One of the oldest issues in macroeconometric model building is whether primacy should be given to the theory or to the statistical evidence. In the 1930s, Tinbergen built the first macroeconometric models drawing informally on Keynes's General Theory and heavily on statistical methods. For the next 50 years, macroeconomic modelling and econometric methods developed side-by-side. These days, most macroeconometric models are based on dynamic stochastic general equilibrium (DSGE) theory and the emphasis has shifted away from the statistical properties of the models to their consistency with the theory. This is reflected in the near universal use of Bayesian estimation for these models in which strong prior distributions of the parameters based on the theory are often more influential than the data. The previous concern with checking the ability of the model to fit the data was downgraded. It is now rare to find anything more than a cursory assessment of the statistical properties of Bayesian estimated models.

The downgrading of statistical methods in macro model building in favour of economic theory was due in part to concerns about data mining. This can be expressed as "it is possible to torture the data as one wishes; the problem is to interpret the results". Observations like this led researchers to move away from judging models largely by their statistical fit. Recognising that economic theory was required, theoretical purity increasingly became the dominant criterion.

With the financial crisis this debate became quite toxic. The* Financial Times*, for example, ran article after article on the damage that had been caused by the use of DSGE models. Many eminent economists contributed (Gürkaynak and Tille 2017). The focus was mainly on the implausibility of DSGE models and not their statistical properties which were implicitly assumed to be poor. It is against this background that our paper (Pagan and Wickens 2019) was written. The issue is the roles of macroeconomic theory and of supporting statistical evidence in macroeconometric modelling.

Pesaran and Smith (2011) conclude their critique of the state of DSGE models around 2010 with the words, "[w]e have argued that macroeconometric modelling would benefit from a more flexible approach which does not require narrow adherence to one particular theoretical framework. In the process one would need to be more explicit about the trade-offs between consistency with theory, adequately representing the data and relevance for particular purposes", and "[w]e have argued that theory, while essential, should be regarded as a flexible framework rather than a straitjacket, because features that the theory abstracts from may be important in practice". Straitjackets come in different sizes. The objective is to keep them as small as possible but maintain the patient in a secure way. Because there are side-effects of being compressed in one, there is a need to ensure that they fit well. So it should be with DSGE models (and any other macroeconomic model being proposed for use). We ask whether that is being done and how it might be done better.

We begin with the question of whether there is a single criterion that might be used to check fit. This stems from the widespread adoption of Bayesian methods to estimate macroeconomic models and what seems to have become a single criterion, namely, the value of the marginal data density – essentially, the fit of the model under all parameterisations. Although there are many proposals in the Bayesian literature about model-checking and the need for it, they seem to have been little used in macroeconomics. Their emphasis is on the need for discrepancy measures to characterise the difference between data simulated from the estimated model and the observed data. These measures can be as simple as just checking the correspondence of moments, as was done in some of the earliest real business cycle studies. Several well-known DSGE models fail to produce moments that agree with those computed from the data. For example, in the model of Christiano, Motto and Rostagno (2014) (henceforth, the CMR model), where the main result is the importance of risk shocks in the macroeconomy, we found that their model without risk shocks gives a better fit. The same outcome of excess volatility is found for the ECB's latest version of their New Area Wide model by Coenen et al. (2018).

Pesaran and Smith highlighted the need to have flexible dynamics, and so it is natural to ask if the dynamics present in the data are being reproduced by the model. As the solution to a DSGE model is a vector autoregression (VAR) or a vector autoregressive moving average (VARMA), a natural and simple way of doing this is to compare a VAR fitted both to the simulated and the observed data. Although this was a focus of attention in early DSGE modelling, its use seems to have lapsed. One problem may have been that researchers were fitting high-order VARs to the data, which involved a large number of parameters that were imprecisely estimated, and so one might well find this produces a test with low power. The indirect estimation literature, which tests whether there is a significant difference between a VAR using data simulated from the estimated DSGE model and a VAR based on the actual data, finds that low-order VARs can also produce tests with good power. Using such a test prominent New Keynesian DSGE models such as those of Smets and Wouters (2007) and, as we found, Ireland (2004) were rejected. The latter we found to have a zero probability of being correct. The CMR model performs better on this dynamic criterion.

An alternative measure based on using a VAR is a model's ability to replicate the data-based error covariance matrix. We found that the CMR model fails this test. A recent suggestion, called agnostic shocks, is to relax some of the restrictions imposed by the model and compare the resulting covariance matrix with that obtained from the original model. We had some difficulty implementing this check and concluded that an indirect inference test is much easier to perform.

DSGE models are usually heavily over-identified, with far more moment conditions than parameters. Many of these moment conditions imply zero correlations among the shocks and no serial correlation (the errors being assumed to be innovations). Checking these conditions provides another check on a model. For the CMR model, we found that some of the ‘innovations’ were correlated and were far from being white noise. For Ireland's model we found the mark up and technology innovations were highly correlated and so unidentified.

One defence of the empirical shortcomings of estimated DSGE models sometimes made is that of ‘theory ahead of data’, which maintains that we shouldn't be surprised if DSGE models fail to fit, and any such gap is of no interest. The way that the theory is imposed is via strong priors in the Bayesian estimation. The danger is that strong priors will dominate the data and produce estimates that make the fit of the estimated model poor.

Another defence is that DSGE models are subject to ‘measurement error’ implying that model residuals are not pure shocks. While this affects the interpretation of the residuals, it does not affect indirect inference. A third defence is to allow the parameters to be time-varying. The problem now is that impulse response functions will also involve movements in the expected values of the parameters and will not just reflect shocks.

Models based more loosely on macroeconomic theory than DSGE models also present problems particularly in their treatment of non-stationarity. DSGE models typically deal with non-stationarity by transforming the data to stationarity by filtering. Models such as structural VARs leave the problem of how to deal with non-stationarity unresolved. Should the data be expressed as growth rates or be de-trended and, if this is done, what are the implications for the equation disturbances and the identification of shocks?

The important issue of how best to construct macroeconometric models has no easy solution. Our view is similar to that of Pesaran and Smith: the challenge is to incorporate greater flexibility into the theoretical model in order to match the data without compromising the theory. In this way the theory, which is essential, becomes less of a straitjacket and the models better explain the data.

## References

Christiano, L J, R Motto and M Rostagno (2014), "Risk Shocks", *American Economic Review* 104: 27-65

Coenen, G, P Karadi, S Schmidt and A Warne (2018), "The New Area-Wide Model II: an extended version of the ECB's micro-founded model for forecasting and policy analysis with a financial sector", ECB Working Paper No 2200.

Gürkaynak, R and C Tille (2017), *DSGE Models in the Conduct of Policy: Use As Intended*, CEPR Press.

Ireland, P N (2004), "Technology shocks in the New Keynesian Model", Review of Economics and Statistics 86(4): 923-936.

Pagan, A and M R Wickens (2019), "Checking if the Straitjacket Fits", CEPR Discussion Paper 14140 (forthcoming in *Advances in Econometrics*).

Pesaran, M H and R P Smith (2011), "Beyond the DSGE Straitjacket", *Manchester School* 79: 5-16

Smets, F and R Wouters (2007), "Shocks and Frictions in US Business Cycles: a Bayesian DSGE Approach",* American Economic Review* 97: 586-606.