Long-term interest rates, such as the yield on a ten-year Treasury bond, reflect expectations of future short-term interest rates, such as the course of the federal funds rate over the next ten years. The famous expectations hypothesis claims that these expectations are the only driver of long-term rates, but this is clearly false. Research going back to Fama and Bliss (1987) and Campbell and Shiller (1991) has shown that an additional component, the so-called term premium, also causes movements in long-term rates. Investors in long-term bonds bear interest rate risk, and the risk premium that compensates for it varies over time. Estimating this risk premium is of keen interest to investors, central bankers, forecasters, and academics, but unfortunately it is very difficult to do. For starters, it is not even clear what variables should be included to construct such estimates.

This issue boils down to the question of which variables help to predict the returns on long-term bonds in excess of a short-term (and risk-free) rate, and this question has generated much controversy over the years. The classic studies cited above documented that the slope of the yield curve – that is, the difference between long-term and short-term yields – predicts excess bond returns. But what about other variables? Interest rates are determined by many macroeconomic variables, including inflation, output, and actions by the central bank. That does not mean those variables are necessarily useful to forecast bond returns. In fact, a simple argument implies that they should not be—the information set of bond investors contains many variables that are relevant for forecasting future interest rates or that determine risk premia. The yield on any bond is a function of these variables. But yields on bonds with different maturities are necessarily different functions of these variables, and we can observe a lot of different yields on the yield curve. In principle – under certain additional assumptions which are satisfied by essentially all theoretical finance models – one should therefore be able to back out the original variables from the cross section of yields (e.g. Duffee 2013). In other words, all the information that is useful for forecasting interest rates and estimating risk premia would be fully captured, or ‘spanned’, by the current yield curve itself.

In addition, it has been recognised since Litterman and Scheinkman (1991) that the first three principal components of yields provide an excellent summary of the entire yield curve. Though the principal components are calculated mechanically, they have the simple intuitive interpretation that they correspond to the level, slope, and curvature of the yield curve. Combining the theoretical argument above with this long-standing result on the factor structure of yields gives rise to a natural hypothesis, which we refer to as the ‘spanning hypothesis’—the level, slope, and curvature of the yield curve might be all we need to know if our goal is to forecast returns or estimate bond risk premia. This spanning hypothesis has a lot of appeal beyond its theoretical justification—it would vastly simplify the difficult task of estimating risk premia in long-term bonds, since beyond these three summary statistics of the current yield curve no other variables would be needed.

In recent years, a consensus has emerged that the spanning hypothesis does not fit the data, due to a number of prominent studies that found variables in addition to the level, slope, and curvature that appear to be useful for predicting excess bond returns. These variables include measures of economic growth and inflation (Joslin et al. 2014), factors inferred from a large set of macro variables (Ludvigson and Ng 2009, 2010), long-term trends in inflation or inflation expectations (Cieslak and Povala 2015), the output gap (Cooper and Priestley 2008), measures of Treasury bond supply (Greenwood and Vayanos 2014), and the fourth and fifth principal component of yields (Cochrane and Piazzesi 2005).

But there are some knotty statistical problems underlying the predictive regressions used in these studies. Recognising and accounting for these problems casts significant doubt on these findings, as we document in a new paper (Bauer and Hamilton, forthcoming). The evidence against the spanning hypothesis is in fact much less convincing than may have originally appeared.

One previously unrecognised problem with evidence of bond return predictability is ‘standard error bias’–the coefficients relating bond returns to the proposed predictive variables are measured much less accurately than conventional statistical formulas would suggest. This means that high t-statistics and regression R2 might indicate strong predictive power for variables that are in fact useless predictors. Standard error bias occurs in a regression with the following characteristics:

- The regression includes explanatory variables that are correlated with lagged values of the dependent variables.

This is true in the cases cited above because the level, slope, and curvature of the term structure are by construction very strongly correlated with lagged returns.

- Both the true regressors and the proposed regressors are highly persistent.

This is also very much the case for most predictors that are considered in practice. As a result, standard error bias is prevalent in predictive regressions for bond returns. At the same time, there are additional statistical problems, such as the use of overlapping returns (e.g. annual returns in monthly data). The consequence is that conventional statistical tests are highly misleading.

We propose a simple bootstrap procedure that can be used both to assess how important the statistical problems are in any given setting and to obtain more reliable results. The idea is to generate artificial data (for yields and the additional variables) in which the spanning hypothesis is true by construction but which otherwise has similar statistical properties, such as high persistence, as the actual data. First, we estimate a vector autoregression – a simple dynamic model – for the three PCs of yields, and generate a bootstrap sample of the principal components. From the way that any given bond yield loads on the principal components in the historical sample (a known linear function of the principal components plus a small random residual), we can generate a full set of bootstrap yields. Second, we separately estimate a vector autoregression for the proposed alternative variables (such as inflation and output) and generate a bootstrap sample for them.

In this artificial data, we can then calculate the same regressions and statistical tests that are used by researchers to test the spanning hypothesis, knowing that it is in fact true. Repeating this process many times, we can assess how frequently one would incorrectly reject this null hypothesis. We find that tests that were originally intended to have a size of 5% (i.e. incorrectly reject a true null hypothesis 5% of the time) have a true size that is often around 30-40%, and in some cases up to 60%, depending on the test and dataset. In other words, it is quite probable that a researcher would find what they think is persuasive evidence against the spanning hypothesis even if this hypothesis is in fact true.

To get more reliable statistical results, the test statistics should not be compared against the usual critical values (such as 2 for a t-test), but against the bootstrap distribution of the test statistic, which leads to much more conservative critical values (often closer to 4). With this more accurate test procedure, we find that in most cases the spanning hypothesis can in fact not be rejected.

Another way to investigate the spanning hypothesis is to use out-of-sample forecasting. While this approach can usually be criticised because of arbitrary selection of in-sample and out-of-sample periods (possibly to aid the researcher’s case), for the evidence cited above we have true out-of-sample data available—that is, data since publication of the original study. We can look at which specification—a model that uses only level, slope, or curvature, or a model that adds other proposed measures—did a better job of predicting data that were not available to the original researcher. We find that the proposed additional predictors are rarely helpful in the new data.

Our conclusion is that the tests which earlier researchers employed are highly unreliable, and that the evidence against the spanning hypothesis is in fact substantially less convincing than would appear from the published studies. Few, if any, macroeconomic variables are robust predictors of bond returns. By contrast, the information in the yield curve itself is tremendously important for estimating bond risk premia and the term premium component in long-term interest rates.

*Editor’s note: The views expressed here are those of the authors and do not necessarily reflect those of the Federal Reserve System.*

## References

Bauer, M D and J D Hamilton (forthcoming), “Robust risk premia”, *Review of Financial Studies.*

Campbell, J Y and R J Shiller (1991), “Yield spreads and interest rate movements: A bird’s eye view”,* Review of Economic Studies*, 58: 495–514.

Cieslak, A and P Povala (2015), “Expected returns in treasury bonds”, *Review of Financial Studies*, 28: 2859-2901.

Cochrane, J H and M Piazzesi (2005), “Bond risk premia”, *American Economic Review*, 95: 138-160.

Cooper, I and R Priestley (2008), “Time-varying risk premiums and the output gap”, *Review of Financial Studies*, 22: 2801-2833.

Duffee, G R (2013), “Forecasting interest rates”, in G Elliott and A Timmermann (eds), *Handbook of Economic Forecasting, Elsevier, Vol. 2, Part A*: 385-426.

Fama, E F and R R Bliss (1987), “The information in long-maturity forward rates”, *American Economic Review,* 77: 680–692.

Greenwood, R and D Vayanos (2014), “Bond supply and excess bond returns”,* Review of Financial Studies*, 27: 663-713.

Joslin, S, M Priebsch and K J Singleton (2014), “Risk premiums in dynamic term structure models with unspanned macro risks”, *Journal of Finance*, 69: 1197-1233.

Litterman, R and J Scheinkman (1991), “Common factors affecting bond returns”, *Journal of Fixed Income*, 1: 54-61.

Ludvigson, S C and S Ng (2009), “Macro factors in bond risk premia”, *Review of Financial Studies*, 22: 5027-5067.

Ludvigson, S C and S Ng (2010), “A factor analysis of bond risk premia”,* Handbook of Empirical Economics and Finance*: 313.