feds · November 30, 2015

Does Realized Volatility Help Bond Yield Density Prediction?

Abstract

We suggest using "realized volatility" as a volatility proxy to aid in model-based multivariate bond yield density forecasting. To do so, we develop a general estimation approach to incorporate volatility proxy information into dynamic factor models with stochastic volatility. The resulting model parameter estimates are highly efficient, which one hopes would translate into superior predictive performance. We explore this conjecture in the context of density prediction of U.S. bond yields by incorporating realized volatility into a dynamic Nelson-Siegel (DNS) model with stochastic volatility. The results clearly indicate that using realized volatility improves density forecasts relative to popular specifications in the DNS literature that neglect realized volatility.

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Does Realized Volatility Help Bond Yield Density Prediction? Minchul Shin and Molin Zhong 2015-115 Please cite this paper as: Shin, Minchul and Molin Zhong (2015). “Does Realized Volatility Help Bond Yield Density Prediction?,” Finance and Economics Discussion Series 2015-115. Washington: Board of Governors of the Federal Reserve System, http://dx.doi.org/10.17016/FEDS.2015.115. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Does realized volatility help bond yield density prediction? Minchul Shin Molin Zhong∗ University of Illinois Federal Reserve Board This version: September 19, 2015 Abstract We suggest using “realized volatility” as a volatility proxy to aid in model-based multivariate bond yield density forecasting. To do so, we develop a general estimation approach to incorporate volatility proxy information into dynamic factor models with stochastic volatility. The resulting model parameter estimates are highly efficient, which one hopes would translate into superior predictive performance. We explore this conjecture in the context of density prediction of U.S. bond yields by incorporating realized volatility into a dynamic Nelson-Siegel (DNS) model with stochastic volatility. The results clearly indicate that using realized volatility improves density forecasts relative to popular specifications in the DNS literature that neglect realized volatility. Key words: Dynamic factor model, forecasting, stochastic volatility, term structure of interest rates, dynamic Nelson-Siegel model JEL codes: C5, G1, E4 ∗Correspondence: MinchulShin: 214DavidKinleyHall,1407W.Gregory,Urbana,Illinois61801. E-mail: mincshin@illinois.edu. Molin Zhong: 20th Street and Constitution Avenue N.W., Washington, D.C. 20551. E-mail: Molin.Zhong@frb.gov. We are grateful for the advice of Frank Diebold, Jesus Fernandez-Villaverde, andFrankSchorfheide. WealsothankManabuAsai,LuigiBocola,ToddClark,XuCheng,FrankDiTraglia, Nikolaus Hautsch, Kyu Ho Kang, Michael McCracken, Andrew Patton, Neil Shephard, Dongho Song, Allan Timmermann, Jonathan Wright, and seminar participants at the University of Pennsylvania, International Symposium on Forecasting 2013, and OMI-SoFiE Financial Econometrics Summer School 2013 for their comments. The views expressed in this paper are solely the responsibility of the authors and should not be interpreted as reflecting the views of the Board of Governors of the Federal Reserve System or of any other person associated with the Federal Reserve System. 1

2 1 Introduction Time-varying volatility exists in U.S. government bond yields. In this paper, we introduce volatility proxy data in the hopes of better capturing this time-varying volatility for predictive purposes. To do so, we develop a general estimation approach to incorporate volatility proxy information into dynamic factor models with stochastic volatility. We apply it to the dynamic Nelson-Siegel (DNS) model of bond yields. We find that the higher frequency movements of the yields in the realized volatility data contain valuable information for the stochastic volatility and lead to significantly better density predictions, especially in the short term. Ourapproachcanbeappliedtotheexistingclassesofdynamicfactormodelswithstochastic volatility. Specifically, we can account for stochastic volatility on the latent factors or stochastic volatility on the measurement errors. We derive a measurement equation to link realized volatility to the model-implied conditional volatility of the original observables. Incorporating realized volatility improves estimation of the stochastic volatility by injecting precise volatility information into the model. The DNS model is a dynamic factor model that uses latent level, slope, and curvature factors to drive the intertemporal movements of the yield curve. This reduces the highdimensional yields to be driven by just three factors. The level of the yield curve has traditionally been linked to inflation expectations while the slope to the real economy. Our preferred specification introduces stochastic volatility on these latent factors. This leads to a nice interpretation of the stochastic volatility as capturing the uncertainty surrounding well-understood aspects of the yield curve. It also reduces the dimension of modeling the time-varying volatility of the yield curve. WethencomparethisspecificationtoseveralothersintheDNSframework, includingrandom walk dynamics for the factors and stochastic volatilities, as well as stochastic volatility on the yield measurement equation. In a forecasting horserace on U.S. bond yields, our

3 preferred specification features slight improvements in the point forecast performance and significant gains in the density forecast performance. We also find that allowing for timevarying volatility is important for density prediction, especially in the short run. Unlike conditional mean dynamics, modeling volatility as first-order autoregressive processes rather than random walks leads to better predictive performance. Furthermore, having stochastic volatility on the factor equation better captures the time-varying volatility in the bond yield data when compared to stochastic volatility on the measurement equation. Our paper relates to the literature in three main areas. First, our paper relates to work started by Barndorff-Nielsen and Shephard (2002) in incorporating realized volatility in models with time-varying volatility. Takahashi et al. (2009) use daily stock return data in combination with high-frequency realized volatility to more accurately estimate the stochastic volatility. Maheu and McCurdy (2011) show that adding realized volatility directly into a model of stock returns can improve density forecasts over a model that only uses level data, such as the EGARCH. Jin and Maheu (2013) propose a model of stock returns and realized covariance based on time-varying Wishart distributions and find that their model provides superior density forecasts for returns. There also exists work adding realized volatility in observation-driven volatility models (Shephard and Sheppard, 2010; Hansen et al., 2012). As opposed to the other papers, we consider a dynamic factor model with stochastic volatility on the factor equation and use the realized volatility to help in the extraction of this stochastic volatility. In this sense, we bring the factor structure in the conditional mean to the conditional volatility as well. Cieslak and Povala (2015) have a similar framework in a no-arbitrage term structure model. Furthermore, we are the first paper to investigate the implications of realized volatility for bond yield density predictability. Second, wecontributetoalargeliteratureonbondyieldforecasting. Mostoftheworkhas been done on point prediction (see for example, Diebold and Rudebusch, 2012; Duffee, 2012, for excellent surveys). There has been, however, a growing interest in density forecasting. Egorov et al. (2006) were the first to evaluate the joint density prediction performance of

4 yield curve models. They overturn the point forecasting result of the superiority in random walkforecastsandfindthataffinetermstructuremodelsperformbetterwhenforecastingthe entire density, especially on the conditional variance and kurtosis. However, they do not consider time-varying conditional volatility dynamics in the bond yield predictive distribution. Hautsch and Ou (2012) and Hautsch and Yang (2012) add stochastic volatility to the DNS model by considering an independent AR(1) specification for the log volatilities of the latent factors. They do not do formal density prediction evaluation of the model, but give suggestive results of the possible improvements in allowing for time-varying volatility. Carriero et al. (2013) find that using priors from a Gaussian no-arbitrage model in the context of a VAR with stochastic volatility improves short-run density forecasting performance. Building on this previous work, we introduce potentially highly accurate volatility information into the model in the form of realized volatility and evaluate bond yield density predictions to see whether this extra information about the bond yield volatility can improve the quality of the predictive distribution. Finally, we also add to a growing literature on including realized volatility information in bond yield models. Andersen and Benzoni (2010) and Christensen et al. (2014) view realized volatility as a benchmark on which to compare the fits of affine term structure models. CieslakandPovala(2015)areinterestedinusingrealizedcovariancetobetterextract stochastic volatility and linking the stochastic volatility to macroeconomic and liquidity factors. These papers focus on in-sample investigations of incorporating realized volatility in bond yield models. Another stream of research exploits information in high-frequency movements of bond prices to achieve better point prediction performance. For example, Wright and Zhou (2009) report that the realized jump mean measure constructed from Treasury bond futures improves excess bond return point prediction by 40%. Our paper, in contrast to these others, considers the improvement from using realized volatility in out-ofsample bond yield density prediction. In section 2, we introduce our methodology for incorporating volatility proxies into dy-

5 namic factor models in the context of the DNS model and other competitor specifications. We discuss the data in section 3. We present our estimation and forecast evaluation methodology in section 4. In section 5, we present in-sample and out-of-sample results. We conclude in section 6. 2 Model WeintroducethedynamicNelson-Siegelmodelwithstochasticvolatility(DNS-SV)proposed by Bianchi et al. (2009), Hautsch and Ou (2012), and Hautsch and Yang (2012). Then, we discuss the incorporation of realized volatility information into this framework. Finally, we consider alternatives to our main approach. 2.1 The Dynamic Nelson-Siegel model and time-varying bond yield volatility Denote y (τ) as the continuously compounded yield to maturity on a zero coupon bond with t maturity of τ periods at time t. Following Diebold and Li (2006), we consider the factor model for the yield curve, (cid:18) 1−e−λτ(cid:19) (cid:18) 1−e−λτ (cid:19) y (τ) = f +f +f −e−λτ +(cid:15) (τ), (cid:15) ∼ N(0, Q) (1) t l,t s,t c,t t t λτ λτ where f ,f and f serve as latent factors and (cid:15) is a vector that collects the idiosyncratic l,t s,t c,t t component (cid:15) (τ) for all maturities. As is well documented in the literature, the first factor t mimics the level of the yield curve, the second the slope, and the third the curvature. We assume that the Q matrix is diagonal. This leads to the natural interpretation of a few common factors driving the comovements in a large number of yields. All of the other movements in the yields are considered idiosyncratic. We model the dynamic factors as a

6 multivariate vector autoregressive process, given by, f = (I −Φ )µ +Φ f +η , η ∼ N(0, H ) (2) t 3 f f f t−1 t t t where f = [f ,f ,f ](cid:48) is a 3 × 1 vector, I is a 3 × 3 identity matrix, Φ is a 3 × 3 t l,t s,t c,t 3 f matrix, µ is a 3×1 vector, and η is a vector that collects the innovations to each factor, f t with a potentially time-varying diagonal variance-covariance matrix H . We also assume t that idiosyncratic shocks (cid:15) and factor shocks η are independent. Following Bianchi et al. t t (2009), Hautsch and Ou (2012), and Hautsch and Yang (2012), we model the logarithm of the variance of the shocks to the factor equation as AR(1) processes, h = µ (1−φ )+φ h +ν , ν ∼ N(0, σ2 ) (3) i,t h,i h,i h,i i,t−1 i,t i,t h,i for i = l,s,c where exp(h ) corresponds to the ith diagonal element of the variancei,t covariance matrix H 1. In addition, shocks to the stochastic volatilities of the factor int novations are assumed to be independent. We call this specification the DNS-SV model (dynamic Nelson-Siegel with stochastic volatility). 2.2 DNS-RV We claim that by using high-frequency data to construct realized volatilities of the yields, it is possible to aid in the extraction of the stochastic volatilities governing the level, slope, and curvature of the DNS-SV model. Using realized volatility to augment our algorithm 1This formulation implies that shocks to the factors, {η ,η ,η } are independent each other. We l,t s,t c,t maintain this independence assumption following the original dynamic Nelson-Siegel model of Diebold and Li (2006). We can relax this assumption by decomposing the covariance matrix η as in Cogley and Sargent t (2005) and Primiceri (2005) to obtain, (cid:32) (cid:33)(cid:32) (cid:33)(cid:32) (cid:33) 1 0 0 exp(h ) 0 0 1 c c l,t ls lc cov(η t )=CH t C(cid:48) = c ls 1 0 0 exp(hs,t) 0 0 1 csc , c lc csc 1 0 0 exp(hc,t) 0 0 1 where c ,c , and c are real numbers. Our main idea goes through with this formulation by redefining λ ls lc sc f in equation 4 as λ(cid:101)f =λ f C.

7 should make estimation of the stochastic volatility parameters more accurate and produce a superior predictive distribution. Crucially, we need to find an appropriate linkage between our volatility proxy - realized volatility - and the stochastic volatility in the model. Given the definition of the model-implied conditional volatility, we propose2 RV ≈ Var (y ) = diag(Λ H Λ(cid:48) +Q) (4) t t−1 t f t f where RV is the realized volatility of bond yields, which has the same dimension as the t bond yield vector y , and Λ is the factor loading matrix given by equation 1. Insofar as t f realized volatility provides an accurate approximation to the true underlying conditional time-varying volatility, equation 4 is the one that links this information to the model. Upon adding measurement error, one can view equation 4 as a nonlinear measurement equation. Inprinciple, wehaveseveraltoolstohandlethisnonlinearity, includingtheparticle filter. To keep estimation computationally feasible, when estimating h , we choose to take t a first order Taylor approximation of the logarithm of this equation around a 3×1 vector µ = [µ ,µ ,µ ](cid:48) with respect to h . This leads to a set of linear measurement equations h h,l h,s h,c t that links the realized volatility of the bond yields and the underlying factor volatility, log(RV ) = β +Λ (cid:101)h +ζ , ζ ∼ N (0,S), (5) t h t t t where we write the logarithm of volatility in deviation form (cid:101)h = h −µ for i = l,s,c. i,t i,t h,i We assume that RV follows a log-normal distribution conditional on past histories of bond t yields and bond yield realized volatilities. This assumption leads to a normally distributed measurement error ζ 3. We call this new model the dynamic Nelson-Siegel with realized t volatility (DNS-RV) model. The difference between this model and DNS-SV comes from augmenting equation 1 with a new measurement equation 5. This equation has a constant 2Thisstrategyoflinkinganobservedvolatilitymeasuretothemodelisalsousedinotherpapers(Maheu and McCurdy (2011) in a univariate model and Cieslak and Povala (2015) in a multivariate context). 3A detailed derivation for equation 5 can be found in the Appendix.

8 β and a factor loading Λ . The parameter β comes from the linearization4 while we can h interpretΛ asaloadingforthefactorvolatilityusedtoreducethedimensionofthevolatility h data, log(RV ). This very naturally extends the dynamic factor model, which transforms t high-dimensional data (y ) into a few number of factors (f ) via the factor loading matrix t t Λ . The volatility factor loading (Λ ) is a function of other model parameters (Λ , Q, µ ) f h f h with the functional form given by the linearized version of equation 45. We can view this as a model-consistent restriction on the linkage between the conditional volatility of observed data, approximated by log(RV ), and the factor volatility h . For the same reasons as in t t the baseline DNS-SV model, we set the S matrix to be diagonal. These are interpreted as idiosyncratic errors, and we therefore do not model them to be contemporaneously or serially correlated. In summary, the DNS-RV model introduces a new measurement equation into the state space of the DNS-SV model. (Measurement equation) y = Λ f +(cid:15) , (cid:15) ∼ N(0, Q) t f t t t (6) log(RV ) = β +Λ (cid:101)h +ζ , ζ ∼ N (0,S), t h t t t (Transition equation) f = (I −Φ )µ +Φ f +η , η ∼ N(0, H = diag(e[h l,t ,hs,t,hc,t])) t 3 f f f t−1 t t t (7) h = µ (1−φ )+φ h +ν , ν ∼ N(0, σ2 ) i,t i,h i,h i,h i,t−1 i,t i,t i,h for i = l,s,c. The operator diag(·) turns a vector into a diagonal matrix. In our application, both observed bond yields (y ) and realized volatilities (log(RV) ) are 17×1 vectors. Moret t over, both sets of variables have a factor structure with dynamics following the transition 4Weestimateβ asaseparateparameterasinTakahashietal.(2009). SeetheAppendixformoredetail. 5Detailed formulas for the volatility factor loading matrix can be found in the Appendix.

9 Table 1 Model Specifications Label Factors (level, slope, curvature) Conditional variance Realized volatility RW-C Random walk Constant Not used RW-SV Random walk log AR(1) in each factor Not used RW-RV Random walk log AR(1) in each factor Used DNS-C Diagonal Φ Constant Not used f DNS-SV Diagonal Φ log AR(1) in each factor Not used f DNS-RV Diagonal Φ log AR(1) in each factor Used f RW-SV-RW Random walk Random walk Not used DNS-SV-RW Diagonal Φ Random walk Not used f DNS-ME-SV Diagonal Φ log AR(1) in measurement equation Not used f DNS-ME-RV Diagonal Φ log AR(1) in measurement equation Used f Note: We list the specifications for the DNS model considered in this paper. equations. In our application, we follow Diebold and Li (2006) and assume Φ and H to be f t diagonal matrices6. 2.3 Alternative specifications We have four classes of alternative specifications to compare forecasts to our baseline model. We briefly introduce them in this section and list all specifications considered in the paper in Table 1. 2.3.1 Dynamic Nelson-Siegel (DNS-C) The first model is the standard Diebold-Li DNS model discussed at the beginning of the paper. It does not allow for stochastic volatility. This model has been shown to forecast the level of bond yields quite well, at times beating the random walk model of yields. 6This specification implies that the movements of the factors are unrelated to each other. While this may seem as a tight restriction at first blush, Diebold and Rudebusch (2012) point out that the assumption does not seem poor in so far as the factors are related to the principal components of the yield curve.

10 2.3.2 Dynamic Nelson-Siegel-Stochastic Volatility (DNS-SV) The second is the DNS-SV model that adds stochastic volatility to the transition equation. It is summarized at the beginning of this section. By allowing for stochastic volatility, this model should improve upon the standard DNS model, especially in the second moments, as it can capture the time-varying volatility present in the bond yield data. 2.3.3 Dynamic Nelson-Siegel-Random Walk (RW) The bond yield forecasting literature (e.g. van Dijk et al. (2014) and references therein) has shown that random walk specifications of the yield curve generally perform quite well. Oftentimes, the no-change forecast from a current period does best among a large group of forecasting models. It is in this sense that bond yield forecasting is difficult. Given these results, we also augment the DNS-C, DNS-SV, and DNS-RV model classes with random walk parameterizations of the factor processes. The empirical macroeconomic literature (Cogley and Sargent, 2005; Justiniano and Primiceri, 2008; Clark, 2011) often specifies stochastic volatility as following a random walk. Doing so reduces the number of parameters estimated while also providing a simple no-change forecast benchmark for time-varying volatility. As long-horizon bond yield volatility links with macroeconomic volatility, we also have random walk specifications for the stochastic volatilities. 2.3.4 Dynamic Nelson-Siegel-Measurement Error Stochastic Volatility (+ Realized Volatility) (DNS-ME) Koopman et al. (2010) argue that putting the time-varying conditional volatility on the measurementerrorsprovidesanimprovementforthein-samplefitoftheDNSclassofmodels. To evaluate whether these results extend to forecasting as well, we model independent AR(1) specifications for the measurement error stochastic volatilities. While following this strategy

11 greatly increases the number of parameters estimated, it could improve forecasting as each yield has its own stochastic volatility process. Therefore, as opposed to a time-varying H t and constant Q matrix in the DNS-SV and DNS-RV setups, now H is constant while Q has t stochastic volatility. Q = diag(eqt) (8) t q −µ = φ (q −µ )+ν , ν ∼ N(0, q ), (9) i,t q,i q,i i,t−1 q,i i,t i,t i for i = 1,...,N where N is the number of bond yields in the observation equation. q is t a vector that collects all stochastic volatilities in the measurement errors. Q remains a t diagonal matrix, as equation 8 shows. We again model the logarithm of the variances as independent first order autoregressive processes. We also consider incorporating realized volatility information into this model. Doing so leads to the following relationship RV ≈ Var (y ) = diag(Λ HΛ(cid:48) +Q ). (10) t t−1 t f f t As before, we do a first order Taylor approximation of the logarithm of this equation around the 17×1 vector µ . We also add in measurement error for estimation. However, in contrast q to the DNS-RV model, we link each element of log(RV ) to its corresponding element in q . t t 2.4 Discussion Note that our DNS-SV and DNS-RV model specifications have three factors governing timevaryingconditionalvolatilitythatareindependentoftheyieldcurvefactors. Hence, thetotal number of latent factors explaining the joint distribution of bond yields is larger than that of standard no-arbitrage affine term structure models considered in the literature (Duffie and Kan, 1996). In addition, our volatility factors are unspanned volatility in the sense that the

12 cross-sectional properties of bond yields (i.e., the yield curve) is determined by f in equation t 6. As we are not imposing no-arbitrage restrictions, the volatility factors (h ) do not face t the dual role of fitting the cross-section and the conditional volatility dynamics of bond yields as pointed out by Collin-Dufresne et al. (2009). Relative to the standard no-arbitrage affine term structure model, the DNS-SV and DSN-RV models provide a more flexible way to achieve our main goal of forecasting the joint distribution of bond yields. They do so by using three factors (f ) to fit yield conditional mean dynamics and three other factors (h ) t t to fit yield volatility dynamics7. 3 Data We use a panel of unsmoothed Fama and Bliss (1987) U.S. government bond yields at the monthlyfrequencywithmaturitiesof3,6,9,12,15,18,21monthsand2,2.5,3,4,5,6,7,8,9, 10 years from January 1981 to December 2009. This dataset is provided by Jungbacker et al. (2013)8. To construct the monthly realized volatility series, we use daily U.S. government bond yield data with the same maturities from January 2, 1981 to December 30, 2009 taken from the Federal Reserve Board of Governors with the methodology of Gu¨rkaynak et al. (2007) 9 10. We construct the realized variance of each month’s yields using daily bond yield data. 7It seems that at least more than one volatility factor is essential to explain the bond yield distribution in the affine term structure model framework (See for example, Creal and Wu, 2014; Cieslak and Povala, 2015). 8http://qed.econ.queensu.ca/jae/datasets/jungbacker001/ 9http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html 10Because most papers in the literature estimate the DNS model with unsmoothed Fama and Bliss data, we generate and evaluate predictions for the unsmoothed Fama and Bliss data. Unfortunately, this data is onlyavailableatthemonthlyfrequency. Eventhoughthetwodatasetsusedifferentmethodologies,monthly yield data based on the daily bond yield from Gu¨rkaynak et al. (2007) is very close to the one based on the unsmoothed Fama and Bliss method.

13 The formula for realized variance at time t is D (cid:88)(cid:16) (cid:17)2 RV = ∆y . t t+d D d=1 where D is the number of daily data in one time period t. This formula converges in probability to the true conditional variance as the sampling frequency goes to infinity under assumptions laid out in Andersen et al. (2003). Usually, there are around 21 days in each month, with less depending upon the number of holidays in a month that fall on normal trading days. We use daily data to construct our realized volatilities for a few reasons. First, we want to use realized volatility information starting in 1981 to use a sample period similar to other bond yield forecasting studies. The availability of higher-frequency intraday data begins much later. For instance, Cieslak and Povala (2015) start their estimation in 1992 for specifically this reason. Second, the month-to-month volatility movements we want the volatility proxy to capture do not necessitate using ultra-high frequency data. Finally, while results may improve with higher frequency data, we show that positive effects are present even with lower frequency realized volatility. Figure 1 plots the time series of monthly U.S. government bond yields and logarithm of realizedvolatilities. Allyieldsexhibitageneraldownwardtrendfromthestartofthesample. Foraroundthefirst25months, therealizedvolatilityseemsquitehighandexhibitslargetime variation. After around 1983, yield volatility dies down and largely exhibits only temporary spikes in volatility. For a period of 2 years starting in 2008, the realized volatility picks up across all yields. We attribute this to the financial crises. Another interesting feature of log realized volatility is that it shows large autocorrelation. Its first-order autocorrelation coefficientsrangefrom0.59to0.69andthe12th-orderautocorrelationcoefficientsrangefrom 0.20 to 0.31.11 This means that realized volatility data could help even for the long-horizon 11We provide tables in the supporting material for the descriptive statistics of monthly realized volatility of bond yields.

14 Figure 1 U.S. Treasury Yields (a) Yields (monthly, annualized %) (b) Yield Realized Volatilities (Monthly, Log) Notes: We present monthly U.S. Treasury yields with maturities of 3, 6, 9, 12, 15, 18, 21 months and 2, 2.5, 3, 4, 5, 6, 7, 8, 9, 10 years over the period January 1981 – November 2009. Monthly yields are constructed usingtheunsmoothedFama-Blissmethod. Monthlyyieldrealizedvolatilitiesareconstructedbasedondaily yields using Wright’s dataset. Blue shaded bars represent NBER recession dates. forecasts that we consider in this paper. Both the yields and realized volatilities do seem to exhibit a factor structure, meaning that each set of series co-move over time. In fact, a principal components analysis shows that the first three factors for yields explain 99.95% of the variation in the U.S. yield curve. The first three factors for realized volatilities explain 98.53% of the variation (Table 2). Although the fact that U.S. bond yields can be explained by the first few principal components is well documented, it is interesting that the same feature carries over to the realized volatility of U.S. bond yields.

15 Table 2 Variance explained by the first five principal components (%) Yield log(RV) pc 1 98.16 84.30 pc 2 99.84 94.62 pc 3 99.95 98.53 pc 4 99.98 99.46 pc 5 99.98 99.85 Notes: NumbersinthetablearethepercentageoftotalvarianceexplainedbythefirstfiveprincipalcomponentsforU.S.Yielddataandlog(RV).log(RV)isthelogarithmofthemonthlyrealizedvolatilityconstructed based on the daily U.S. yield data. 4 Estimation/Evaluation Methodology 4.1 Estimation We perform a Gibbs sampling Markov Chain Monte Carlo algorithm for 15,000 draws. We keep every 5th draw and burn in for the first 5,000 draws. Due to our linearization approximation in introducing realized volatility, all specifications that we consider can be sampled by using the method developed in Kim et al. (1998) for the stochastic volatility state space model. Details of the state space representation and the estimation procedure can be found in the appendix. Details on the prior can be found in the appendix as well, although we comment that our choice of prior is loose and is not expected to impact the estimation results radically. We highlight the difference in estimation procedure due to the additional measurement equation for the realized volatility. Roughly speaking, there are two sources of information for the latent volatility factor h . The first source is from the latent factor f . To see this, t t one can transform the transition equation for f as follows t (cid:0) (cid:1) log (f −µ (1−φ )−φ f )2 = h +log(x2 ), x ∼ N(0,1) (11) i,t i,f i,f i,f i,t i,t i,t i,t for i = l,s,c. This is common to both DNS-SV and DNS-RV. The second source is from

16 equation 6 which relates log(RV ) with h and is unique to DNS-RV. For the estimation of t t the DNS-RV model, we augment the realized volatility measurement equation to the Kim et al. (1998)’s state space model representation defined by equation 3 and equation 11. Then, conditional on the other parameters and data, extraction of h amounts to running a t simulation smoother in conjunction with the Kalman filter with and without equation 6. 4.2 Forecast evaluation We consider model performance along both the point and density forecasting dimensions. The appendix contains further details on the Bayesian simulation algorithm we use to generate the forecasts. We begin forecasting on February 1994 and reestimate the model in an expanding window and forecast moving forward two months at a time. For every forecast run at a given time t, we forecast for all yields in our dataset and for horizons ranging from 1 month to 12 months ahead. This leads to a total of 94 repetitions. Point forecast To evaluate the point prediction, we use the Root Mean Square Forecast Error (RMSE) statistic, (cid:114) RMSEM = 1 (cid:88)(cid:0) yˆM (τ)−y(τ) (cid:1)2 . (12) τ,ho F t+ho t+ho Call the yield τ forecast at horizon ho made by model M as yˆM (τ) and y(τ) the realized t+ho t+ho value of the yield at time t + ho. F is the number of forecasts made. Then, equation 12 provides the formula for the RMSE. To gauge whether there are significant differences in the RMSE, we use the Diebold and Mariano (1995) t-test of equal predictive accuracy. Density forecast The log predictivescore (Geweke and Amisano, 2010) gives an indication of how well a model performs in density forecasting, 1 (cid:88) LPSM = logp(y (τ)|yt,M). (13) τ,ho F t+ho

17 wherep(y (τ)|yt,M)denotestheho-stepaheadpredictivedistributionofyieldτ generated t+ho by model M given time t information. Following Carriero et al. (2013), we estimate the log predictive density by a kernel density estimator using MCMC draws for parameters and latent states and compute the p-value for the Amisano and Giacomini (2007) t-test of equal means to gauge whether there exist significant differences in the log predictive score. 5 Results We first present in-sample results of the model, focusing on time-varying volatility. Then, we move to point and density forecasting results. 5.1 In-sample We first present the full sample estimation from January 1981 - November 2009. We focus on how adding realized volatility information alters the model. Adding in second-moment information does not significantly change conditional mean dynamics, so we relegate our discussion of the extracted factors to the appendix12. Our extracted factors are similar to those found in Diebold and Li (2006). The stochastic volatility dynamics deserve some more precise discussion. Figure 2 shows the volatility estimate from the DNS-C, DNS-SV, and DNS-RV models. The fluctuations of the extracted stochastic volatilities in both the DNS-SV and DNS-RV models show that there exists conditional time-varying volatility in the data. Relative to the DNS-SV specification, adding in realized volatility data makes the extracted stochastic volatility much less persistent and more variable. This leads to a lower autoregressive parameter and higher innovation standard deviation estimates for all of the stochastic volatility processes (Table 12All other in-sample estimation results are in the appendix.

18 Figure 2 Stochastic volatility for bond yield factors Level Factor Volatility DNS-RV DNS-SV Slope Factor Volatility DNS-RV DNS-SV Curvature Factor Volatility DNS-RV DNS-SV Notes: Posterior median of log stochastic volatility (h ) for bond yield factors from DNS-RV (left column) t and DNS-SV (right column). Red dotted line is volatility level estimated from DNS-C. Blue band is 80% credible interval. Estimation sample is from January 1981 to November 2009.

19 Table 3 Posterior Estimates of Parameters on h Equation t DNS-SV DNS-RV 5% 50% 95% 5% 50% 95% µ -4.40 -2.56 -1.48 -4.19 -3.92 -2.21 h,l µ -3.41 -2.30 -1.67 -3.22 -2.85 -1.94 h,s µ -1.48 -0.96 -0.47 -2.18 -1.88 -1.39 h,c φ 0.93 0.98 0.999 0.58 0.66 0.73 h,l φ 0.92 0.97 0.995 0.57 0.64 0.72 h,s φ 0.81 0.92 0.98 0.39 0.49 0.60 h,c σ 0.01 0.03 0.08 0.41 0.75 0.92 h,l σ 0.01 0.04 0.10 1.27 1.72 2.64 h,s σ 0.02 0.09 0.27 1.30 1.78 4.30 h,c Notes: Posterior moments are based on estimation sample from January 1981 to November 2009. 3). The DNS-RV model also delivers lower stochastic volatility mean estimates. These differences lead to differences in forecasting. For example, the lower autoregressive parameter in the DNS-RV model means that it predicts faster mean reversion of the stochastic volatilities relative to the DNS-SV model and the lower long-run mean estimate implies that the DNS- RV model produces a tighter density prediction in the long run. The smoothed stochastic volatilities from the DNS-SV model, however, generally captures the low-frequency volatility movements from the DNS-RV model. We argue that this difference in the stochastic volatilities matters for density forecasting. The high-frequency data used to construct the realized volatilities brings information that the low-frequency monthly yield data misses. By having more accurate estimates of the current level of time-varying volatility and volatility process parameters, the DNS-RV model both starts off forecasting at a more accurate point and better captures the dynamics of the data moving forward.

20 Table 4 RMSE comparison Maturity RW-C RW-SV RW-RV DNS-C DNS-SV DNS-RV RW-SV-RW DNS-SV-RW DNS-ME-SV DNS-ME-RV 1-step-ahead prediction 3 0.267 0.992 1.011 1.004 1.034** 1.007 0.991 1.038** 1.071 0.953** 12 0.229 1.005 1.002 1.076** 1.055** 1.003 1.002 1.060** 1.070** 1.072** 36 0.274 1.001 0.996 1.015 1.020 0.990 1.001 1.025 1.010 1.012 60 0.274 1.000 0.995 1.003 1.010 0.987 1.000 1.013 1.002 1.004 120 0.277 1.001 0.996 0.988 0.989 0.999 1.000 0.988 0.986 0.997 3-step-ahead prediction 3 0.506 1.002 1.011 1.036 1.066** 1.018 1.002 1.069** 1.055 1.024 12 0.537 1.002 0.999 1.072 1.073** 0.999 1.002 1.077** 1.064 1.063 36 0.580 1.000 0.998 1.022 1.036 0.990 0.999 1.040 1.018 1.018 60 0.551 1.000 0.998 1.004 1.017 0.986 0.999 1.021 1.005 1.004 120 0.489 1.000 0.998 0.982 0.987 0.995 1.000 0.988 0.987 0.987 6-step-ahead prediction 3 0.932 1.000 1.002 1.003 1.047 1.007 0.999 1.048 1.010 1.000 12 0.915 1.001 1.001 1.040 1.065 0.999 1.000 1.068 1.036 1.034 36 0.881 1.001 1.001 1.017 1.041 0.987 1.000 1.046 1.011 1.011 60 0.819 1.000 0.998 0.998 1.016 0.978 1.000 1.020 0.997 0.997 120 0.665 0.999 0.992* 0.977 0.982 0.980 0.998 0.986 0.981 0.980 12-step-ahead prediction 3 1.592 1.002* 0.999 0.936** 1.013 0.991 1.000 1.015 0.943** 0.940* 12 1.476 1.001 0.999 0.975 1.034 0.991 1.000 1.036 0.976 0.975 36 1.242 1.001 1.001 0.997 1.043 0.979** 1.001 1.046 0.993 0.993 60 1.069 1.002 1.000 0.997 1.031 0.967* 1.001 1.033 0.994 0.994 120 0.833 1.002 0.998 0.976 0.994 0.966 1.001 0.994 0.980 0.976 Notes: The first column shows the RMSE based on the RW-C. Other columns show the relative RMSE compared to the first column. The RMSE from the best model for each variable and forecast horizon is in bold letter. Units are in percentage points. Divergences in accuracy that are statistically different from zero are given by * (10%), ** (5%), *** (1%). We construct the p-values based on the Diebold and Mariano (1995) t-statistics with a variance estimator robust to serial correlation using a rectangular kernel of h−1 lags and the small-sample correction proposed by Harvey et al. (1997). 5.2 Point prediction Table 4 shows the RMSE of selected maturities for 1, 3, 6, and 12-step ahead predictions. The second column has the calculated RMSE values for the RW-C model. All other values reportedareratiosrelativetotheRW-CRMSE.Valuesbelow1indicatesuperiorperformance relative to the random walk benchmark. Stars in the table indicate significant gains relative to the RW-C model. As expected, the RMSE increases as the forecasting horizon lengthens. The models with random walk dynamics in the factors do well for short-horizon forecasts but deteriorate when compared to the stationary models as the prediction horizon lengthens. In general, all RMSE values have numbers close to 1, reproducing the well-known result in the bond yield forecasting literature on the difficulty in beating the no-change forecast. As

21 Table 5 Log predictive score comparison Maturity RW-C RW-SV RW-RV DNS-C DNS-SV DNS-RV RW-SV-RW DNS-SV-RW DNS-ME-SV DNS-ME-RV 1-step-ahead prediction 3 -0.538 0.205 0.343 0.008 0.198 0.375 0.181 0.173 0.048 0.102 12 -0.357 0.211 0.304 -0.009 0.195 0.316 0.189 0.172 0.012 0.013 36 -0.318 0.129 0.158 0.000 0.118 0.176 0.111 0.104 0.022 0.017 60 -0.265 0.109 0.077 -0.001 0.098 0.105 0.098 0.088 0.019 0.015 120 -0.248 0.110 0.111 0.011 0.121 0.110 0.106 0.108 0.048 0.037 3-step-ahead prediction 3 -1.061 0.183 0.302 0.016 0.165 0.322 0.151 0.133 0.036 0.061 12 -0.985 0.107 0.195 -0.012 0.064 0.196 0.089 0.045 0.007 0.010 36 -0.951 0.027 0.053 -0.002 -0.002 0.067 0.013 -0.018 0.009 0.007 60 -0.882 0.013 0.016 0.003 -0.011 0.035 0.007 -0.015 0.011 0.010 120 -0.779 0.033 0.020 0.014 0.033 0.031 0.036 0.038 0.027 0.029 6-step-ahead prediction 3 -1.485 0.048 0.157 0.028 0.023 0.159 0.033 0.005 0.038 0.053 12 -1.417 0.010 0.065 0.000 -0.058 0.066 0.000 -0.082 0.008 0.012 36 -1.342 -0.017 0.021 0.001 -0.077 0.030 -0.039 -0.095 0.010 0.012 60 -1.263 -0.031 0.008 0.010 -0.072 0.018 -0.045 -0.083 0.016 0.016 120 -1.100 0.031 0.049 0.023 0.038 0.061 0.029 0.033 0.036 0.034 12-step-ahead prediction 3 -1.940 -0.096 0.031 0.091 -0.098 0.041 -0.096 -0.119 0.086 0.094 12 -1.850 -0.071 -0.012 0.048 -0.147 -0.004 -0.089 -0.168 0.048 0.051 36 -1.688 -0.017 -0.001 0.023 -0.122 0.019 -0.049 -0.137 0.032 0.033 60 -1.560 0.005 0.036 0.024 -0.068 0.066 -0.011 -0.073 0.034 0.036 120 -1.387 0.099 0.151 0.030 0.101 0.193 0.098 0.092 0.047 0.045 Notes: ThefirstcolumnshowsthelogpredictivescorebasedontheRW-C.Othercolumnsshowthedifference of log predictive score from the first column. Log predictive score differences represent percentage point differences. Therefore, a difference of 0.1 corresponds to a 10% more accurate density forecast. The log predictive score from the best model is in bold letter for each variable and forecast horizon. alluded to in the previous section, adding in time-varying second moments does not largely impactpointpredictions,althoughtheDNS-RVmodelforecastsmiddlematuritieswellacross all horizons. For 12-month horizon forecasts, the DNS-C and DNS-ME-RV models also do well for short maturity yields. 5.3 Density prediction Table 5 shows the density evaluation result in terms of the log predictive score. Similar to the RMSE table, the RW-C column gives the value of the log predictive score for the random walk case while the numbers for the other models are differences relative to that column. A higher value indicates larger log predictive score and better density forecasting results. We present p-values based on Amisano and Giacomini (2007) comparing the hypothesis of

22 Table 6 Log predictive score of DNS-RV versus other models: p-values Maturity RW-C RW-SV RW-RV DNS-C DNS-SV RW-SV-RW DNS-SV-RW DNS-ME-SV DNS-ME-RV 1-step-ahead prediction 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12 0.00 0.00 0.13 0.00 0.00 0.00 0.00 0.00 0.00 36 0.00 0.22 0.01 0.00 0.18 0.10 0.11 0.01 0.01 60 0.08 0.93 0.02 0.10 0.88 0.87 0.72 0.17 0.15 120 0.07 0.99 0.94 0.11 0.82 0.93 0.98 0.30 0.22 3-step-ahead prediction 3 0.00 0.00 0.17 0.00 0.00 0.00 0.00 0.00 0.00 12 0.00 0.03 0.93 0.00 0.01 0.02 0.00 0.00 0.00 36 0.20 0.31 0.28 0.22 0.18 0.21 0.12 0.29 0.27 60 0.53 0.60 0.27 0.59 0.37 0.51 0.34 0.67 0.66 120 0.60 0.96 0.64 0.77 0.96 0.91 0.87 0.94 0.97 6-step-ahead prediction 3 0.12 0.04 0.93 0.25 0.04 0.02 0.03 0.24 0.32 12 0.38 0.28 0.98 0.44 0.15 0.20 0.12 0.45 0.50 36 0.57 0.16 0.74 0.64 0.19 0.09 0.16 0.73 0.75 60 0.77 0.20 0.74 0.90 0.18 0.12 0.14 0.98 0.97 120 0.35 0.52 0.78 0.55 0.64 0.51 0.59 0.68 0.66 12-step-ahead prediction 3 0.76 0.10 0.82 0.70 0.30 0.06 0.25 0.70 0.66 12 0.97 0.23 0.80 0.38 0.41 0.11 0.36 0.34 0.35 36 0.72 0.02 0.17 0.92 0.43 0.00 0.38 0.69 0.72 60 0.10 0.00 0.26 0.44 0.34 0.00 0.32 0.51 0.56 120 0.01 0.00 0.39 0.05 0.26 0.00 0.23 0.07 0.07 Notes: This table presents the p-values from Amisano and Giacomini (2007) tests comparing the hypothesis of equal log predictive score of the DNS-RV with alternative models. Bold letter indicates p-values less than 5%. Test statistics are computed with a variance estimator robust to serial correlation using a rectangular kernel of h−1 lags and the small-sample correction proposed by Harvey et al. (1997). equal log predictive score of the DNS-RV with those of the alternative models in Table 6. As opposed to the point prediction results, three interesting findings emerge when we consider the log predictive score. First, for the short-run horizon, having realized volatility gives significant gains in density prediction. This is on top of a large improvement in log predictive score from adding stochastic volatility, which Carriero et al. (2013) find. Table 6 shows that the DNS-RV model has significantly higher log predictive score values for one- and three-month ahead predictions for short term maturities when compared to most competitors13. By producing improved estimates of the current state of volatility, we would expect that short horizon forecasts have the largest gain. The improved density forecasting performance for the DNS-RV and RW-RV 13We also compute the model confidence set of Hansen et al. (2011) and find a similar result. See the supporting material for the model confidence set results.

23 models continues even up to a 6-month forecasting horizon. At one year ahead, most models with realized volatility have their volatility processes returning close to the unconditional mean, so the gain diminishes. Second, comparingRW-SVtoRW-SV-RW,RW-RVtoRW-RV-RW,andDNS-SVtoDNS- SV-RW shows that given a fixed conditional mean specification, a random walk specification on conditional volatility dynamics in general leads to poorer results. This illustrates the fact that even though conditional mean dynamics of bond yields approximate a random walk in our sample, conditional volatility dynamics exhibit mean reversion. Bond yields therefore do have forecastability, although simply looking at the conditional mean dynamics do not reveal this fact strongly. Third, an alternative specification for introducing stochastic volatility into the model by putting it on the measurement equation does not forecast as well as the specification with stochastic volatility on the transition equation. Comparing DNS-SV to DNS-ME-SV shows thatforshorthorizonforecasts,DNS-SVperformsbetterwhereasforlongerhorizonforecasts, DNS-ME-SV does better. A similar story holds when looking at DNS-RV and DNS-ME-RV, although DNS-RV does better even up to 6-month horizon forecasts with mixed 12-month horizon results. The measurement error specifications give consistent improvements in the log predictive score over and above the constant volatility models, although the gains are small. One fact holding back the performance of the measurement error specifications is that the measurement error variance explains a small portion of total bond yield variance. For example, the ratio of the standard deviations of smoothed measurement errors to the standard deviations of smoothed factors in the DNS-ME-SV model is often below 3 percent and never above 8 percent14. In Figure 3, we see that the model with stochastic volatility on the measurement errors does not generate movements in the conditional time-varying volatility 14This fact is similar across different model specifications. The values from all models are presented in the supporting materials.

24 Figure 3 Stochastic variance of individual yield Based on DNS-SV and ME-SV 3 month maturity 1 year maturity 3 year maturity 5 year maturity 8 year maturity 10 year maturity Notes: Red dotted line: Conditional variance from DNS-SV. Blue solid line: Conditional variance from DNS-ME-SV with 80% credible interval. Black dotted horizontal line: Conditional variance from DNS-C. Estimation sample is from January 1981 to November 2009. of various middle maturity yields. In fact, the conditional variances of the 1-year, 3-year, 5-year, and 8-year maturities are nearly on top of the black dotted line, which is the variance of the yields implied by the DNS-C model. Therefore, putting time-varying volatility in the measurement errors does not drastically change the model-implied predictive distributions. This explains why the density forecasting performance for this class of models mimics that of the DNS-C model. In contrast, the Figure 3 shows that putting stochastic volatility in the shocks to the bond yield factor can better capture time-varying volatility. The DNS-SV model-implied time-varying volatility consistently fits the narrative evidence of the Great Moderation from the mid-1980’s until the mid-2000’s. It also picks up the increases in volatility from the early 2000’s recession and recent financial crises.

25 6 Conclusion We investigate the effects of introducing realized volatility information on U.S. bond yield density forecasting. To do so, we develop a general estimation approach to incorporate realized volatility into dynamic factor models with stochastic volatility. We compare the performance of our benchmark model DNS-RV with a variety of different models proposed in the literature and find that the DNS-RV model produces superior density forecasts, especially for the short-run. In addition to this, incorporating time-varying volatility in general improves density prediction, time-varying volatility is better modeled as a stationary process as opposed to a random walk, and time-varying volatility in the factor equation generates better density predictions when compared to time-varying volatility on the measurement equation.

26 References Amisano, G. and R. Giacomini (2007): “Comparing Density Forecasts via Weighted Likelihood Ratio Tests,” Journal of Business & Economic Statistics, 25, 177–190. Andersen, T. G. and L. Benzoni (2010): “Do Bonds Span Volatility Risk in the U.S. Treasury Market? A Specification Test for Affine Term Structure Models,” The Journal of Finance, 65, 603–653. Andersen, T. G., T. Bollerslev, F. X. Diebold, and P. Labys (2003): “Modeling and Forecasting Realized Volatility,” Econometrica, 71, 579–625. Barndorff-Nielsen, O. E. and N. Shephard (2002): “Econometric Analysis of Realized Volatility and Its Use in Estimating Stochastic Volatility Models,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64, 253–280. Bianchi, F., H. Mumtaz, and P. Surico (2009): “The Great Moderation of the Term Structure of UK Interest Rates,” Journal of Monetary Economics, 56, 856 – 871. Carriero, A., T. E. Clark, and M. Marcellino (2013): “No Arbitrage Priors, Drifting Volatilities, and the Term Structure of Interest Rates,” Tech. rep., Working paper. Carter, C. K. and R. Kohn (1994): “On Gibbs Sampling for State Space Models,” Biometrika, 81, 541–553. Christensen, J. H., J. A. Lopez, and G. D. Rudebusch (2014): “Can Spanned Term Structure Factors Drive Stochastic Volatility?” Federal Reserve Bank of San Francisco Working Paper Series. Cieslak, A. and P. Povala (2015): “Information in the Term Structure of Yield Curve Volatility,” Journal of Finance, forthcoming. Clark, T. (2011): “Real-Time Density Forecasts From Bayesian Vector Autoregressions With Stochastic Volatility,” Journal of Business & Economic Statistics, 29, 327–341. Cogley, T. and T. J. Sargent (2005): “Drifts and Volatilities: Monetary Policies and Outcomes in the Post WWII US,” Review of Economic Dynamics, 8, 262 – 302. Collin-Dufresne, P., R. Goldstein, and C. Jones(2009): “CanInterestRateVolatilityBeExtractedfromtheCrossSectionofBondYields?” Journal of Financial Economics, 94, 47–66. Creal, D. and J. Wu (2014): “Interest Rate Uncertainty and Economic Fluctuations,” Working Paper, The University of Chicago Booth School of Business. Del Negro, M. and F. Schorfheide (2013): “DSGE Model-Based Forecasting,” in Handbook of Economic Forecasting, ed. by G. Elliott and A. Timmermann, Elsevier, vol. 2A, chap. 2, 57 – 140. Diebold, F. X. and C. Li (2006): “Forecasting the Term Structure of Government Bond Yields,” Journal of Econometrics, 130, 337 – 364.

27 Diebold, F. X. and R. S. Mariano (1995): “Comparing Predictive Accuracy,” Journal of Business & Economic Statistics, 20, 134–144. Diebold, F. X. and G. D. Rudebusch (2012): Yield Curve Modeling and Forecasting, Princeton University Press. Duffee, G. R. (2012): “Forecasting Interest Rates,” Working papers, The Johns Hopkins University, Department of Economics. Duffie, D. and R. Kan (1996): “A Yield-Factor Model Of Interest Rates,” Mathematical Finance, 6, 379–406. Egorov, A. V., Y. Hong, and H. Li (2006): “Validating Forecasts of the Joint Probability Density of Bond Yields: Can Affine Models Beat Random Walk?” Journal of Econometrics, 135, 255 – 284. Fama, E. F. and R. R. Bliss(1987): “TheInformationinLong-MaturityForwardRates,” The American Economic Review, 77, pp. 680–692. Geweke, J. and G. Amisano (2010): “Comparing and Evaluating Bayesian Predictive Distributions of Asset Returns,” International Journal of Forecasting, 26, 216 – 230. Gu¨rkaynak, R. S., B. Sack, and J. H. Wright (2007): “The U.S. Treasury Yield Curve: 1961 to the Present,” Journal of Monetary Economics, 54, 2291 – 2304. Hansen, P. R., Z. Huang, and H. H. Shek (2012): “Realized GARCH: a Joint Model for Returns and Realized Measures of Volatility,” Journal of Applied Econometrics, 27, 877–906. Hansen, P. R., A. Lunde, and J. M. Nason (2011): “The Model Confidence Set,” Econometrica, 79, 453–497. Harvey, D., S. Leybourne, and P. Newbold (1997): “Testing the Equality of Prediction Mean Squared Errors,” International Journal of Forecasting, 13, 281 – 291. Hautsch, N. and Y. Ou (2012): “Analyzing Interest Rate Risk: Stochastic Volatility in the Term Structure of Government Bond Yields,” Journal of Banking & Finance, 36, 2988 – 3007. Hautsch, N. and F. Yang (2012): “Bayesian Inference in a Stochastic Volatility Nelson– Siegel Model,” Computational Statistics & Data Analysis, 56, 3774 – 3792. Jin, X. and J. M. Maheu (2013): “Modeling Realized Covariances and Returns,” Journal of Financial Econometrics, 11, 335–369. Jungbacker, B., S. J. Koopman, and M. van der Wel (2013): “Smooth Dynamic Factor Analysis with Application to the US Term Structure of Interest Rates,” Journal of Applied Econometrics. Justiniano, A. and G. E. Primiceri (2008): “The Time-Varying Volatility of Macroeconomic Fluctuations,” The American Economic Review, 98, pp. 604–641.

28 Kim, S., N. Shephard, and S. Chib (1998): “Stochastic Volatility: Likelihood Inference and Comparison with ARCH Models,” The Review of Economic Studies, 65, 361–393. Koopman, S. J., M. I. P. Mallee, and M. Van der Wel (2010): “Analyzing the Term Structure of Interest Rates Using the Dynamic Nelson–Siegel Model With Time–Varying Parameters,” Journal of Business & Economic Statistics, 28, 329–343. Maheu, J. M. and T. H. McCurdy (2011): “Do High-Frequency Measures of Volatility Improve Forecasts of Return Distributions?” Journal of Econometrics, 160, 69 – 76. Primiceri, G. (2005): “Time Varying Structural Vector Autoregressions and Monetary Policy,” Review of Economic Studies, 72, 821–852. Shephard, N. and K. Sheppard (2010): “Realising the Future: Forecasting with High- Frequency-Based Volatility (HEAVY) Models,” Journal of Applied Econometrics, 25, 197– 231. Takahashi, M., Y. Omori, and T. Watanabe (2009): “Estimating Stochastic Volatility Models Using Daily Returns and Realized Volatility Simultaneously,” Computational Statistics & Data Analysis, 53, 2404 – 2426. van Dijk, D., S. J. Koopman, M. van der Wel, and J. H. Wright (2014): “Forecasting Interest Rates with Shifting Endpoints,” Journal of Applied Econometrics, 29, 693–712. Wright, J. and H. Zhou (2009): “Bond Risk Premia and Realized Jump Risk,” Journal of Banking & Finance, 33, 2333–2345.

A-1 Appendices A State Space Representation For completeness, we present the full specification of the state space form of the model. We give a detailed explanation of these equations in sections 2.1 and 2.2 of the main text. Consider a set of bond yields y = {y (1),...,y (N)}(cid:48). τ is the maturity in months of bond t t t j yield j and λ is the point of maximal curvature.   1 1−e−λτ1 1−e−λτ1 −eλτ1    λτ1 λτ1  f   l,t . . .      y =   f +(cid:15) , (cid:15) ∼ N(0,Q) (A.1) t    s,t t t . . .        f c,t 1 1−e−λτN 1−e−λτN −eλτN λτN λτN log(RV ) = β +Λ (cid:101)h +ζ , ζ ∼ N (0,S) (A.2) t h t t t (cid:0) (cid:0) (cid:1)(cid:1) f = (I −Φ )µ +Φ f +η , η ∼ N 0, diag e[h l,t ,hs,t,hc,t] (A.3) t 3 f f f t−1 t t h = µ (1−φ )+φ h +ν , ν ∼ N(0, σ2 ) (A.4) i,t i,h i,h i,h i,t−1 i,t i,t i,h for i = l,s,c and Q and S are diagonal matrix. We write (cid:101)h as a vector of the demeaned t volatility processs with individual elements (cid:101)h = h −µ . i,t i,t i,h B Measurement Equation for RV: Derivation and Approximation Equation A.2 is the linearized version of the nonlinear measurement equation that comes from adding realized volatility information to the dynamic factor model. We perform a

A-2 first-order approximation of the logarithm of the following equation RV ≈ Var (y ) = diag(Λ H Λ(cid:48) +Q) t t−1 t f t f (A.5) = diag(Λ(cid:101) H(cid:101) Λ(cid:101) (cid:48) +Q) f t f where we write the logarithm of volatility in deviation form (cid:101)h = h −µ for i = l,s,c. i,t i,t h,i Then H(cid:101) is a 3 × 3 diagonal matrix with each element corresponding to e(cid:101)hi,t and Λ(cid:101) = t f Λ [eµ l /2,eµs/2,eµc/2](cid:48) . We first derive the nonlinear measurement equation that links the f realized volatility with underlying factor volatility. Our derivation is similar to Maheu and McCurdy (2011) but we derive it under the dynamic factor model framework. Then, we describe the approximation to get the linearized measurement equation for RV 15. t Derivation of the measurement equation. We assume that RV is a noisy measure for t the true conditional volatility. Following Maheu and McCurdy (2011), we assume that RV t follows a log-normal distribution conditional on past histories of bond yields and bond yield realized volatilities: RV = diag(Λ(cid:101) H(cid:101) Λ(cid:101) (cid:48) +Q)Z , t f t f t where Z is a 17 × 1 vector and each element in Z follows a log-normal distribution with t t E [Z ] = 1 for i = 1,...,17. Taking logarithm on both sides gives t−1 i,t log(RV ) = β(cid:101)+log(diag(Λ(cid:101) H(cid:101) Λ(cid:101) (cid:48) +Q))+S(cid:101)ζ , ζ ∼ N(0,I ). (A.6) t f t f t t 17 where β(cid:101) is a 17 × 1 vector, S(cid:101) is a 17 × 17 diagonal matrix, and I is a 17 × 17 identity 17 matrix. We separately estimate β(cid:101) and S(cid:101) where β(cid:101) is a conditional variance of Z plus a bias t correction term (e.g. Takahashi et al., 2009, for the univariate case). 15The derivation and approximation of the measurement equations when we add stochastic volatility to the measurement errors Q follows the same basic procedure, so we suppress the details. t

A-3 Linearization. We present the derivation of equation 5. We linearize equation A.6 for the ith element around (cid:101)h = 0 for j = 1,2,3 and S(cid:101) = 0 for i = 1,...,17 to get j,t i,i (cid:32) (cid:33) 3 (cid:88) log(RV ) = β +ν Λ(cid:101) 2 (cid:101)h +S(cid:101)ζ , ζ ∼ N(0,I ). i,t i i f,i,j j,t t t 17 j=1 where Λ(cid:101)2 is the square of the (i,j)th element of Λ(cid:101) in equation 5 and f,i,j f (cid:32) (cid:33) 3 (cid:88) β = β(cid:101) +log Λ(cid:101) 2 +Q i i f,i,j i,i j=1 1 ν = . i (cid:16) (cid:17) (cid:80)3 Λ(cid:101)2 +Q j=1 f,i,j i,i (cid:16) (cid:17) Note that the term ν (cid:80)3 Λ(cid:101)2 (cid:101)h is linear in (cid:101)h and we can write it as Λ (cid:101)h . In i j=1 f,i,j j,t t h,i t addition, we write S = S(cid:101)S(cid:101)(cid:48). This completes the derivation of the RV measurement equation (equation A.2). C Estimation Procedure Presented is the algorithm for the posterior sampler. We draw 15,000 samples, saving every 5th draw, with the first 5,000 draws as burn-in. The priors we choose for the model are presented in Table A-1. We describe the algorithm for our main model, DNS-RV. Estimation of other model specifications is a straightforward simplification or generalization. We collect the parameters on which we would like to perform inference in one vector and write it as Θ∗ = {Q,β,S,f ,µ ,φ ,h ,µ ,φ ,σ2}16. In addition, we denote Θ∗ as the parameter 1:T f f 1:T h h h −x vector of all elements in Θ∗ except x and data as U.S. bond yields and their corresponding yield realized volatilities used in the estimation. The posterior sampler iterates the following 16While some authors have estimated the λ, we fix it at 0.0609, noting from Diebold and Li (2006) and others that the value does not move around too much across time and that its estimation does not seem to affect the results.

A-4 Table A-1 Prior Distribution Parameter Description Dim. Dist. Para(1) Para(2) H Variance of the measurement error 17×1 IG 0 0.001 (y ). t µ Long-run mean parameter for f . 3×1 N 0 100 f t φ AR(1) coefficient for f . 3×1 N 0.8 100 f t µ Long-run mean parameter for h . 3×1 N 0 100 h t φ AR(1) coefficient for h . 3×1 N 0.8 100 h t σ2 Variance of the innovation for the 3×1 IG 0.01 2 h h . t β Intercepts in the RV measurement 17×1 N 0 100 equation. Only used for M RV S Variance of the measurement error 17×1 IG 0 0.001 RV. Only used for M RV σ2 Variance of the innovation for the 3×1 IG 0.1 2 f f . Only used for models without t time-varying volatility Note: a) All prior distributions are independent. For example, prior distributions for elements in H are independent from each other and follow the inverse gamma distribution. b) Dim: Dimension of the parameters. c) IG: Inverse gamma distribution. Para(1) and Para(2) mean scale and shape parameters, respectively. d) N: Normal distribution. Para(1) and Para(2) stand for mean and variance, respectively. e) Priors for φ and φ are truncated so that the processes for factors and volatilities are stationary. f h f) M is the set of models with realized volatility data. RV steps starting with an initial value Θ0 and s = 1: 1. (Drawing Q|data,Θs−1): Since Q is diagonal, we draw the diagonal elements one at a −Q time. Note that without the RV measurement equation, each element of the diagonal term on Q is distributed as an inverse gamma distribution. For DNS-RV, Q enters in the realized measurement equation (equation A.6). In this case, we draw Q using the Metropolis-Hastings algorithm with a proposal distribution as an inverse gamma distribution with the same moments as in the conditional posterior distribution of Q without the RV measurement equation. Set Θs−1 = {Q∗,βs−1,Ss−1,fs−1,µs−1, 1:T f φs−1,hs−1,µs−1,φs−1,(σ2)s−1} where Q∗ is a new draw from this Metropolis-Hastings f 1:T h h h sampler. 2. (Drawing β,S|data,Θs−1 ): We can likewise draw β and the diagonal elements of S −β,S

A-5 equation-by-equation. It is a standard linear regression normal-inverse gamma framework. Set Θs−1 = {Q∗,β∗,S∗,fs−1,µs−1,φs−1,hs−1,µs−1,φs−1,(σ2)s−1} where β∗ and 1:T f f 1:T h h h S∗ is a new draw from the conditional posterior distribution of β and S. 3. (Drawing f |data,Θs−1 ): TheCarterandKohn(1994)multi-moveGibbssampling 1:T −f1:T procedure with stochastic volatility can be used to draw the level, slope, and curvature factors. Set Θs−1 = {Q∗,β∗,S∗,f∗ ,µs−1,φs−1,hs−1,µs−1,φs−1,(σ2)s−1} where f∗ is 1:T f f 1:T h h h 1:T a new draw from the multi-move Gibbs sampler. 4. (Drawing µ ,φ |data,Θs−1 ): Because we specify the factors and stochastic volatilf f −µ ,φ f f ities to have independent AR(1) processes, we can separate the drawing of the parameters for each factor. Drawing the parameters equation-by-equation is possible through the linear regression framework allowing for stochastic volatility. We generate draws from the conditional distribution using the Gibbs sampler laid out in Bianchi et al. (2009) and Hautsch and Yang (2012). Then, set Θs−1 = {Q∗,β∗,S∗,f∗ ,µ∗,φ∗, 1:T f f hs−1,µs−1,φs−1,(σ2)s−1}. 1:T h h h 5. (Drawing h |data,Θs−1 ): We have a measurement equation made up of two parts. 1:T −h1:T The first part uses the Kim et al. (1998) method to transform the level, slope, and curvature factor equations (equation A.3). These measurement equations are log (cid:0) (f −(1−φ )µ −φ f )2(cid:1) = h +log(η2 ) (A.7) i,t f,i f,i f,i i,t−1 i,t i,t fori = l,s,c. Thesecondpartistherealizedvolatilitymeasurementequation(equation A.2). Because of our linear approximation of the nonlinear RV measurement equation, we can simply use the standard Carter and Kohn (1994) multi-move Gibbs sampler in conjunction with the transition equation for the volatility processes (equation A.4). This step can be viewed as an extension of the Kim et al. (1998)’s sampler with an extra linear measurement equations. We set Θs−1 = {Q∗,β∗,S∗,f∗ ,µ∗,φ∗,h∗ , 1:T f f 1:T µs−1,φs−1,(σ2)s−1} where h∗ is a new draw from the sampler. h h h 1:T

A-6 6. (Drawing µ ,φ ,σ2|data,Θs−1 ): Conditional on the volatility process series, h h h −µ ,φ ,σ2 h h h h∗ , we can use a standard linear regression normal-inverse gamma framework to 1:T draw µ , φ , and σ2 equation-by-equation for i = l,s,c. Set h,i h,i h,i Θs = {Q∗,β∗,S∗,f∗ ,µ∗,φ∗,h∗ ,µ∗,φ∗,(σ2)∗} 1:T f f 1:T h h h where µ∗ , φ∗ , and (σ2 )∗ are newly drawn parameters from the conditional posterior h,i h,i h,i distribution. 7. Go to the step 1 with s ← s+1. D Forecasting Procedure Presented in equations A.8 - A.10 is the forecasting algorithm that we use. Because we are performingBayesiananalysis,weexplicitlytakeintoaccounttheparameteruncertaintywhen generating our forecasts. We first draw parameters from the relevant posterior distributions (j) and then simulate 10 trajectories of data given the parameter values (k). We do so for 2,000 parameter draws for a total of 20,000 simulated data chains from which to compare to the realized data (Del Negro and Schorfheide, 2013). Note that for the DNS-C model, we would not have equation A.10 and the H would become H. t yˆj,k = Λ fj,k +(cid:15)˜j,k, (cid:15)˜j,k ∼ N (cid:0) 0,Qj (cid:1) (A.8) t f t t t (cid:16) (cid:16) (cid:17)(cid:17) f t j,k = (I 3 −Φj f )µ f +Φj f f t j − ,k 1 +η˜ t j,k, η˜ t j,k ∼ N 0, diag e[hj l, , t k,hj s , , k t ,hj c , , k t ] (A.9) (cid:16) (cid:17) hj,k −µj = φj (hj,k −µj )+e˜j,k, e˜j,k ∼ N 0, (cid:0) σj (cid:1)2 (A.10) i,t i,h i,h i,t−1 i,h i,t i,t i,h for j = 1,...,2,000, k = 1,...,10, and t = T,...,T + 12 where T is the beginning of the forecasting period.

A-7 E Supporting Material This section contains additional tables and figures that support our claims in the main text. First, we present descriptive statistics of the data that we used in the estimation and forecasting exercises (section E.1). Then, we report parameter estimates (section E.2) and estimated bond yields factors f (section E.3). In section E.4, we report the relative t importance (ratio in %) of variation between the measurement error and f . Finally, as a t robustness check for the density forecasting evaluation conducted in section 5.3, we compute and present the model confidence set of Hansen et al. (2011) in section E.5. The model confidence set results list a subset of forecasting models that includes the best models in terms of the log predictive score at the 5% confidence level. E.1 Descriptive statistics of data Table A-2 Descriptive Statistics (Yields) Maturity mean std min max ρˆ(1) ρˆ(12) ρˆ(24) 3 5.35 3.14 0.04 16.02 0.97 0.65 0.39 6 5.52 3.17 0.15 16.48 0.98 0.66 0.40 9 5.64 3.19 0.19 16.39 0.98 0.67 0.42 12 5.75 3.19 0.25 16.10 0.98 0.69 0.44 15 5.87 3.21 0.38 16.06 0.98 0.70 0.46 18 5.95 3.20 0.44 16.22 0.98 0.71 0.47 21 6.03 3.19 0.53 16.17 0.98 0.71 0.49 24 6.06 3.15 0.53 15.81 0.98 0.72 0.50 30 6.18 3.11 0.82 15.43 0.98 0.73 0.52 36 6.29 3.08 0.98 15.54 0.98 0.74 0.54 48 6.48 3.02 1.02 15.60 0.98 0.75 0.57 60 6.60 2.94 1.56 15.13 0.98 0.76 0.59 72 6.73 2.92 1.53 15.11 0.98 0.77 0.61 84 6.81 2.84 2.18 15.02 0.98 0.77 0.61 96 6.90 2.81 2.11 15.05 0.98 0.78 0.63 108 6.95 2.79 2.15 15.11 0.98 0.78 0.63 120 6.95 2.72 2.68 15.19 0.98 0.77 0.63 Notes: For each maturity we present mean, standard deviation, minimum, maximum and the j-th order autocorrelation coefficients for j = 1, 12, and 24.

A-8 Table A-3 Descriptive Statistics (log realized volatility) Maturity mean std min max ρˆ(1) ρˆ(12) ρˆ(24) 3 -3.05 1.29 -5.33 1.10 0.65 0.31 0.19 6 -3.35 1.21 -6.66 0.64 0.69 0.28 0.18 9 -3.24 1.10 -5.90 0.47 0.66 0.23 0.10 12 -3.11 1.02 -5.76 0.41 0.63 0.21 0.04 15 -2.99 0.96 -5.68 0.36 0.61 0.20 0.01 18 -2.89 0.93 -5.59 0.29 0.61 0.20 -0.01 21 -2.81 0.90 -5.52 0.20 0.60 0.20 -0.02 24 -2.75 0.88 -5.46 0.11 0.60 0.20 -0.03 30 -2.66 0.85 -5.34 -0.04 0.59 0.20 -0.03 36 -2.61 0.83 -5.23 -0.12 0.60 0.21 -0.03 48 -2.57 0.79 -5.02 -0.19 0.61 0.22 -0.02 60 -2.57 0.77 -4.84 -0.22 0.61 0.23 -0.01 72 -2.58 0.76 -4.67 -0.25 0.61 0.25 0.00 84 -2.59 0.76 -4.66 -0.21 0.62 0.27 0.00 96 -2.61 0.75 -4.66 -0.18 0.63 0.29 0.01 108 -2.62 0.75 -4.66 -0.17 0.64 0.30 0.01 120 -2.63 0.75 -4.65 -0.18 0.66 0.31 0.01 Notes: For each maturity we present mean, standard deviation, minimum, maximum and the j-th order autocorrelation coefficients for j = 1, 12, and 24.

A-9 E.2 In-sample estimation (posterior moments) We denote {y ,y ,y ,...,y ,y } = monthly U.S. Treasury yields with maturities of (3, 6, 9, 1 2 3 16 17 12, 15, 18, 21 months and 2, 2.5, 3, 4, 5, 6, 7, 8, 9, 10 years. Table A-4 Posterior moments of H RW-C RW-SV RW-RV DNS-C DNS-SV DNS-RV RW-SV-RW DNS-SV-RW y1 5% 0.064 0.063 0.060 0.063 0.063 0.488 0.063 0.063 50% 0.074 0.072 0.068 0.072 0.072 0.610 0.072 0.072 95% 0.085 0.083 0.080 0.083 0.082 0.760 0.083 0.083 y2 5% 0.008 0.008 0.010 0.008 0.008 0.157 0.008 0.008 50% 0.010 0.009 0.012 0.009 0.009 0.202 0.009 0.009 95% 0.012 0.011 0.015 0.011 0.011 0.284 0.011 0.011 y3 5% 0.002 0.002 0.004 0.002 0.002 0.043 0.002 0.002 50% 0.003 0.003 0.005 0.002 0.002 0.067 0.002 0.003 95% 0.004 0.003 0.006 0.003 0.003 0.138 0.003 0.003 y4 5% 0.003 0.003 0.004 0.003 0.003 0.018 0.003 0.003 50% 0.004 0.004 0.005 0.004 0.004 0.038 0.004 0.004 95% 0.005 0.005 0.005 0.005 0.005 0.097 0.005 0.005 y5 5% 0.006 0.006 0.006 0.006 0.007 0.017 0.006 0.006 50% 0.007 0.007 0.007 0.007 0.007 0.035 0.007 0.007 95% 0.008 0.009 0.008 0.009 0.009 0.086 0.009 0.009 y6 5% 0.006 0.006 0.005 0.006 0.006 0.016 0.006 0.006 50% 0.006 0.006 0.006 0.006 0.007 0.034 0.007 0.007 95% 0.007 0.008 0.007 0.007 0.008 0.083 0.008 0.008 y7 5% 0.004 0.004 0.004 0.004 0.004 0.015 0.004 0.004 50% 0.005 0.005 0.005 0.005 0.005 0.033 0.005 0.005 95% 0.006 0.006 0.005 0.006 0.006 0.080 0.006 0.006 y8 5% 0.002 0.002 0.002 0.002 0.002 0.014 0.002 0.002 50% 0.002 0.002 0.003 0.002 0.002 0.033 0.002 0.002 95% 0.003 0.003 0.003 0.003 0.002 0.082 0.002 0.003 y9 5% 0.002 0.002 0.002 0.002 0.002 0.013 0.002 0.002 50% 0.002 0.002 0.003 0.002 0.002 0.029 0.002 0.002 95% 0.003 0.003 0.003 0.003 0.003 0.075 0.003 0.003 y10 5% 0.002 0.002 0.003 0.002 0.002 0.011 0.002 0.002 50% 0.003 0.003 0.003 0.003 0.003 0.026 0.003 0.003 95% 0.003 0.003 0.003 0.003 0.003 0.068 0.003 0.003 y11 5% 0.004 0.004 0.004 0.004 0.004 0.009 0.004 0.004 50% 0.004 0.004 0.004 0.004 0.004 0.021 0.004 0.004 95% 0.005 0.005 0.005 0.005 0.005 0.059 0.005 0.005 y12 5% 0.003 0.003 0.004 0.003 0.003 0.007 0.003 0.003 50% 0.004 0.004 0.004 0.004 0.004 0.017 0.004 0.004 95% 0.004 0.004 0.005 0.004 0.004 0.049 0.004 0.004 y13 5% 0.004 0.004 0.005 0.004 0.004 0.006 0.004 0.004 50% 0.005 0.005 0.006 0.005 0.005 0.013 0.005 0.005 95% 0.005 0.005 0.006 0.005 0.005 0.038 0.005 0.005 y14 5% 0.003 0.003 0.004 0.003 0.003 0.004 0.003 0.003 50% 0.004 0.004 0.005 0.004 0.004 0.011 0.004 0.004 95% 0.005 0.004 0.006 0.004 0.004 0.032 0.004 0.004 y15 5% 0.003 0.002 0.004 0.003 0.002 0.004 0.002 0.002 50% 0.003 0.003 0.005 0.003 0.003 0.011 0.003 0.003 95% 0.004 0.004 0.006 0.004 0.004 0.031 0.004 0.004 y16 5% 0.006 0.007 0.007 0.006 0.007 0.006 0.007 0.007 50% 0.007 0.008 0.008 0.008 0.008 0.012 0.008 0.008 95% 0.009 0.009 0.010 0.009 0.009 0.028 0.009 0.009 y17 5% 0.012 0.012 0.012 0.012 0.013 0.016 0.013 0.012 50% 0.014 0.014 0.014 0.014 0.014 0.029 0.014 0.014 95% 0.016 0.016 0.016 0.016 0.017 0.053 0.017 0.017 Notes: Variance of the measurement error. Not applicable to DNS-ME-SV and DNS-ME-RV. Estimation sample is from January 1981 to November 2009.

A-10 Table A-5 Posterior moments of parameters related to f t RW-C RW-SV RW-RV DNS-C DNS-SV DNS-RV RW-SV-RW DNS-SV-RW DNS-ME-SV DNS-ME-RV µ 5% 0.00 0.00 0.00 -7.28 -5.83 -17.48 0.00 -5.44 -7.20 -7.40 f,l 50% 0.00 0.00 0.00 4.63 4.57 -5.47 0.00 4.71 4.41 4.37 95% 0.00 0.00 0.00 7.86 7.17 2.15 0.00 6.94 7.83 7.79 µ 5% 0.00 0.00 0.00 -4.28 -4.04 -1.14 0.00 -4.28 -4.47 -4.49 f,s 50% 0.00 0.00 0.00 -2.54 -0.88 2.92 0.00 -0.98 -2.55 -2.59 95% 0.00 0.00 0.00 -1.43 6.93 13.81 0.00 6.46 -1.50 -1.43 µ 5% 0.00 0.00 0.00 -2.64 -2.13 -6.31 0.00 -1.78 -2.97 -2.91 f,c 50% 0.00 0.00 0.00 -0.82 -0.52 0.16 0.00 -0.42 -0.87 -0.79 95% 0.00 0.00 0.00 0.92 0.84 8.66 0.00 0.70 1.26 1.33 φ 5% 1.00 1.00 1.00 0.98 0.98 0.99 1.00 0.98 0.98 0.98 f,l 50% 1.00 1.00 1.00 0.99 0.99 1.00 1.00 0.99 0.99 0.99 95% 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 φ 5% 1.00 1.00 1.00 0.94 0.97 0.99 1.00 0.97 0.94 0.94 f,s 50% 1.00 1.00 1.00 0.96 0.99 1.00 1.00 0.99 0.96 0.97 95% 1.00 1.00 1.00 0.99 1.00 1.00 1.00 1.00 0.99 0.99 φ 5% 1.00 1.00 1.00 0.92 0.91 0.97 1.00 0.90 0.92 0.93 f,c 50% 1.00 1.00 1.00 0.95 0.95 0.99 1.00 0.94 0.96 0.96 95% 1.00 1.00 1.00 0.99 0.99 1.00 1.00 0.98 0.99 0.99 σ 5% 0.10 0.10 0.98 0.99 0.98 0.09 0.09 f,l 50% 0.12 0.11 0.99 1.00 0.99 0.11 0.10 95% 0.13 0.13 1.00 1.00 1.00 0.12 0.12 σ 5% 0.16 0.16 0.16 0.15 f,s 50% 0.19 0.19 0.18 0.17 95% 0.22 0.21 0.20 0.19 σ 5% 0.42 0.43 0.40 0.37 f,c 50% 0.49 0.51 0.46 0.44 95% 0.57 0.59 0.54 0.51 Notes: Estimation sample is from January 1981 to November 2009.

A-11 Table A-6 Posterior moments of parameters related to h t RW-C RW-SV RW-RV DNS-C DNS-SV DNS-RV RW-SV-RW DNS-SV-RW µ 5% -4.75 -4.11 -4.40 -4.19 0.00 0.00 h,l 50% -2.55 -3.85 -2.56 -3.92 0.00 0.00 95% -1.11 -1.76 -1.48 -2.21 0.00 0.00 µ 5% -3.39 -3.22 -3.41 -3.22 0.00 0.00 h,s 50% -2.30 -2.82 -2.30 -2.85 0.00 0.00 95% -1.69 -1.74 -1.66 -1.94 0.00 0.00 µ 5% -1.47 -2.25 -1.48 -2.18 0.00 0.00 h,c 50% -0.97 -1.93 -0.96 -1.88 0.00 0.00 95% -0.53 -1.45 -0.47 -1.39 0.00 0.00 φ 5% 0.93 0.59 0.93 0.58 1.00 1.00 h,l 50% 0.98 0.66 0.98 0.66 1.00 1.00 95% 1.00 0.73 1.00 0.73 1.00 1.00 φ 5% 0.92 0.56 0.92 0.57 1.00 1.00 h,s 50% 0.97 0.65 0.96 0.64 1.00 1.00 95% 0.99 0.72 1.00 0.72 1.00 1.00 φ 5% 0.74 0.40 0.81 0.39 1.00 1.00 h,c 50% 0.91 0.50 0.92 0.49 1.00 1.00 95% 0.98 0.61 0.98 0.60 1.00 1.00 σ 5% 0.01 0.39 0.01 0.41 0.01 0.01 h,l 50% 0.03 0.72 0.03 0.75 0.01 0.01 95% 0.08 0.88 0.08 0.92 0.03 0.03 σ 5% 0.01 1.29 0.01 1.27 0.01 0.01 h,s 50% 0.03 1.75 0.04 1.72 0.02 0.03 95% 0.10 3.00 0.10 2.64 0.06 0.05 σ 5% 0.01 1.41 0.02 1.30 0.01 0.01 h,c 50% 0.12 2.00 0.09 1.78 0.03 0.03 95% 0.40 4.00 0.27 4.30 0.09 0.09 Notes: Parameters related to h . Not applicable to DNS-C, RW-C, DNS-ME-SV and DNS-ME-RV. Estimat tion sample is from January 1981 to November 2009.

A-12 Table A-7 Posterior moments of parameters related to h for ME-SV model t µ φ σ2 h h h DNS-ME-SV DNS-ME-RV DNS-ME-SV DNS-ME-RV DNS-ME-SV DNS-ME-RV y 5% -4.80 -4.28 5% 0.91 0.91 5% 0.14 0.04 1 50% -3.61 -3.45 50% 0.95 0.95 50% 0.23 0.08 95% -2.50 -2.92 95% 0.99 0.98 95% 0.36 0.13 y 5% -5.75 -5.79 5% 0.89 0.91 5% 0.03 0.01 2 50% -4.88 -5.03 50% 0.95 0.95 50% 0.07 0.03 95% -4.14 -4.62 95% 0.99 0.99 95% 0.14 0.06 y 5% -6.28 -5.77 5% 0.53 0.88 5% 0.00 0.00 3 50% -5.48 -5.55 50% 0.96 0.93 50% 0.01 0.01 95% -4.96 -5.35 95% 1.00 0.98 95% 0.04 0.02 y 5% -6.52 -5.60 5% 0.24 0.87 5% 0.00 0.00 4 50% -5.43 -5.43 50% 0.96 0.92 50% 0.01 0.01 95% -4.65 -5.26 95% 1.00 0.97 95% 0.02 0.01 y 5% -7.89 -5.35 5% 0.95 0.86 5% 0.01 0.00 5 50% -5.40 -5.12 50% 0.98 0.92 50% 0.02 0.01 95% -4.15 -4.93 95% 1.00 0.97 95% 0.04 0.02 y 5% -10.34 -5.69 5% 0.97 0.88 5% 0.00 0.01 6 50% -5.95 -5.39 50% 0.99 0.93 50% 0.01 0.02 95% -5.14 -5.14 95% 1.00 0.97 95% 0.02 0.04 y 5% -9.71 -6.07 5% 0.97 0.89 5% 0.00 0.01 7 50% -6.16 -5.67 50% 0.99 0.94 50% 0.01 0.03 95% -5.46 -5.37 95% 1.00 0.98 95% 0.01 0.06 y 5% -8.41 -6.01 5% 0.92 0.84 5% 0.00 0.00 8 50% -5.92 -5.85 50% 0.98 0.91 50% 0.01 0.01 95% -5.03 -5.68 95% 1.00 0.96 95% 0.01 0.02 y 5% -7.06 -6.29 5% 0.93 0.87 5% 0.00 0.01 9 50% -6.11 -5.98 50% 0.97 0.93 50% 0.01 0.03 95% -5.63 -5.73 95% 1.00 0.97 95% 0.03 0.06 y 5% -8.33 -5.89 5% 0.95 0.83 5% 0.00 0.00 10 50% -5.91 -5.71 50% 0.98 0.90 50% 0.01 0.01 95% -5.00 -5.56 95% 1.00 0.95 95% 0.01 0.02 y 5% -7.80 -5.88 5% 0.95 0.86 5% 0.01 0.02 11 50% -5.53 -5.55 50% 0.98 0.92 50% 0.02 0.05 95% -0.82 -5.27 95% 1.00 0.96 95% 0.05 0.09 y 5% -8.28 -5.68 5% 0.95 0.84 5% 0.00 0.01 12 50% -5.69 -5.50 50% 0.98 0.91 50% 0.01 0.01 95% -4.91 -5.31 95% 1.00 0.96 95% 0.02 0.04 y 5% -8.00 -6.12 5% 0.95 0.88 5% 0.01 0.03 13 50% -5.48 -5.60 50% 0.99 0.94 50% 0.03 0.06 95% 0.85 -5.20 95% 1.00 0.98 95% 0.06 0.10 y 5% -9.18 -6.51 5% 0.97 0.92 5% 0.01 0.03 14 50% -5.94 -5.77 50% 0.99 0.96 50% 0.02 0.05 95% -2.26 -5.14 95% 1.00 0.99 95% 0.04 0.08 y 5% -8.72 -6.07 5% 0.96 0.88 5% 0.00 0.01 15 50% -5.95 -5.69 50% 0.99 0.93 50% 0.01 0.03 95% -2.15 -5.36 95% 1.00 0.98 95% 0.03 0.07 y 5% -9.02 -6.04 5% 0.96 0.89 5% 0.01 0.03 16 50% -5.84 -5.42 50% 0.99 0.94 50% 0.01 0.06 95% -4.57 -4.92 95% 1.00 0.98 95% 0.04 0.11 y 5% -6.53 -5.66 5% 0.93 0.91 5% 0.04 0.04 17 50% -5.02 -4.87 50% 0.97 0.95 50% 0.07 0.07 95% -3.06 -4.25 95% 1.00 0.99 95% 0.14 0.12 Notes: Parameters related to h . For DNS-ME-SV and DNS-ME-RV. Estimation sample is from January t 1981 to November 2009.

A-13 E.3 Extracted factors (f ) t Figure A-1 Extracted factors Level Factor DNS-C Various specifications Slope Factor DNS-C Various specifications Curvature Factor DNS-C Various specifications Notes: Left columns: Factors estimated from the DNS-C model with 80% credible intervals. Right column: Estimatedfactorsfromthevariousspecifications. ShadedbarsontherightpanelareNBERrecessiondates. 1) Factors are very similar to each other. 2) Factors are very accurately estimated. Estimation sample is from January 1981 to November 2009.

A-14 E.4 Relative importance (ratio in %) of variation between the measurement error and f . t Table A-8 Relative importance (ratio in %) of variation between the measurement error and f . t Maturity DNS-C DNS-SV DNS-RV DNS-ME-SV DNS-ME-RV 3 7.72 7.73 7.77 7.95 7.66 6 2.54 2.54 2.80 2.78 2.56 9 1.13 1.12 1.60 1.55 1.51 12 1.80 1.81 1.90 1.73 1.85 15 2.24 2.26 2.22 2.13 2.09 18 2.05 2.06 2.04 2.00 1.94 21 1.77 1.78 1.75 1.75 1.74 24 1.29 1.29 1.35 1.36 1.34 30 1.31 1.28 1.44 1.43 1.41 36 1.43 1.42 1.52 1.39 1.42 48 2.04 2.04 2.11 2.08 2.10 60 1.70 1.71 1.85 1.72 1.69 72 2.18 2.18 2.43 2.32 2.25 84 1.96 1.95 2.24 2.13 2.19 96 1.48 1.45 2.01 1.51 1.64 108 2.74 2.78 2.92 2.57 2.66 120 4.04 4.07 4.11 3.92 4.16 Notes: We calculate std(et)∗100 where e is measurement error and f is a vector level, slope, and curvature std(ft) t t factor. This table is to show that variation in the measurement equation is relatively smaller than variation in the factor component. Sizes of variation from the measurement error is about 1% ∼ 8 % of the variation fromthefactors. Mostlytheyarebelow3%except3monthand10yearbondyields. Thisevidencesupports that time-varying volatility in the transition equation (factor equation) plays much larger role in prediction. Calculated at the posterior median. Estimation sample is from January 1981 to November 2009.

A-15 E.5 Model confidence set Table A-9 Model Confidence Set (5%) Based on the log predictive score Maturity ListofModels 1-step-aheadprediction 3 DNS-RV 12 RW-RV DNS-RV 36 DNS-RV 60 DNS-SV DNS-RV RW-SV 120 DNS-RV RW-SV RW-RV DNS-SV 3-step-aheadprediction 3 RW-RV DNS-RV 12 DNS-SV RW-SV-RW RW-SV RW-RV DNS-RV 36 RW-C DNS-SV DNS-ME-RV DNS-ME-SV RW-SV-RW RW-SV RW-RV DNS-RV 60 DNS-SV-RW DNS-SV RW-C DNS-C RW-SV-RW DNS-ME-RV DNS-ME-SV RW-RV RW-SV DNS-RV 120 RW-RV DNS-ME-SV DNS-ME-RV RW-SV DNS-RV DNS-SV RW-SV-RW DNS-SV-RW 6-step-aheadprediction 3 DNS-SV DNS-ME-SV RW-SV DNS-ME-RV RW-RV DNS-RV 12 DNS-SV-RW DNS-SV DNS-C RW-C RW-SV-RW DNS-ME-SV RW-SV DNS-ME-RV RW-RV DNS-RV 36 DNS-SV-RW DNS-SV RW-SV-RW RW-SV DNS-C RW-C DNS-ME-SV DNS-ME-RV RW-RV DNS-RV 60 DNS-SV-RW RW-SV-RW DNS-SV RW-SV RW-C RW-RV DNS-C DNS-ME-RV DNS-ME-SV DNS-RV 120 RW-SV-RW DNS-C RW-SV DNS-SV-RW DNS-ME-RV DNS-ME-SV DNS-SV RW-RV DNS-RV 12-step-aheadprediction 3 RW-RV DNS-RV DNS-ME-SV DNS-C DNS-ME-RV 12 DNS-SV-RW DNS-SV RW-SV-RW RW-SV RW-RV RW-C DNS-RV DNS-C DNS-ME-SV DNS-ME-RV 36 DNS-SV-RW DNS-SV RW-SV-RW RW-SV RW-RV RW-C DNS-C DNS-RV DNS-ME-SV DNS-ME-RV 60 DNS-ME-SV RW-RV DNS-ME-RV DNS-RV 120 DNS-ME-RV DNS-ME-SV RW-SV RW-SV-RW DNS-SV-RW DNS-SV RW-RV DNS-RV Notes: This table lists a subset of forecasting models that includes the best models (in terms of the log predictive score) at the 5% confidence level. Specifically, we define the difference in the log predictive score for model i and j as d =LPS −LPS ij,t i,t j,t and define µ =E[d ]. Then, the set of best forecasts is defined as, ij ij,t M∗ ={i∈M:µ ≥0, ∀j ∈M}. ij We follow Hansen et al. (2011) to construct the model confidence set. We construct p-values using the stationary bootstrap with 10,000 replications and the average window length 12. Computation is based on the MFE Toolbox provided by Kevin Sheppard.

Cite this document
APA
Minchul Shin and Molin Zhong (2015). Does Realized Volatility Help Bond Yield Density Prediction? (FEDS 2015-115). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2015-115
BibTeX
@techreport{wtfs_feds_2015_115,
  author = {Minchul Shin and Molin Zhong},
  title = {Does Realized Volatility Help Bond Yield Density Prediction?},
  type = {Finance and Economics Discussion Series},
  number = {2015-115},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2015},
  url = {https://whenthefedspeaks.com/doc/feds_2015-115},
  abstract = {We suggest using "realized volatility" as a volatility proxy to aid in model-based multivariate bond yield density forecasting. To do so, we develop a general estimation approach to incorporate volatility proxy information into dynamic factor models with stochastic volatility. The resulting model parameter estimates are highly efficient, which one hopes would translate into superior predictive performance. We explore this conjecture in the context of density prediction of U.S. bond yields by incorporating realized volatility into a dynamic Nelson-Siegel (DNS) model with stochastic volatility. The results clearly indicate that using realized volatility improves density forecasts relative to popular specifications in the DNS literature that neglect realized volatility.},
}