feds · April 30, 2006

Do Macro Variables, Asset Markets, or Surveys Forecast Inflation Better?

Abstract

Surveys do! We examine the forecasting power of four alternative methods of forecasting U.S. inflation out-of-sample: time series ARIMA models; regressions using real activity measures motivated from the Phillips curve; term structure models that include linear, non-linear, and arbitrage-free specifications; and survey-based measures. We also investigate several methods of combining forecasts. Our results show that surveys outperform the other forecasting methods and that the term structure specifications perform relatively poorly. We find little evidence that combining forecasts produces superior forecasts to survey information alone. When combining forecasts, the data consistently places the highest weights on survey information.

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Do Macro Variables, Asset Markets, or Surveys Forecast Inflation Better? Andrew Ang, Geert Bekaert, and Min Wei 2006-15 NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Do Macro Variables, Asset Markets, or Surveys Forecast Inflation Better? ∗ Andrew Ang † Columbia University and NBER Geert Bekaert ‡ Columbia University, CEPR and NBER Min Wei § Federal Reserve Board of Governors This Version: 13 February, 2006 JEL Classification: E31, E37, E43, E44 Keywords: ARIMA, Phillips curve, forecasting, term structure models, Livingston ∗We thank Jean Boivin for kindly providing data. Andrew Ang acknowledges support from the National Science Foundation. We have benefitted from the comments of Todd Clark, Dean Croushore, Bob Hodrick, Jonas Fisher, Robin Lumsdaine, Michael McCracken, Antonio Moreno, Serena Ng, and Tom Stark, and seminar participants at Columbia University and Goldman Sachs Asset Management. Weespecially thanktheeditor, CharlesPlosser, andananonymous refereeforexcellent comments. The opinions expressed in this paper do not necessarily reflect those of the Federal Reserve Board or the FederalReservesystem. †Columbia Business School, 805 Uris Hall, 3022 Broadway, New York, NY 10027; ph: (212) 854- 9154;fax: (212)662-8474; email: aa610@columbia.edu; WWW:http://www.columbia.edu/ aa610 ∼ ‡Columbia Business School, 802 Uris Hall, 3022 Broadway, New York, NY 10027; ph: (212) 854- 9156; fax: (212) 662-8474; email: gb241@columbia.edu; WWW: http://www.gsb.columbia.edu/faculty/gbekaert §Federal Reserve Board of Governors, Division of Monetary Affairs, Washington, DC 20551; ph: (202) 736-5619; fax: (202) 452-2301; email: min.wei@frb.gov; WWW: www.federalreserve.gov/research/staff/weiminx.htm

Abstract Surveys do! We examinetheforecasting powerof four alternativemethodsof forecasting U.S. inflation out-of-sample: time-series ARIMA models; regressions using real activity measures motivated from the Phillips curve; term structure models that include linear, non-linear, and arbitrage-free specifications; and survey-based measures. We also investigate several methods ofcombiningforecasts. Ourresultsshowthatsurveysoutperformtheotherforecastingmethods and that the term structure specifications perform relatively poorly. We find littleevidence that combiningforecastsproducessuperiorforecaststosurveyinformationalone. Whencombining forecasts, thedataconsistentlyplaces thehighestweightson surveyinformation.

1 Introduction Obtainingreliableandaccurateforecastsoffutureinflationiscrucialforpolicymakersconductingmonetaryandfiscalpolicy;forinvestorshedgingtheriskofnominalassets;forfirmsmaking investment decisions and setting prices; and for labor and management negotiating wage contracts. Consequently,itisnosurprisethataconsiderableacademicliteratureevaluatesdifferent inflation forecasts and forecasting methods. In particular, economists use four main methods to forecast inflation. The first method is atheoretical, using time series models of the ARIMA variety. The second method builds on the economic model of the Phillips curve, leading to forecasting regressions that use real activity measures. Third, we can forecast inflation using information embedded in asset prices, in particular the term structure of interest rates. Finally, survey-based measures use information from agents (consumers or professionals) directly to forecast inflation. In this article, we comprehensively compare and contrast the ability of these four methods to forecast inflation out of sample. Our approach makes four main contributions to the literature. First, our analysis is the first to comprehensively compare the four methods: time-series forecasts, forecasts based on the Phillips curve, forecasts from the yield curve, and all three available surveys (the Livingston, Michigan, and SPF surveys). The previous literature has concentrated on only one or two of these different forecasting methodologies. For example, Stockton and Glassman (1987) show that pure time-series models out-perform more sophisticated macro models, but do not consider term structure models or surveys. Fama and Gibbons (1984) compare term structure forecasts with the Livingston survey, but they do not consider forecasts from macro factors. Whereas Grant and Thomas (1999), Thomas (1999) and Mehra (2002) show that surveys out-perform simple time-series benchmarks for forecasting inflation, noneofthesestudiescomparestheperformanceofsurveymeasureswithforecastsfromPhillips curveorterm structuremodels. Thelackofastudycomparingthesefourmethodsofinflationforecastingimpliesthatthere is no well-accepted set of findings regarding the superiorityof a particular forecasting method. The most comprehensive study to date, Stock and Watson (1999), finds that Phillips curvebased forecasts produce the most accurate out-of-sample forecasts of U.S. inflation compared with other macro series and asset prices, using data up to 1996. However, Stock and Watson only briefly compare the Phillips-curve forecasts to the Michigan survey and to simple regressions using term structure information. Stock and Watson do not consider no-arbitrage term structure models, non-linear forecasting models, or combined forecasts from all four forecast- 1

ingmethods. Recent workalsocasts doubtson therobustnessoftheStock-Watsonfindings. In particular, Atkeson and Ohanian (2001), Fisher, Liu and Zhou (2002), Sims (2002), and Cecchetti, Chu and Steindel (2000), among others, show that the accuracy of Phillips curve-based forecasts depends crucially on the sample period. Clark and McCracken (2006) address the issue of how instability in the output gap coefficients of the Phillips curve affects forecasting power. To assess the stability of the inflation forecasts across different samples, we consider out-of-sampleforecasts overboththepost-1985and post-1995periods. Our second contribution is to evaluate inflation forecasts implied by arbitrage-free asset pricingmodels. Previousstudiesemployingtermstructuredatamostlyuseonlythetermspread in simple OLS regressions and usually do not use all available term structure data (see, for example, Mishkin, 1990, 1991; Jorion and Mishkin, 1991; Stock and Watson, 2003). Frankel and Lown (1994) use a simple weighted average of different term spreads, but they do not imposeno-arbitragerestrictions. Incontrasttotheseapproaches,wedevelopforecastingmodels that useall availabledata and imposeno-arbitrage restrictions. Ourno-arbitrage term structure models incorporate inflation as a state variable because inflation is an integral component of nominal yields. The no-arbitrage framework allows us to extract forecasts of inflation from dataoninflationand assetprices takingintoaccount potentialtime-varyingriskpremia. No-arbitrageconstraintsarereasonableinaworldwherehedgefundsandinvestmentbanks routinelyeliminatearbitrage opportunitiesin fixed incomesecurities. Imposing theoretical noarbitrage restrictions may also lead to more efficient estimation. Just as Ang, Piazzesi and Wei (2004) show that no-arbitrage models produce superior forecasts of GDP growth, no-arbitrage restrictionsmayalsoproducemoreaccurateforecastsofinflation. Inaddition,thisisthefirstarticletoinvestigatenon-linear,no-arbitragemodelsofinflation. Weinvestigatebothanempirical regime-switchingmodelincorporatingtermstructureinformationandano-arbitrage,non-linear termstructuremodelfollowingAng, Bekaert andWei (2006)withinflationas astatevariable. Ourthirdcontributionisthatwethoroughlyinvestigatecombinedforecasts. StockandWatson (2002a, 2003), among others, show that the use of aggregateindices of many macro series measuring real activity produces better forecasts of inflation than individual macro series. To investigate this further, we also include the (Phillips curve-based) index of real activity constructed by Bernanke, Boivin and Eliasz (2005) from 65 macroeconomic series. In addition, several authors (see, e.g., Stock and Watson, 1999; Brave and Fisher, 2004; Wright, 2004) advocate combining several alternative models to forecast inflation. We investigate five different methods of combining forecasts: simple means or medians, OLS based combinations, and Bayesian estimatorswithequal orunitweightpriors. 2

Finally,ourmainfocusisforecastinginflationrates. Becauseofthelong-standingdebatein macroeconomics on the stationarity of inflation rates, we also explicitly contrast the predictive power of some non-stationary models to stationary models and consider whether forecasting inflationchangesalters therelativeforecastingabilityofdifferentmodels. Ourmajorempiricalresultscanbesummarizedasfollows. Thefirstmajorresultisthatsurveyforecasts outperformtheotherthreemethodsinforecasting inflation. ThatthemedianLivingston and SPF survey forecasts do well is perhaps not surprising, because presumably many ofthebestanalystsusetime-seriesandPhillipsCurvemodels. However,evenparticipantsinthe Michigan survey who are consumers, not professionals, produce accurate out-of-sample forecasts, which are only slightly worse than those of the professionals in the Livingston and SPF surveys. Wealsofindthatthebestsurveyforecastsarethesurveymedianforecaststhemselves; adjustmentstotakeintoaccountbothlinearandnon-linearbiasyieldworseout-of-sampleforecastingperformance. Second,termstructureinformationdoesnotgenerallyleadtobetterforecastsandoftenleads toinferiorforecaststhanmodelsusingonlyaggregateactivitymeasures. Whereasthisconfirms the results in Stock and Watson (1999), our investigation of term structure models is much more comprehensive. The relatively poor forecasting performance of term structure models extends to simple regression specifications, iterated long-horizon VAR forecasts, no-arbitrage affine models, and non-linear no-arbitrage models. These results suggest that whileinflation is very important for explaining the dynamics of the term structure (see, e.g., Ang, Bekaert and Wei,2006), yieldcurveinformationisless importantforforecasting futureinflation. Ourthirdmajorfindingis thatcombiningforecasts does notgenerally lead to betterout-ofsampleforecastingperformancethansingleforecastingmodels. Inparticular,simpleaveraging, likeusingthe mean or median ofa numberof forecasts, does not necessarily improvethe forecast performance, whereas linear combinations of forecasts with weights computed based on past performance and prior information generate the biggest gains. Even the Phillips curve models using the Bernanke, Boivin and Eliasz (2005) forward-looking aggregate measure of real activity mostly does not perform well relative to simpler Phillips curve models and never outperformsthesurveyforecasts. Thestrongsuccessofthesurveysinforecastinginflationoutof-sampleextendstosurveysdominatingothermodelsinforecastcombininationmethods. The data consistently place the highest weights on the survey forecasts and little weight on other forecastingmethods. The remainder of this paper is organized as follows. Section 2 describes the data set. In Section 3, we describe the time-series models, predictive macro regressions, term structure 3

models, and forecasts from survey data, and detail the forecasting methodology. Section 4 contains the empirical out-of-sample results. We examine the robustness of our results to a non-stationaryinflationspecification inSection 5. Finally,Section 6 concludes. 2 Data 2.1 Inflation Weconsiderfourdifferentmeasuresofinflation. Thefirstthreeareconsumerpriceindex(CPI) measures, including CPI-U for All Urban Consumers, All Items (PUNEW), CPI for All Urban Consumers,All ItemsLess Shelter (PUXHS)and CPI forAll Urban Consumers,All Items Less Food and Energy (PUXX), which is also called core CPI. The latter two measures strip outhighlyvolatilecomponentsinordertobetterreflect underlyingpricetrends(seethediscussion in Quah and Vahey, 1995). The fourth measure is the Personal Consumption Expenditure deflator (PCE). While all three surveys forecast a CPI-based inflation measure, PCE inflation features prominently in policy work at the Federal Reserve. All measures are seasonally adjustedandobtainedfromtheBureauofLaborStatisticswebsite. Thesampleperiodis1952:Q2 to2002:Q4forPUNEWandPUXHS,1958:Q2to2002:Q4forPUXX,and1960:Q2to2002:Q4 forPCE. Wedefine thequarterly inflationrate, π , fromt 1 tot as: t − P t π = ln , (1) t P (cid:18) t−1 (cid:19) where P is the level of one of the four inflation indices at time t. We use the terms “inflation” t and “inflation rate” interchangeably as defined in equation (1). We take one quarter to be our baseunitfor estimationpurposes,butforecast annual inflation,π ,from ttot+4: t+4,4 π = π +π +π +π , (2) t+4,4 t+1 t+2 t+3 t+4 whereπ isthequarterlyinflationrate inequation(1). t Empirical work on inflation has failed to come to a consensus regarding its stationarity properties. For example, Bryan and Cecchetti (1993) assume a stationary inflation process, whileNelsonandSchwert(1977)andStockandWatson(1999)assumethattheinflationprocess has a unit root. Most of our analysis assumes that inflation is stationary for two reasons. First, it is difficult to generate non-stationary inflation in standard economic models, whether they are monetary in nature, orof theNew Keynesianvariety (see Fuhrer and Moore, 1995; Holden 4

and Driscoll, 2003). Second, the working paper version of Bai and Ng (2004) recently rejects thenullof non-stationarityforinflation. That beingsaid, Cogleyand Sargent (2005)and Stock and Watson (2005) find evidence of changes in inflation persistence over time, with a random walk or integrated MA-process providing an accurate description of inflation dynamics during certain times. Furthermore, the use of a parsimonious non-stationary model may be attractive forforecasting. Inparticular,AtkesonandOhanian(2001)havemadetherandomwalkanatural benchmark to beat in forecasting exercises. Therefore, we consider whether our results are robusttoassumingnon-stationaryinflationinSection 5. Table 1 reports summary statistics for all four measures of inflation for the full sample in Panel A, and the post-1985 sample and the post-1995 sample in Panels B and C, respectively. Ourstatisticspertaintoannualinflation,π ,butwesamplethedataquarterly. Therefore,we t+4,t report the fourth autocorrelation for quarterly inflation, which corresponds to the first autocorrelation for annual inflation. Table 1 shows that all four inflation measures are lower and more stableduring the last two decades, in common with many othermacroeconomic series, includingoutput(seeKimandNelson,1999;McConnelland Perez-Quiros, 2000;Stockand Watson, 2002b). CoreCPI(PUXX)hasthelowestvolatilityofalltheinflationmeasures. PUXXvolatilityrangesfrom2.56%perannumoverthefullsampletoonly0.24%perannumpost-1996. The highervariabilityoftheothermeasures in thelatterpart ofthesamplemustbedueto food and energy price changes. In thelater sampleperiods, PCE inflation is, on average, lowerthan CPI inflation, which may be partly due to its use of a chain weighting in contrast to the other CPI measureswhich useafixed basket (seeClark, 1999). Inflationissomewhatpersistent(0.79%forPUNEWoverthefullsample),butitspersistence decreasesovertime,ascanbeseen fromthelowerautocorrelationcoefficientsforthePUNEW and the PUXHS measures after 1986, and for all measures after 1995. The correlations of the four measures of inflation with each other are all over 75% over the full sample. The comovementcanbeclearlyseeninthetoppanelofFigure1. Inflationislowerpriorto1969and after 1983, but reaches a high of around 14% during the oil crisis of 1973–1983. PUXX tracks both PUNEW and PUXHS closely, except during the 1973–1975 period, where it is about 2% lower than the other two measures, and after 1985, where it appears to be more stable than the other two measures. During the periods when inflation is decelerating, such as in 1955–1956, 1987–1988, 1998–2000 and most recently 2002–2003, PUNEW declines more gradually than PUXHS, suggesting that housing prices are less volatile than the prices of other consumption goodsduringtheseperiods. 5

2.2 Real Activity Measures Weconsidersix individualseries forreal activityalong withone compositereal activityfactor. We compute GDP growth (GDPG) using the seasonally adjusted data on real GDP in billions of chained 2000 dollars. The unemployment rate (UNEMP) is also seasonally adjusted and computed for the civilian labor force aged 16 years and over. Both real GDP and the unemployment rate are from the Federal Reserve Economic Data (FRED) database. We compute the output gap either as the detrended log real GDP by removing a quadratic trend as in Gali and Gertler (1999), which we term GAP1, or by using the Hodrick-Prescott (1997) filter (with the standard smoothness parameter of 1,600), which we term GAP2. At time t, both measures are constructed using only current and past GDP values, so the filters are run recursively. We also use the labor income share (LSHR), defined as the ratio of nominal compensation to total nominal output in the U.S. nonfarm business sector. We use two forward-looking indicators: theStock-Watson(1989)ExperimentalLeadingIndex(XLI)andtheirAlternativeNonfinancial ExperimentalLeading Index-2(XLI-2). Because Stock and Watson (2002a), among others, show that aggregating the information from many factors has good forecasting power, we also use a single factor aggregating the information from 65 individual series constructed by Bernanke, Boivin and Eliasz (2005). This single real activity series, which we term FAC, aggregates real output and income, employment and hours, consumption, housing starts and sales, real inventories, and average hourly earnings. The sample period for all the real activity measures is 1952:Q2 to 2001:Q4, except the Bernanke-Boivin-Eliasz real activity factor, which spans 1959:Q1 to 2001:Q3. We use the composite real activity factor at the end of each quarter for forecasting inflation over the next year.1 The real activity measures have the disadvantage that they may use information that is not actually available at the time of the forecast, either through data revisions, or because of full sampleestimationinthecaseoftheBernanke-Boivin-Eliaszmeasure. Thisbiasestheforecasts fromPhillipscurvemodelstobebetterthanwhatcouldbeactuallyforecastedusingareal-time data set. The use of real time economic activity measures produces much worse forecasts of 1To achieve stationarity of the underlying individualmacro series, various transformationsare employed by Bernanke,BoivinandEliasz(2005).Inparticular,manyseriesarefirstdifferencedatamonthlyfrequency.Better forecastingresultsmightbepotentiallyobtainedbytakingalong12-monthdifferencetoforecastannualinflation (see comments by, among others, Plosser and Schwert, 1978), or pre-screening the variables to be used in the constructionofthecompositefactor(seeBoivinandNg,2006).Wedonotconsidertheseadjustmentsandusethe originalBernanke-Boivin-Eliaszseries. 6

future inflation compared to the use of revised economic series in Orphanides and van Norden (2001) but only slightly worse forecasts for both inflation and real activity in Bernanke and Boivin (2003). Nevertheless, our forecast errors using real activity measures are likely biased downwards. 2.3 Term Structure Data The term structure variables are zero-coupon yields for the maturities of 1, 4, 8, 12, 16, and 20 quarters from CRSP spanning 1952:Q2to 2001:Q4. The one-quarter rate is from theCRSP Fama risk-free rate file, while all other bond yields are from the CRSP Fama-Bliss discount bond file. All yields are continuouslycompounded and expressed at a quarterly frequency. We define the short rate (RATE) to be the one-quarter yield and define the term spread (SPD) to be the difference between the 20-quarter yield and the short rate. Some of our term structure modelsalso usefour-quarter and12-quarteryieldsforestimation. 2.4 Surveys We examine three inflation expectation surveys: the Livingston survey, the Survey of ProfessionalForecasters(SPF),andtheMichigansurvey.2 TheLivingstonsurveyisconductedtwicea year,inJuneandinDecember,andpollseconomistsfromindustry,government,andacademia. TheLivingstonsurveyrecordsparticipants’forecastsofnon-seasonally-adjustedCPIlevelssix and twelve months in the future and is usually conducted in the middle of the month. Unlike the Livingstonsurvey, participants in the SPF and the Michigan survey forecast inflation rates. Participants in the SPF are drawn primarily from business, and forecast changes in the quarterlyaverageofseasonally-adjustedCPI-U levels. TheSPFisconductedinthemiddleofevery quarter and the sample period for the SPF median forecasts is from 1981:Q3 to 2002:Q4. In contrast to theLivingstonsurveyand SPF, theMichigansurveyis conducted monthlyand asks households, rather than professionals, to estimate expected price changes over the next twelve months. We use the median Michigan survey forecast of inflation over the next year at the end ofeach quarterfrom 1978:Q1to 2002:Q4. 2WeobtaindatafortheLivingstonsurveyandSPFdatafromthePhiladelphiaFedwebsite(http://www.phil.frb. org/econ/livand http://www.phil.frb.org/econ/spf,respectively). We take the Michigan survey data from the St. LouisFederalReserveFREDdatabase(http://research.stlouisfed.org/fred2/series/MICH/).MedianMichigansurvey data is also available from the University of Michigan’s website (http://www.sca.isr.umich.edu/main.php. However,therearesmalldiscrepanciesbetweenthetwo sourcesbeforeSeptember1996. We choosetouse data fromFREDbecauseitisconsistentwiththevaluesreportedinCurtin(1996). 7

There are some reporting lags between the time the surveys are taken and the public dissemination of their results. For the Livingstonand the SPF surveys, there is a lag of about one week between the due date of the survey and their publication. However, these reporting lags are largely inconsequential for our purposes. What matters is the information set used by the forecasters in predicting future inflation. Clearly, survey forecasts must use less up to date informationthaneithermacro-economicorterm structureforecasts. Forexample,theLivingston surveyforecasterspresumablyuseinformationuptoatmostthebeginningofJuneandDecember, and mostly do not even have the May and Novemberofficial CPI numbers available when making a forecast. The SPF forecasts can only use information up to at most the middleof the quarterandwhilewetakethefinalmonthofthequarterfortheMichigansurvey,consumersdo not have up-to-date economic data available at the end of the quarter. But, for the economist forecasting annual inflation with the surveys, all survey data is publicly available at the end of each quarter for the SPF and Michigan surveys, and at the end of each semi-annual period for theLivingstonsurvey. Togetherwith theslightdata advantagespresent in revised, fitted macro data,weare infact biasingtheresultsagainstsurveyforecasts. The Livingston survey is the only survey available for our full sample. In the top panel of Figure1, whichgraphsthefullsampleofinflationdata, wealsoincludetheunadjustedmedian Livingston forecasts. We plot the survey forecast lagged one year, so that in December 1990, we plot inflation from December 1989 to December 1990 togetherwith the surveyforecasts of December 1989. The Livingston forecasts broadly track the movements of inflation, but there areseverallargemovementsthattheLivingstonsurveyfailstotrack,forexamplethepickupin inflation in 1956–1959, 1967–1971, 1972–1975, and 1978–1981. In the bottom panel of Figure 1, we graph all three survey forecasts of future one-year inflation together with the annual PUNEWinflation, where the surveyforecasts are lagged one year for direct comparison. After 1981, all survey forecasts movereasonably closely togetherand track inflation movementsrelatively well. Nevertheless, there are still some notable failures, like the slowdowns in inflation intheearly 1980sandin 1996. 3 Forecasting Models and Methodology In this section, we describe the forecasting models and describe our statistical tests. In all our out-of-sample forecasting exercises, we forecast future annual inflation. Hence, for all our 8

models,wecomputeannual inflationforecasts of: 4 E (π ) = E π , (3) t t+4,4 t t+i ! i=1 X whereπ isannual inflationfromtto t+4 defined in equation(2). t+4,4 InSections3.1to3.4,wedescribeour39forecastingmodels. Table2containsafullnomenclature. Section 3.1 focuses on time-series models of inflation, which serve as our benchmark forecasts; Section 3.2 summarizes our OLS regression models using real activity macro variables; Section 3.3 describes the term structure models incorporating inflation data; and finally, Section 3.4 describes oursurvey forecasts. In Section 3.5, we define the out-of-sampleperiods and list the criteria that we use to assess the performance of out-of-sample forecasts. Finally, Section3.6 describesourmethodologytocombinemodelforecasts. For all models except OLS regressions, we compute implied long-horizon forecasts from single-period(quarterly)models. WhileSchorfheide (2005)showsthat in theory,iterated forecasts need not be superior to direct forecasts from horizon-specific models, Marcellino, Stock and Watson (2006) document the empirical superiority of iterated forecasts in predicting U.S. macroeconomic series. For the OLS models, we compute the forecasts directly from the longhorizonregressionestimates. 3.1 Time-Series Models ARIMAModels If inflation is stationary, the Wold theorem suggests that a parsimonious ARMA(p,q) model mayperformwellinforecasting. WeconsidertwoARMA(p,q)models: anARMA(1,1)model andapureautoregressivemodelwithplags,AR(p). TheoptimallaglengthfortheARmodelis recursively selected using the Schwartz criterion (BIC) on the in-sample data. The motivation for the ARMA(1,1) model derives from a long tradition in rational expectations macroeconomics (see Hamilton, 1985) and finance (see Fama, 1975) that models inflation as the sum of expected inflation and noise. If expected inflation follows an AR(1) process, then the reducedformmodelforinflationisgivenbyanARMA(1,1)model. TheARMA(1,1)modelalsonicely fitstheslowlydecaying autocorrelogramofinflation. ThespecificationsoftheARMA(1,1)model, π = µ+φπ +ψε +ε , (4) t+1 t t t+1 9

andtheAR(p) model, π = µ+φ π +φ π +...+φ π +ε , (5) t+1 1 t 2 t−1 p t−p+1 t+1 areentirelystandard. TheARMA(1,1)modelisestimatedbymaximumlikelihood,conditional on a zero initial residual. We compute the implied inflation level forecast over the next year expressedat aquarterlyfrequency. FortheARMA(1,1)model,theforecast is: 1 φ(1 φ4) φ(1 φ4) (1 φ4)ψ E (π ) = 4 − µ+ − π + − ε . t t+4,4 t t 1 φ − (1 φ) (1 φ) (1 φ) − (cid:20) − (cid:21) − − Tofacilitatetheforecastsofannualinflation,wewritetheAR(p)modelinfirst-ordercompanion form: X = A+ΦX +U , t+1 t t+1 where π µ φ φ ... φ ε t 1 2 p t  π   0   1 0 ... 0   0  t−1 X = , A = , Φ = and U = . t  . . .   . . .   . . . . . . ... . . .  t  . . .                   π t−p+1   0   0 0 ... 0   0                  Then,theforecast fortheAR(p)modelis givenby: E (π ) = e′ (I Φ) −1 4I Φ(I Φ) −1 I Φ 4 A+e′Φ(I Φ) −1 I Φ 4 X , t t+4,4 1 1 t − − − − − − (cid:0) (cid:0) (cid:1)(cid:1) (cid:0) (cid:1) wheree is ap 1selectionvectorcontainingaonein thefirst row andzeros elsewhere. 1 × OurthirdARIMAbenchmarkisarandomwalk(RW)forecast whereπ = π +ε ,and t+1 t t+1 E (π ) = 4π . Inspired by Atkeson and Ohanian (2001), we also forecast inflation using a t t+4,4 t random walk model on annual inflation, where the forecast is given by E (π ) = π . We t t+4,4 t,4 denotethisforecast as AORW. Regime-Switching Models Evans and Wachtel (1993), Evans and Lewis (1995), and Ang and Bekaert (2004), among others, document regime-switching behavior in inflation. A regime-switching model may potentially account for non-linearities and structural changes, such as a sudden shift in inflation expectationsafterasupplyshock,orachangeininflationpersistence. We estimate the following univariate regime-switching model for inflation, which we term RGM: π = µ(s )+φ(s )π +σ(s )ε (6) t+1 t+1 t+1 t t+1 t+1 10

The regime variable s = 1,2 follows a Markov chain with constant transition probabilities t P = Pr(s = 1 s = 1) and Q = Pr(s = 2 s = 2). The model can be estimated using t+1 t t+1 t | | the Bayesian filter algorithms of Hamilton (1989) and Gray (1996). We compute the implied annual horizon forecasts of inflation from equation (6), assuming that the current regime is the regime that maximizes the probability Pr(s I ). This is a byproduct of the estimation t t | algorithm. 3.2 Regression Forecasts Based on the Phillips Curve In standard Phillips curve models of inflation, expected inflation is linked to some measure of the output gap. There are both forward- and backward-looking Phillips curve models, but ultimately even forward-looking models link expected inflation to the current information set. According to the Phillips curve, measures of real activity should be an important part of this information set. We avoid the debate regarding the actual measure of the output gap (see, for instance, Gali and Gertler, 1999) by taking an empirical approach and using a large number of real activity measures. We choose not to estimate structural models because the BIC criterion is likely to choose the empirical model best suitable for forecasting. Previous work often finds thatmodelswiththeclearesttheoreticaljustificationoftenhavepoorpredictivecontent(seethe literaturesummarybyStock and Watson,2003). Theempiricalspecificationwe estimateis: π = α+β(L)′X +ε (7) t+4,4 t t+4,4 where X combinesπ and oneor two real activitymeasures. The lag length in thelag polynot t mialβ(L)isselectedbyBIConthein-sampledataandissettobeequalacrossalltheregressors inX . Thechosenspecificationtendstohavetwoorthreelagsinourforecastingexercises. We t listthecompleteset ofreal activityregressorsin Table2as PC1 to PC10. Inournextsection,weextendtheinformationsettoincludetermstructureinformation. RegressionmodelswheretermstructureinformationisincludedinX alongwithinflationandreal t activity are potentially consistent with a forward-looking Phillips curve that includes inflation and real activity measures in the information set. Such models can approximate the reduced form of a more sophisticated, forward-looking rational expectations Phillips curve model of inflation(see, forinstance,Bekaert, Cho and Moreno,2005). 11

3.3 Models Using Term Structure Data We consider a variety of term structure forecasts, including augmenting the simple Phillips Curve OLS regressionswith short rate and term spread variables; long-horizon VARforecasts; a regime-switching specification; affine term structure models; and term structure models incorporatingregimeswitches. Weoutlineeach ofthesespecificationsinturn. LinearNon-Structural Models We begin by augmenting the OLS Phillips Curve models in equation (7) with the short rate, RATE, and the term spread, SPD, as regressors in X . Specifications TS1–TS8 add RATE to t the Phillips Curve Curve specifications PC1–PC8. TS9 and TS10 only use inflation and term structure variables as predictors. TS9 uses inflation and the lagged term spread, producing a forecastingmodelsimilartothespecificationinMishkin(1990,1991). TS10addstheshortrate tothisspecification. Finally,TS11 addsGDPgrowthtotheTS10 specification. We also consider forecasts with a VAR(1) in X , where X contains RATE, SPD, GDPG, t t andπ : t X = µ+ΦX +ε . (8) t+1 t t+1 Although the VAR is specified at a quarterly frequency, we compute the annual horizon forecast of inflationimplied by theVAR. Wedenote thisforecasting specification as VAR. As Ang, Piazzesi and Wei (2004) and Cochrane and Piazzesi (2005) note, a VAR specification can be economically motivated from the fact that a reduced-form VAR is equivalent to a Gaussian termstructuremodelwherethetermstructurefactorsareobservableyieldsandcertainassumptions on risk premia apply. Under these restrictions, a VAR coincides with a no-arbitrage term structuremodelonlyfor thoseyieldsincludedin theVAR. However,the VARdoes notimpose over-identifying restrictions generated by the term structure model for yields not included as factors intheVAR. AnEmpirical Non-Linear Regime-Switching Model A large empirical literature has documented the presence of regime switches in interest rates (see,amongothers,Hamilton,1988;Gray,1996;Bekaert,HodrickandMarshall,2001). Inparticular, Ang and Bekaert (2002) show that regime-switching models forecast interest rates better than linear models. As interest rates reflect information in expected inflation, capturing the regime-switchingbehaviorininterestratesmayhelpinforecastingpotentiallyregime-switching dynamicsofinflation. 12

Weestimatearegime-switchingVAR, denotedas RGMVAR: X = µ(s )+ΦX +Σ(s )ε , (9) t+1 t+1 t t+1 t+1 where X contains RATE, SPD and π . Similar to the univariate regime-switching model in t t equation (6), s = 1 or 2 and follows a Markov chain with constant transition probabilities. t We computeout-of-sample forecasts from equation (9) assuming that the current regime is the regimewiththehighestprobabilityPr(s I ). t t | No-ArbitrageTerm Structure Models We estimate two no-arbitrage term structure models. Because such models have implications for the complete yield curve, it is straightforward to incorporate additional information from theyieldcurveintotheestimation. Such additionalinformationis absent in theempiricalVAR specified in equation (8). Concretely, both no-arbitrage models have two latent variables and quarterly inflation as state variables, denoted by X . We estimate the models by maximum t likelihood,andfollowingChenandScott(1993),assumethattheone-and20-quarteryieldsare measured withouterror, and theotherfour- and 12-quarteryields are measured witherror. The estimated models build on Ang, Bekaert and Wei (2006), who formulate a real pricing kernel as: 1 M = exp r λ′λ λ ε . (10) t+1 − t − 2 t t − t t+1 (cid:18) (cid:19) Here, λ is a 3 1 real price of risk vector. The real short rate is an affine function of t c × the state variables. The nominal pricing kernel is defined in the standard way as M = t+1 M exp( π ). Bondsare priced usingtherecursion: t+1 t+1 − c exp( − ny t n) = E t [M t+1 exp( − (n − 1)y t n + − 1 1 )], whereyn isthen-quarterzero-coupon bondyield. t Thefirstno-arbitragemodel(MDL1)isanaffinemodelintheclassofDuffieandKan(1996) withaffine, time-varyingriskpremia(seeDai andSingleton,2002;Duffee, 2002)modelledas: λ = λ +λ X . (11) t 0 1 t where λ is a 3 1 vector and λ a 3 3 diagonal matrix. The state variables follow a linear 0 1 × × VAR: X = µ+ΦX +Σε . (12) t t−1 t+1 Thesecondmodel(MDL2)incorporatesregimeswitchesandisdevelopedbyAng,Bekaert and Wei (2006). Ang, Bekaert and Wei show that this model fits the moments of yields and 13

inflationverywellandalmostexactlymatchestheautocorrelogramofinflation. MDL2replaces equation(12)withtheregime-switchingVAR: X = µ(s )+ΦX +Σ(s )ε , (13) t t+1 t−1 t+1 t+1 andalso incorporatesregimeswitchesin theprices ofrisk,replacing equation(11)with λ = λ (s )+λ X . (14) t 0 t+1 1 t There are four regime variables s = 1,...,4 in the Ang, Bekaert and Wei (2006) model rept resenting all possiblecombinationsof two regimes ofinflation and two regimes of a real latent factor. In estimating MDL1 and MDL2, we impose the same parameter restrictions necessary for identification as Ang, Bekaert and Wei (2006) do. For both MDL1 and MDL2, we compute out-of-sampleforecasts ofannual inflation,butthemodelsare estimatedusingquarterly data. 3.4 Survey Forecasts We produce estimates of E (π ) from the Livingston, SPF, and the Michigan surveys. We t t+4,4 denotetheactual forecasts fromtheSPF, Livingstonand Michigansurveysas SPF1,LIV1, and MCH1, respectively. Producing Forecastsfrom Survey Data Participants in the Livingston survey are asked to forecast a CPI level (not an inflation rate). Given the timing of the survey, Carlson (1977) carefully studies the forecasts of individual participants in the Livingston survey and finds that the participants generally forecast inflation over the next 14 months. We follow Thomas (1999) and Mehra (2002) and adjust the raw Livingstonforecasts byafactor of12/14to obtainan annualinflationforecast. Participants in both the SPF and the Michigan surveys do not forecast log year-on-year CPI levels according to the definition of inflation in equation (1). Instead, the surveys record simple expected inflation changes, E (P /P 1). This differs from E (logP /P ) by a t t+4 t t t+4 t − Jensen’s inequality term. In addition, the SPF participants are asked to forecast changes in the quarterly average of seasonally-adjusted PUNEW (CPI-U), as opposed to end-of-quarter changes in CPI levels. In both the SPF and the Michigan survey, we cannot directly recover forecastsofexpectedlogchanges inCPI levels. Instead,wedirectlyusetheSPF and Michigan survey forecasts to represent forecasts of future annual inflation as defined in equation (3). We 14

expectthattheeffectsofthesemeasurementproblemsaresmall.3 Inanycase,theJensen’sterm biases our survey forecasts upwards, imparting a conservative upward bias to our Root Mean Squared Error(RMSE) statistics. Adjusting Surveys forBias Several authors, including Thomas (1999), Mehra (2002), and Souleles (2004), document that surveyforecastsarebiased. Wetakeintoaccountthesurveybiasbyestimatingα andβ inthe 1 1 regressions: π = α +β fS +ε , (15) t+4,4 1 1 t t+4,4 where fS is the forecast from the candidate survey S. For an unbiased forecasting model, t α = 0 and β = 1. We denote survey forecasts that are adjusted using regression (15) as 1 1 SPF2, LIV2, and MCH2 for the SPF, Livingston, and Michigan surveys, respectively. The bias adjustment occurs recursively, that is, we update the regression with new data points each quarterandre-estimatethecoefficients. Table 3 provides empirical evidence regarding these biases using the full sample. For each inflation measure, the first three rows report the results from regression (15). The SPF survey forecasts produce β s that are smaller than one for all inflation measures, which are, with the 1 exception of PUXX, significant at the 95% level. However, the point estimates of α are also 1 positive, although mostly not significant, which implies that at low levels of inflation, the surveysunder-predictfutureinflationandat highlevelsofinflationthesurveysover-predictfuture inflation. The turning point is 0.852/(1 0.694) = 2.8%, so that the SPF survey mostly over- − predicts inflation. The Livingston and Michigan surveys produce largely unbiased forecasts because the slope coefficients are insignificantly different from one and the constants are insignificantlydifferentfromzero. Nevertheless,becausetheinterceptsarepositive(negative)for the Livingston (Michigan) survey, and the slope coefficients largely smaller (larger) than one, theLivingston(Michigan)surveytendstoproducemostlyforecasts that aretoolow(high). Thomas (1999) and Mehra (2002) suggest that the bias in the survey forecasts may vary across accelerating versus decelerating inflation environments,oracross thebusinesscycle. To 3Inthedata,thecorrelationbetweenlogCPIchanges,log(P t+4/P t )andsimpleinflation,P t+4/P t 1is1.000 − forall fourmeasuresof inflationacross ourfullsample period. The correlationbetweenend-of-quarterlog CPI changesandquarterlyaverageCPIchangesisabove0.994. ThedifferencesinlogCPIchanges,simpleinflation, andchangesinquarterlyaverageCPIareverysmall,andanorderofmagnitudesmallerthantheforecastRMSEs. Asanillustration,forPUNEW,themeansoflog(P t+4/P t ),P t+4/P t 1,andchangesinquarterlyaverageCPI-U − are3.83%,3.82%,and3.86%,respectively,whilethevolatilitiesare2.87%,2.86%,and2.91%,respectively. 15

take account of this possible asymmetry in the bias, we augment equation (15) with a dummy variable,D , which equalsoneifinflationat timetexceedsits pasttwo-yearmovingaverage, t 7 1 π π > 0, t t−j − 8 j=0 X otherwiseD is setequal tozero. Theregressionbecomes: t π = α +α D +β fS +β D fS +ε . (16) t+4,4 1 2 t 1 t 2 t t t+4,4 Wedenotethesurveyforecaststhatarenon-linearlybias-adjustedusingequation(16)asSPF3, LIV3, andMCH3 fortheSPF, Livingston,and Michigansurveys,respectively.4 The bottom three rows of each panel in Table 3 report results from regression (16). Nonlinear biases are reflected in significant α or β coefficients. For the SPF survey, there is no 2 2 statistical evidence of non-linear biases. For all inflation measures, the SPF’s negative α and 2 positive β coefficients indicates that accelerating inflation implies a smaller intercept and a 2 higher slope coefficient, bringing the SPF forecasts closer to unbiasedness. For the Michigan survey, the biases are larger in magnitude (except for the PUXX measure) but there is only one significant coefficient: accelerating inflation yields a significantly higher slope coefficient for the PUXHS measure. Economically, the Michigan survey is very close to unbiasedness in decelerating inflation environments, but over- (under-) predicts future inflation at low (high) inflationlevelsinaccelerating inflationenvironments. The Livingston survey has the strongest evidence of non-linear bias, for which we also have the longest data sample. The coefficients have the same sign as for the other surveys, but now the β slope coefficients significantly increase in accelerating inflation environments for 2 all inflation measures except PUXX. As in the case of the SPF survey, the Livingston survey iscloserto being unbiasedin accelerating inflation environments. Withoutaccountingfor nonlinearity, the Livingston survey produces largely unbiased forecasts in Table 3. However, the resultsofregression(16)for theLivingstonsurveyshowitproduces mostlybiasedforecasts in 4Wealsoexaminedbiasadjustmentsusingthechangeinannualinflation,using π t+4,4 − π t,4 =α1+β1(f t S − π t,4)+ε t+4,4 inplaceofequation(15)and π t+4,4 − π t,4 =α1+α2D t +β1(f t S − π t,4)+β2D t (f t S − π t,4)+ε t+4,4 inplaceofequation(16). Likethebiasadjustmentsinequations(15)and(16),thesebiasadjustmentsalsodonot outperformtherawsurveyforecastsandgenerallyperformworsethanthebiasadjustmentsusinginflationlevels. 16

deceleratinginflationenvironments,under-predictingfutureinflationwheninflationisrelatively low,and over-predictingfutureinflationwheninflationis relativelyhigh. 3.5 Assessing Forecasting Models Out-of-SamplePeriods We select two starting dates for our out-of-sample forecasts, 1985:Q4 and 1995:Q4. Our main analysis focuses on recursive out-of-sample forecasts, which use all the data available at time t to forecast annual future inflation from t to t + 4. Hence, the windows used for estimation lengthen through time. We also consider out-of-sample forecasts with a fixed rolling window. Allofourannualforecastsarecomputedataquarterlyfrequency,withtheexceptionofforecasts fromtheLivingstonsurvey,whereforecastsareonlyavailableforthesecondandfourthquarter each year.5 Theout-of-sampleperiodsend in2002:Q4,exceptforforecasts withthecomposite real activityfactor, which endin 2001:Q3. Measuring ForecastAccuracy We assess forecast accuracy with the Root Mean Squared Error (RMSE) of the forecasts produced by each model and also report the ratio of RMSEs relative to a time-series ARMA(1,1) benchmark that uses only information in the past series of inflation. We show below that the ARMA(1,1) model nearly always produces the lowest RMSE among all of the ARIMA timeseriesmodelsthatwe examine. To compare the out-of-sample forecasting performance of the various models, we perform aforecast comparisonregression,followingStock and Watson(1999): π = λfARMA +(1 λ)fx +ε , (17) t+4,4 t − t t+4,4 where fARMA is the forecast of π from the ARMA(1,1) time-series model, fx is the foret t+4,4 t cast from the candidate model x, and ε is the forecast error associated with the combined t+4,4 forecast. Ifλ = 0,thenforecastsfromtheARMA(1,1)modeladdnothingtotheforecastsfrom candidate model x, and we thus conclude that model x out-performs the ARMA(1,1) benchmark. If λ = 1, then forecasts from model x add nothing to forecasts from the ARMA(1,1) time-seriesbenchmark. 5While the RMSEs forthe Livingstonsurveyrepresenta differentsample thanthose of allothermodelsand surveys, we also produced forecasts for a common semi-annual sample. The results are robust and we do not furthercommentonthem. 17

Stock and Watson (1999) note that inference about λ is complicated by the fact that the forecasts errors, ε , follow a MA(3) process because the overlapping annual observations t+4,4 are sampled at a quarterly frequency. We compute standard errors that account for the overlap by using Hansen and Hodrick (1980) standard errors. To also take into account the estimated parameter uncertainty in one or both sets of the forecasts, fARMA and fx, we also compute t t West(1996)standarderrors. TheAppendixprovidesadetaileddescriptionofthecomputations involved. 3.6 Combining Models A long statistics literature documents that forecast combinations typically provide better forecasts than individual forecasting models.6 For inflation forecasts, Stock and Watson (1999) and Wright (2004), among others, show that combined forecasts using real activity and financial indicators are usually more accurate than individual forecasts. To examine if combining the information in different forecasts leads to gains in out-of-sample forecasting accuracy, we examine five different methods of combining forecasts. All these methods involveplacing differentweightsonnindividualforecastingmodels. Thefivemodelcombinationmethodscanbe summarizedas follows: CombinationMethods 1. Mean 2. Median 3. OLS 4. Equal-WeightPrior 5. Unit-WeightPrior All our model combinations are ex-ante. That is, we compute the weights on the models using the history of out-of-sample forecasts up to time t. Hence, the ex-ante method assesses actual out-of-sample forecasting power of combination methods. For example, the weights usedtoconstructtheex-antecombinedforecast at2000:Q4isbased onaregressionofrealized annual inflation over 1985:Q4 to 2000:Q4 on the constructed out-of-sample forecasts over the sameperiod. Inthefirsttwomodelcombinationmethods,wesimplylookattheoverallmeanandmedian, 6See the literature reviews by, among others, Clemen (1989), Diebold and Lopez (1996), and more recently Timmermann(2006). 18

respectively, over n different forecasting models. Equal weighting of many forecasts has been used as early as Bates and Granger (1969)and, in practice, simpleequal-weightingforecasting schemesarehardtobeat. Inparticular,StockandWatson(2003)showthatthismethodproduces superiorout-of-sampleforecasts ofinflation. In the last three combination methods, we compute different individualmodel weights that vary over time. These weights are estimated as slope coefficients in a regression of realized inflationonmodelforecasts: n π = ωifi +ε , t = 1,...,T, (18) t+4,4 t t t,t+4 i=1 X where fi is the i-th model forecast at time t. The n 1 weight vector ω = ωi is estimated t × t { t} either by OLS, as in our third model combination specification, or using the mixed regressor methodproposedbyTheilandGoldberger(1961)andTheil(1963),asinCombinationMethods 4and 5. To describe the last two combination methods, we set up some notation. Suppose we have T forecast observations with n individual models. Let F be the T n matrix of forecasts and × π the T 1 vector of actual future inflation levels that are being forecast. Consequently, the × s-th row ofF is givenby F = f1,...fn . The mixedregressionestimatorcan be viewedas a s { s s } Bayesianestimatorwiththepriorω N (µ,σ2I),whereσ2 isascalarandI then nidentity ∼ ω ω × matrix. Theestimatorcan bederivedas: ω = (F′F +γI)−1 (F′π +γµ), (19) where the parameter γ controls b the amount of shrinkage towards the prior. In particular, when γ = 0,theestimatorsimplifiestostandardOLS,andwhenγ ,theestimatorapproachesthe → ∞ weighted average of the forecasts, with the weights givenby the prior weights. It is instructive tore-writetheestimatoras aweightedaverageoftheOLS estimatorand theprior: ω = θ ω +θ µ OLS OLS prior withθ OLS = (F′F +γI) −1 (F′F)b and θ prior = (F′F +γI) −1 (γI), so that theweightsadd up totheidentitymatrix. WeuseempiricalBayes methodsand estimatetheshrinkageparameteras: γ = σ 2 /σ 2 , (20) ω where b b b 1 σ 2 = π′ I F (F′F) −1 F′ π T − h i b 19

and π′π Tσ2 2 σ = − . ω trace(F′F) To interpret the shrinkage parameter, observe that σ b2 is simply the residual variance of the b regression;thenumeratorofσ2 isthefittedvarianceoftheregressionandthedenominatoristhe ω b average variance of the independent variables (the forecasts) in the regression. Consequently, b the shrinkage parameter, γ, in equation (20) increases when the variance of the independent variables becomes larger, and decreases as the R2 of the regression increases. In other words, ifforecasts are (not)very variableand theregressionR2 is small (large), wetrust theprior(the regression). Weexaminetheeffectoftwopriors. InModelCombination4,weuseanequal-weightprior where each element of µ, µ = 1/n,i = 1,...,n, which leads to the Ridge regressor used by i Stock and Watson (1999). In the second prior (Model Combination 5), we assign unit weight to one type of forecast, for example, µ = 0...1...0 ′. One natural choice for a unit weight { } priorwouldbeto choosethebestperformingunivariateforecast model. When we compute the model weights, we impose the constraint that the weight on each model is positive and the weights sum to one. This ensures that the weights represent the best combinationofmodelsthatproducegoodforecastsintheirownright,ratherthanplacenegative weightsonmodelsthatgiveconsistentlywrongforecasts. Thisisalsoverysimilartoshrinkage methodsofforecasting(seeStockandWatson,2005). Forexample,BayesianModelAveraging usesposteriorprobabilitiesas weights,whichare, byconstruction,positiveand sumtoone.7 The positivity constraint is imposed by minimizing the usual loss function, L, associated withOLSforcombinationmethod3: ′ L = (π Fω) (π Fω), − − anda lossfunctionforthemixedregressorestimations(combinationmethods4and 5): ′ ′ (π Fω) (π Fω) (ω µ) (ω µ) L = − − + − − , σ2 σ2 ω subject to the positivity constraints. These are standard constrained quadratic programming b b problems. 7Diebold(1989)showsthatwhenthetargetispersistent,asinthecaseofinflation,theforecasterrorfromthe combinationregression will typically be serially correlated and hence predictable, unless the constraint that the weightssumtooneisimposed. 20

4 Empirical Results Section 4.1 lays out our main empirical results for the forecasts of time-series models, OLS Phillips curve regressions, term structure models, and survey forecasts. We summarize these resultsinSection4.2. Section4.3investigateshowconsistentlythebestmodelsperformthrough time and Section 4.4 considers the effect of rolling windows. Section 4.5 reports the results of combiningmodelforecasts. 4.1 Forecast Accuracy Time-Series Models In Table 4, we report RMSE statistics, in annual percentage terms, for the ARIMA model outof-sampleforecastsoverthethepost-1985andpost-1995periods. TheARIMA RMSEs generallyrangefromaround0.4-0.7%forPUXXtoaround1.4-2.2%forPUXHS. Forthepost-1985 sample,theARMA(1,1)modelgeneratesthelowestRMSEamongallARIMAmodelsinforecastingPUNEWandPUXHS,buttheannualAtkeson-Ohanian(2001)randomwalkissuperior in forecasting core inflation (PUXX) and PCE. As thebest quarterly ARIMA model, we select theARMA(1,1)modelfortheremainderofthepaper.8 Inthepost-1995period,itbeatsboththe quarterly RW and AR models in forecasting the PUXHS and PCE measure, but the AR model has a lower RMSE in forecasting PUNEW and PUXX, whereas the quarterly RW generates a lower RMSE in forecasting PUXX . Yet, the improvements are minor and the ARMA(1,1) model remains overall best among the three quarterly ARIMA models. However, the annual randomwalk is thebest forecasting modelforPUXX and PCE. It beats theARMA(1,1)model forthreeof thefourinflationmeasures and generates a muchlowerRMSE for forecasting core inflation(PUXX). Table 4 also reports the RMSEs of the non-linearregime-switchingmodel, RGM. Overthe post-1985 period, RGM generally performs in line with, and slightly worse than, a standard ARMA model. There is someevidence that non-linearities are important for forecasting in the post-1995 sample, where the regime-switching model outperforms all the ARIMA models in forecastingPUNEWandPUXHS.Boththeseinflationseriesbecomemuchlesspersistentpost- 1995,and theRGMmodelcapturesthisbytransitioningtoaregimeoflesspersistentinflation. However, the Hamilton (1989) RGM model performs worse than a linear ARMA model for 8TheestimatedARMAmodelscontainlargeautoregressiverootswithnegativeMAroots. AsNgandPerron (2001)comment,thenegativeMAcomponentsleadunitrootteststoover-rejectthenullofnon-stationarity. 21

forecastingPUXX and PCE. OLSPhillipsCurveForecasts Table5reportstheout-of-sampleRMSEsandthemodelcomparisonregressionestimates(equation (17)) for the Phillips curve models described in Section 3.2, relative to the benchmark of the ARMA(1,1) model. The overall picture in Table 5 is that the ARMA(1,1) model typically outperformthePhillipscurveforecasts. Ofthe80comparisons(10models,2 out-samples,and 4 inflation measures), the model comparison regression coefficient (1 λ) is not significantly − positiveat the 95% level in any of 80 cases using West (1996) standard errors! It must be said that the coefficients are sometimes positiveand far away from zero, but the standard errors are generallyratherlarge. WhenwecomputeHansen-Hodrick(1980)standarderrors,westillonly obtain 14 cases of significant (1 λ) coefficients with p-values less than 5%, and of these 14 − cases, onlynineare positive. TheOLSPhillipscurveregressionsaremostsuccessfulinforecastingcoreinflation,PUXX. OftheninecaseswherethePhillipscurveproduceslowerRMSEsthantheARMA(1,1)model, fiveoccurforPUXX.ThebestmodelforecastingPUXXinflationusesthecompositeBernanke- Boivin-Eliasz aggregate real activity factor (PC8). While the (1 λ) coefficients are large for − PC8,theirWest(1996)standarderrorsarealsolarge,sotheyareinsignificantforbothsamples. Anotherrelativelysuccessful Phillipscurve specification is thePC7 model that uses theStock- Watson nonfinancial Experimental Leading Index-2. This index does not embed asset pricing information. PC7 for PUXHS post-1985 is the only case, out of 80 cases, that generates a positive(1 λ) coefficient which is significantat alevel higherthan the90% levelusingWest − standard errors, but its performance deteriorates for the post-1995 sample. All of the RMSEs of PC7 are also higher than the RMSE of an ARMA(1,1) model. In contrast, the PC1 model, whichsimplyusespastinflationandpastGDPgrowth,deliversfiveoftheninerelativeRMSEs belowoneand beats PC7 in allbutonecase. AmongthevariousPhillipscurvemodels,itisalsostrikingthatthePC4 modelconsistently beats the PC2 and PC3 models, sometimes by a wide margin in terms of RMSE. The PC2 and PC3 models use detrended measures of output that are often used to proxy for the output gap. PC4 uses thelaborshareas areal activitymeasure, which is sometimesused as aproxy forthe marginalcostconceptinNewKeynesianmodels. Thisisinterestingbecausetherecent Phillips curveliterature(seeGaliandGertler,1999)stressesthatmarginalcostmeasuresprovideabetter characterization of(in-sample)inflation dynamicsthandetrended outputmeasures. Ourresults suggest that the use of marginal cost measures also leads to better out-of-sample predictive 22

power. However, the use of GDP growth leads to significantly better forecasts than the labor share measure, but GDP growth remains, so far, conspicuously absent in the recent Phillips curveliterature. Finally,usingTable4togetherwithTable5,itiseasytoverifywhethertheAtkeson-Ohanian (2001) results hold up for our models and data. Essentially, they do: the annual random walk beatsthePhillipscurvemodelsin72outof80cases. AllthecaseswhereaPhillipscurvemodel beatstheannualrandom walkoccurin forecastingthePUNEWorPUXHSmeasures. Term Structure Forecasts InTable6,wereporttheout-of-sampleforecastingresultsforthevarioustermstructuremodels (seeSection3.3). Generally,thetermstructurebasedforecastsperformworsethanthePhillipscurve based forecasts. Over a total of 120 statistics(15 models, 4 inflation measures, 2 sample periods),termstructurebased-modelsbeattheARMA(1,1)modelinonlyeightcasesintermsof producingsmallerRMSEstatistics. The(1 λ)coefficientsareusuallypositiveforforecasting − PUXX in the post-1985 period, but half are negative in the post-1995 sample. Unfortunately, theuseofWest(1996)standarderrorsturns10casesofsignificantlypositive(1 λ)coefficients − using Hansen-Hodrick (1980) standard errors into insignificant coefficients. The performance of the term structure forecasts is so poor that using West (1996) standard errors, in none of the 120casesisthe(1 λ)parameterssignificantatthe95%level. Thismaybecausedbymanyof − the term structure models, especially the no-arbitrage models, having relatively large numbers ofparameters. The term structuremodels most successfullyforecast core inflation, PUXX, which delivers six of the eight cases with smaller RMSEs than an ARMA(1,1) model. In particular, the TS1 model that includes inflation, GDP growth, and the short rate beats an ARMA(1,1) model and hasapositive(1 λ),butinsignificant,coefficientinboththepost-1985andpost-1995samples. − The other models with term structure information that are successful at forecasting PUXX are TS6 andTS8, bothofwhich alsoincludeshortrate information. The finance literature has typically used term spreads, not short rates, to predict future inflation changes (see, for example, Mishkin, 1990, 1991). In contrast to the relative success of the models with short rate information, models TS9-TS11, which incorporate information fromthetermspread,performbadly. TheyproducehigherRMSEstatisticsthanthebenchmark ARMA(1,1)model forallfourinflation measures. This isconsistentwithEstrellaand Mishkin (1997)andKozicki(1997),whofindthattheforecastingabilityofthetermspreadisdiminished aftercontrollingforlaggedinflation. However,weshowthattheshortratestillcontainsmodest 23

predictive power even after controlling for lagged inflation. Thus, the short rate, not the term spread,containsthemostpredictivepowerin simpleforecasting regressions. Table6showsthattheperformanceofiteratedVARforecastsismixed. VARsproducelower RMSEs than ARMA(1,1) models. The relativelypoor performance of long-horizon VAR forecasts forinflationcontrasts withthegood performancefor VARs in forecastingGDP (see Ang, Piazzesi and Wei, 2004) and for forecasting other macroeconomic time series (see Marcellino, Stockand Watson,2006). Thenon-linearempiricalregime-switchingVAR(RGMVAR)generallyfaresworsethantheVAR.Thisresultstandsincontrasttotherelativelystrongperformance of the univariate regime-switching model using only inflation data (RGM in Table 4) for forecasting PUNEW and PUXX. This implies that the non-linearities in term structure data have nomarginal valueforforecasting inflationabovethenon-linearitiesalready presentin inflation itself. The last two lines of each panel in Table 6 shows that there is some evidence that noarbitrageforecasts (MDL1-2)areuseful forforecasting PUXX in thepost-1985sample. While the(1 λ)coefficientsaresignificantusingHansen-Hodrick(1980)standarderrors,theyarenot − significantwithWest(1996)standarderrors. Moreover,bothno-arbitragetermstructuremodels always fail to beat the ARMA(1,1) forecasts in terms of RMSE. While the finance literature showsthatinflationisaveryimportantdeterminantofyieldcurvemovements,ourresultsshow thattheno-arbitragecross-sectionofyieldsappearstoprovidelittlemarginalforecastingability forthedynamicsoffutureinflationoversimpletime-seriesmodels. Surveys Table7reportstheresultsforthesurveyforecastsandrevealsseveralnotableresults. First,surveysperform verywell inforecasting PUNEW,PUXHS, and PUXX.With onlyoneexception, the raw survey forecasts SPF1, LIV1 and MICH1 have lower RMSEs than ARMA(1,1) forecasts over both the post-1985 and the post-1995 samples (the exception is MICH1 for PUXX overthepost-1985sample). Forexample,forthepost-1985(post-1995)sample,theRMSEratio of the raw SPF forecasts relative to an ARMA(1,1) is 0.779 (0.861) when predicting PUNEW. The horse races always assign large, positive(1 λ) weights to the pure survey forecasts (the − lowest one is 0.383) in both out-of-sample periods. Ignoring parameter uncertainty, the coefficients are significantly different from zero in every case, but taking into account parameter uncertainty, statisticalsignificance disappears for the post-1995 samples, and in the case of the PUXX measure,evenfor thepost-1985sample. Thisistrueforallthree surveys. Second, while the SPF and Livingston surveys do a good job at forecasting all three mea- 24

sures of CPI inflation (PUNEW, PUXHS, and PUXX) out-of-sample, the Michigan survey is relatively unsuccessful at forecasting core inflation, PUXX. It is not surprising that consumers intheMichigansurveyfailtoforecastPUXX,sincePUXXexcludesfoodandenergywhichare integral components of the consumer’s basket of goods. Note that while the annual PUNEW andPUXHSmeasureshavethehighestcorrelationswitheachother(99%inbothout-samples), core inflation is less correlated with the other CPI measures. In particular, post-1995, the correlation of annual PUXX with annual PUNEW (PUXHS) is only 33% (21%). Surveys do less well at forecasting PCE inflation, always producing worse forecasts in terms of RMSE than an ARMA(1,1). This result is expected because the survey participants are asked to forecast CPI inflation,ratherthantheconsumptiondeflatorPCE. Third, the raw survey forecasts outperform the linear or non-linear bias adjusted forecasts (withtheonlynotableexceptionbeingthebias-adjustedforecastsforPCE).Asaspecificexample,forPUNEW,therelativeRMSEratiosarealwayshigherforthemodelswithsuffix2(linear bias adjustment) or the models with suffix 3 (non-linear bias adjustment) compared to the raw survey forecasts across all three surveys. This result is perhaps not surprising given the mixed evidence regarding biases in the survey data (see Table 3). While there are some significant biases, these biases must be small, relative to the total amount of forecast error in predicting inflation. Finally, we might expect that the Livingston and SPF surveys produce good forecasts becausetheyareconductedamongprofessionals. In contrast,participantsin theMichigansurvey areconsumers, notprofessionals. It is indeedthecase that theprofessionalsuniformlybeat the consumers in forecasting inflation. Nevertheless, in most cases, the Michigan forecasts are of thesameorderofmagnitudeastheLivingstonandSPFsurveys. Forexample,forPUNEWover thepost-1995sample,theMichiganRMSEratiois0.862,justslightlyabovetheRMSEratioof 0.861for the SPF survey. It is strikingthat informationaggregated overnon-professionalsalso producesaccurate forecasts thatbeat ARIMA time-seriesmodels. It is conceivable that consumers simply extrapolate past information to the future and that the Michigan survey forecasts are simply random walk forecasts, similar to the Atkeson and Ohanian (2001) (AORW) random walk forecasts. Indeed, Table 3 demonstrated the relatively good forecasting performance of the annual random walk model, which beats the ARMA(1,1) model in a number of cases. Nevertheless, comparing the performance of the survey forecasts relativetotheAORWmodel,wefindthattherandomwalkmodelproducessmallerRMSEsthan theMichigansurveyonlyforPUXXandPCEinflation,whichconsumersarenotdirectlyasked toforecast. TheAORWalsooutperformstheSPFsurveyforPUXXinflationoverthepost-1995 25

period, but the AORW model always performs worse than the Livingston survey for the CPI inflationmeasures. LookingatPUNEW,theinflationmeasurewhichthesurveyparticipantsare actuallyasked toforecast,theAORWmodelperformsworsethanall thesurveys,includingthe Michigansurveys. Thus,surveyforecasts clearly arenotsimplyrandomwalkforecasts! 4.2 Summary Letussummarizetheresultsso far. First,amongARIMA time-seriesmodels,theARMA(1,1) modelisthebest overallquarterly model,buttheannual randomwalkalso performsvery well. Nevertheless, some models that incorporate real activity information, term structure information, or, especially, survey information, beat the ARMA(1,1) model, even when ARMA(1,1) forecasts are used as the benchmark in a forecast comparison regression. Second, the simplest Phillips curve model using only past inflation and GDP growth is a good predictor. Third, adding term structure information occasionally leads to an improvement in inflation forecasts, but generally only for core inflation. No-arbitrage restrictions do not improve forecasting performance. Fourth, the survey forecasts perform very well in forecasting all inflation measures exceptPCE inflation. To get an overall picture of the relative forecasting power of the various models, Table 8 reportstherelativeRMSE ratiosofthebestmodelsfromeach ofthefirstthreecategories(pure time-series, Phillips-curve, and term structure models) and of each raw survey forecast. The most remarkable result in Table 8 is that for CPI inflation (PUNEW, PUXHS, and PUXX), the survey forecasts completely dominate the Phillips curve or term structure models in both out-of-sample periods. For the post-1985 sample, the RMSEs are around 20% smaller for the survey forecasts compared to forecasts from Phillips-curve or term structure models. The natural exception is PCE inflation, where the best model in both samples is just the annual randomwalkmodel! For the post-1985 sample, a survey forecast delivers the overall lowest RMSE for all CPI inflation measures. The performance of the survey forecasts remains impressive in the post- 1995 sample, but the Hamilton (1989) regime-switching model (RGM) has a slightly lower RMSE for PUNEW and PUXHS. Impressively, the Livingston survey continues to deliver the mostaccurate forecast ofPUXX post-1995. For the Phillips curve forecasts, the simple PC1 regression using only past inflation and GDPgrowth frequently outperforms morecomplicatedmodels for bothPUNEW and PUXHS. Other measures of economic growth are more successful at forecasting PUXX and PCE. For PUXXinflation,PC8producesforecaststhatbeatanARMA(1,1)modelforboththepost-1985 26

and post-1995 sample. The PC8 forecasting model uses the Bernanke et al. (2005) composite indicator. For the PCE measure, models combining multiple time series (PC6 through 8) continueto do well, and the PC6 measure, which uses the Stock and Watson experimental leading index (XLI), produces the lowest RMSE for the post-1995 sample. For the post-1985 sample, PC4, which uses the labor share performs best. However, all the Phillips curve models are alwaysbeaten by time-seriesmodelsorsurveys. Among the term structure models, models incorporating past inflation, the short rate, and one of the combination real activitymeasures (TS6 through TS8) perform relativelywell. TS7 (usingXLI-2)isbestforthePUNEWandPCEmeasureforthepost-1985sample,whereasTS8 (using the Bernanke et al., 2005, compositeindicator) is best for all measures except PUXX in thepost-1995sample. ForPUXX,theTS6model(whichusesXLIastherealactivitymeasure) produces the lowest RMSE. Like the Phillips curve models, all the term structure forecasts are alsosoundlybeaten by time-seriesmodelsorsurveyforecasts. 4.3 Stability of the Best Forecasting Models Onerequirementforagoodforecastingmodelisthatitmustconsistentlyperformwell. InTable 9, we report the ex-ante best models within each category (time-series, Phillips curve, term structure, and surveys) and across all models over the post-1995 sample. Since we record the best modelsat the end of each quarter, we includeonly theSPF and Michigansurveyforecasts becausetheLivingstonsurveyisonlyavailablesemi-annually. Thisunderstatestheperformance ofthesurveysas the Livingstonsurveysometimesoutperformstheother two surveymeasures, especially for PUXX (see Table 8). The best models are evaluated recursively,so at each point in time, we select the model within each group that yields the lowest forecast RMSEs over the sample from 1985:Q4 to the present. Naturally, as we roll through the sample, the best ex-antemodelsup to theend of each quarter convergeto the best modelsreported for thepost- 1985 period in Table 8. If the best ex-ante models for 2002:Q4 were reported, these would be identical to the best models in the post-1985 sample in Table 8, with the exception that the Livingstonsurveyis excluded. Table9showsthatforPUNEWandPUXHS,theARMA(1,1)modelisconsistentlythebest time-series model, whereas for PUXX and PCE, the Atkeson-Ohanian (2001) model is always best. Giventhe good forecasting performance of thesetime-series models, thisimpliesthat the time-series models represent extremely good benchmarks. In contrast, there is little stability for the best ex-ante Phillips curve model, which is also stressed by Brave and Fisher (2004). For PUNEW, the best Phillips curve models alternate between PC1 (using GDP growth) and 27

PC5 (using unemployment). For PUXHS, the best Phillips curve is PC7 (using XLI-2) at the beginning of the period, but transitions to PC1 at the end of the sample. For core inflation, PUXX, PC8 (using the composite Bernanke, Boivin and Eliasz, 2005, factor) alternates with PC1. This instability further reduces the usefulness of the Phillips curve forecasts and hence, theknowledgethat sometimesthesePhillipscurveforecasts may beat an ARMA(1,1)model is hard totranslateintoconsistent,accurateforecasts. The best term structure models are also generally unstable over time for PUNEW and PUXX. While the VAR model is consistently the best performer for PUXHS and TS7 (using XLI-2 with the short rate) is always the best term structure model for PCE, this consistent performance is less useful because both of these models cannot beat an ARMA(1,1). A sharp contrast to the unstable Phillips curve and term structure models are the survey results. For all three CPI measures (PUNEW, PUXHS, and PUXX), professionals always forecast better than consumers, withthe SPF beating theMichigansurvey. A remarkable result is that theraw SPF surveyalwaysdominatesallothermodelsthroughouttheperiodfortheCPImeasures. Surveys consistentlydeliversuperiorinflation forecasts! 4.4 Rolling Window Forecasts McConnell and Perez-Quiros (2000) and Stock and Watson (2002b), among others, document that there has been a structural break since the mid-1980s. This has been called the “Great Moderation”because it is characterized by lower volatility of many macro variables. It is conceivable that professional forecasters fast adapt to structural changes. In contrast, the models userelativelylong windows(necessary to retain someestimationefficiency and power)to estimate parameters. These model parameters would respond only slowly to a structural break as new data points are added. If changes in the time series properties of inflation play a role in therelativeforecastingprowessofmodelsversusthesurveys,allowingthemodelparametersto changemorequicklythroughrollingwindowsshouldgenerate superiormodelperformance. In Table10,weuseaconstant10-yearrollingwindowtoestimateall thelineartime-series, Phillips curve, and term structure models. We do not consider the regime-switching models (RGM, RGMVAR) and the no-arbitrage term structure models, (MDL1, which is an affine model, and MDL2, which is a regime-switching model). The regime-switching data generating processes in the RGM, RGMVAR, and MDL2 models produce forecasts that may already potentiallyaccountforstructuralbreaks. WereporttherelativeRMSEsoftheex-postbestmodels in each category together with the raw survey forecasts results, using the same recursively estimatedARMA(1,1)modelas thebenchmark. 28

Table 10 shows that over both the post-1985 and post-1995 samples, surveys still provide the best forecasts for all CPI inflation measures. Note that with a 10-year rolling window, the post-1995 sample results involve models estimated only on the post-Great Moderation sample. Thus, surveys still out-perform even when the models are estimated only with data from the Great Moderation regime. But, estimating the models with only post-1985 data does improve their performance, as a comparison between the RMSE ratios between Tables 8 and 10 reveals, especially for the PUXX and PCE measures. This implies that the model parameters may indeed only have adjusted to the new situation by 1995 and raises the possibility that the out-performance of the surveys may not last. In fact, it is striking that an older literature, summarizedbyCroushore(1998),stressedthatthesurveysperformedrelativelypoorlyinforecastingcomparedto models. Toinvestigatethis,weusetheLivingstonsurvey,whichistheonlysurveyavailableoverour fullsample,from1952-2002. WecomputetheRMSEratiooftheout-of-sampleforecastsforthe Livingston survey relative to an ARMA(1,1) model for 1960-1985 and 1986-2002, where the firsteightyearsareusedasanin-sampleestimationperiodfortheARMA(1,1)model. Overthe pre-1985sample,theLivingstonRMSEratiois1.046(withaRMSElevelof2.324),whileover the post-1985 sample, the RMSE ratio is 0.789 (with a RMSE level of 0.896). Consequently, professionalsare moreadept at forecasting inflationinthepost-1985period.9 4.5 Combining Model Forecasts Surveys may be averaging information from many different sources, whereas our models implicitlyalwaysconstraintheinformationsettoalimitednumberofvariables. Ifthisisthesource oftheout-performanceofthesurveys,themodelcombinationtechniquesshouldperformbetter thananyindividualmodelbyitself. Table 11 investigates whether we can improve the forecasting performance by combining different models. We first combine models within each of the four categories (time-series, Phillips curve, term structure, and survey models), then combine the four ex-ante best models from each category in the column labelled “Best Models,” and finally combine across all the models in the last column labelled “All Models.” The models in the survey category comprise onlytheSPFandMichigansurveysbecausetheLivingstonsurveyisconductedatasemiannual frequency. Table7showsthattheLivingstonforecastsareverysimilartotheSPFandMichigan 9Incontrasttothesuperiorperformanceofsurveysrelativetomodelsforforecastinginflation,Campbell(2004) findsthatforforecastingGDPpost-1985,surveysperformworserelativetoasimpleAR(1). However,Campbell showsthatforforecastingGDP,surveysoutperformanAR(1)benchmarkpriorto1985. 29

surveys for PUNEW and PUXHS, and that the Livingston survey is the best single forecaster for PUXX. Thus, excluding the Livingston survey places a conservative higher bound on the RMSEs fortheforecast combinationsinvolvingsurveys. We use five methods of model combination: means or medians over all the models, linear combinations using weights recursively computed by OLS, and linear combinations using weights recursively computed by mixed combination regressions either with an equal-weight prior or a prior that places a unit weight on the ex-ante best model. We start the model combination regressions at 1995:Q4 using realized inflation and the out-of-sample forecasts over 1985:Q4 to 1995:Q4. At each subsequent period, we advance the data sample by one quarter andre-runthemodelcombinationregressiontoobtaintheslopecoefficientestimates. Forcomparison, the last row in each panel reports the RMSE ratio, relativeto an ARMA(1,1) forecast, oftherecursively-updatedex-antebestperformingindividualmodel,as reported inTable9.10 There are three main findings in Table 11. First, using mean or median forecasts mostly does not improvetheforecast performance relativeto the best individualex-ante model. There are 24 cases to consider: four inflation measures and six different sets of model combinations. Combiningforecasts by takingtheirmeans onlyimprovesout-of-sampleforecasts in six outof 24 cases. Taking medians produces the same results, improving forecasts for exactly the same cases as taking means. The mean or median combination methods work best for PUNEW and PUXHS using time-series models. However, when these forecasting improvements occur for modelcombinations,theimprovementsaresmall. Thus,simplemethodsofcombiningforecasts providelittleadditionalpredictivepowerrelativetothebestmodel. Second,updatingthemodelweightsbasedonpreviousmodelperformancedoesnotalways leadto superiorperformance. ForthePhillipsCurvemodels,OLSmodelcombinationsoutperform means and medians for all inflation measures. However, when OLS model combinations are taken across all models, using an OLS combination is never better than the best individual model. Finally,theperformanceoftheequal-weightpriorandtheunitpriorthatplacesweightonly the best ex-ante model are generally close to the OLS forecast combination method. Across all models, the unit weight prior produces lower RMSE ratios than the OLS or equal-weight 10We also ask the questionwhetherex-post, a particularcombinationof modelswould have performedbetter thanindividualforecasts. Thisex-postanalysiscannotbeusedforactualforecasting,butindicateswhichmodels would have been most successful forecastinginflation out-of-sampleex-post. For the ex-postcombinations, we findthattheimprovementgeneratedbythe combinedforecastsisalso relativelyminor,evenfortheunit-weight priormodel, which uses forward-lookinginformationto find the best performingmodelover the whole sample. Theseresultsareavailableuponrequest. 30

priors. However, it is only for PUXX that the various regression-based model combination methodsproducebetterforecasts than thebestindividualforecasts. ForPUNEW, PUXHS, and PCE, the best individual models beat the model combinations, and for PUNEW and PUXHS, thebest individualex-anteforecasts aresurveys. Tohelpinterprettheresults,weinvestigatetheex-anteOLSweightsonsomeselectedmodels. In Figure 2, we plot the OLS slope estimates of regression (18) for various inflation measuresovertheperiodof1995:Q4to2002:Q4. Forclarity,werestricttheregressiontocombinationsoftheex-antebestmodelwithineachcategory(time-series,PhillipsCurve,termstructure) togetherwiththeSPFsurvey. Notethatbychoosingthebestmodelineachcategory,wehandicapthesurveyforecasts. Wecomputetheweightsintheregressionrecursivelyliketheforecasts in Table 11; that is, we start in 1995:Q4, and recursively compute forecasts from 1985:Q4 to 1995:Q4. Figure 2 shows that when forecasting all the CPI inflation measures (PUNEW, PUXHS, andPUXX),thedataconsistentlyplacethelargestex-anteweightsonsurveyforecastsandvery littleweightontheothermodels. TheweightsontheSPFsurveyforecastaregenerallyconstant and lie around 0.8 for PUNEW, PUXX, and PUXHS. There is no consistent, best model that dominatesfortheremaining0.1-0.2weights. Theweightsonthetime-seriesmodelsarealways zero for PUNEW, but temporarily spike upward in the middle of the sample to around 0.15 for PUXHS and 0.20 for PUXX. For PUNEW and PUXHS, the Phillips curves fare best at the beginningof thesample, but the regressionsplace very littleweight on Phillips curveforecasts at the end of the sample. For PCE inflation, surveys contain little information. The weight on thebestsurveystaysclosetozerountillate1999,thenrisesto0.2. ForforecastingPCE among the other categories of models, the Phillips Curve forecast stands out, with weights ranging from 0.2 to0.6. Term structuremodelsreceivethe highestweightat theend ofthesample. We concludethatcombiningmodelforecasts,atleastusingthetechniqueshere,isnotaveryuseful forecastingtool,especiallycompared tousingjustsurveydataforforecastingCPI inflation. 5 Robustness to Non-Stationary Inflation 5.1 Definition and Models In this section we investigate the robustness of our results to the alternative assumption that quarterly inflation is difference stationary. Our exercise is now to forecast four-quarter ahead 31

inflationchanges: 3 E (π π ) = E (4 i )∆π t t+4,4 t,4 t t+1+i − −| | " # i=−3 X 3 = E (4 i)∆π +4π π , (21) t t+1+i t t,4 − − " # i=0 X whereπ isannual inflationdefined inequation(2). t+4,4 We now replace quarterly inflation, π , by quarterly inflation changes, ∆π = π π t t+1 t+1 t − in all the models considered in Sections 3.1 to 3.3. For example, we estimate an ARMA(1,1) onfirst differences ofinflation: ∆π = µ+φ∆π +ψε +ε t+1 t t t+1 andan AR(p) onfirst differences ofinflation: ∆π = µ+φ ∆π +φ ∆π +...+φ ∆π +ε . t+1 1 t 2 t−1 p t−p+1 t+1 The OLS Phillips Curve and term structure regressions include quarterly inflation changes as one of the regressors, rather than quarterly inflation. From the models estimated on ∆π , we t computeforecasts ofinflationchangesoverthenextyear, E (π π ). t t+4,4 t,4 − Thereare threemodelsfor whichwe donot estimateacounterpart usingquarterly inflation differences. We do not consider a random walk model for inflation changes and do not specify the no-arbitrage term structure models (MLD1 and MLD2) to have non-stationary inflation dynamics, although we still consider the forecasts of annual inflation changes implied by the original stationary models. In all other cases, we examine the forecasts of both the original stationarymodelsand thenewnon-stationarymodelsthatusefirst differences ofinflation. The original models estimated on inflation levels generate RMSEs for forecasting annual inflationchangesthatareidenticaltotheRMSEsforforecastingannualinflationlevels. Hence, the question is whether models estimated on differences provide superior forecasts to models estimated on levels. By including a new set of models estimated on inflation changes, we also enrich the set of forecasts which we can combine. We maintain the ARMA(1,1) model estimatedon inflationratelevelsasa benchmark. 5.2 Performance of Individual Models Table 12 reports the RMSE ratios of the best performing models estimated on levels or differences within each model category. Time-series models estimated on levels always provide 32

lower RMSEs than time-series models estimated on differences. For both Phillips curve and term structuremodels, usinginflation differences orlevels produces similarforecasting performance for both the PUNEW and PUXHS measures. For these inflation measures, the Phillips curve models are slightly better estimated on levels, but for term structure models, there is no clear overall winner. However, for the PUXX and PCE measures, Phillips curve and term structure regressions using past inflation changes are more accurate than regressions with past inflationlevels. Ourmajorfindingthatsurveysgenerallyoutperformothermodelforecastsisrobusttospecifying the models in inflation differences. For the CPI inflation measures (PUNEW, PUXHS, PUXX) over the post-1985 sample, surveys deliver lower RMSEs than the best time-series, Phillips curve, and term structure forecasts. First difference models are most helpful for lowering RMSEs for core inflation (PUXX) overthe post-1995 sample, where the best time-series modelestimated on differences (ARMA) produces a relativeRMSE ratio of 0.649. This is still beaten bytherawLivingstonsurvey,withaRMSE ratio of0.557.11 5.3 Performance of Combining Models In this section, we run forecast combination regressions to determine the best combination of models to forecast inflation changes (similar to Section 3.6 for inflation levels). The model weightsare computedfromtheregression: n π π = ωifi +ε , s = 1,...,t. (22) s+4,4 − s,4 s s s,s+4 i=1 X We repeat the exercise of Table 11 and compute ex-ante recursive weights over 1995:Q4- 2002:Q4 using the best ex-ante forecasting models in each category and across all models. In unreported results available upon request, we find that our original results for forecasting inflation levels also extends to forecasting inflation changes. Specifically, there is generally no improvement in combining model forecasts, or when model combinations result in outperformance, the improvement is small. Specifically, for PUNEW and PUXHS, using means, 11Wealsoranmodelcomparisonregressionsasinequation(17),butwithinflationchangesonthelefthandside, andkeepingthestationaryARMA(1,1)modelasthebenchmarkmodel. Theseresultsareavailableuponrequest. We find thatwhile generallythemodelsspecified indifferencesdonotfare anybetterthan themodelsspecified in levelsin terms of beating the RMSE of a stationaryARMA(1,1), there are more I(1) modelswith significant (1 λ)coefficientsusingHansen-Hodrick(1980)standarderrors.ThelargestincreaseoccursforPUXXinflation. − Likethemodelcomparisonsforforecastinginflationlevels,surveysconsistentlyprovidesignificantimprovement in forecastingCPI inflation changesabovean ARMA(1,1)modelon levels, especially forthe post-1985sample period. 33

medians,OLS,oranequal-weightpriorproduceshigherRMSEsthanthebestindividualmodel. For these inflation measures, all model combinations produce RMSEs that are higher than the survey forecasts. This result is robust to both combining models in levels and also combining models in differences. There are some improvements for forecasting PCE inflation using modelsin differences, buttheforecastinggainsarevery small. In Figures 3 and 4, we plot the OLS coefficient estimates of equation (22) for the models specified in differences and the models specified in levels, respectively, together with the best surveyforecast. WeconsideronlytheSPFandtheMichigansurveysattheendofeachquarter, and the SPF survey always dominates theMichigansurvey. Similar to Figure 2, we choosethe bestex-anteperformingtime-series,PhillipsCurve,andtermstructuremodelsateachtime,and computethe OLS ex-ante weights recursivelyover 1995:Q4 to 2004:Q4. Both Figures 3 and 4 confirmthat thesurveysproducesuperiorforecasts ofinflationchanges. In Figure 3, the weight on the SPF survey for PUNEW and PUXHS changes is above or around0.8. ThesurveysclearlydominatetheI(1)time-series,PhillipsCurve,andtermstructure models. ForPUXX changes, theregressionsstillplacethelargestweighton thesurvey,butthe weight is around 0.5. In contrast, for forecasting PUXX inflation levels, the weights on the survey range from 0.6 to above 0.9. Thus, there is now additional information in the other modelsforforecasting PUXX changes, mostparticularlythe PhillipsCurvePC1 model, which hasaweightaround 0.4. Nevertheless,surveysstillreceivethehighestweight. Consistentwith the results for forecasting inflation levels, surveys provide little information to forecast PCE changes. ForPCE changes, thelargestex-anteweight intheforecast combinationregressionis fortheARMA(1,1)estimatedoninflationdifferences. Figure 4 combines the surveys with stationary models. While Table 12 reveals that the RGM model estimated on inflation levels yields the lowest RMSE over the post-1995 sample in forecasting PUNEW and PUXHS differences, there appears to be little additional value in the RGM forecast once surveys are included. Figure 4 shows that the forecast combination regression places almost zero ex-ante weight on the RGM model. The weights on the other I(0) models are also low, whereas the survey weights are around 0.8 or higher. Compared to theotherstationarymodelcategories,surveysalsohaveanedgeatforecastingPUXXinflation. Again,surveysdonotperform well relativeto I(0)modelsforforecastingPCE changes. 34

6 Conclusions We conduct a comprehensive analysis of different inflation forecasting methods using four inflation measures and two different out-of-sample periods (post-1985 and post-1995). We investigateforecasts basedontime-seriesmodels;Phillipscurveinspiredforecasts; andforecasts embeddinginformationfromthetermstructure. Ouranalysisoftermstructuremodelsincludes linear regressions, non-linear regime switching models, and arbitrage-free term structure models. We compare these model forecasts with the forecasting performance of three different survey measures (the SPF, Livingston, and Michigan surveys), examining both raw and biasadjustedsurveymeasures. Ourresultscan besummarizedasfollows. First,thebesttimeseriesmodelismostlyasimple ARMA(1,1) model, which can be motivated by thinking of inflation comprising stochastic expected inflation following an AR(1) process, and shocks to inflation. Post-1995, the annual random walk used by Atkeson and Ohanian (2001) is a serious competitor. Second, while the ARMA(1,1) model is hard to beat in terms of RMSE forecast accuracy, it is never the best model. For CPI measures, the survey measures consistently deliver better forecasts than ARMA(1,1) models, and in fact, much better forecasts than Phillips curve-based regressions, term structure models based on OLS regressions, non-linear models, iterated VAR forecasts, and even no-arbitrage term structure models that use information from the entire cross-section of yields. Naturally, surveys do a relatively poor job at forecasting PCE inflation, which they arenotdesignedto forecast. Some of our results shed light on the validity of some simple explanations of the superior performance of survey forecasts. One possibilityis that the surveys simply aggregate information from many different sources, not captured by a single model. The superior information in median survey forecasts may be due to an effect similar to Bayesian Model Averaging, or averaging across potentially hundreds of different individual forecasts and extracting common components (see Stock and Watson, 2002a; Timmermann, 2004). For example, it is striking that the Michigansurvey,which is conducted among relativelyunsophisticatedconsumers, beatstime-series,Phillipscurve,andtermstructureforecasts. TheLivingstonandSPFsurveys, conductedamongprofessionals,doevenbetter. Ifthereisinformationinsurveysnotincludedinasinglemodel,combiningmodelforecasts mayleadtosuperiorforecasts. However,whenweexamineforecaststhatcombineinformation acrossmodelsorfromvariousdatasources(liketheBernankeetal.,2005,indexofrealactivity that uses 65 macro factors measuring real activity), we find that the surveys still outperform. 35

Across all models, combinationmethods of simplemeans or medians, or forecast combination regressions which use prior information never outperform survey forecasts. In ex-ante model combination exercises for forecasting CPI inflation, almost all the weight is placed on survey forecasts. One avenue for future research is to investigate whether alternative techniques for combiningforecasts perform better(see Inoue and Killian, 2005, fora survey and study ofone promisingtechnique). Anotherpotentialreasonwhysurveysoutperformis becausesurveyinformationis notcapturedinanyofthevariablesormodelsthatweuse. Ifthisisthecase,ourresultsstronglysuggest thattherewouldbeadditionalinformationtoincludesurveyforecastsinthelargedatasetsused to construct a small number of composite factors, which are designed to summarize aggregate macroeconomicdynamics(see,amongothers,Bernankeetal.,2005;StockandWatson,2005). Our results also have important implications for term structure modelling. Extant sophisticatedno-arbitragetermstructuremodels,whileperformingwellinsample,seemtoproviderelativelypoorforecasts relativetosimplertermstructureorPhillipscurvemodelsout-of-sample. A potential solution is to introduce the information present in the surveys as additional state variables in the term structure models. Pennacchi (1991) was an early attempt in that direction and Kim (2004) is a recent attempt to build survey expectations into a no-arbitrage quadratic term structure model. Brennan, Wang and Xia (2004) also recently use the Livingston survey toestimatean affineasset pricingmodel. Finally,surveysmayforecastwellbecausetheyquicklyreacttochangesinthedatageneratingprocessforinflationinthepost-1985sample. Inparticular,sincethemid-1980s,thevolatility of many macroeconomicseries, including inflation, has declined. This “Great Moderation” mayalsoexplainwhyaunivariateregime-switchingmodelforinflationprovidesrelativelygood forecasts overthissampleperiod. Nevertheless,when we re-do ourforecasting exercisesusing a10-yearrollingwindow,thesurveysforecasts remain superior. We conjecture that the surveys likely perform well for all of these reasons: the pooling of large amounts of information; the efficient aggregation of that information; and the ability to quickly adapt to major changes in the economic environment such as the Great Moderation. While our analysis shows that surveys providesuperior forecasts of CPI inflation, the PCE deflator is often the Federal Reserve’s preferred inflation indicator for the conduct of monetary policy. Since existingsurveystarget only theCPI index,professional surveysdesigned to forecastthePCE deflatormayalso deliversuperiorforecasts ofPCE inflation. 36

Appendix: Computation of West (1996) Standard Errors By subtractingfARMA frombothsidesofequation(17)andlettingeARMA denotetheforecastresidualsof the t t,t+4 ARMA(1,1)modelandex denotetheforecastresidualsofcandidatemodelx,wecanwrite: t,t+4 eA t,t R + M 4 A =(1 − λ)(eA t,t R + M 4 A − ex t,t+4 )+ε t+4,4. (A-1) Theestimatedslopecoefficientλˆhastheasymptoticdistribution: √P(λˆ − λ) → d N 0,E(d t+4d ′ t+4 ) −1Ω ff E(d t+4d ′ t+4 ) −1 , (A-2) (cid:0) (cid:1) whereP is the length of the out-sample, Ω ff = var(f t,t+4), f t,t+4 = eA t,t R + M 4 A(eA t,t R + M 4 A − ex t,t+4 ) and d t,t+4 = eARMA ex . West(1996)derivesthelong-runasymptoticvarianceΩ aftertakingintoaccountparameter t,t+4 − t,t+4 ff uncertainty. WeusethenotationbasedonWest(2006).Theforecasthorizonisfourquartersahead.Foreachmodelithere areP out-of-sampleforecastsinall, whichrelyonestimatesofak 1unknownparametervectorθ . Thefirst i i × forecastusesdatafromasampleoflengthRtopredictatimet=(R+4)variable,whilethelastforecastusesdata fromtimet=R+P 1 T toforecastatimet=T+4variable.ThetotalsamplesizeisR+P 1+4=T+4. − ≡ − Fortheithcandidatemodel,θˆ,thesmall-sampleestimateoftheparametersθ satisfies: i i θˆ(t) θ =B (t)H (t), (A-3) i i i i − whereB (t)isak q matrixandH (t)isaq 1vector.ThevectorH (t)representsorthogonalityconditionsof i i i i i i × × themodelandthematrixB (t)isalinearcombinationoftheorthogonalityconditionstorecovertheparameters. i WeassumethatB (t) p B ,whereB isamatrixwithrankk . ThemomentconditionsH (t)aregivenby:12 i i i i i → t 1 H (t)= hi(θ ), (A-4) i t s i s=1 X fortherecursiveforecastcasewhichweinvestigate,wherehi(θ )areq 1orthogonalityconditions.Formodels s i i × estimatedbymaximumlikelihood,thematrixB (t)istheinverseoftheHessianandhi(θ )isthescore.Forlinear i t i modelsintheformofy =Xi′ θi+ε ,B (t)=E(XiXi′ ) −1andhi(θ )=Xi′ (y Xi′ θ ). t t t i t t t i t t − t i WestacktheparametersoftheARMA(1,1)benchmarkmodelandtheparametersoftheithcandidatemodel inthevectorθ =(θ ,θ ). Then,wecanwriteθˆ(t)=B(t)H(t),whereH(t)= 1 t h (θ),where: ARMA i t s=1 s P B (t) 0 ARMA B(t) = , " 0 B i (t) # hARMA(θ ) h (θ) = t ARMA , (A-5) t " hi t (θ i ) # p andB(t) B,where → B 0 ARMA B = . (A-6) " 0 B i # 12WestandMcCracken(1998)derivesimilarformsforΩ underthecasesofrollingandfixedout-of-sample ff forecasts. 37

WedefinethederivativeF ofthemomentconditionswithrespecttoθas: ∂f t,t+4(θ) F1 F =E = , (A-7) (cid:20) ∂θ (cid:21) " F2 # whereF1 andF2 aregivenby: F1 = E (cid:20) ∂ ∂ f θ t, A t+ R 4 M ( A θ) (cid:21) =E " 2eA t,t R + M 4 A − ex t,t+4 ∂ ∂ θ e A A t,t R R + M M 4 A A # F2 = E ∂f t, ∂ t+ θ 4(θ) = − E (cid:0) eA t,t R + M 4 A ∂e ∂ x t, θ t+4 (cid:1) . (A-8) (cid:20) i (cid:21) (cid:20) i (cid:21) Finally,fortheasymptoticresults,weneedP andR with →∞ →∞ P ρ= lim < . (A-9) T→∞ R ∞ FollowingWest(2006),wedefinetheconstantsλ andλ : fh hh λ = 1 ρ −1ln(1+ρ), fh − λ = 2[1 ρ −1ln(1+ρ)]. (A-10) hh − Undertheseassumptions,West(1996)derivesthattheasymptoticvarianceΩ isgivenby: ff ′ ′ ′ ′ ′ Ω =S +λ FBS +S B F +λ FBV B F (A-11) ff ff fh fh fh hh hh (cid:0) (cid:1) where ∞ ′ S ff = E[(f t,t+4 Ef t,t+4)(f t−j,t−j+4 Ef t,t+4)], − − j=−∞ X ∞ ′ S fh = E (f t,t+4 − Ef t,t+4)h t−j , j=−∞ X (cid:2) (cid:3) ∞ ′ S = E h h . (A-12) hh t t−j j=−∞ X (cid:2) (cid:3) NotethattheestimatewithoutparameteruncertaintyissimplyS ,andtakingintoaccountparameteruncertainty ff canincreaseordecreasethelong-runvarianceofλˆdependingonthecovariancesoff t,t+4 withh t+4 . Aconsistentestimatorcanbeconstructedusingthesmall-samplecounterparts.Inparticular,wecomputeλˆ fh andλˆ settingρˆ=P/R, hh T 1 ∂f(θ) Fˆ = , P ∂θ θ=θˆ t X =R (cid:12) Bˆ B(T) p B, (cid:12) (cid:12) (A-13) ≡ → andconstructfˆ t,t+4 =f t,t+4(θˆ(t))andhˆ t =h t (θˆ(t))usingtheestimatesθˆ(t),whicharerecursivelyupdatedeach timeusingdatauptotimet. Thesamplecovariances,Sˆ ,Sˆ andS convergetotheirpopulationequivalents ff fh hh inequation(A-12).Toestimatethese,wedefinethevectorofmoments: gˆ t = fˆ t,t+4 FˆBˆhˆ t . (A-14) h i 38

Toconstructanon-singularestimateforthecovarianceofgˆ,whichwedenoteasΩˆ,weuseaNewey-West(1987) t covarianceestimatorwiththreelags. WepartitionΩˆ asthe2 2matrix: × Ωˆ = Ωˆ 11 Ωˆ 12 . (A-15) " Ωˆ 21 Ωˆ 22 # Then,aconsistentestimateofΩ isgivenby: ff Ωˆ ff =Ωˆ 11+λˆ fh (Ωˆ 12+Ωˆ 21)+λˆ hh Ωˆ 22. (A-16) 39

References Atkeson,A.,Ohanian,L.E.,2001.ArePhillipsCurvesusefulforforecastinginflation? FederalReserveBankof MinneapolisQuarterlyReview25,2–11. Ang,A.,Bekaert,G.,2002.Regimeswitchesininterestrates.JournalofBusinessandEconomicStatistics20, 163–182. Ang,A.,Bekaert,G.,Wei,M.,2006.Thetermstructureofrealratesandexpectedinflation.Workingpaper, ColumbiaUniversity. Ang,A.,Piazzesi,M.,Wei,M.,2004.WhatdoestheyieldcurvetellusaboutGDPgrowth?Journalof Econometrics,forthcoming. Bai,J.,Ng,S.,2004.Apanicattackonunitrootsandcointegration.Econometrica72,1127–1177. Bates,J.M.,Granger,C.W.J.,1969.Thecombinationofforecasts.OperationsResearchQuarterly20,451–468. Bekaert,G.,Cho,S.,Moreno,A.,2005.NewKeynesianmacroeconomicsandthetermstructure.Workingpaper, ColumbiaUniversity. Bekaert,G.,Hodrick,R.J.,Marshall,D.,2001.Pesoproblemexplanationsfortermstructureanomalies.Journal ofMonetaryEconomics48,241–270. Bernanke,B.S.,Boivin,J.,2003.Monetarypolicyinadata-richenvironment.JournalofMonetaryEconomics50, 525–546. Bernanke,B.S.,Boivin,J.,Eliasz,P.,2005.Measuringtheeffectsofmonetarypolicy:Afactor-augmentedvector autoregressive(FAVAR)approach.QuarterlyJournalofEconomics120,387–422. Boivin,J.,Ng,S.,2006.Aremoredataalwaysbetterforfactoranalysis? JournalofEconometrics,forthcoming. Brave,S.,Fisher,J.D.M.,2004.Insearchofarobustinflationforecast.FederalReserveBankofChicago EconomicPerspectives28,12–30. Brennan,M.J.,Wang,A.W.,Xia,Y.,2004.Estimationandtestofasimplemodelofintertemporalcapitalasset pricing.JournalofFinance59,1743–1775. Bryan,M.F.,Cecchetti,S.G.,1993.Theconsumerpriceindexasameasureofinflation.EconomicReviewofthe FederalReserveBankofCleveland29,15–24. Campbell,S.D.,2004.Volatility,predictabilityanduncertaintyinthegreatmoderation:Evidencefromthesurvey ofprofessionalforecasters.Workingpaper,FederalReserveBoardofGovernors. Carlson,J.A.,1977.Astudyofpriceforecasts.AnnalsofEconomicandSocialMeasurement1,27–56. Clark,T.E.,1999.AcomparisonoftheCPIandthePCEpriceindex.FederalReserveBankofKansasCity EconomicReview3,15–29. Clark,T.E.,McCracken,M.W.,2006.Thepredictivecontentoftheoutputgapforinflation:Resolvingin-sample andout-of-sampleevidence.JournalofMoney,CreditandBanking,forthcoming. Clemen,R.T.,1989.Combiningforecasts:Areviewandannotatedbibliography.InternationalJournalof Forecasting5,559–581. Chen,R.R.,Scott,L.,1993.Maximumlikelihoodestimationforamulti-factorequilibriummodeloftheterm structureofinterestrates.JournalofFixedIncome3,14–31. Cecchetti,S.,Chu,R.,Steindel,C.,2000.Theunrealiabilityofinflationindicators.FederalReserveBankofNew YorkCurrentIssuesinEconomicsandFinance6,1–6. Cochrane,J.,Piazzesi,M.,2005.Bondriskpremia.AmericanEconomicReview95,1,138–160. Cogley,T.,SargentT.J.,2005.Driftsandvolatilities:MonetarypoliciesandoutcomesinthepostWWIIU.S. ReviewofEconomicDynamics8,262–302. Croushore,D.,1998.Evaluatinginflationforecasts.WorkingPaper98-14,FederalReserveBankofSt.Louis. Curtin,R.T.,1996.Proceduretoestimatepriceexpectations.Mimeo,UniversityofMichiganSurveyResearch Center. Dai,Q.,Singleton,K.J.,2002.Expectationpuzzles,time-varyingriskpremia,andaffinemodelsoftheterm structure.JournalofFinancialEconomics63,415–41. 40

Diebold,F.X.,1989.Forecastcombinationandencompassing:Reconcilingtwodivergentliteratures. InternationalJournalofForecasting5,589–92. Diebold,F.X.,Lopez,J.A.,1996.Forecastingevaluationandcombination,inG.S.MaddalaandC.R.Rao,eds., Handbookofstatistics(Elsevier,Amsterdam)241–268. Duffee,G.R.,2002.Termpremiaandtheinterestrateforecastsinaffinemodels.JournalofFinance57,405–443. Duffie,D.,Kan,R.,1996.Ayield-factormodelofinterestrates.MathematicalFinance6,379–406. Estrella,A.,Mishkin,F.S.,1997.ThepredictivepowerofthetermstructureofinterestratesinEuropeandthe UnitedStates: ImplicationsfortheEuropeanCentralBank.EuropeanEconomicReview41,1375–401. Evans,M.D.D.,Lewis,K.K.,1995.Doexpectedshiftsininflationaffectestimatesofthelong-runFisherrelation? JournalofFinance50,225–253. Evans,M.D.D.,Wachtel,P.,1993.Inflationregimesandthesourcesofinflationuncertainty.JournalofMoney, CreditandBanking25,475–511. Fama,E.F.,1975.Short-terminterestratesaspredictorsofinflation.AmericanEconomicReview65,269–282. Fama,E.F.,Gibbons,M.R.,1984.Acomparisonofinflationforecasts.JournalofMonetaryEconomics13, 327–348. Fisher,J.D.M.,Liu,C.T.,Zhou,R.,2002.Whencanweforecastinflation?FederalReserveBankofChicago EconomicPerspectives1,30–42. Frankel,J.A.,Lown,C.S.,1994.Anindicatoroffutureinflationextractedfromthesteepnessoftheinterestrate yieldcurvealongitsentirelength.QuarterlyJournalofEconomics59,517–530. Fuhrer,J.,Moore,G.,1995.Inflationpersistence.QuarterlyJournalofEconomics110,127–159. Gali,J.,andM.Gertler,1999,“InflationDynamics:AStructuralEconometricsAnalysis,”JournalofMonetary Economics,44,2,195–222. Grant,A.P.,Thomas,L.B.,1999.Inflationexpectationsandrationalityrevisited.EconomicsLetters62,331–338. Gray,S.F.,1996.Modelingtheconditionaldistributionofinterestratesasaregime-switchingprocess.Journalof FinancialEconomics42,27–62. Hamilton,J.D.,1985.Uncoveringfinancialmarketexpectationsofinflation.JournalofPoliticalEconomy93, 1224–1241. Hamilton,J.,1988,Rational-expectationseconometricanalysisofchangesinregime:Aninvestigationofthe termstructureofinterestrates.JournalofEconomicDynamicsandControl12,385–423. Hamilton,J.,1989.Anewapproachtotheeconomicanalysisofnonstationarytimeseriesandthebusinesscycle. Econometrica57,357–384. Hansen,L.P.,Hodrick,R.J.,1980.Forwardexchangeratesasoptimalpredictorsoffuturespotrates: an econometricanalysis.JournalofPoliticalEconomy88,829–853. Hodrick,R.J.,Prescott,E.C.,1997.PostwarU.S.businesscycles:Anempiricalinvestigation.JournalofMoney, CreditandBanking29,1–16. Holden,S.,Driscoll,J.C.,2003.Inflationpersistenceandrelativecontracting.AmericanEconomicReview93, 1369–1372. Inoue,A.,Kilian,L.,2005.Howusefulisbagginginforecastingeconomictimeseries? AcasestudyofU.S.CPI inflation.Workingpaper,UniversityofMichigan. Jorion,P.,Mishkin,F.S.,1991.Amulti-countrycomparisonoftermstructureforecastsatlonghorizons.Journal ofFinancialEconomics29,59–80. Kim,D.H.,2004.Inflationandtherealtermstructure.Workingpaper,FederalReserveBoardofGovernors. Kim,C.J.,Nelson,C.R.,1999.HastheU.S.economybecomemorestable? ABayesianapproachbasedona Markovswitchingmodelofthebusinesscycle.ReviewofEconomicsandStatistics81,608–616. Kozicki,S.,1997.Predictingrealgrowthandinflationwiththeyieldspread.FederalReserveBankofKansas CityEconomicReview82,39–57. Marcellino,M.,Stock,J.H.,Watson,M.W.,2006.AcomparisonofdirectanditeratedmultistepARmethodsfor forecastingmacroeconomictimeseries.JournalofEconometrics,forthcoming. 41

Mehra,Y.P.,2002.Surveymeasuresofexpectedinflation:Revisitingtheissuesofpredictivecontentand rationality.FederalReserveBankofRichmondEconomicQuarterly88,17–36. McConnell,M.M.,Perez-Quiros,G.,2000.OutputfluctuationsintheUnitedStates: Whathaschangedsincethe early1950’s.AmericanEconomicReview90,1464–1476. Mishkin,F.S.,1990.Whatdoesthetermstructuretellusaboutfutureinflation?JournalofMonetaryEconomics 25,77–95. Mishkin,F.S.,1991.Amulti-countrystudyoftheinformationinthetermstructureaboutfutureinflation.Journal ofInternationalMoneyandFinance19,2–22. Nelson,C.R.,Schwert,G.W.,1977.Ontestingthehypothesisthattherealrateofinterestisconstant.American EconomicReview67,478–486. Newey,W.K.,WestK.D.,1987.Asimplepositive,semi-definite,heteroskedasticityandautocorrelation consistentcovariancematrix.Econometrica55,703–708. Ng,S.,Perron,P.,2001.Laglengthselectionandtheconstructionofunitroottestswithgoodsizeandpower. Econometrica69,1519–1554. Orphanides,A.,vanNorden,S.,2003.Thereliabilityofinflationforecastsbasedonoutputgapestimatesinreal time.Workingpaper,CIRANO. Pennacchi,G.G.,1991.Identifyingthedynamicsofrealinterestratesandinflation:Evidenceusingsurveydata. ReviewofFinancialStudies4,53–86. Plosser,C.I.,Schwert,G.W.,1978.Money,income,andsunspots:Measuringtheeconomicrelationshipsandthe effectsofdifference.JournalofMonetaryEconomics4,637–660. Quah,D.,Vahey,S.P.,1995.Measuringcoreinflation.EconomicJournal105,1130–1144. Schorfheide,F.,2005.VARforecastingundermisspecification.JournalofEconometrics128,99-136. Sims,C.A.,2002.Theroleofmodelsandprobabilitiesinthemonetarypolicyprocess.BrookingsPaperson EconomicActivity2,1–40. Souleles,N.S.,2004.Expectations,heterogeneousforecasterrorsandconsumption:Microevidencefromthe Michiganconsumersentimentsurveys.JournalofMoney,CreditandBanking36,39–72. Stock,J.H.,Watson,M.W.,1989.Newindexesofcoincidentandleadingeconomicindicators,inO.J.Blanchard andS.Fischer,eds.,NBERMacroeconomicsAnnual(MITPress,Boston)351–394. Stock,J.H.,Watson,M.W.,1999.Forecastinginflation.JournalofMonetaryEconomics44,293–335. Stock,J.H.,Watson,M.W.,2002a.Forecastingusingprincipalcomponentsfromalargenumberofpredictors. JournaloftheAmericanStatisticalAssociation97,1167–1179. Stock,J.H.,Watson,M.W.,2002b.Hasthebusinesscyclechangedandwhy?inM.GertlerM.andK.Rogoff, eds.,NBERMacroeconomicsAnnual2002(MITPress,Boston)159–218. Stock,J.H.,Watson,M.W.,2003.Forecastingoutputandinflation:Theroleofassetprices.JournalofEconomic Literature41,788–829. StockJ.H.,Watson,M.W.,2005.Anempiricalcomparisonofmethodsforforecastingusingmanypredictors. Workingpaper,HarvardUniversity. Stockton,D.,Glassman,J.,1987.Anevaluationoftheforecastperformanceofalternativemodelsofinflation. ReviewofEconomicsandStatistics69,108–117. Theil,H.,1963.Ontheuseofincompletepriorinformationinregressionanalysis.JournaloftheAmerican StatisticalAssociation58,401–414. Theil,H.,Goldberger,A.S.,1961.Onpureandmixedestimationineconomics.InternationalEconomicReview 2,65–78. Thomas,L.B.,1999.SurveymeasuresofexpectedU.S.inflation.JournalofEconomicPerspectives13,125–144. Timmermann,A.,2006.Forecastcombinations,inG.Elliot,C.W.J.GrangerandA.Timmermann,eds., HandbookofEconomicForecasting(Elsevier,Amsterdam),inpress. West,K.D.,1996.Asymptoticinferenceaboutpredictiveability.Econometrica64,1067–1084. 42

West,K.D.,2006.Forecastevaluation,inG.Elliott,C.W.J.Granger,andA.Timmermann,eds.,Handbookof EconomicForecasting(Elsevier,Amsterdam),inpress. West,K.D.,McCracken,M.W.,1998.Regression-basedtestsofpredictiveability.InternationalEconomicReview 39,817–840. Wright,J.H.,2004.ForecastingU.S.inflationbyBayesianmodelaveraging.Workingpaper,FederalReserve BoardofGovernors. 43

Table1: Summary Statistics PUNEW PUXHS PUXX PCE PanelA:1952:Q2–2002:Q4∗ Mean 3.84 3.60 4.24 3.84 (0.20) (0.20) (0.19) (0.19) StandardDeviation 2.86 2.78 2.56 2.45 (0.14) (0.14) (0.14) (0.13) Autocorrelation 0.78 0.74 0.77 0.79 (0.08) (0.09) (0.11) 0.09) Correlations PUXHS 0.99 PUXX 0.94 0.91 PCE 0.98 0.98 0.93 PanelB:1986:Q1–2002:Q4 Mean 3.09 2.87 3.21 2.58 (0.14) (0.17) 0.12) (0.14) StandardDeviation 1.12 1.37 0.97 1.08 (0.10) (0.12) (0.09) (0.10) Autocorrelation 0.47 0.37 0.77 0.69 (0.07) (0.10) (0.08) (0.07) Correlations PUXHS 0.99 PUXX 0.85 0.79 PCE 0.95 0.93 0.90 PanelC:1996:Q1–2002:Q4 Mean 2.27 1.84 2.32 1.70 (0.17) (0.25) (0.05) (0.13) StandardDeviation 0.81 1.19 0.24 0.62 (0.12) (0.17) (0.03) (0.09) Autocorrelation -0.13 -0.19 -0.38 0.05 (0.23) (0.23) (0.14) (0.18) Correlations PUXHS 0.99 PUXX 0.33 0.21 PCE 0.89 0.88 0.19 Thistablereportsvariousmomentsofdifferentmeasuresofannualinflationsampledataquarterlyfrequencyfor differentsampleperiods. PUNEWisCPI-UAllItems;PUXHSisCPI-ULessShelter;PUXXisCPI-UAllItems Less Food and Energy, also called core CPI; and PCE is the Personal Consumption Expenditure deflator. All measuresareinannualpercentageterms. Theautocorrelationreportedisthefourthorderautocorrelationwiththe quarterlyinflationdata,representingthefirst-orderautocorrelationofannualinflation. Standarderrorsreportedin parenthesesarecomputedbyGMM. ∗ ForPUXX,thestartdateis1958:Q2andforPCE,thestartdateis1960:Q2. 44

Table2: ForecastingModels Abbreviation Specification Time-SeriesModels ARMA ARMA(1,1) AR Autoregressivemodel RW Randomwalkonquarterlyinflation AORW Randomwalkonannualinflation RGM Univariateregime-switchingmodel PhillipsCurve(OLS) PC1 INFL+GDPG PC2 INFL+GAP1 PC3 INFL+GAP2 PC4 INFL+LSHR PC5 INFL+UNEMP PC6 INFL+XLI PC7 INFL+XLI-2 PC8 INFL+FAC PC9 INFL+GAP1+LSHR PC10 INFL+GAP2+LSHR OLSTerm TS1 INFL+GDPG+RATE StructureModels TS2 INFL+GAP1+RATE TS3 INFL+GAP2+RATE TS4 INFL+LSHR+RATE TS5 INFL+UNEMP+RATE TS6 INFL+XLI+RATE TS7 INFL+XLI-2+RATE TS8 INFL+FAC+RATE TS9 INFL+SPD TS10 INFL+RATE+SPD TS11 INFL+GDPG+RATE+SPD EmpiricalTerm VAR VAR(1)onRATE,SPD,INFL,GDPG StructureModels RGMVAR Regime-switchingmodelonRATE,SPD,INFL No-ArbitrageTerm MDL1 Three-factoraffinemodel StructureModels MDL2 Generalthree-factorregime-switchingmodel InflationSurveys SPF1 SurveyofProfessionalForecasters SPF2 Linearbias-correctedSPF SPF3 Non-linearbias-correctedSPF LIV1 LivingstonSurvey LIV2 Linearbias-correctedLivingston LIV3 Non-linearbias-correctedLivingston MICH1 MichiganSurvey MICH2 Linearbias-correctedMichigan MICH3 Non-linearbias-correctedMichigan INFL refers to the inflation rate over the previous quarter; GDPG to GDP growth; GAP1 to detrended log real GDP using a quadratic trend; GAP2 to detrended log real GDP using the Hodrick-Prescott filter; LSHR to the laborincomeshare; UNEMPto the unemploymentrate; XLI to the Stock-WatsonExperimentalLeadingIndex; XLI-2 to the Stock-Watson Experimental Leading Index-2; FAC to an aggregate composite real activity factor constructed by Bernanke, Boivin and Eliasz (2005); RATE to the one-quarter yield; and SPD to the difference betweenthe20-quarterandtheone-quarteryield. 45

Table3: Bias ofSurveyForecasts α1 α2 β1 β2 PUNEW SPF 1.321 0.482∗∗ (0.694) (0.190) Livingston 0.637 0.993 (0.375) (0.161) Michigan -0.823 1.276 (0.658) (0.205) SPF 1.437∗ -0.188 0.414∗∗ 0.128 (0.671) (0.585) (0.180) (0.140) Livingston 0.589∗∗ -0.295 0.806∗∗ 0.461∗∗ (0.184) (0.506) (0.068) (0.160) Michigan 0.039 -1.261 0.959 0.482 (0.429) (0.822) (0.099) (0.249) PUXHS SPF 0.638 0.601∗ (0.803) (0.199) Livingston 0.561 0.942 (0.337) (0.130) Michigan -0.741 1.167 (0.621) (0.166) SPF 0.612 -0.269 0.580∗ 0.147 (0.717) (1.085) (0.164) (0.279) Livingston 0.568∗∗ -0.191 0.765∗∗ 0.389∗∗ (0.202) (0.576) (0.070) (0.129) Michigan -0.267 -0.723 1.002 0.262∗ (0.613) (0.571) (0.143) (0.132) PUXX SPF 0.852 0.694 (0.612) (0.179) Livingston 0.381 1.055 (0.429) (0.133) Michigan -0.279 1.194 (0.466) (0.124) SPF 0.966 -0.201 0.643 0.100 (0.662) (0.495) (0.192) (0.123) Livingston 0.433 0.124 0.931 0.165 (0.303) (0.558) (0.104) (0.136) Michigan -0.160 -0.042 1.137 0.059 (0.579) (0.842) (0.146) (0.245) PCE SPF 0.041 0.728∗ (0.500) (0.125) Livingston 0.234 0.949 (0.479) (0.136) Michigan -0.547 1.058 (0.521) (0.139) SPF 0.122 -0.571 0.689∗∗ 0.213 (0.482) (0.751) (0.108) (0.187) Livingston 0.278 -0.094 0.785∗ 0.399∗∗ (0.453) (0.480) (0.087) (0.085) Michigan -0.061 -0.688 0.900 0.228 (0.581) (0.559) (0.145) (0.117) Thistablereportsthecoefficientestimatesinequations(15)and(16).Wedenotestandarderrorsofα1 ,α2 andβ2 thatrejectthehypothesisthatthecoefficientsaredifferenttozeroandstandarderrorsofβ1 thatrejectthatβ1 =1 atthe95%and99%levelby∗and∗∗,respectively,basedonHansenandHodrick(1980)standarderrors(reported in parentheses). For the SPF survey, the sample is 1981:Q3 to 2002:Q4; for the Livingston survey, the sample is1952:Q2to 2002:Q4forPUNEWandPUXHS, 1958:Q2to 2002:Q4forPUXX,and1960:Q2to2002:Q4for PCE;andfortheMichigansurvey,thesampleis1978:Q1to2002:Q4. 46

Table4: Time-SeriesForecasts ofAnnualInflation Post-1985Sample Post-1995Sample RMSE ARMA=1 RMSE ARMA=1 PUNEW ARMA 1.136 1.000 1.144 1.000 AR 1.140 1.003 1.130 0.988 RGM 1.420 1.250 0.873 0.764 AORW 1.177 1.036 1.128 0.986 RW 1.626 1.431 1.529 1.337 PUXHS ARMA 1.490 1.000 1.626 1.000 AR 1.515 1.017 1.634 1.005 RGM 1.591 1.068 1.355 0.833 AORW 1.580 1.061 1.670 1.027 RW 2.172 1.458 2.146 1.320 PUXX ARMA 0.630 1.000 0.600 1.000 AR 0.644 1.023 0.593 0.988 RGM 0.677 1.075 0.727 1.211 AORW 0.516 0.819 0.372 0.620 RW 0.675 1.072 0.549 0.915 PCE ARMA 0.878 1.000 0.944 1.000 AR 0.942 1.073 1.014 1.074 RGM 0.945 1.077 1.081 1.145 AORW 0.829 0.945 0.869 0.921 RW 1.140 1.298 1.215 1.288 Weforecastannualinflationout-of-samplefrom1985:Q4to2002:Q4andfrom1995:Q4to2002:Q4ataquarterly frequency. Table2containsfulldetailsofthetime-seriesmodels. NumbersintheRMSEcolumnsarereportedin annualpercentageterms.ThecolumnlabeledARMA=1reportstheratiooftheRMSErelativetotheARMA(1,1) specification. 47

Table5: OLSPhillipsCurveForecasts ofAnnualInflation Post-1985Sample Post-1995Sample Relative HH West Relative HH West RMSE 1 λ SE SE RMSE 1 λ SE SE − − PUNEW PC1 0.979 0.639 0.392 0.596 0.977 0.673 0.624 0.984 PC2 1.472 0.066 0.145 0.155 1.956 -0.117 0.199 0.169 PC3 1.166 0.269 0.233 0.258 1.295 0.171 0.349 0.344 PC4 1.078 -1.043 0.632 1.266 1.025 0.046 0.890 1.389 PC5 1.032 0.354 0.288 0.372 1.115 -0.174 0.222 0.458 PC6 1.103 -0.303 0.575 0.634 1.086 -0.633 0.488 1.054 PC7 1.022 0.460 0.161∗∗ 0.283 1.040 0.367 0.406 0.531 PC8 1.039 0.319 0.477 0.515 0.993 0.468 0.793 0.901 PC9 1.576 0.006 0.119 0.144 1.994 -0.121 0.174 0.159 PC10 1.264 0.146 0.205 0.235 1.426 0.119 0.246 0.287 PUXHS PC1 1.000 0.498 0.458 0.758 0.992 0.618 0.814 1.182 PC2 1.328 -0.022 0.218 0.239 1.586 -0.192 0.317 0.266 PC3 1.113 0.200 0.310 0.329 1.105 0.239 0.522 0.519 PC4 1.096 -0.988 0.497∗ 1.064 1.029 0.008 0.745 1.229 PC5 1.083 -0.080 0.299 0.491 1.076 -0.411 0.358 0.708 PC6 1.131 -1.074 0.519∗ 0.822 1.061 -1.316 0.512∗∗ 1.463 PC7 1.001 0.498 0.186∗∗ 0.301 1.070 0.085 0.529 0.590 PC8 1.094 -0.325 0.466 0.713 1.007 0.101 1.259 1.337 PC9 1.394 -0.055 0.186 0.224 1.624 -0.204 0.290 0.254 PC10 1.165 0.125 0.273 0.308 1.202 0.150 0.340 0.392 PUXX PC1 0.866 1.432 0.340∗∗ 1.632 0.825 1.182 0.120∗∗ 1.384 PC2 2.463 -0.120 0.072 0.100 3.257 -0.227 0.093∗ 0.119 PC3 1.664 0.054 0.213 0.190 2.076 -0.063 0.275 0.226 PC4 1.234 0.126 0.143 0.261 1.330 0.187 0.214 0.230 PC5 1.024 0.460 0.207∗ 0.370 1.185 0.134 0.445 0.551 PC6 1.005 0.479 0.477 1.053 0.916 1.009 0.277∗∗ 1.935 PC7 1.074 0.381 0.277 0.426 1.089 0.293 0.500 0.731 PC8 0.862 0.809 0.297∗∗ 0.751 0.767 1.127 0.275∗∗ 1.340 PC9 2.485 -0.076 0.069 0.100 3.262 -0.168 0.069∗ 0.120 PC10 1.873 0.079 0.136 0.153 2.562 0.038 0.150 0.151 PCE PC1 1.053 0.029 0.469 0.972 1.088 -0.240 0.434 1.119 PC2 1.698 -0.136 0.141 0.178 1.997 -0.240 0.223 0.218 PC3 1.274 -0.031 0.280 0.252 1.407 -0.239 0.354 0.340 PC4 1.027 0.343 0.392 1.004 1.031 0.339 0.535 1.138 PC5 1.125 -0.080 0.327 0.434 1.214 -0.635 0.389 0.629 PC6 1.053 0.036 0.484 1.233 1.020 0.273 0.509 1.795 PC7 1.033 0.436 0.175∗ 0.359 1.116 0.034 0.334 0.651 PC8 1.040 0.269 0.476 0.807 1.044 0.044 1.101 2.018 PC9 1.518 -0.100 0.166 0.193 1.786 -0.282 0.258 0.258 PC10 1.247 0.120 0.201 0.297 1.432 -0.068 0.235 0.322 Weforecastannualinflationout-of-sampleover1985:Q4to2002:Q4andover1995:Q4to2002:Q4ataquarterly frequency.Table2containsfulldetailsofthePhillipsCurvemodels.Thecolumnlabelled“RelativeRMSE”reports the ratio of the RMSE relative to the ARMA(1,1) specification. The column titled “1-λ” reportsthe coefficient (1 λ) from equation (17). Standard errors computed using the Hansen-Hodrick(1980) method and the West − (1996)methodarereportedinthecolumnstitled“HHSE”and“WestSE,”respectively.Wedenotestandarderrors thatrejectthehypothesisof(1 λ)equaltozeroatthe95%(99%)levelby∗ (∗∗). − 48

Table6: Term StructureForecastsofAnnualInflation Post-1985Sample Post-1995Sample Relative HH West Relative HH West RMSE 1 λ SE SE RMSE 1 λ SE SE − − PUNEW TS1 1.096 0.137 0.332 0.393 1.030 0.362 0.410 0.653 TS2 1.444 0.019 0.145 0.148 1.826 -0.147 0.229 0.182 TS3 1.176 0.193 0.229 0.259 1.226 0.156 0.335 0.358 TS4 1.166 -0.108 0.249 0.321 1.018 0.370 0.474 0.959 TS5 1.134 0.088 0.186 0.278 1.122 0.006 0.187 0.429 TS6 1.194 -0.241 0.326 0.371 1.112 -0.162 0.406 0.578 TS7 1.091 0.309 0.252 0.290 1.039 0.373 0.434 0.523 TS8 1.119 0.116 0.332 0.365 1.010 0.380 0.816 0.864 TS9 1.363 0.086 0.085 0.129 1.229 -0.008 0.083 0.305 TS10 1.196 -0.024 0.143 0.220 1.043 0.132 0.639 0.685 TS11 1.198 -0.124 0.431 0.414 1.052 0.286 0.318 0.611 VAR 1.106 0.307 0.187 0.225 1.328 -0.101 0.259 0.270 RGMVAR 1.647 0.050 0.050 0.090 1.518 -0.170 0.198 0.226 MDL1 1.323 0.161 0.064∗ 0.356 1.345 -0.088 0.192 0.247 MDL2 1.192 0.225 0.117 0.392 1.329 -0.118 0.251 0.278 PUXHS TS1 1.080 -0.025 0.413 0.508 1.014 0.373 0.553 0.824 TS2 1.345 -0.017 0.205 0.216 1.584 -0.197 0.329 0.265 TS3 1.116 0.186 0.278 0.309 1.118 0.195 0.435 0.463 TS4 1.085 -0.275 0.499 0.670 0.996 0.542 0.592 1.077 TS5 1.113 -0.082 0.214 0.358 1.094 -0.191 0.265 0.557 TS6 1.140 -0.566 0.342 0.534 1.069 -0.360 0.419 0.776 TS7 1.081 0.161 0.298 0.342 1.070 0.089 0.410 0.564 TS8 1.083 -0.054 0.411 0.497 0.975 0.559 1.057 1.055 TS9 1.173 0.114 0.105 0.201 1.130 -0.123 0.211 0.478 TS10 1.140 -0.594 0.468 0.658 1.032 -0.034 0.090 0.855 TS11 1.102 -0.121 0.423 0.482 1.049 0.093 0.164 0.667 VAR 1.001 0.496 0.264 0.354 1.137 0.041 0.426 0.433 RGMVAR 1.363 0.070 0.085 0.159 1.285 -0.149 0.366 0.383 MDL1 1.225 0.127 0.081 0.263 1.186 -0.048 0.266 0.320 MDL2 1.047 0.395 0.203 0.702 1.156 0.000 0.406 0.386 PUXX TS1 0.945 0.667 0.322∗ 0.655 0.945 0.665 0.317∗ 0.924 TS2 2.262 -0.092 0.084 0.100 2.982 -0.225 0.099∗ 0.117 TS3 1.399 0.121 0.260 0.249 1.698 -0.057 0.344 0.288 TS4 1.232 0.260 0.156 0.229 1.268 0.319 0.225 0.248 TS5 1.081 0.392 0.203 0.299 1.258 0.085 0.407 0.454 TS6 0.969 0.567 0.294 0.601 0.866 0.788 0.078∗∗ 0.882 TS7 1.068 0.419 0.203∗ 0.354 1.118 0.342 0.289 0.505 TS8 0.948 0.568 0.197∗∗ 0.459 0.958 0.520 0.253∗ 0.832 TS9 1.372 0.050 0.239 0.247 1.282 -0.101 0.457 0.504 TS10 1.034 0.433 0.284 0.467 1.208 -0.048 0.548 0.737 TS11 1.017 0.474 0.246 0.439 1.192 0.099 0.502 0.686 VAR 1.651 0.041 0.178 0.154 2.238 -0.276 0.151 0.183 RGMVAR 1.572 0.120 0.138 0.147 1.622 -0.211 0.340 0.278 MDL1 1.506 0.253 0.091∗∗ 0.381 1.593 -0.004 0.280 0.303 MDL2 1.834 0.262 0.039∗∗ 0.443 1.329 0.355 0.069∗∗ 0.298 49

Table6 Continued Post-1985Sample Post-1995Sample Relative HH West Relative HH West RMSE 1 λ SE SE RMSE 1 λ SE SE − − PCE TS1 1.075 -0.073 0.453 0.847 1.078 -0.207 0.433 1.192 TS2 1.670 -0.149 0.145 0.181 1.966 -0.247 0.226 0.221 TS3 1.279 -0.053 0.288 0.259 1.373 -0.245 0.376 0.360 TS4 1.075 0.018 0.372 0.864 1.059 0.234 0.442 0.816 TS5 1.126 -0.115 0.331 0.456 1.202 -0.645 0.383 0.663 TS6 1.094 -0.149 0.428 0.896 1.100 -0.358 0.397 1.322 TS7 1.018 0.443 0.271 0.481 1.106 0.033 0.303 0.673 TS8 1.027 0.374 0.414 0.720 1.025 0.346 1.058 1.855 TS9 1.141 -0.024 0.192 0.304 1.121 -0.825 0.584 0.939 TS10 1.087 -0.569 0.549 0.992 1.110 -0.850 0.638 1.177 TS11 1.086 0.006 0.418 0.665 1.132 -0.396 0.288 0.878 VAR 1.286 -0.179 0.274 0.298 1.511 -0.337 0.392 0.327 RGMVAR 1.507 -0.242 0.131 0.237 1.461 -0.356 0.233 0.424 MDL1 1.169 0.144 0.235 0.432 1.271 -0.374 0.284 0.481 MDL2 1.314 -0.205 0.159 1.220 1.339 -0.331 0.120∗∗ 0.589 Weforecastannualinflationout-of-sampleover1985:Q4to2002:Q4andover1995:Q4to2002:Q4ataquarterly frequency.Table2containsfulldetailsofthetermstructuremodels.Thecolumnlabelled“RelativeRMSE”reports the ratio of the RMSE relative to the ARMA(1,1) specification. The column titled “1-λ” reportsthe coefficient (1 λ) from equation (17). Standard errors computed using the Hansen-Hodrick(1980) method and the West − (1996)methodarereportedinthecolumnstitled“HHSE”and“WestSE,”respectively.Wedenotestandarderrors thatrejectthehypothesisof(1 λ)equaltozeroatthe95%(99%)levelby∗ (∗∗). − 50

Table7: SurveyForecasts ofAnnual Inflation Post-1985Sample Post-1995Sample Relative HH West Relative HH West RMSE 1 λ SE SE RMSE 1 λ SE SE − − PUNEW SPF1 0.779 1.051 0.177∗∗ 0.439∗ 0.861 0.869 0.407∗ 0.554 SPF2 0.964 0.564 0.216∗∗ 0.308 0.902 0.745 0.377∗ 0.484 SPF3 0.976 0.541 0.207∗∗ 0.302 0.915 0.728 0.414 0.479 LIV1 0.789 1.164 0.102∗∗ 0.585 0.792 1.140 0.203∗∗ 0.913 LIV2 1.180 0.335 0.177 0.281 1.092 0.403 0.437 0.550 LIV3 1.299 0.251 0.163 0.226 1.152 0.275 0.517 0.549 MICH1 0.902 0.771 0.324∗ 0.379∗ 0.862 1.113 0.520∗ 0.684 MICH2 0.961 0.675 0.327∗ 0.370 0.930 0.861 0.644 0.609 MICH3 0.968 0.655 0.347 0.375 0.947 0.776 0.653 0.567 PUXHS SPF1 0.819 0.939 0.171∗∗ 0.430∗ 0.914 0.773 0.394∗ 0.546 SPF2 0.924 0.666 0.227∗∗ 0.312∗ 0.888 0.825 0.357∗ 0.504 SPF3 1.348 0.103 0.183 0.193 0.958 0.582 0.323 0.362 LIV1 0.844 1.098 0.099∗∗ 0.573 0.856 1.072 0.214∗∗ 0.878 LIV2 1.054 0.554 0.176∗∗ 0.386 1.031 0.550 0.366 0.615 LIV3 1.199 0.327 0.156∗ 0.299 1.053 0.502 0.443 0.605 MICH1 0.881 0.876 0.273∗∗ 0.398∗ 0.937 0.750 0.434 0.476 MICH2 0.918 0.815 0.290∗∗ 0.395∗ 0.932 0.814 0.515 0.528 MICH3 0.970 0.608 0.251∗ 0.347 0.953 0.684 0.492 0.474 PUXX SPF1 0.691 0.968 0.140∗∗ 0.654 0.699 1.260 0.225∗∗ 1.437 SPF2 1.145 0.125 0.362 0.555 1.104 0.091 0.852 1.177 SPF3 1.179 0.035 0.373 0.555 1.180 -0.358 0.956 1.390 LIV1 0.655 0.803 0.192∗∗ 0.730 0.557 1.227 0.134∗∗ 1.453 LIV2 1.355 -0.185 0.177 0.185 1.387 -0.423 0.415 0.557 LIV3 1.289 -0.095 0.259 0.262 1.278 -0.496 0.735 0.850 MICH1 1.185 0.383 0.159∗ 0.301 0.822 1.041 0.208∗∗ 2.124 MICH2 1.343 -0.153 0.248 0.272 1.566 -0.385 0.286 0.356 MICH3 1.360 -0.242 0.253 0.285 1.617 -0.493 0.273 0.363 PCE SPF1 1.199 0.147 0.267 0.241 1.250 0.090 0.395 0.349 SPF2 0.980 0.537 0.206∗∗ 0.375 0.924 0.655 0.325∗ 0.570 SPF3 1.034 0.454 0.180∗ 0.306 1.040 0.453 0.234 0.362 LIV1 1.082 0.175 0.325 0.300 1.101 0.132 0.412 0.400 LIV2 1.397 -0.050 0.189 0.234 1.303 -0.026 0.265 0.358 LIV3 1.380 -0.123 0.149 0.212 1.341 -0.191 0.272 0.375 MICH1 1.217 0.108 0.216 0.192 1.338 -0.030 0.327 0.283 MICH2 1.194 0.039 0.253 0.216 1.205 0.056 0.415 0.350 MICH3 1.248 -0.022 0.239 0.200 1.255 -0.003 0.399 0.334 Weforecastannualinflationout-of-sampleover1985:Q4to2002:Q4andfrom1995:Q4to2002:Q4ataquarterly frequencyfor the SPF survey (SPF1-3) and the Michigan survey (MICH1-3). The frequency of the Livingston survey(LIV1-3)isbiannualandforecastsaremadeattheendofthesecondandendofthefourthquarter. Table2 containsfulldetailsofthesurveymodels. Thecolumnlabelled“RelativeRMSE” reportstheratiooftheRMSE relative to the ARMA(1,1)specification. The column titled “1-λ” reportsthe coefficient(1 λ) from equation − (17).StandarderrorscomputedusingtheHansen-Hodrick(1980)methodandtheWest(1996)methodarereported inthecolumnstitled“HHSE”and“WestSE,”respectively. Wedenotestandarderrorsthatrejectthehypothesis of(1 λ)equaltozeroatthe95%(99%)levelby∗(∗∗). − 51

Table8: Best Modelsin ForecastingAnnualInflation PUNEW PUXHS PUXX PCE PanelA:Post-1985Sample BestTime-SeriesModel ARMA 1.000 ARMA 1.000 AORW 0.819 AORW 0.945* BestPhillips-CurveModel PC1 0.979 PC1 1.000 PC8 0.862 PC4 1.027 BestTerm-StructureModel TS7 1.091 VAR 1.001 TS1 0.945 TS7 1.018 RawSurveyForecasts SPF1 0.779* SPF1 0.819* SPF1 0.691 SPF1 1.199 LIV1 0.789 LIV1 0.844 LIV1 0.655* LIV1 1.082 MICH1 0.902 MICH1 0.881 MICH1 1.185 MICH1 1.217 PanelB:Post-1995Sample BestTime-SeriesModel RGM 0.764* RGM 0.833* AORW 0.620 AORW 0.921* BestPhillips-CurveModel PC1 0.977 PC1 0.992 PC8 0.767 PC6 1.020 BestTerm-StructureModel TS8 1.010 TS8 0.975 TS6 0.866 TS8 1.025 RawSurveyForecasts SPF1 0.861 SPF1 0.914 SPF1 0.699 SPF1 1.250 LIV1 0.792 LIV1 0.856 LIV1 0.557* LIV1 1.101 MICH1 0.862 MICH1 0.937 MICH1 0.822 MICH1 1.338 Thetablereportsthebesttime-seriesmodel,thebestOLSPhillipsCurvemodel,thebestmodelusingtermstructure data,alongwithSPF1,LIV1,andMCH1forecastsforout-of-sampleforecastingofannualinflationataquarterly frequency.EachentryreportstheratioofthemodelRMSEtotheRMSEofanARMA(1,1)forecast.Thesmallest RMSEsforeachinflationmeasurearemarkedwithanasterisk. 52

Table9: Ex-AnteBest ModelsinForecasting AnnualInflation PUNEW PUXHS Time Phillips Term All Time Phillips Term All Date Series Curve Structure Surveys Models Series Curve Structure Surveys Models 1995Q4 ARMA PC5 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1996Q1 ARMA PC5 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1996Q2 ARMA PC5 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1996Q3 ARMA PC1 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1996Q4 ARMA PC5 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1997Q1 ARMA PC5 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1997Q2 ARMA PC5 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1997Q3 ARMA PC5 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1997Q4 ARMA PC1 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1998Q1 ARMA PC1 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1998Q2 ARMA PC1 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1998Q3 ARMA PC1 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1998Q4 ARMA PC1 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1999Q1 ARMA PC5 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1999Q2 ARMA PC5 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1999Q3 ARMA PC5 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 1999Q4 ARMA PC5 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 2000Q1 ARMA PC1 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 2000Q2 ARMA PC1 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 2000Q3 ARMA PC1 VAR SPF1 SPF1 ARMA PC7 VAR SPF1 SPF1 2000Q4 ARMA PC1 TS1 SPF1 SPF1 ARMA PC1 VAR SPF1 SPF1 2001Q1 ARMA PC1 TS1 SPF1 SPF1 ARMA PC1 VAR SPF1 SPF1 2001Q2 ARMA PC1 TS1 SPF1 SPF1 ARMA PC1 VAR SPF1 SPF1 2001Q3 ARMA PC1 TS1 SPF1 SPF1 ARMA PC1 VAR SPF1 SPF1 2001Q4 ARMA PC1 TS7 SPF1 SPF1 ARMA PC1 VAR SPF1 SPF1 53

Table9Continued PUXX PCE Time Phillips Term All Time Phillips Term All Date Series Curve Structure Surveys Models Series Curve Structure Surveys Models 1995Q4 AORW PC1 TS11 SPF1 SPF1 AORW PC7 TS7 MICH1 TS7 1996Q1 AORW PC1 TS11 SPF1 SPF1 AORW PC7 TS7 MICH1 TS7 1996Q2 AORW PC1 TS11 SPF1 SPF1 AORW PC7 TS7 MICH1 TS7 1996Q3 AORW PC1 TS11 SPF1 SPF1 AORW PC7 TS7 MICH1 TS7 1996Q4 AORW PC8 TS11 SPF1 SPF1 AORW PC7 TS7 MICH1 AORW 1997Q1 AORW PC1 TS11 SPF1 SPF1 AORW PC7 TS7 MICH1 AORW 1997Q2 AORW PC8 TS11 SPF1 SPF1 AORW PC7 TS7 MICH1 AORW 1997Q3 AORW PC8 TS11 SPF1 SPF1 AORW PC4 TS7 MICH1 AORW 1997Q4 AORW PC8 TS11 SPF1 SPF1 AORW PC4 TS7 MICH1 AORW 1998Q1 AORW PC8 TS1 SPF1 SPF1 AORW PC4 TS7 MICH1 AORW 1998Q2 AORW PC8 TS8 SPF1 SPF1 AORW PC4 TS7 MICH1 AORW 1998Q3 AORW PC8 TS8 SPF1 SPF1 AORW PC4 TS7 MICH1 AORW 1998Q4 AORW PC8 TS8 SPF1 SPF1 AORW PC4 TS7 MICH1 AORW 1999Q1 AORW PC8 TS8 SPF1 SPF1 AORW PC7 TS7 MICH1 TS7 1999Q2 AORW PC8 TS8 SPF1 SPF1 AORW PC7 TS7 MICH1 TS7 1999Q3 AORW PC8 TS8 SPF1 SPF1 AORW PC7 TS7 MICH1 TS7 1999Q4 AORW PC8 TS8 SPF1 SPF1 AORW PC4 TS7 MICH1 TS7 2000Q1 AORW PC8 TS8 SPF1 SPF1 AORW PC4 TS7 MICH1 AORW 2000Q2 AORW PC8 TS8 SPF1 SPF1 AORW PC4 TS7 MICH1 AORW 2000Q3 AORW PC8 TS8 SPF1 SPF1 AORW PC4 TS7 MICH1 AORW 2000Q4 AORW PC8 TS8 SPF1 SPF1 AORW PC4 TS7 MICH1 AORW 2001Q1 AORW PC8 TS8 SPF1 SPF1 AORW PC4 TS7 SPF1 AORW 2001Q2 AORW PC8 TS8 SPF1 SPF1 AORW PC4 TS7 SPF1 AORW 2001Q3 AORW PC8 TS8 SPF1 SPF1 AORW PC4 TS7 SPF1 AORW 2001Q4 AORW PC8 TS1 SPF1 SPF1 AORW PC4 TS7 SPF1 AORW Thetablereportstheex-antebestmodelwithineachcategoryoftime-series,Phillipscurve,andtermstructuremodels,togetherwiththeSPFandMichigansurveys.We alsoreportthebestex-antemodelacrossallmodels.Thebestmodelswithineachcategory,andacrossallmodels,yieldthelowestout-of-sampleRMSEforforecasting annualinflationata quarterlyfrequencyduringthe post-1985sampleperiod. Theex-antebestmodelsareevaluatedrecursivelythroughthesamplestarting withthe firstforecastin1985:Q4andthelastforecastendingonthedategiveninthefirstcolumn. 54

Table10: Best ModelsinForecasting AnnualInflation: RollingEstimation PUNEW PUXHS PUXX PCE PanelA:Post-1985Sample BestTime-SeriesModel AR 0.967 AR 1.002 AORW 0.819 AORW 0.945∗ BestPhillips-CurveModel PC7 1.070 PC1 1.068 PC8 1.179 PC8 1.082 BestTerm-StructureModel TS1 1.199 TS9 1.073 TS6 1.350 TS6 1.182 RawSurveyForecasts SPF1 0.779∗ SPF1 0.819∗ SPF1 0.691 SPF1 1.199 LIV1 0.789 LIV1 0.844 LIV1 0.655∗ LIV1 1.082 MICH1 0.902 MICH1 0.881 MICH1 1.185 MICH1 1.217 PanelB:Post-1995Sample BestTime-SeriesModel AR 0.879 AR 0.914 ARMA 0.635 ARMA 0.730∗ BestPhillips-CurveModel PC6 0.951 PC6 0.955 PC7 0.560 PC6 0.799 BestTerm-StructureModel VAR 0.987 VAR 0.998 TS5 0.881 TS3 0.990 RawSurveyForecasts SPF1 0.861∗ SPF1 0.914 SPF1 0.699 SPF1 1.250 LIV1 0.792 LIV1 0.856∗ LIV1 0.557∗ LIV1 1.101 MICH1 0.862 MICH1 0.937 MICH1 0.822 MICH1 1.338 The table reports the ex-post best ARIMA and random walk time-series models, the best OLS Phillips Curve model,the bestlinearmodelusingterm structuredata, alongwith SPF1, LIV1, andMCH1forecastsforout-ofsampleforecastingofannualinflationataquarterlyfrequency. Allmodelsareestimatedusingarollingwindow of 10 years. We do notconsider the regime-switchingmodels(RGM and RGMVAR) and the no-arbitrageterm structuremodels(MDL1andMLD2).EachentryreportstheratioofthemodelRMSEtotheRMSEofarecursively estimatedARMA(1,1)model. ModelswiththesmallestRMSEsaremarkedwithanasterisk. 55

Table11: CombinedForecastsofAnnualInflation Model Time- Phillips Term Best All CombinationMethod Series Curve Structure Surveys Models Models PUNEW Mean 0.898 1.123 1.057 0.851 0.992 0.998 Median 0.934 1.093 1.079 0.851 1.016 1.045 OLS 0.970 1.007 1.116 0.858 0.867 0.876 EqualWeightPrior 0.955 1.007 1.102 0.858 0.861 0.879 UnitWeightPrior 0.977 0.951 1.115 0.859 0.862 0.873 BestIndividualModel 1.000 0.960 1.207 0.861 0.861 0.861 PUXHS Mean 0.954 1.065 1.012 0.921 0.975 0.992 Median 0.953 1.082 1.053 0.921 1.009 1.039 OLS 0.963 1.001 1.069 0.917 0.919 0.924 EqualWeightPrior 0.950 1.008 1.058 0.918 0.920 0.935 UnitWeightPrior 0.977 0.992 1.085 0.916 0.914 0.914 BestIndividualModel 1.000 1.029 1.137 0.914 0.914 0.914 PUXX Mean 0.835 1.547 1.322 0.719 0.727 1.235 Median 0.940 1.167 1.211 0.719 0.735 1.052 OLS 0.631 0.885 0.964 0.699 0.665 0.706 EqualWeightPrior 0.687 0.878 0.956 0.699 0.652 0.661 UnitWeightPrior 0.650 0.836 0.947 0.699 0.658 0.658 BestIndividualModel 0.620 0.779 0.977 0.699 0.699 0.699 PCE Mean 0.968 1.160 1.127 1.285 0.999 1.105 Median 0.979 1.136 1.130 1.285 0.999 1.118 OLS 0.935 0.974 1.019 1.288 0.921 0.964 EqualWeightPrior 0.938 0.984 1.017 1.287 0.922 0.968 UnitWeightPrior 0.917 0.967 1.010 1.287 0.911 0.948 BestIndividualModel 0.921 1.057 1.106 1.289 0.887 0.887 ThetablereportstheRMSEsrelativetotheARMA(1,1)modelforforecastingannualinflationataquarterlyfrequencyout-of-samplefrom1995:Q4to2002:Q4bycombiningmodelswithineachcategory(time-series,Phillips curve,termstructure,surveys),usingtheex-antebestmodelsin eachcategory,oroverallmodels. Forecastsreportedincludethe mean and medianforecasts, and linear combinationsof forecasts using recursively-computed weights computed from OLS, or model combination regressions with various priors. We investigate an equal weightprior and a prior thatplaces only a unit weighton the best ex-antemodel. We consider onlyunadjusted SPFandMichigansurveyforecastsinthesurveycategory. Forcomparison,thelastrowineachpanelreportsthe relativeRMSEofusingtheex-antebestperformingsingleforecastmodelateachperiod(asreportedinTable9). 56

Table12: Best ModelsinForecastingAnnualInflation Changes Post-1985Sample Post-1995Sample Estimatedon Estimatedon Estimatedon Estimatedon Levels Differences Levels Differences Model RMSE Model RMSE Model RMSE Model RMSE PUNEW BestTime-SeriesModel ARMA 1.000 ARMA 1.071 RGM 0.764* ARMA 1.025 BestPhillips-CurveModel PC1 0.979 PC7 1.005 PC1 0.977 PC7 0.976 BestTerm-StructureModel TS7 1.091 TS7 1.023 TS8 1.010 TS1 0.968 RawSurveyForecasts SPF1 0.779* SPF1 0.861 LIV1 0.789 LIV1 0.792 MICH1 0.902 MICH1 0.862 PUXHS BestTime-SeriesModel ARMA 1.000 ARMA 1.098 RGM 0.833* ARMA 1.046 BestPhillips-CurveModel PC1 1.000 PC7 1.027 PC1 0.992 PC1 1.023 BestTerm-StructureModel VAR 1.001 TS7 1.004 TS8 0.975 TS7 0.987 RawSurveyForecasts SPF1 0.819* SPF1 0.914 LIV1 0.844 LIV1 0.856 MICH1 0.881 MICH1 0.937 57

Table12 Continued Post-1985Sample Post-1995Sample Estimatedon Estimatedon Estimatedon Estimatedon Levels Differences Levels Differences Model RMSE Model RMSE Model RMSE Model RMSE PUXX BestTime-SeriesModel AORW 0.819 ARMA 0.837 AORW 0.620 ARMA 0.649 BestPhillips-CurveModel PC8 0.862 PC1 0.722 PC8 0.767 PC1 0.652 BestTerm-StructureModel TS1 0.945 TS8 0.861 TS6 0.866 TS6 0.655 RawSurveyForecasts SPF1 0.691 SPF1 0.699 LIV1 0.655* LIV1 0.557* MICH1 1.185 MICH1 0.822 PCE BestTime-SeriesModel AORW 0.945 ARMA 1.029 AORW 0.921 ARMA 1.004 BestPhillips-CurveModel PC4 1.027 PC8 0.978 PC6 1.020 PC6 1.018 BestTerm-StructureModel TS7 1.018 TS8 0.945* TS8 1.025 TS4 0.951* RawSurveyForecasts SPF1 1.199 SPF1 1.250 LIV1 1.082 LIV1 1.101 MICH1 1.217 MICH1 1.338 This table reportsthe relative RMSE for forecastingannualinflationchangesof the best performingout-of-sampleforecastingmodelin each modelcategory(timeseries,PhillipsCurve,andtermstructuremodels)andthoseoftherawsurveyforecasts.Themodelsareestimatedineitherinflationlevelsorinflationdifferences.Table 2containsfulldetailsofalltheforecastingmodels. WereporttheRMSEratiosrelativetoanARMA(1,1)specificationestimatedonlevels. Modelswiththesmallest RMSEsaremarkedwithanasterisk. 58

Inthetoppanel,wegraphthefourinflationmeasures: CPI-UAllItems,PUNEW;CPI-ULessShelter,PUXHS; CPI-UAllItemsLessFoodandEnergy,orcoreCPI,PUXX;andthePersonalConsumptionExpendituredeflator, PCE. We also plot the Livingston surveyforecast. The surveyforecastis lagged one year, so that in December 1990,we plotinflationfromDecember1989to December1990togetherwith thesurveyforecastsofDecember 1989.Inthebottompanel,weplotallthreesurveyforecasts(SPF,Livingston,andtheMichigansurveys),together withPUNEWinflation.Thesurveyforecastsarealsolaggedoneyearforcomparison. Figure1: AnnualInflation andSurvey Forecasts 59

We graph the ex-ante OLS weights on models from regression (18) over the period 1995:Q4 to 2002:Q4. We combinetheex-antebestmodelwithineachcategory(time-series,PhillipsCurve,andtermstructure)fromTable 11withtherawSPFsurvey.Theweightsarecomputedrecursivelythroughthesample. Figure2: Ex-AnteWeightson Best ModelsforForecasting AnnualInflation 60

Wegraphtheex-anteOLSweightsonmodelsfromregression(22)overtheperiod1995:Q4to2002:Q4.Wecombinetheex-antebestnon-stationarymodelwithineachcategory(time-series,PhillipsCurve,andtermstructure) togetherwiththerawSPFsurvey.Theweightsarecomputedrecursivelythroughthesample. Figure3: Ex-AnteWeightsonBest I(1) ModelsforForecastingAnnualInflation Changes 61

We graph the ex-ante OLS weights on models from regression (22) over the period 1995:Q4 to 2002:Q4. We combinetheex-antebeststationarymodelwithineachcategory(time-series, PhillipsCurve, andtermstructure) togetherwiththerawSPFsurvey.Theweightsarecomputedrecursivelythroughthesample. Figure4: Ex-AnteWeightsonBest I(0) ModelsforForecastingAnnualInflation Changes 62

Cite this document

APA

Andrew Ang, Geert Bekaert, & and Min Wei (2006). Do Macro Variables, Asset Markets, or Surveys Forecast Inflation Better? (FEDS 2006-15). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2006-15

BibTeX

@techreport{wtfs_feds_2006_15,
  author = {Andrew Ang and Geert Bekaert and and Min Wei},
  title = {Do Macro Variables, Asset Markets, or Surveys Forecast Inflation Better?},
  type = {Finance and Economics Discussion Series},
  number = {2006-15},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2006},
  url = {https://whenthefedspeaks.com/doc/feds_2006-15},
  abstract = {Surveys do! We examine the forecasting power of four alternative methods of forecasting U.S. inflation out-of-sample: time series ARIMA models; regressions using real activity measures motivated from the Phillips curve; term structure models that include linear, non-linear, and arbitrage-free specifications; and survey-based measures. We also investigate several methods of combining forecasts. Our results show that surveys outperform the other forecasting methods and that the term structure specifications perform relatively poorly. We find little evidence that combining forecasts produces superior forecasts to survey information alone. When combining forecasts, the data consistently places the highest weights on survey information.},
}