feds · August 31, 1999

The Reliability of Output Gap Estimates in Real Time

Abstract

Compared to its central role in policy discussions in the United States and most other developed countries, the reliability of the measurement of the output gap has attracted relatively little academic study. Furthermore, both the academic literature and the debate among practitioners have tended to neglect a key factor. Although in a policy setting it is necessary to estimate the current (i.e. end-of-sample) output gap without the benefit of knowing the future, most studies concentrate on measurement that employs data that only become available later. In this paper we examine the reliability of alternative output detrending methods, with special attention to the accuracy of real-time estimates. We show that ex post revisions of the output gap are of the same order of magnitude as the output gap itself, that these ex post revisions are highly persistent and that real-time estimates tend to be severely biased around business cycle turning points, when the cost of policy induced errors due to incorrect measurement is at its greatest. We investigate the reasons for these ex post revisions, and find that, although important, the ex post revision of published data is not the primary source of revisions in output gap measurements. The bulk of the problem is due to the pervasive unreliability of end-of-sample estimates of the trend in output.

The Reliability of Output Gap Estimates in Real Time Athanasios Orphanides(cid:3) Board of Governors of the Federal Reserve System Simon van Norden E(cid:19)cole des Hautes E(cid:19)tudes Commerciales, Montr(cid:19)eal August 1999 Abstract Compared to its central role in policy discussions in the United States and most other developed countries, the reliability of the measurement of the output gap has attracted relatively little academic study. Furthermore, both the academic literature and the debate among practitioners have tended to neglect a key factor. Although in a policy setting, it is necessary to estimate the current (i.e. end-of-sample) output gap without the bene(cid:12)t of knowing the future, most studies concentrate on measurement that employs data that only become available later. In this paper we examine the reliability of alternative output detrendingmethods, with specialattention to theaccuracy of real-time estimates. We show that ex post revisions of the output gap are of the same order of magnitude as the output gapitself,thattheseexpostrevisionsarehighlypersistentandthatreal-timeestimatestend to be severely biased around business cycle turning points, when the cost of policy induced errors due to incorrect measurement is at its greatest. We investigate the reasons for these ex post revisions, and (cid:12)nd that, although important, the ex post revision of published data isnottheprimarysourceof revisionsinoutputgap measurements. Thebulkof theproblem is due to the pervasive unreliability of end-of-sample estimates of the trend in output. Keywords: Real-time data, output gap, business cycle measurement. Correspondence: Orphanides: Division of Monetary A(cid:11)airs, Board of Governors of the Federal ReserveSystem, Washington, D.C. 20551,USA. Tel.: (202)452-2654,e-mail: aorphanides@frb.gov. van Norden: H.E.C., 3000 Chemin de la C^ote Sainte Catherine, Montr(cid:19)eal QC, Canada H3T 2A7. Tel.: (514)340-6781,e-mail: simon.van-norden@hec.ca. (cid:3) Wewouldliketo thank BryanCampbell andAndy Filardofor their comments,as wellas seminar participantsattheAtelier(cid:19)econometriquedeMontr(cid:19)eal,theUniversityofOttawa,OptimizationDays andthemeetingsoftheCanadianEconomicsAssociation. Prof. vanNordenwouldalsoliketothank theSSHRCfortheirsupport. Theopinionsexpressedarethoseoftheauthorsanddonotnecessarily reflect views of the Board of Governors of the Federal Reserve System.

1 Introduction One of the fundamental issues in macroeconomics is understanding macroeconomic fluctuations. At the most aggregate level this entails the study of an economy’s output relative to its potential level. Understanding whether the economy is operating at its full potential, however, presupposes accurate measurement of both actual output as well as potential output. The di(cid:11)erence between the two is commonly referred to as the business cycle or the output gap. Although macroeconomic analysis often takes the availability of such measures for granted, considerable uncertainty surrounds them in practice. Theissueis ofsomeimportanceforempiricalmacroeconomics sincetestingandcomparison of alternative models can be easily obscured by inaccurate measurements. Bluntly, to evaluate whether a speci(cid:12)c theory or model can provide an adequate accounting of macroeconomic fluctuations we must (cid:12)rst measure the fluctuations that are to be accounted for. The problem is especially acute for economic policy. While academic investigations can a(cid:11)ord the luxury of waiting for the accumulation of accurate historical data before estimatesofpastactualandpotentialoutputneedtobeconstructed,policydecisionsrequire such estimates in real-time and policy actions based on incorrect real-time estimates may inadvertently contribute to undesirable macroeconomic outcomes. For (cid:12)scal policy, it is often useful to abstract from cyclical influences to assess whether policyisexpansionaryorcontractionary andalso toevaluatethepathofgovernment expenditures and (cid:12)nance. The resulting \full employment" budget estimates, however, squarely rest on accurate assessments of the economy’s performance relative to potential. Uncertainty regarding the measurement of the business cycle arguably presents a bigger problem for monetary policy. A central bank can influence credit conditions and consequently aggregate demand via its monetary policy instrument. This potentially allows monetary policy to dampen aggregate demand fluctuations and, when necessary, counteract inflationary pressures. However, since such policy actions a(cid:11)ect aggregate demand and inflationwithalag,timelymeasuresandforecastsoftheoutputgapareessential. Obviously, 1

unless the economy’s potential can be reliably measured, policy choices may fail to react to the true underlying economic conditions and may instead partially reflect measurement error. Three distinct issues complicate assessment of the economy’s performance relative to its potential in real-time. First, output data (and other o(cid:14)cially published macroeconomic time series) are continually revised in response to more complete reporting, adjustment of seasonal factors, re(cid:12)nements in concept or methodology, etc. This implies that measures of theoutputgap available in real-time may di(cid:11)er fromthose constructed fromdata published manyyears later. Second,mostmethodsforestimatingpotentialprovidedi(cid:11)erentestimates ofpotentialoutputforagivenquarterifdataonactualoutputinyearsfollowingtherelevant quarter are made available. This may be because hindsight makes clearer which part of the businesscycle theeconomy was in ata particular pointin time, even if ourbeliefs aboutthe processes driving output growth do not change. In this way, the passage of time may allow better estimates of a speci(cid:12)c quarter’s output gap to be made ex post, even if no revisions are made to actual output data. Third, the subsequent evolution of output may indicate that the economy has undergone a structural change. This in turn may lead to a change in our beliefs about the economy and the expected evolution of potential output. It may also cause us to revise our beliefs about potential output and the output gap in the period prior to our becoming convinced that a structural change had taken place. This paper investigates the quantitative relevance of these issues for the measurement of the output gap in the United States over the last thirty years. We investigate several well-known methods for estimating the output gap. For each method, we examine the behaviourofend-of-sampleoutputgap estimates andof therevisionsoftheseestimates over time. Speci(cid:12)cally, we calculate the statistical propoerties of the revisions and decompose them into their various sources, including the component due to revisions of the underlying output data and that due to re-estimation of the process generating potential output. We then compare the revision behaviour of the alternative methods. 2

In the present paper, we restrict our attention to univariate methods for estimation of the output gap. To conduct a thorough analysis based on multivariate techniques would require compilation of unrevised data series for all variables involved and would also introduce additional conceptual issues. Briefly, by utilizing information from additional sources, multivariate techniques may reduce the errors associated with the end-of-sample estimates of the output trend from univariate methods. However, multivariate techniques also introduce additional sources of mispeci(cid:12)cation and parameter estimation problems which may more than o(cid:11)set the potential improvement these methods o(cid:11)er. By concentrating on the univariate methods we provide a benchmark against which these additional issues can be examined. To help assess the pertinence of our results, we also provide a brief comparison of our real time univariate estimates of the output gap to \o(cid:14)cial" real time output gap estimates as constructed in the United States from the mid 1960s. The potential quantitative relevance of the issues we investigate has been pointed out before. Using (cid:12)nal data, Kuttner (1994) and St-Amant and van Norden (1998) pointed out that di(cid:11)erences between end-sample and mid-sample estimates of the output gap can di(cid:11)er substantially for some commonly used methods for estimating the output gap, such as unobserved component and smoothing spline methods. Orphanides (1997,1999) documented that the errors in \o(cid:14)cial" estimates of the output gap available to policymakers have indeed been substantial and several authors, includingKuttner (1992), McCallum and Nelson (1998), Orphanides (1998) and Smets (1998) have elaborated on the policy implications of this issue. As far as we know this study is the (cid:12)rst attempt at comprehensive measurement and evaluation of the measurement errors associated with various techniques based on real-time data for the past thirty years. 3

2 Data Sources, Revisions and Concepts 2.1 How to measure the reliability of measured output gaps Our aim in this study is to understand better the reliability and statistical accuracy of commonly used estimates of output gaps. While there are many approaches to measuring their reliability and accuracy, none is without limitations. One way would be to generate arti(cid:12)cial output data from an economic model which would then be detrended by the various di(cid:11)erent methods under study. The di(cid:11)erent estimates of the true output gap could then be compared to the known output gaps from the economic model. The problem with such an approach is that results will in general depend on the speci(cid:12)cation of the economic model and a wide range of speci(cid:12)cations could reasonably be considered plausible.1 Furthermore, it would ignore the uncertainty introduced by the ongoing revision of published data. Another way would be to simply use the statistical uncertainty associated with our estimate of potential or trend output to put con(cid:12)dence intervals around these estimates and therefore around our calculated output gaps. Unfortunately, some popular methods (suchastheHP(cid:12)lter)donotgivestatistical con(cid:12)denceintervals. Furthermore,thismethod implicitly assumes that the statistical model is not misspeci(cid:12)ed, an assumption which often appears to be at odds with the evidence. Finally, this too would ignore the e(cid:11)ects of data revision. Athirdwaywouldbetospecifyaparticularmeasureofthevalueofoutputgapmeasures. For example, if the goal of measuring output gaps is to aid policy makers in controlling inflation, we might seek to measure the marginal forecasting power of output gaps for subsequent inflation. Again, ambiguity about the goal of measuring output gaps implies that di(cid:11)erent criteria might reasonably be used and could give varying results for any particular method.2 More seriously, such a methodology would have to address the special 1For an exampleof such sensitivity analysis, see Guay and St-Amant(1996). 2This is not a criticism of the methodology, of course. This simply reflects the fact that some measures may be bettersuited for some purposes than for others. 4

problems posed by the Lucas Critique.3 The alternative approach which we use in this paper allows us to capture the e(cid:11)ects of errors due both to data revision and to misspeci(cid:12)cation of statistical models used to estimate output gaps. At the same time, it is simple to implement and does not require a priori assumptions on the true structure of the economy or on the time-series model generating observed output. We explain this method in detail below. Briefly, it consists of measuring the degree to which estimates of the output gap at any point in time vary as data are revised and as data about the subsequent evolution of output becomes available. To be sure, this method is not without its own limitations. We measure only revisions in estimates of the output gap. However, it is reasonable to assume that some uncertainty remains with long-past historical estimates of the output gap. Since the total amount of uncertainty at the end of a sample is presumably the sum of the uncertainty from these two sources, thisapproach gives usan overestimate of theprecision andaccuracy associated withanydetrendingmethod. Thishasimplications forthewaywecan interpretourresults. A (cid:12)nding that revision errors are small might not be very meaningful, since it would not necessarily imply that the remaining \unrevised" errors are small. Similarly, it would be naive to attempt to rank di(cid:11)erent methods on the basis of the size of their revisions. It is nonetheless informative when and if we (cid:12)nd that revision errors are relatively large, since we can conclude that the total error of these estimators must be larger still. 2.2 Data Most of our data is taken from the real-time data set compiled by Croushore and Stark (1999). From their database we use the real-time variables for real output from 1965 to 1997. In each quarter, these time series reflect real output as published during that quarter by the Department of Commerce. The latest observation is always the one corresponding 3To see this, consider the case where we directly observe the true output gap. If the gap is unknown to themonetarypolicyauthority,itwillpresumablyhavesomeforecastingpowerforinflation. However,ifthis information is available to the policy maker and is used e(cid:14)ciently in setting policy, then it will appear to haveno forecasting power if there havebeen o(cid:11)setting adjustments in monetary policy. 5

to the previous quarter.4 The data are seasonally adjusted, and therefore alternative data vintages reflect, among other changes, re-estimation of seasonal factors. Theconcept of real output has also evolved over time. In the U.S. the benchmark series was GNP until the end of 1991 and GDP since then. In addition, changes also reflect the choice of deflator. Until the end of 1995 real output was measured in constant dollars with the benchmark year changing once or twice in every decade. Since then a chain-weighted deflator is being used. We use 1999:Q1 data as \(cid:12)nal data" recognizing, of course, that \(cid:12)nal" is very much an ephemeral concept in the measurement of output. Even when the output concept and deflator are same, (cid:12)rst released output data di(cid:11)er signi(cid:12)cantly from subsequent releases. The biggest revisions are in the (cid:12)rst few quarters after the release. However, once a year a major revision is made and seasonals adjusted with changes that are, at times, substantial for the few most recent years. 2.3 Measuring the revision of output gaps We use the data set mentioned above with a variety of detrending methods (described in the next section) to produce many di(cid:11)erent estimated output gap series. However, we also apply each of these detrending methods in a number of di(cid:11)erent ways in order to estimate and decompose the extent of the revisions in the estimated gap series. To understand how the extent of the revisions is measured, we de(cid:12)ne several conceptually di(cid:11)erent ways in which any existing detrending method may be applied. In the remainder of this section, we describe how these methods were applied and their corresponding interpretations.5 4TheCroushoreandStarkdatabasesamplesinformationinthemiddleofeveryquarter. Asaresultona fewoccasionswhenthedatawerereleasedlaterthanusualtherealoutputdataforthepreviousquarterare notavailable. Toavoidmissing observationswesupplementedthedatawith information publishedtowards theendofthequarteronthoseoccasionsusingthe(cid:12)rstSurvey ofCurrent Businessissuewhereinformation for the previousquarter was reported. 5A more technical description of the methodswe used may be found in theAppendix. 6

2.3.1 Final Estimates The (cid:12)rst of these methods gives rise to a \Final" estimate of the output gap. This simply takes the last available vintage of data we have available (in our case, this is the series as published in 1999Q1) and detrends it. The resulting series of deviations from trend constitutes the \Final" estimate of the output gap. This is the typical way in which such detrending methods are employed. 2.3.2 RealTime Estimates The \RealTime" estimate of the output gap is constructed in two stages. First, we detrend each and every vintage of data available to construct an ensemble of output gap series. Of course, earlier vintage output gap series are shorter than later vintages since the output seriesonwhichtheyarebasedendearlier. Next, weusethesedi(cid:11)erentvintages toconstruct a new series which consists entirely of the (cid:12)rst available estimate of the output gap for each point in time. This new series is the \RealTime" estimate of the output gap. It represents the most timely estimate of the output gap which policy makers could have constructed at any point in time. The di(cid:11)erence between the RealTime and the Final estimate give us the total revision in the estimated outputgap at each point in time. We usethe statistical properties of theserevisionsas ourguidetoreliability andaccuracy ofestimated outputgaps recalling, of course, that this is an overestimate of the true reliability of the RealTime estimates since it ignores the estimation error in the (cid:12)nal series. 2.3.3 QuasiReal Estimates The di(cid:11)erences between the RealTime and the Final estimates have several sources, one of which is the ongoing revision of published data. To isolate the importance of this factor, we de(cid:12)ne a third output gap measure, the \QuasiReal" estimate. Like the RealTime estimate, the QuasiReal estimate is constructed in two steps. The (cid:12)rststep is to construct an ensemble of \rolling" estimates of the outputgap. That 7

is,webeginbytakingtheFinalvintage oftheoutputseriesbutuseonlytheobservationsup to and including 1966:Q1 in order to compute the QuasiReal estimate for 1966:Q1. Next, we extend the sample period by one observation and repeat the detrending. We continue in this way until we have used the full sample period for the Final output series and we have a full set of corresponding output gap series. The second step is the same as that used to construct the RealTime series; we collect the (cid:12)rst available estimate of the output gap at each point in time from the various series we constructed in step one. This sequence of output gaps is the QuasiReal series. The di(cid:11)erence between the RealTime and the QuasiReal series is entirely due to the e(cid:11)ects of data revision, since estimates in the two series at any particular point in time are based on data samples covering exactly the same time period. 2.3.4 QuasiFinal Estimates For unobserved component (UC) models, we are able to further decompose the revision in the estimated gap by de(cid:12)ning another estimate of the output gap. This QuasiFinal estimate uses more information than the QuasiReal estimate (which uses subsamples of Final data) but less than the Final estimate (which uses the full sample of Final data.) This is relevant because UC models use the data in two distinct phases. First, they use the available data sample to estimate the parameters of a time-series model of output. Next, theyusetheseestimatedparametersintheKalman(cid:12)ltertoarriveatestimates oftheoutput gap. However, they distinguish between \(cid:12)ltered" and \smoothed" estimates of the output gap. The smoothed estimate uses the full sample parameter estimates and data from 1 to T to form an optimal estimate of the gap in quarter t (1 (cid:20) t (cid:20) T). However, the (cid:12)ltered estimate uses only data from 1 to t with the full sample parameter estimates to make an optimal estimate of the output gap at t. For this class of models, smoothed estimates of the output gap are used to construct the Final series, while (cid:12)ltered estimates are used for the QuasiFinal series.6 The di(cid:11)erence 6Inbothcases,theUCmodel’sparametersareestimatedusingthefullsampleoftheFinalvintagedata, 8

between the QuasiFinal and the QuasiReal series then reflect solely the e(cid:11)ects of using di(cid:11)erent parameter estimates for the model to (cid:12)lter the data (i.e. the full-sample ones versus the partial sample ones). The extent of the di(cid:11)erence will reflect the importance of parameter instability in the underlying UC model. The di(cid:11)erence between the QuasiReal and the RealTime series reflects the importance of ex post information in estimating the output gap given the parameter values of the process generating output.7 3 Alternative Detrending Methods Having explained how we will measure the precision and reliability of di(cid:11)erent detrending methods, we now briefly review a variety of detrending methods. We consider four types of methods. They are: 1. Deterministic Trends. 2. The Hodrick Prescott Filter 3. The Beveridge Nelson Decomposition 4. Unobserved Component Models. Next we briefly discuss each of these four groups and the variants of these methods which we apply. Readers familiar with these detrending methods may wish to just skim this section and pass rapidly onto section 4, where we discuss our results. 3.1 Deterministic Trends The(cid:12)rstset of detrendingmethods we consider assume that the trend in (the logarithm of) output is well approximated as a simple deterministic function of time. We consider three such functions; linear, quadratic, and piece-wise linear functions. and the same data is then used for (cid:12)ltering and smoothing. 7St-AmantandvanNorden(1998) arguethatthedegreetowhichthesubsequentbehaviourofoutputis informative about the outputgap is linked to presence or absence of hysteresis in output. 9

The linear trend is the oldest and simplest of these models. It assumes that output may be decomposed into a cyclical component and a linear function of time y = (cid:11)+(cid:12) (cid:1)t+c (1) t t where c is the business cycle and y is our chosen measure of output (in logarithms). The t t quadratic trend adds a second term in the deterministic component: y = (cid:11)+(cid:12)(cid:1)t+γ (cid:1)t2+c (2) t t This allows the flexibility to detect a slowly changing trend in a simple way. Because of the noticeable downturn in GDP growth after 1973, another simple deterministic technique is a breaking linear trend that allows for the slowdown in that year. In general, the breaking trend model can be written as: y = (cid:11)+(cid:12)(cid:1)t+c for t (cid:20) t t t 1 y = (cid:11)+(cid:12)(cid:1)t+γ(cid:1)(t−t )+c for t > t (3) t 1 t 1 Breaking trends were (cid:12)rst formally studied by Perron (1989), who allowed also for multiple breaks in the trend. Our implementation of the breaking trend method will incorporate the assumption that thelocationofthebreakis(cid:12)xedandknown. Speci(cid:12)callyweassumethatabreakinthetrend at the end of 1973 would have been incorporated in real time from 1977 on. This conforms with the debate regarding the productivity slowdown during the 1970s and evidence (e.g. Council of Economic Advisers, 1977) that it would not have been reasonable to introduce the 1973 break earlier but would be appropriate to do so as early as 1977.8 Due to their simplicity, deterministic trends remain appealing. Some authors use deterministic trend methods, particularly when simplicity is greatly valued as in some applications regardingmonetarypolicyevaluation. For example, Taylor (1993) relied ondeviations 8Wealsoinvestigatedalternatives,includingoneswithabreakofunknownlocationandalsothepossibility of multiple breaks. For compactness we only report the (cid:12)xed break in 1973 case since this method is more common for practical applications, especially ones relating to productivity and output. Qualitatively, the results were similar for the otheralternatives. 10

from a linear trend to measure the cycle, and Clarida, Gali and Gertler (1998) employed a quadratic trend. Theuse of deterministic trends, however, remains a matter of controversy. Nelson and Plosser’s (1982) seminal critique of the adequacy of deterministic trend model, has sparked fully two decades of research and debate. To briefly summarize a vast and still unsettled literature, there is still no consensus on the adequacy of the model, with at least some recent papers disputing Nelson and Plosser’s claim that output was better modeled as containing a stochastic rather than a deterministic trend.9 However, the possibility that output contained a unit root (and possibly more than one) suggested a variety of other detrending methods which we consider next. 3.2 The Hodrick Prescott Filter and Smoothing Splines In recent years, smoothing splines have frequently been used to detrend output and other time series. The most popular of these is that proposed by Hodrick and Prescott (1997) and is commonly called the HP (cid:12)lter.10 The HP (cid:12)lter decomposes a time series y into an t additive cyclical component, yc, and a growth component y g , t t y = yc+yg (4) t t t and then chooses the series fygg to minimize the variance of the cyclical component yc t t subject to a penalty for the variation in the second di(cid:11)erence of the growth component yg. t Formally, the HP-(cid:12)ltered trend is given by XT fyggT+1 = argmin f(y −yg)2+(cid:21)[(yg −yg)−(yg −yg )]2g (5) t t=0 t t t+1 t t t−1 t=1 and yc is the resulting measure of the output gap. (cid:21) is called the \smoothness parameter" t and penalizes the variability in the growth component. The larger the value of (cid:21), the smoother the growth component and the greater the variability of the output gap. As (cid:21) approachesin(cid:12)nity,thegrowthcomponentcorrespondstoalineartimetrend. Forquarterly data, Hodrick and Prescott propose setting (cid:21) equal to 1600. 9For example, see Rudebusch(1993), Rothman (1997), Cheungand Chinn (1997). 10The method was proposed byHodrick and Prescott in theirinfluential 1981 working paper. The development of smoothing splines dates back to thework of Whittaker (1923) and Henderson (1924). 11

King and Rebelo (1993) show that under some conditions the HP (cid:12)lter will be the optimal (cid:12)lter for identifying the cyclical component of a series. Harvey and Jaeger (1993) compare it to a structural time-series model and conclude \...that the HP (cid:12)lter is tailormadeforextractingthebusinesscyclecomponentfromUSGNP"(p. 236). BaxterandKing (1995) show that the HP (cid:12)lter \...can, in some cases, producereasonable approximations to anidealbusinesscycle(cid:12)lter"(p. 21-22). However, useoftheHP(cid:12)lterremainscontroversial. King and Rebelo note that the conditions for optimality are unlikely to be satis(cid:12)ed and Harvey and Jaeger (cid:12)nd the HP (cid:12)lter performs less well on other series. Cogley and Nason (1995) discuss the dangers of spurious cyclicality induced by the HP (cid:12)lter while Guay and St-Amant (1996) argue that the HP (cid:12)lter does a poor job of extracting business cycle frequencies from macroeconomic time series.11 Despite this, the HP (cid:12)lter remains popular in applied work (e.g. Taylor, 1998). Multivariate applications of the (cid:12)lter have also been developed (e.g. Laxton and Tetlow, 1992 and Kozicki, 1998). 3.3 The Beveridge-Nelson Decomposition Beveridge and Nelson (1981) consider the case of an ARIMA(p,1,q) series, y, which is to be decomposed into a trend and a cyclical component. For simplicity, we can assume that all deterministic components belong to the trend component and have already been removed from the series. Since the (cid:12)rst-di(cid:11)erence of the series is stationary, it has an in(cid:12)nite-order MA representation of the form (cid:1)y t = " t +(cid:12) 1 (cid:1)" t−1 +(cid:12) 2 (cid:1)" t−2 +(cid:1)(cid:1)(cid:1) = e t (6) where " is assumed to be an innovations sequence. The change in the series over the next s periods is simply Xs Xs y −y = (cid:1)y = e (7) t+s t t+j t+j j=1 j=1 11A summary of these critiques and others may be found in St-Amant and van Norden (1997). See also Christiano and Fitzgerald (1999) for comparisons of the HP(cid:12)lter with theband pass (cid:12)lter. 12

The trend is de(cid:12)ned to be Xs lim E (y )=y + lim E ( e ) (8) s!1 t t+s t s!1 t t+j j=1 From equation 6, we can see that X1 E t (e t+j ) =E t (" t+j +(cid:12) 1 (cid:1)" t+j−1 +(cid:12) 2 (cid:1)" t+j−2 +(cid:1)(cid:1)(cid:1))= (cid:12) j+i (cid:1)" t−i (9) i=0 Since changes in the trend are therefore unforecastable, this has the e(cid:11)ect of decomposing the series into a random walk and a cyclical component, so that y = (cid:28) +c (10) t t t where the trend is (cid:28) t = (cid:28) t−1 +e t and e is white noise. t To use the Beveridge-Nelson decomposition we must therefore: (1) Identify p and q in our ARIMA(p,1,q) model. (2) Identify the f(cid:12) g in equation 6. (3) Choose some large j enough but (cid:12)nite value of s to approximate the limit in equation 8.12 (4) For all t and for j = 1;(cid:1)(cid:1)(cid:1);s, calculate E (e ) from equation 9. (5) Calculate the trend at time t as t t+j P y +E ( s e ) and the cycle as y minus the trend. t t j=1 t+j t Based on results for the full sample, we use an ARIMA(1,1,2), with parameters reestimated by maximum likelihood methods before each recalculation of the trend. When applied to GDP, the Beveridge-Nelson decomposition typically implies relatively small and not very persistent output gaps.13 The Beveridge-Nelson decomposition was influential in the 1980s when the small variance of its cycles in output was interpreted as implying that real rather than nominal shocks dominated output fluctuations. This reasoning has been discredited by the work of Watson (1986) and Quah(1992), who stressed 12This need not be very large since changes in the detrended log of output may not be very persistent. For example, Blanchard and Fischer (1989) argue that changes in the detrended log of U.S. GDP are well approximated by an MA(2), implying that the correct model for log output is an ARIMA(0,1,2) and that s=2 is su(cid:14)cient. 13This reflectsthe fact that ARMAmodels havelittle ability to forecast changes in output. 13

that other decompositions could lead to other conclusions, and Lippi and Reichlin (1994) who noted that the random walk assumption imposed on the trend does not match the implications of business cycle models.14 Perhaps as a result, multivariate extensions of this method have been much more influential in recent years. (See e.g. Rotemberg and Woodford,1996, forsuchanapplication forbusinesscycleanalysis.) Suchmethodscurrentlyform the basis of the OECD’s measures of the output gap and their work on cyclical adjustment of government de(cid:12)cits and surpluses. (Giorno et al., 1995.) 3.4 Unobserved Component Models Unobservedcomponent(UC)modelsattemptto specifythetime-series propertiesofoutput and use the resulting model to identify cyclic and trend components. Surveys of its use in business cycle estimation may be found in Enders (1994) and Maravall (1996). Among the simplest UC models are the Local Level models, y = (cid:22) +" ; t t t (cid:22) t = (cid:22) t−1 +(cid:17) t ; (11) and the Local Linear Trend models, y = (cid:22) +" ; t t t (cid:22) t = (cid:22) t−1 +(cid:12) t−1 +(cid:17) t ; (12) (cid:12) = (cid:12) t−1 +(cid:16) t : In the former (equation 11), the observed output series y is composed of a random walk t component (cid:22) and white noise " . " and the increments of the randomwalk are assumed to t t t be mutually uncorrelated and follow independent Gaussian distributions. This implies that y follows an IMA(1,1), with the size of the MA term determined by the relative variances t of " and (cid:22). The local linear trend modi(cid:12)es the local level model by assuming that the 14Quah (1992) notes that of all possible decompositions, the Beveridge-Nelson decomposition minimizes thevariance of thecyclical component. 14

increments to the trend component, (cid:22) , are not i.i.d, but themselves follow a local level t model.15 This implies that y must be I(2) rather than I(1). t Popularmodelsofquarterlyoutputaretypically basedononeofthesetwobasicmodels, addingonlyricher short-termdynamics. The(cid:12)rstof theseto beapplied was thatof Watson (1986), who modi(cid:12)ed the linear level model by replacing the white noise error term " with t an AR(2) process to allow for more business cycle persistence. y = (cid:22) +c t t t (cid:22) t = (cid:14)+(cid:22) t−1 +(cid:17) t (13) c t = (cid:26) 1 (cid:1)c t−1 +(cid:26) 2 (cid:1)c t−2 +" t Next was Clark (1987), who similarly modi(cid:12)ed the local linear trend model to allow for an AR(2) cycle. y = n +x t t t n t = g t−1 +n t−1 +(cid:23) t g t = g t−1 +w t (14) x t = (cid:30) 1 (cid:1)x t−1 +(cid:30) 2 (cid:1)x t−2 +e t where (cid:23) ;w and e are i.i.d mean-zero gaussian processes. t t t Finally, Harvey and Jaeger (1993) o(cid:11)ered a di(cid:11)erent modi(cid:12)cation of the local linear trend model in which Clark’s AR(2) cycle is replaced by a sinusoidal stochastic process, . t y = (cid:22) + +" t t t t (cid:22) t = (cid:22) t−1 +(cid:12) t−1 +(cid:17) t (cid:12) t = (cid:12) t−1 +(cid:16) t (15) t = (cid:26)(cid:1)cos((cid:21) c (cid:1) t−1 )+(cid:26)(cid:1)sin((cid:21) c (cid:1) t (cid:3) −1 )+(cid:31) t 15Again,alltheerrortermsareassumedtobenormallydistributedandmutuallyindependentatallleads and lags. 15

t (cid:3) = −(cid:26)(cid:1)sin((cid:21) c (cid:1) t−1 )+(cid:26)(cid:1)sin((cid:21) c (cid:1) t (cid:3) −1 )+(cid:31) (cid:3) t wheref";(cid:17);(cid:16);(cid:31);(cid:31) (cid:3)gareallmean-zerogaussiani.i.d. errorsandareuncorrelatedatallleads and lags. All three of the above-mentioned paperssuggested usingthe cycle-trend decompositions implied by these models as a measure of the business cycle.16 These univariate models have led to a series of multivariate extensions which are currently used extensively in output gap measurement.17 We examine the simpler univariate models in this paper for a variety of reasons. First, there are some indications that the multivariate versions are not always much more precise thantheirsimplerunivariatecounterparts. Inthosecases, ouranalysis oftherevisionerrors should help us understand the reliability of the resulting estimates. Second, the inclusion of the UC models allows us to further decompose the di(cid:11)erence between the QuasiReal and the Final estimates and thereby better understand the importance of parameter instability in causing revisions to output gap estimates. Finally, since UC models also allow us to calculate the con(cid:12)dence intervals around our estimated output gaps, the revision errors serve as a useful check on the accuracy of these standard errors in the face of possible misspeci(cid:12)cation. 4 Results Figure1comparestheestimatedbusinesscyclesfortheeightdi(cid:11)erentmethodsmentionedin Section 3. RealTime estimates are shown in the top half of the (cid:12)gure while Final estimates are shown in the lower half. Several features are readily apparent. First, the di(cid:11)erent methods have strong short-term comovements. Most appear to be moving upwards or downwards at roughly the same time, although the amount of these moves vary from one method to another. 16Clark (1987) also considered a bivariate model of output growth. 17For example, see Kuttner(1992, 1994), Amato (1997), Gerlach and Smets(1997) and Kichian (1999). 16

Second, despite having similar short-term movements, the di(cid:11)erent methods typically giverisetoawiderangeofdi(cid:11)erentestimates oftheoutputgap. Thedi(cid:11)erencebetween the highest and lowest estimate is frequently over 4 percent of output and is the same order of magnitude as the size of the business cycle itself. The dispersion of estimates is su(cid:14)ciently great that estimates of both signs can usually be found and exceptions to this rule tend to be short-lived. Curiously, both the RealTime and the Final estimates show a period during which all the estimates tended to be tightly clustered. However, these periods are quite di(cid:11)erent for the two kinds of estimates; around 1973 for the RealTime estimates and 1984-1990 for the Final estimates. To provide a (cid:12)rst impression of the variation and size of the revisions implied by the real-time and (cid:12)nal estimates shown in (cid:12)gure 1, we plot the di(cid:11)erence of the two series for each method in (cid:12)gure 2. As with the estimates themselves, the dispersion of revisions is great, especially in the mid 1970s, suggesting that interpreting the accuracy of estimates during that period might have been especially di(cid:14)cult. The mid 1970s also coincides with the periodwhen the \o(cid:14)cial" estimates of theoutput gap (which were preparedat the time by the Council of Economic Advisers) were most inaccurate. At the time, those estimates were based on a segmented trend method for estimating potential output which proved particularly misleading for assessing the productive capacity of the economy following the productivityslowdownofthelate1960sandearly1970s. By1975, theseestimatessuggested that output was more than 10 percentage points below potential|similar to what is shown in (cid:12)gure 1 for our linear and quadratic trend method estimates. 4.1 Revision size and persistence To better understand the di(cid:11)erences between the RealTime and the Final estimates, Table 1 provides descriptive statistics on the various RealTime, QuasiReal, QuasiFinal and Final estimates, while Table 2 provides similar statistics for the total revision (i.e. Final estimate - RealTime estimate). Comparing the two tables, we see that the revisions are of the same order of magnitude as the estimated output gaps, although this varies somewhat 17

across methods. The last column of table 2 reports the estimated (cid:12)rst order autocorrelation coe(cid:14)cients for the revisions, showing that they are highly persistent. Aside from the Beveridge-Nelson model, the persistence ranges from 0.80 for the Breaking Trend to 0.96 for the Linear Trend and the Watson model. It is worth noting that the statistical properties of these revisions are broadly in line with those of the revisions of \o(cid:14)cial" output gap estimates for the U.S. One such series is examined in Orphanides (1999), who has compiled the real-time output gap estimates available at the Federal Reserve from 1965 to 1993. These were based on the Council of EconomicAdvisersestimatesduringthe1960sand1970sandFederalReservesta(cid:11)estimates during the 1980s and 1990s. The standard deviation of these real-time estimates from 1966Q1to1993Q4is3.8percent. Comparisonofthesereal-timeestimateswiththehistorical Federal Reserve sta(cid:11) estimates available in 1994Q4 suggests large and highly persistent revisions. The standard deviation of these revisions is 2.6 percent and their (cid:12)rst order serial correlation is about 0.9. Because the various methods have substantial variation in the size of the cyclical component they produce, it is easier to compare their reliability in real-time by looking at comparably scaled measures of the revisions. Table 3 presents some such measures. In column 1 we present the correlation between the Final and RealTime series for each method. (Thiswouldbe1intheidealcasewherenorevisionstotheRealTimeestimateswereeverrequired.) As can beseen these correlations range froma low of 0.53 for the Hodrick-Prescott (cid:12)lter and 0.56 for the Harvey-Jaeger model to a high of 0.87 for the Breaking Trend and 0.81 for the Linear Trend. The remaining three indicators in Table 3 measure in di(cid:11)erent ways the relative importance of the revisions. (In the ideal case of no revisions, each of these indicators would equal 0.) The (cid:12)rst of these indicators, NS, reports the ratio of the standard deviation of the total revision to the standard deviation of the (cid:12)nal estimate of the gap; this gives us a proxy for the \noise-to-signal ratio" in the RealTime estimates. For example, looking at 18

the Hodrick Prescott method, we see that this ratio is 1.03 (i.e. the revision has a slightly larger variance than the (cid:12)nal estimate of the output gap itself). This is the worst ratio for the eight methods, although it is not far from the 0.93 and 0.92 for the Quadratic Trend and Harvey-Jaeger models, respectively. By this criterion, even the bestmodels have rather large ratios, between one-half and two-thirds. ThelasttwocolumnsprovidethefrequencieswithwhichtheRealTimeestimateis"bad." TheOPSIGN column shows the frequencywith which the RealTime and Final gaps were of opposite signs. For the Watson and Linear trend models, this frequencyexceeds 50 percent. Not all methods do as badly by this criterion with the Breaking Trend model misclassifying only 12 percent and the Beveridge Nelson only 21 percent. The XSIZE column shows the frequency with which the absolute value of the revision exceeds the absolute value of the Final series. The di(cid:11)erent detrending methods give more similar results in this respect. In (cid:12)ve of the eight models this frequency exceeds 50 percent and in two others it exceeds 40 percent. The Breaking Trend again stands out as the best with revisions larger than Final gaps only 30 percent of the time. We reiterate that the revision errors we measure here are underestimates of the total estimate error; we are measuring only the estimation errors which we subsequently correct. This also means that we must be particularly cautious in trying to compare the reliability of the di(cid:11)erent methods. With this caveat (cid:12)rmly in mind, we may note that some methods appear on the surface to be less desirable than others. For example, the Hodrick Prescott (cid:12)lter combines the lowest correlation (0.53) between the Final and RealTime estimates and the worst noise-signal ratio with a higher than average persistence of revisions (0.93). The Quadratic Trend does not fair much better, with the second-worst noise-signal ratio and the third-highest persistence (0.95.) In contrast, the Breaking Trend combines the highest correlation (0.87) with the second-lowest persistence (0.80) and by far the best frequency of correctly signing the output gap. 19

4.2 Decomposition of Revisions Figure3throughFigure6helpusunderstandtheimportanceofdi(cid:11)erentfactors inaccounting for the total revision in the estimated output gap as we move from RealTime to Final estimates. Table 4 presents detailed related summary statistics for the various methods. Figure 3 shows results for the Linear Trend method in the upper panel and the Watson model in the lower panel. (The reason for this grouping will become clear shortly.) In each graph, we see the RealTime estimate of the output gap together with the total subsequent revision (Final - RealTime) of that estimate. The fact that the revision is roughly equal to the RealTime estimate at the trough of the 1975 recession tells us that our (cid:12)nal estimate of the output gap is roughly zero. In other words, despite the extreme evidence of recession in the RealTime estimate, ex post we would judge that the economy was operating roughly at potential at that time. The size of these revisions (about 8 to 10 percentage points in this period) underline the lack of precision of these methods’ RealTime estimates. To understand the source of these revisions, both graphs also show the e(cid:11)ects of data revision. (This is constructed as the RealTime estimate minus the QuasiReal estimate.) This is simply the component of the overall revision which is due to subsequent changes in the published data (as opposed to the addition of new data points to the sample.) For example, since we see that the total revision and data revision are roughly equal in both graphs in late 1995, this means that nearly all of the revision in our estimated output gap for those quarters was due to subsequent revisions in the published data. Looking at the whole sample period, the data revision is typically less than (cid:6) 2 percent of output and its variability tends to be small compared to that of the total revision. This in turn means that most of the revision is due to the addition of new points to our data sample. However, data revisions still play a role as can be con(cid:12)rmed by looking at the summary statistics of the di(cid:11)erence between the QuasiReal and RealTime estimates of the output gap shown in Table 4. In the case of the Watson model, we can further identify the source of the revisions 20

by identifying the e(cid:11)ects of parameter revisions (calculated as QuasiReal - QuasiFinal). The lower panel of Figure 3 shows that these parameter revisions account for much of the revisions of our estimates of the output gap.18 Considering the evidence presented so far on the Linear Trend and Watson models, we are led to the conclusion that they are not well suited to the estimation of business cycles due to their assumption of a constant long-term trend in output growth. This assumption leads to parameter instability as samples are lengthened and the trend rate of growth is revised downwards. It gives us output gap estimates which seem to contain a downward trend (see Figure 2), output gaps which are furthest from zero and the largest standard deviation of revision. Figure 4 considers the two other deterministic trend models, the Quadratic Trend and the Breaking Trend. The two give visually similar RealTime estimates, the main di(cid:11)erence coming in 1977, when the Breaking Trend estimates undergo a discrete shift as the trend breakisintroducedin1973. ThetotalrevisionisagainoftenclosetothesizeoftheRealTime output gap (particularly in the mid-1990s.) We note that although the data revisions seem to play a secondary role in explaining the total revision of the RealTime estimates, a major exception appears during 1974 and 1975 when substantial data revisions eventually helped to moderate initial perceptions of a disastrous recession. Figure 5 again presents visually similar results fromtwo conceptually di(cid:11)erentmethods, this time from the Hodrick-Prescott (cid:12)lter and the Harvey-Jaeger unobserved component model.19 In both cases we (cid:12)nd revisions that are fully as large as the RealTime estimates and that cannot be attributed to the e(cid:11)ects of data revisions (particularly once we exclude the1974-75 revisions.) Resultsfor theHarvey-Jaeger modelfurtherindicate thatthee(cid:11)ects of parameter revision are similarly small, unlike the (cid:12)rst case we considered above. The 18The parameter instability was evident when performing the rolling estimation of the Watson model; parameterestimatestendedtofluctuatebetweentwodi(cid:11)erentsetsofparameters withquitedi(cid:11)erentimplications for theestimated business cycle. 19ThesimilarityintheFinalestimatesfromthesetwomethodswasnotedintheoriginalarticlebyHarvey and Jaeger (1993). 21

large revision of our estimates must therefore be due almost entirely to the addition of subsequent observations to our sample. A further striking feature of these two methods is that the revision seems to systematicallyleadtheRealTimeestimatebyaboutoneyear. ThisdoesnotimplythattheRealTime estimates use the available data ine(cid:14)ciently, since the revisions can obviously only be calculated with Final data. These results appear to contrast with those of St-Amant and van Norden (1997), who examined the spectral properties of HP (cid:12)lters at the end of sample (similar to our QuasiReal estimates.) They found that while there was a phase lag of about 2 quarters at most business cycle frequencies, the overall phase shift was e(cid:11)ectively zero due to the e(cid:11)ects of spectral leakage from lower frequencies.20 In Figure 6, we consider the results from the last pair of models, the Beveridge-Nelson and Clark models. The upper frame shows that results for the Beveridge-Nelson decomposition are atypical in almost every way. The estimated output gap is much smaller and much less persistent that produced by any other method, facts which were also evident in Table 1. However, we now also see that the RealTime estimates look very little like the output gaps we would associate with U.S. business cycles. For example, the recessions of 1982 and 1991 are di(cid:14)cult to distinguish from the background \noise" and appear to be very brief and mild (with the gap never exceeding 1.5 percent of output in absolute value.) By far the largest output gap in the sample, that of 1975, is largely accounted for by data revisions and becomes unremarkable in Final estimates. Indeed, the total revisions for this method are dominated by the e(cid:11)ects of data revision; the two series are highly correlated and their plots are often di(cid:14)cult to separate visually. The lower frame shows that the results for Clark are much more typical of those for the other unobserved components models. Revisions are almost as large as the RealTime gaps and are persistent. Both parameter revision and data revision e(cid:11)ects are relatively minor. Perhaps the most striking feature of the RealTime estimates are that after 1973 they are 20St-Amantand van Norden (1997), p. 32. 22

almost never strongly positive; that is, in real-time the economy appears to be virtually continuously at or below potential for twenty-(cid:12)ve years with this method. 4.3 Turning Points It is particularly interesting to know how the di(cid:11)erent business cycle measures do around businesscycle turningpoints,sincethesearepresumablyperiodswhereaccurate andtimely estimateoftheoutputgap(anditschanges)wouldbeofparticularinteresttopolicymakers. To help assess this, we calculated a number of descriptive statistics regarding the size or the revision in RealTime estimates in the three quarters centered about each of the NBER business cycle peaks from 1966 to 1997. Results are shown in Table 5. We see that all methods seem to underestimate the output gap in RealTime at cyclical peaks, although the degree to which this is true varies considerably from one method to another. TheLinearTrendandWatsonmethodshavebyfarthemostsevereunderestimates while the Beveridge-Nelson has the smallest. 4.4 Revisions and Con(cid:12)dence Intervals As noted previously, our revision errors overestimate the overall reliability of the output gap series since they neglect the estimation error which remains in the Final estimates. Alternatively, we can also use standard statistical methods to calculate the reliability of some of the output gap measures. These too will overestimate the reliability of the gap since they ignore the e(cid:11)ects of data revision and model misspeci(cid:12)cation. Of course, if these two are relatively small, statistical methods may be a useful guide to the reliability of RealTime output gap estimates. To investigate this question, we focused on the three UC models and calculated 95% con(cid:12)dence intervals about the RealTime estimates of the output gap.21 The results are shownin(cid:12)gures7through9, whichcomparethesecon(cid:12)denceintervalstothe(cid:12)nalestimates 21These werecalculated usingtheusualformulas for thestandard errors surroundingestimates produced by the Kalman (cid:12)lter. Note that in addition to the e(cid:11)ects of data revision and model misspeci(cid:12)cation mentioned above, these also do not take account of theuncertainty in the model’s estimated parameters. 23

of the outputgap. If the statistical con(cid:12)dence intervals are reliable, we should (cid:12)nd that our Final estimates fall outside the 95% RealTime con(cid:12)dence interval very infrequently. The (cid:12)gures show that the reliability of calculated con(cid:12)dence intervals varies. Final estimates from the Watson model are often outside the real-time con(cid:12)dence intervals. This happens only rarely (and then briefly) for the Harvey-Jaeger model, and not at all for the Clark model. This (cid:12)nding suggests that the assumption of a constant drift rate for trend output growth embedded in the Watson model is at odds with the data and implies that the calculated con(cid:12)dence intervals for this model omit an important source of error. 5 Conclusions Wehaveexaminedthereliabilityofunivariatedetrendingmethodsforestimatingtheoutput gap in real time. In doing so, we have focused on the internal consistency of output gap estimates over time as more information arrives and data are revised. This gives us results which are robust to alternative assumptions about the structure of the economy, but may tend to overestimate the reliability of the estimated output gaps from any given method. Our results suggest that the reliability of output gap estimates in real time tends to be quite low. Di(cid:11)erent methods give widely di(cid:11)erent estimates of the output gap in real time and often do not even agree on the sign of the gap. The standard error of the revisions is of the same order of magnitude as the standard error of the output gap for all the methods. The measurement error problem is compounded by a high degree of persistence of the revisions and further by a systematic bias around business cycle turning points. These (cid:12)ndings suggest that measurement error would pose a serious policy problem for any of these measures of the output gap. The relative size and persistence of the revision errors we report are also similar to those associated with \o(cid:14)cial" real-time output gap estimates, such as those reported in Orphanides (1999). Someimportantdi(cid:11)erencesbetweenthealternativemethodsalsoemerged. TheBeveridge- Nelson method does not give reasonably sized or persistent gaps. Methods which assume a 24

cycle around a constant growth trend (Linear Trend and Watson models) have particularly large revisions due to parameter instability in the estimated trend rate of growth. This con(cid:12)rms that models with time-varying trend rates of growth should be preferred. We also found that, although important, the revision of published data does not appear to be the primary source of revisions for any of the methods we examined. Rather, the subsequent evolution of the economy seems to be very informative for estimation of the current position in the business cycle. Thus, even if the reliability of the underlying realtime data were to improve, real-time estimates of the output gap would remain unreliable. 25

References Amato, Je(cid:11)rey D. (1997), Empirical Models for Monetary Policy Making, unpublisheddoctoral dissertation, Harvard University, Cambridge, MA. Baxter, Marianne and Robert G. King (1995), \Measuring Business Cycles: Approximate Band-Pass Filters for Economic Time Series," NBER Working Paper, No. 5022. Beveridge, S and C. R. Nelson (1981), \A New Approach to Decomposition of Economic Time Series into Permanent and Transitory Components with Particular Attention to Measurement of the ‘Business Cycle’," Journal of Monetary Economics, 7, 151-174. Blanchard, Olivier Jean and Stanley Fischer (1989), Lectures on Macroeconomics MIT Press, Cambridge, MA. Cheung, Yin-Wong and Menzie Chinn (1997), \Further Investigation of the Uncertain Unit Root in GNP" Journal of Business and Economic Statistics, 15(1), 68-73. Christiano, Lawrence J., and Terry J. Fitzgerald (1999), \The Band Pass Filter," NBER Working Paper, No, 7257, July. Clarida,Richard,JordiGali,andMarkGertler(1998), \MonetaryPolicyRulesandMacroeconomic Stability: Evidence and Some Theory," NBER Working Paper, No, 6442, March. Clark, Peter K. (1987), \The Cyclical Component of U.S. Economic Activity," Quarterly Journal of Economics v102 n4, pp. 797-814. Cogley, T. and J. Nason (1995) \E(cid:11)ects of the Hodrick-Prescott Filter on Trend and Difference Stationary Time Series: Implications for Business Cycle Research." Journal of Economic Dynamics and Control, 19(1-2), 253-78. Council of Economic Advisers (1977), Economic Report of the President, U.S. Printing O(cid:14)ce, Washington, D.C. Croushore, Dean and Tom Stark, (1999) \A Real-Time Data Set for Macroeconomists," Manuscript, May. Enders, Walter (1994), Applied Econometric Time Series, Wiley. Gerlach,StefanandFrankSmets(1997)\OutputGapsandInflation: Unobserable-Components Estimates for the G-7 Countries." Bank for International Settlements mimeo, Basel. Giorno, Claude, Pete Richardson, Deborah Roseveare, and Paul van den Noord (1995), \Potential Output, Output Gaps and Structural Budget Balances," OECD Economic 26

Studies No. 24, 1, 167-209. Guay, Alain and Pierre St-Amant (1996) \Do Mechanical Filters Provide a Good Approximation of Business Cycles?" Bank of Canada Technical Report No. 78, Ottawa. Harvey, A. C. and A. Jaeger (1993), \Detrending, Stylized Facts, and the Business Cycle," Journal of Applied Econometrics, 8, 231-247. Henderson, R. (1924). "A New Method of Graduation." Actuarial Society of America Transactions, 25, 29-40. Hodrick, R, and E. Prescott (1997), \Post-war Business Cycles: An Empirical Investigation," Journal of Money, Credit, and Banking, 29, 1-16. Kichian, Maral (1999), \Measuring Potential Output within a State-Space Framework" Bank of Canada Working Paper 99-9. King, Robert G. and Sergio Rebelo (1993), \Low Frequency Filtering and Real Business Cycles." Journal of Economic Dynamics and Control, 17(1-2), 207-31. Kuttner, Kenneth N. (1992), \Monetary Policy with Uncertain Estimates of Potential Output," Economic Perspectives, Federal Reserve Bank of Chicago, 16, 2-15. Kuttner, Kenneth N. (1994), \Estimating Potential Output as a Latent Variable," Journal of Business and Economic Statistics, 12(3), 361-68. Kozicki, Sharon (1998), \Multivariate Detrending Under Common Trend Restrictions: Implications for Business Cycle Research Journal of Economic Dynamics and Control, (forthcoming). Laxton, Doug and RobertTetlow (1992), A Simple Multivariate Filter for the Measurement of Potential Output. Technical Report No. 59, Ottawa: Bank of Canada Lippi, M. and L. Reichlin (1994), \Di(cid:11)usion of Technical Change and the Decomposition of Output into Trend and Cycle." Review of Economic Studies, 61(1), 19-30. Maravall, Agustin (1996) \Unobserved components in economic time series." Banco de Espana documento de trabajo 9609. McCallum, Bennett and Edward Nelson (1998), \Performance of Operational Policy Rules in An Estimated Semi-Classical Structural Model," NBER Working Paper, No. 6599, June. Nelson, Charles R. and Charles I. Plosser (1982), \Trends and Random Walks in macroeconomic time series: some evidence and implications." Journal of Monetary Economics, 10(2), 139-62. Orphanides, Athanasios (1997), \Monetary Policy Rules Based on Real-Time Data," Fi- 27

nance and Economics Discussion Series, 1998-03, Federal Reserve Board, December. Orphanides, Athanasios (1998), \Monetary Policy Evaluation With Noisy Information," Finance and Economics Discussion Series, 1998-50, Federal Reserve Board, October. Orphanides,Athanasios (1999), \TheQuest for ProsperityWithout Inflation," Manuscript, May. Perron, Pierre (1989), \The Great Crash, the Oil Price Shock, and the Unit Root Hypothesis." Econometrica 10(2), 130-62. Quah, Danny (1992), "The relative importance of permanent and transitory compoenents: Identi(cid:12)cation and some theoretical bounds." Econometrica 60(1), 107-18. Rotemberg,JulioandMichaelWoodford(1996), \Real-Business-CycleModelsandtheForecastableMovements inOutput,Hours,andConsumption,"American Economic Review, 86(1), 71-89, March. Rothman, Philip (1997), \More uncertainty about the unit root in U.S. real GNP." Journal of Macroeconomics 19(4), 771-780. Rudebusch, Glenn (1993), \The uncertain unit root in GNP." American Economic Review 83, 264-72. Smets, Frank (1998), \Output Gap Uncertainty: Does it Matter for the Taylor Rule?" BIS Working Paper No. 60, November. St-Amant, Pierre and Simon van Norden (1998), \Measurement of the Output Gap: A discussion of recent research at theBank of Canada," Bank of CanadaTechnical Report No. 79. Taylor, John B. (1993), \Discretion versus Policy Rules in Practice," Carnegie-Rochester Conference Series on Public Policy, 39, December, 195-214. Taylor, JohnB.(1998), \AnHistorical AnalysisofMonetaryPolicy Rules," NBERWorking Paper 6768, October. Watson, Mark W. (1986), \Univariate Detrending Methods with Stochastic Trends," Journal of Monetary Economics, 18, 49-75. Whittaker, E. T. (1923) \On a New Method of Graduation." Proceedings of the Edinburgh Mathematical Society, 41, 63-75. 28

Appendix: Alternative Measures of the Output Gap Let yv be the value of output published at time v for an observation at time t. Due to t publication lags, we require t (cid:20) v − 1: The full series published at any point in time v can be written as the vector Yv (cid:17) [yv;yv;:::;yv ]: We can also refer to the subvector 1 2 v−1 Yv (cid:17) [yv;yv;:::;yv ]: N 1 2 N Now suppose Z is an N (cid:2)M matrix consisting of real and non-real numbers. We restrict all its non-real entries (which represent unavailable observations) to lie below the main diagonal. We construct matrices of this form by placing series of di(cid:11)erent length in the columns, with each series starting in row 1. The remaining entries in each column (after the end of each series) are then (cid:12)lled with some non-real constant. We denote this as Z (cid:17) z(A;B;:::;M) (A.1) where the arguments A;B;:::;M are simply column vectors of (possibly) unequal length. We also de(cid:12)ne the last-value function l(Z): RN(cid:2)M !RM, which simply selects the last realobservation (i.e. theonewiththehighestrownumber)ineach columnofZ. Combining the z and l functions into one gives us ‘(A;B;:::;M) (cid:17) l(z(A;B;:::;M)) (A.2) Suppose that we also have an arbitrary detrending function f(X) : RN ! RN. The Final estimate of the output gap for this detrending function is just Y^ (cid:17) f(YM) (A.3) Final where M is the "(cid:12)nal" vintage of data available (in our case, 1999Q1.) The RealTime estimate of the output gap is Y^ (cid:17) ‘(f(Y1);f(Y2);:::;f(YM)) (A.4) RealTime 29

The QuasiReal estimate of the output gap is given by Y^ (cid:17) ‘(f(YM );f(YM );:::;f(YM)) (A.5) QuasiReal N−M+1 N−M+2 N The QuasiFinal estimate of the output gap only exists for detrending functions of the form f(X;(cid:18)) where (cid:18) is a set of parameters. Typically, these parameters describe the data-generating process for X and the maximum-likelihood estimate of the parameters may be denoted (cid:18)^(X). When the samples which we detrend are the same as those used to estimate the parameters, then we may de(cid:12)nea new detrendingfunction g(X) (cid:17) f(X;(cid:18)^(X)) which can be used to construct the conventional RealTime, QuasiReal and Final output gap estimates. In the case of the QuasiFinal estimate, however, we compute Y^ (cid:17) ‘(f(YM ;(cid:18)^(YM));f(YM ;(cid:18)^(YM);:::;f(YM;(cid:18)^(YM)) (A.6) QuasiFinal N−M+1 N−M+2 N 30

Table 1 Output Gap Summary Statistics: 1966:1 { 1997:4 Method MEAN SD MIN MAX COR Hodrick-Prescott Final 0:06 1:71 −4:58 3:70 1:00 Quasi-Real −0:15 1:75 −4:30 3:84 0:56 Real-Time −0:27 1:90 −6:63 3:84 0:53 Breaking Trend Final 0:33 2:51 −6:24 4:84 1:00 Quasi-Real 0:25 2:86 −6:90 6:94 0:91 Real-Time 0:21 3:15 −10:52 5:02 0:87 Quadratic Trend Final 0:55 2:54 −6:93 5:35 1:00 Quasi-Real −1:02 2:72 −7:57 6:16 0:72 Real-Time −0:96 3:03 −10:83 4:70 0:65 Linear Trend Final 1:47 4:96 −7:15 9:68 1:00 Quasi-Real −3:74 4:17 −11:32 6:94 0:88 Real-Time −3:45 3:98 −10:52 5:02 0:81 Beveridge-Nelson Final −0:06 0:53 −1:80 1:66 1:00 Quasi-Real −0:10 0:51 −1:81 1:54 0:99 Real-Time −0:20 0:75 −4:14 1:98 0:79 (Continued next page) 31

Table 1 (continued) Method MEAN SD MIN MAX COR Clark Final 0:24 2:11 −5:38 3:84 1:00 Quasi-Final −0:61 1:45 −4:15 3:11 0:87 Quasi-Real −0:69 1:63 −4:34 3:41 0:79 Real-Time −0:93 1:91 −6:99 3:02 0:77 Harvey-Jaeger Final 0:03 1:55 −3:89 3:91 1:00 Quasi-Final −0:07 1:22 −2:68 2:91 0:63 Quasi-Real −0:04 1:35 −3:12 3:21 0:58 Real-Time −0:10 1:48 −5:04 3:01 0:56 Watson Final 1:32 3:44 −4:37 7:19 1:00 Quasi-Final 0:16 3:35 −4:73 6:37 0:96 Quasi-Real −2:38 2:65 −7:75 4:41 0:81 Real-Time −2:08 2:61 −7:43 3:56 0:78 Notes: The alternative detrending methods are as described in the text. The statistics shown for each variable are: MEAN, the mean; SD, the standard deviation; and MIN and MAX, the minimum and maximum values. COR, denotes the correlation with the (cid:12)nal estimate of the gap for that method. 32

Table 2 Summary Revision Statistics Final vs Real-Time Estimates 1966:1 { 1997:4 Method MEAN SD MIN MAX AR Hodrick-Prescott 0:32 1:77 −3:41 3:42 0:93 Breaking Trend 0:12 1:54 −4:85 5:40 0:80 Quadratic Trend 1:49 2:36 −3:40 7:56 0:95 Linear Trend 4:97 2:83 −2:33 11:51 0:96 Beveridge-Nelson 0:14 0:46 −1:11 2:66 0:29 Clark 1:17 1:37 −1:90 4:35 0:92 Harvey-Jaeger 0:12 1:43 −2:93 3:67 0:85 Watson 3:40 2:16 −1:93 7:53 0:96 Notes: The detrending method and statistics are as described in the notes to Table 1. AR denotes the (cid:12)rst order serial correlation of the revision series shown. 33

Table 3 Summary Reliability Indicators 1966:1 { 1997:4 Method COR NS OPSIGN XSIZE Hodrick-Prescott 0:53 1:03 0:40 0:60 Breaking Trend 0:87 0:62 0:12 0:30 Quadratic Trend 0:65 0:93 0:34 0:52 Linear Trend 0:81 0:57 0:52 0:59 Beveridge-Nelson 0:79 0:87 0:21 0:43 Clark 0:77 0:65 0:31 0:49 Harvey-Jaeger 0:56 0:92 0:41 0:58 Watson 0:78 0:63 0:51 0:57 Notes: The table shows measures evaluating the size, sign and variability of the revision between the (cid:12)nal and the real-time estimates for alternative methods. COR, denotes the correlation of the real-time and (cid:12)nal estimates (from table 1). NS indicates the ratio of the standard deviation of the revision and the standard deviation of the (cid:12)nal estimate of the gap. OPSIGN indicates the frequency with which the real-time and (cid:12)nal gap estimates have opposite signs. XSIZE indicates the frequency with which the absolute value of the revision exceeds the absolute value of the (cid:12)nal gap. 34

Table 4 Detailed Breakdown of Revision Statistics 1966:1 { 1997:4 Method MEAN SD MIN MAX AR Hodrick-Prescott Final/Real-Time 0:32 1:77 −3:41 3:42 0:93 Final/Quasi-Real 0:21 1:62 −3:52 3:27 0:97 Quasi-Real/Real-Time 0:11 0:59 −0:97 2:71 0:60 Breaking Trend Final/Real-Time 0:12 1:54 −4:85 5:40 0:80 Final/Quasi-Real 0:08 1:18 −3:76 2:24 0:87 Quasi-Real/Real-Time 0:03 1:06 −2:98 3:84 0:77 Quadratic Trend Final/Real-Time 1:49 2:36 −3:40 7:56 0:95 Final/Quasi-Real 1:56 1:97 −1:80 5:14 0:98 Quasi-Real/Real-Time −0:09 1:04 −2:89 3:80 0:76 Linear Trend Final/Real-Time 4:97 2:83 −2:33 11:51 0:96 Final/Quasi-Real 5:20 2:35 0:00 8:25 0:96 Quasi-Real/Real-Time −0:27 1:20 −3:62 3:84 0:81 Beveridge-Nelson Final/Real-Time 0:14 0:46 −1:11 2:66 0:29 Final/Quasi-Real 0:04 0:06 −0:25 0:26 0:59 Quasi-Real/Real-Time 0:10 0:46 −1:19 2:62 0:31 (Continued next page) 35

Table 4 (continued) Method MEAN SD MIN MAX AR Clark Final/Real-Time 1:17 1:37 −1:90 4:35 0:92 Final/Quasi-Final 0:85 1:11 −1:23 3:24 0:93 Quasi-Final/Quasi-Real 0:07 0:47 −0:86 1:20 0:93 Quasi-Real/Real-Time 0:24 0:59 −0:75 2:65 0:84 Harvey-Jaeger Final/Real-Time 0:12 1:43 −2:93 3:67 0:85 Final/Quasi-Final 0:10 1:22 −2:47 3:30 0:87 Quasi-Final/Quasi-Real −0:03 0:24 −0:48 0:60 0:95 Quasi-Real/Real-Time 0:06 0:39 −0:60 1:91 0:82 Watson Final/Real-Time 3:40 2:16 −1:93 7:53 0:96 Final/Quasi-Final 1:16 0:92 −0:56 2:81 0:95 Quasi-Final/Quasi-Real 2:53 1:59 −0:45 4:59 0:98 Quasi-Real/Real-Time −0:29 0:95 −2:45 2:35 0:85 Notes: See notes to tables 1 and 2. 36

Table 5 Revision Statistics at NBER Peaks Final vs Real-Time Estimates Name MEAN SD MIN MAX Hodrick-Prescott 2:33 0:73 0:77 3:42 Breaking Trend 0:71 0:73 −0:79 1:81 Quadratic Trend 2:95 1:73 −0:51 5:17 Linear Trend 6:11 1:98 2:49 8:60 Beveridge-Nelson 0:31 0:49 −0:44 1:26 Clark 1:71 1:09 0:01 3:74 Harvey-Jaeger 1:68 0:94 −0:16 2:96 Watson 4:29 1:64 0:94 6:65 Notes: Therevisionisde(cid:12)nedasthedi(cid:11)erencebetweenthe(cid:12)nalandthereal-timeestimates. For each method, the sample used to compute the revision statistics is limited to the three quarters centered around each of the NBER peaks from 1966 to 1997. See also notes to Table 1. 37

Figure 1 Real-Time Estimates of the Business Cycle Percent 8 Linear Trend Quadratic Trend 6 Breaking Trend Hodrick-Prescott Beveridge-Nelson 4 Watson Clark Harvey-Jaeger 2 0 -2 -4 -6 -8 -10 -12 1966 1969 1972 1975 1978 1981 1984 1987 1990 1993 1996 Final Estimates of the Business Cycle Percent 12 Linear Trend Quadratic Trend 10 Breaking Trend Hodrick-Prescott 8 Beveridge-Nelson Watson Clark 6 Harvey-Jaeger 4 2 0 -2 -4 -6 -8 1966 1969 1972 1975 1978 1981 1984 1987 1990 1993 1996 38

Figure 2 Total Revision in Business Cycle Estimates Percent 12 10 Linear Trend Quadratic Trend Breaking Trend Hodrick-Prescott 8 Beveridge-Nelson Watson Clark Harvey-Jaeger 6 4 2 0 -2 -4 -6 1966 1969 1972 1975 1978 1981 1984 1987 1990 1993 1996 39

Figure 3 Estimated Business Cycle: Linear Trend Percent 6 RealTime Total Revision 4 Data Revision 2 0 -2 -4 -6 -8 -10 -12 1966 1969 1972 1975 1978 1981 1984 1987 1990 1993 1996 Estimated Business Cycle: Watson Percent 6 RealTime Total Revision Data Revision 4 Parameter Revision 2 0 -2 -4 -6 -8 -10 1966 1969 1972 1975 1978 1981 1984 1987 1990 1993 1996 40

Figure 4 Estimated Business Cycle: Breaking Linear Trend Percent 6 RealTime Total Revision 4 Data Revision 2 0 -2 -4 -6 -8 -10 -12 1966 1969 1972 1975 1978 1981 1984 1987 1990 1993 1996 Estimated Business Cycle: Quadratic Trend Percent 6 RealTime Total Revision 4 Data Revision 2 0 -2 -4 -6 -8 -10 -12 1966 1969 1972 1975 1978 1981 1984 1987 1990 1993 1996 41

Figure 5 Estimated Business Cycle: Hodrick-Prescott Percent 6 RealTime Total Revision Data Revision 4 2 0 -2 -4 -6 -8 1966 1969 1972 1975 1978 1981 1984 1987 1990 1993 1996 Estimated Business Cycle: Harvey-Jaeger Percent 6 RealTime Total Revision Data Revision 4 Parameter Revision 2 0 -2 -4 -6 -8 1966 1969 1972 1975 1978 1981 1984 1987 1990 1993 1996 42

Figure 6 Estimated Business Cycle: Beveridge-Nelson Percent 4 RealTime Total Revision Data Revision 2 0 -2 -4 -6 1966 1969 1972 1975 1978 1981 1984 1987 1990 1993 1996 Estimated Business Cycle: Clark Percent 6 RealTime Total Revision Data Revision 4 Parameter Revision 2 0 -2 -4 -6 -8 1966 1969 1972 1975 1978 1981 1984 1987 1990 1993 1996 43

Figure 7 Real-Time 95% Con(cid:12)dence Interval and Final Estimates Unobserved Component Models Percent 12 Watson 8 4 0 -4 -8 -12 -16 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 Percent 12 Clark 8 4 0 -4 -8 -12 -16 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 Percent 12 Harvey-Jaeger 8 4 0 -4 -8 -12 -16 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 44

Cite this document

APA

Athanasios Orphanides and Simon van Norden (1999). The Reliability of Output Gap Estimates in Real Time (FEDS 1999-38). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_1999-38

BibTeX

@techreport{wtfs_feds_1999_38,
  author = {Athanasios Orphanides and Simon van Norden},
  title = {The Reliability of Output Gap Estimates in Real Time},
  type = {Finance and Economics Discussion Series},
  number = {1999-38},
  institution = {Board of Governors of the Federal Reserve System},
  year = {1999},
  url = {https://whenthefedspeaks.com/doc/feds_1999-38},
  abstract = {Compared to its central role in policy discussions in the United States and most other developed countries, the reliability of the measurement of the output gap has attracted relatively little academic study. Furthermore, both the academic literature and the debate among practitioners have tended to neglect a key factor. Although in a policy setting it is necessary to estimate the current (i.e. end-of-sample) output gap without the benefit of knowing the future, most studies concentrate on measurement that employs data that only become available later. In this paper we examine the reliability of alternative output detrending methods, with special attention to the accuracy of real-time estimates. We show that ex post revisions of the output gap are of the same order of magnitude as the output gap itself, that these ex post revisions are highly persistent and that real-time estimates tend to be severely biased around business cycle turning points, when the cost of policy induced errors due to incorrect measurement is at its greatest. We investigate the reasons for these ex post revisions, and find that, although important, the ex post revision of published data is not the primary source of revisions in output gap measurements. The bulk of the problem is due to the pervasive unreliability of end-of-sample estimates of the trend in output.},
}