Missing Variation in the Great Moderation: Lack of Signal Error and OLS Regression
Abstract
This paper studies measurement errors that subtract signal from true variables of interest, labeled lack of signal errors (LoSE). The effect on OLS regression of LoSE is opposite the conventional wisdom about classical measurement errors, with LoSE in the dependent variable, not the explanatory variables, causing attenuation bias under some conditions. The paper provides evidence of LoSE in US GDP growth during the period known as the Great Moderation (roughly the mid-1980s to the mid-2000s), illustrating attenuation bias in regressions of GDP growth on asset prices. These biases may have contributed to conventional macroeconomic analysis missing the severity of the adverse shocks hitting the economy in the Great Recession.
Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Missing Variation in the Great Moderation: Lack of Signal Error and OLS Regression Jeremy J. Nalewaik 2014-27 NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.
Missing Variation in the Great Moderation: Lack of Signal Error and OLS Regression Jeremy J. Nalewaik ∗ April 7, 2014 Abstract This paperstudies measurementerrors that subtractsignal fromtruevariables of interest, labeled lack of signal errors (LoSE). The effect on OLS regression of LoSE is opposite the conventional wisdom about classical measurement errors, with LoSE in the dependent variable, not the explanatory variables, causing attenuation bias under some conditions. The paper provides evidence of LoSE in US GDP growth during the period known as the Great Moderation (roughly the mid-1980s to the mid-2000s), illustrating attenuation bias in regressions of GDP growthonassetprices. Thesebiasesmayhavecontributedtoconventionalmacroeconomic analysis missing the severity of the adverse shocks hitting the economy in the Great Recession. ∗ Board of Governors of the Federal Reserve System, 20th Street and Constitution Avenue NW, Washington, DC 20551. Telephone: 1-202-452-3792. Fax: 1-202-872-4927. E-mail: jeremy.j.nalewaik@frb.gov. Thanks to Katherine Abraham, Charles Fleischman, Michael Kiley, David Lebow, Claudia Sahm, Jonathan Millar and Rob Vigfusson for comments. The ideas and analysis in this paper grew out of an earlier one called: “Lack of Signal Error (LoSE) and Implications for OLS Regression: Measurement Error for Macro Data” (FEDS 2008-15). The views expressed in this paper are solely those of the author. 1
1 Introduction This paper examines a simple generalization of the classical measurement error model andstudies itsimplications forordinaryleastsquares (OLS)regression. Theusual model starts with the true variable of interest and adds noise, or classical measurement error (CME); see Klepper and Leamer (1984), Griliches (1986), Fuller (1987), Leamer (1987), Angrist and Krueger (1999), Bound, Brown and Mathiowetz (2001) or virtually any econometrics textbook. The generalization discussed here incorporates measurement error that subtracts signal from the true variable of interest, a type of measurement error that this paper calls Lack of Signal Error, or LoSE, for short. The implications of LoSE for OLS regression are opposite the usual intuition about measurement error, which is applicable to CME only. The CME intuition says that, in a regression, measurement error in the dependent variable Y poses no real problems for standard estimation and inference, with parameter estimates unbiased and consistent. It is CME in the explanatory variables X that causes the real problems, namely attenuation bias and inconsistency. However with LoSE these results are reversed. For the baseline case considered here, LoSE in the explanatory variables X does not lead to bias or inconsistency, similar to CME in Y. It is LoSE in the dependent variable Y that introduces an attenuation-type bias and inconsistency into the regression under some circumstances, namely, when the explanatory variables contain some signal missing from the dependent variable. This point is obvious when we consider the extreme case of maximum LoSE, an error-ridden estimate Y of the true variable Y⋆ that is just a constant equal to the unconditional mean of Y⋆. Then if Y⋆ = Xβ +U for some X with positive variance, a standardOLSregressionofY onX recovers β = cov(X,Y) = 0regardlessofthetrueβ. All var(X) b 2
of the variation in Y⋆ from X is missing from the estimate Y, so the parameter estimate is biased all the way to zero. In addition, the LoSE in Y shrinks the variance of the regression residuals U and thus the standard errors, which are zero in this extreme case, raising serious concerns about the robustness of hypothesis tests. Indeed, parameter estimates that have been attenuated and estimated with false precision due to LoSE easily could have led to the rejection of hypotheses that are actually true. The paper derives instrumenting strategies to eliminate bias from LoSE, strategies not derived in the previous literature. Is LoSE just a curiosity, interesting because it runs contrary to conventional wisdom about the effect of measurement error on regression estimates, but not relevant for the type of work economists actually do? It has long been known that the initial releases of macroeconomic quantities like US gross domestic product (GDP) are contaminated with LoSE; see Mankiw and Shapiro (1986), who show that revisions to GDP growth add news missing from its initial estimates, implying they lack signal. However, it has always beenanopenquestionastowhether allofthenews, orclosetoallofthenews, abouttrue output growth eventually becomes incorporated through revisions. This paper provides evidence that, over the period known as the Great Moderation (roughly the mid-1980s to the mid-2000s), the answer is no: GDP growth still appears to be contaminated with substantial LoSE even after it has passed through all of its revisions. Regressions of GDP growth and its subcomponents on asset prices are widespread in macroeconomics and finance, and if asset prices capture some of the signal missing from these quantities, the estimated coefficients are biased. The paper examines this hypothesis over the Great Moderation period by regressing different measures of output growth on a fixed set of stock or bond prices. As expected, the regression coefficients 3
increase when we switch the dependent variable from the initial GDP growth estimates based on limited source data to the revised GDP growth estimates reflecting news from more-comprehensive source data, consistent with LoSE in the initial GDP growth estimates. Tellingly, the coefficients increase again when we switch the dependent variable from latest, revised GDP growth to an alternative and likely superior measure of US output growth, GDI growth.1 This increase in the coefficients is consistent with the hypothesis that LoSE remains in GDP growth even after it has passed through all of its revisions. Finally, the paper implements the instrumenting strategies derived here for producing unbiased and consistent parameter estimates when the dependent variable of a regression is contaminated with LoSE. The instrumental variables estimates, which do not employ GDI growth at all, provide independent corroborating evidence of substantial LoSE in latest, revised GDP growth over the Great Moderation period. The attenuation biases from LoSE discussed here can lead applied macroeconomic analysis astray in ways not fully appreciated by the prior literature. For example, the paperillustrateshowthesebiasesmayhavecontributedtoconventional analysisunderestimating the size of the shocks hitting the economy at the height of the Great Recession, leading to some prominent and widely-discussed forecast errors. A better understanding of the implications of LoSE in GDP growth might help avoid such forecasting mistakes in the future. Section 2 discusses the relation of the work here to the previous literature. After providing a brief introductory motivation for the generalized measurement error model in section 3, section 4 shows the implications of LoSE for OLS regression and derives 1On the superiority of GDI, see Nalewaik (2010) and Aruoba, Diebold, Nalewaik, Schorfheide, and Song (2012, 2013). 4
valid instruments for dealing with LoSE-induced bias. Section 5 discusses the data and choice of instruments. Section 6 shows regression-based tests and instrumental variables estimates providing evidence for LoSE in GDP growth. Section 7 shows how the attenuation biases from LoSE in GDP growth may have contributed to conventional macroeconomic analysis missing the severity of the shocks hitting the economy at the height of the Great Recession. Section 8 concludes the paper. 2 Relation to Previous Literature Much of the econometrics literature on non-classical measurement error has focused on binary or categorical response data, for which the classical measurement error assumptions cannot hold; see Card (1996), Bollinger (1996), and Kane, Rouse and Staiger (1999). In a more general linear regression context, Berkson (1950) was an early paper tackling some of the issues addressed here; see the discussion in Durbin (1954), Griliches (1986, section 4), and Fuller (1987, section 1.6.4). Berkson had in mind a regression using “controlled” measurements as the explanatory variable X, readings from a scientific experiment where the unobserved true values of interest X⋆ fluctuate around the observed controlled measurements in a random way. Berkson showed that if the unobserved fluctuations X⋆ X are uncorrelated with the measurements X, then regression − parameter estimates are unbiased. The literature following Berkson has generally focused on extending his results to regressions employing non-linear functions of X; see Geary (1953), Federov (1974), Carroll and Stefanski (1990), Huwang and Huang (2000), and Wang (2003, 2004). This literature has focused less on the implications of “controlled” measurements of the dependent variable Y. 5
Use of measurement equations has a long history in economics, with Friedman (1957) being a famous and notable early example, and several papers discuss different LoSErelated estimation issues. These include Sargent (1989), who uses measurement equations in state space models to discuss optimally filter estimates, Bound, Brown and Mathiowetz (2001), Koenig, Dolmas and Piger (2003), and Kimball, Sahm and Shapiro (2008), Kishor andKoenig(2011), JacobsandvanNorden(2011)andClements andGalvao (2013). Following the pioneering work of Koenig, Dolmas and Piger (2003), many of these papers, such asClements andGalvao(2013), focus ontheimplications forforecasting of LoSE in initial macroeconomic estimates (so later revisions yield “news”). In this context, using the LoSE-biased parameter estimates often can yield the most accurate forecasts (especially of the initial estimates), although section 7 below points out some previously-unknown pitfalls of using LoSE-biased parameter estimates in a structural model. Perhaps the closest paper to this one is Hyslop and Imbens (HI, 2001), which shows some of the major implications of LoSE, while simultaneously considering some other measurement error biases. The results in this paper are distinct from those in HI in at least four ways. First, in defining the LoSE in a variable as the difference between its truevalueandaconditional expectationofthattruevalue, thispaperconsiders arbitrary conditioning informationsets Z, while HI consider more specialized information sets in a univariate regression context.2 Second, when variables are mismeasured with LoSE, this 2A estimate may be informed by many different variables, as government statistical agencies draw on vast information sets in producing their estimates. However, the examples in HI are stylized ones meant to make a point, and they do acknowledge the importance of the conditioning information set: “Acrucialingredient... istheinformationset. Itmaybethattherespondenthadonlyasingleunbiased measurementoftheunderlyingtruevariable. Alternatively,othervariables,whichthemselvesmayenter the econometric model of interest, may be used to produce this estimate.” 6
paper derives precise conditions under which instrumental variables produce consistent estimates. The previous literature has not derived valid instrumenting strategies. Third, thepreviousliteraturedoesnotdiscusstheproblemsthatLoSE-shrunkenstandarderrors pose for hypothesis testing. And fourth, this paper shows evidence of LoSE in US GDP growth even after it has passed through all of its revisions, both by comparing GDP growth with an alternative measure of output, GDI growth, and by implementing the instrumenting strategies derived here. These are important contribution of the paper. A large body of empirical work has now accumulated on mismeasurement of microeconomic survey data, which generally rejects the CME assumptions and points to negative correlation between the measurement errors and the true variables of interest; see Bound and Krueger (1991), Bound, Brown, Duncan and Rodgers (1994), Pischke (1995), Bollinger (1998), Bound, Brown and Mathiowetz (2001) and the references therein, and Escobal and Laszlo (2008). Such negative correlation is an implication of LoSE, although other measurement error models may generate such a result as well, such as those considered in the appendix of this paper. Some of the problems of imputation in microeconomic surveys are very much related to LoSE as well; see Hirsch and Schumacher (2004) and Bollinger and Hirsch (2006). 7
3 A Generalization of the Classical Measurement Error Model Let Y⋆ bethe true variableof interest andY bea mismeasured estimate ofthat variable. t t The generalized model of mismeasurement considered here is: (1) Y = Y⋆ ζ +ε . t t − t t The term ε is “noise” or the classical measurement error (CME) in the estimate, with t ε and Y⋆ independent. The CME may arise from estimation errors or other sources. t t Since many estimates Y are based on surveys, survey sampling errors are often thought t to be a source of CME. The other measurement error is defined as: (2) ζ = Y⋆ E(Y⋆ Z ). t t − t | t Z is a (1 l) vector of possibly stochastic variables used to construct Y , with ε and Z t t t t × independent. In many cases a government statistical agency or some other organization computes Y based on information from surveys, administrative records, and other data t sources (source data for short); then Z is functions of the source data. We place no t restrictions on Z ; it may be arbitrarily large, unlike the stylized examples of LoSE t studied in HI.3 The ζ represents the information about Y⋆ not contained in Z —i.e. mismeat t t 3 However, Zt need not be an exhaustive information set - i.e. it need not contain all available ⋆ relevantpieces of informationabout unobservedY . Resource and other constraints certainly preclude t this from being the case, and the sections below considering the implications of LoSE allow for this possibility. 8
surement from lack of signal about Y⋆ in the information used to construct Y . As t t such, ζ may be labelled the Lack of Signal Error, or LoSE for short. The LoSE t is uncorrelated with all functions of Z , so cov(E(Y⋆ Z ),ζ ) = 0 and cov(Y⋆,ζ ) = t t | t t t t cov(E(Y⋆ Z )+ζ ,ζ ) = var(ζ ), so: t | t t t t var(Y ) = var(Y⋆)+var(ζ ) 2cov(Y⋆,ζ )+var(ε ) t t t − t t t (3) = var(Y⋆) var(ζ )+var(ε ). t − t t Depending on whether the variance of the LoSE is greater than or less than the variance ofthe CME, thevariance oftheestimate Y maybegreater thanor less thanthevariance t of true Y⋆. With CME alone, the variance of the estimate Y must exceed the variance of t t thetruevariable, butitiseasytothinkofcounterexamples, such aswhenY⋆ haspositive t variance but the estimate Y is just a constant. Note that while the generalized model t here is less restrictive than the CME model, some restrictions do remain. In particular, zero covariance between ζ and Y is a restriction violated by systematic biases in the t t estimates. Appendix A considers some measurement error models of this form. 4 Implications for OLS Estimation Consider ordinary least squares estimation of the relation between a mismeasured variable Y and a (1 k) set of explanatory variables X , using a sample of length T. When t t × stacking together the T observations, time subscripts are dropped for convenience. The results below are for the case in which Y follows the generalized model of section 3, and t X is measured without error, as is the case in the empirical work below. The most t interesting empirical results show through to this specialialized case of no measurement 9
error in X ; the more general case, in which both X and Y follow the generalized model t t t of section 3, is analyzed in an appendix. Our full set of assumptions follows: Assumption 1 Y⋆ = X⋆β +U⋆. U⋆ is i.i.d., mean zero, with var(U⋆) = σ2 and U⋆ t t t t t U⋆ s independent of X⋆, t,s. Measured Y = E(Y⋆ Z y )+ε , with: t ∀ t t | t t The CME ε is i.i.d., mean zero, and independent of X⋆ and Z y, with var(ε ) = σ2. • t t t t ε The LoSE ζ = (X⋆ E(X⋆ Z y ))β +(U⋆ E(U⋆ Z y )) = ζ xy β +ζu. ζu is i.i.d., • t t − t| t t − t| t t t t independent of X⋆, and mean zero with var(ζu) = σ2 , while ζ xy is i.i.d. and t t ζ,u t mean zero with var(ζ xy ) = σ2 , a k k matrix. t ζ,xy × Measured X = X⋆, with: t t 1 (X⋆) ′ X⋆ p Q • T −→ xx 1 (E(X⋆ Zy)) ′ E(X⋆ Zy) p Qzy = Q σ2 • T | | −→ xx xx − ζ,xy All relevant fourth moments exist. Weimposethei.i.d. assumptionsbecausetheyareapproximatelymetintheapplications below, andbecause it allowsdiscussion ofbias aswell asconsistency.4 However, forother time series applications, the i.i.d. assumption will be overly restrictive, and relaxing it could be a topic for future research. 4ThetimeseriesoflatestGDPgrowthestimatesovertheGreatModerationsamplestudiedherehas an AR1 coefficient of only 0.2, and the errors from the GDP growth regressions in section 6 are even less persistent, with Breusch-Godfrey tests not rejecting independence of the errors across various lag lengths. 10
Given assumption 1, Y can be written as: t (4) Y = E(X Z y )β +E(U⋆ Z y )+ε t t | t t| t t = X β +(E(X⋆ Z y ) X )β +E(U⋆ Z y )+ε t t| t − t t | t t = X β ζxyβ +U⋆ ζu +ε . t − t − t t The OLS regression estimator is: ′ −1 ′ β = (X X) X Y (5) b = β +(X ′ X) −1 X ′ ( ζxyβ +U⋆ ζu +ε). − − It is well known that the CME in Y introduces no bias and inconsistency, since ε is independent of X. The LoSE in U⋆ introduces no bias or inconsistency either, since it is uncorrelated with X. However, X = E(X Zy) + ζxy is clearly not independent of | ζxyβ, and: − (6) E β = β E (X ′ X) −1 X ′ ζxy β − (cid:16) (cid:17) (cid:16) (cid:17) (7) b β p β (Q ) −1σ2 β −→ − xx ζ,xy b The inconsistency of β tends towards zero, since some variation in X that appears in Y⋆ is missing from mismebasured Y, essentially driving down the covariance between X and Y and the parameter estimates as well since the variance of X is not biased down. If X is univariate, the inconsistency of β is unambiguously towards zero, similar to standard attenuation bias from CME in thebexplanatory variable of a regression. The inconsistency of β can be corrected by instrumenting with a (1 m) set of × b 11
instruments W , with m k, if the instruments meet the following set of assumptions: t ≥ Assumption 2 With P = W (W ′ W) −1W ′, 1X ′ P X p Qw , a positive semi- W T W −→ xx definite matrix, and 1X ′ P ( ζxyβ +U⋆ ζu +ε) p 0. All relevant fourth moments T W − − −→ exist. The instruments must be uncorrelated with ζxy, for example if W Z y , so that W is t t t − ∈ independent of the information about X missing from Y . With valid instruments, we t t have: ′ −1 ′ β = X P X X P Y W W (cid:0) (cid:1) (8) b = β+ X ′ P X −1 X ′ P ( ζxyβ+U⋆ ζu+ε), W W − − (cid:0) (cid:1) p and β β. −→ Thbe asymptotic distribution of the IV estimate β is: b √T β β d N 0,(Qw ) −1 σ2 σ2 +σ2 +β ′ σ2 β , − −→ xx U⋆ − ζ,u ε ζ,xy (cid:16) (cid:17) (cid:0) (cid:0) (cid:1)(cid:1) b d where denotes convergence in distribution as T , and N (a,b) is a Gaussian −→ −→ ∞ distribution with mean a and variance b. The usual estimator of the variance of the error term is: 1 ′ s2 = E(X Zy)β +E(U⋆ Zy)+ε Xβ E(X Zy)β +E(U⋆ Zy)+ε Xβ T | | − | | − (cid:16) (cid:17) (cid:16) (cid:17) = 1 E(U⋆ Zy) ′ E(U⋆ Zy)+ 1 ε ′ ε+ 1 β b′ E(X Zy) ′ E(X Zy)β 1 β ′ E(X Zy) b′ Xβ T | | T T | | − T | 1 β ′ X ′ E(X Zy)β + 1 β ′ X ′ Xβ + 1 cross terms. b −T | T T b b b The first two terms converge in probability to σ2 σ2 + σ2, and the cross terms U⋆ − ζ,u ε 12
converge in probability to zero. The terms involving β and β simplify in the limit since β p β, producing a consistent estimate of the asymptoticberror variance: −→ b s2 p σ2 σ2 +σ2 +β ′ σ2 β. −→ U⋆ − ζ,u ε ζ,xy The ζu component of the LoSE in Y decreases the variance of the regression residuals and standard errors (whether or not the estimator is consistent), and the LoSE in Y can be particularly pernicious when it both attenuates parameter estimates and shrinks the regression standard errors. In these circumstances, the econometrician runs a high risk of rejecting a candidate hypothesis β = β0, as long as β0 is non-zero, even when the hypothesis is actually true. Regressions with such LoSE in Y will tend to show an estimated relation between Y and X that is smaller in absolute value than the true relation, with the true size of the relation appearing implausible because of excessively small standard errors. 5 Data: US Macroeconomic Quantities The decision to test for LoSE in US GDP growth over the Great Moderation period is motivated by several considerations. GDP is estimated in a bottom-up, component-bycomponent fashion using government survey data to estimate spending for each category (consumption, investment, etc.) and then aggregating.5 But, goverment survey data at the quarterly frequency is unavailable for many categories comprising a large share of 5The BEA does not use the information in stock or bond prices to make any direct or indirect adjustments to this bottom-up estimation procedure, to the author’s knowledge. 13
GDP,including most services categoriesofpersonal consumption expenditures.6 Growth rates for these categories are typically interpolated or extrapolated using related indicators, or estimated as a “trend extrapolation.” It is difficult to imagine how this lack of hard information would not introduce some LoSE into GDP growth, and that LoSE may have become more consequential over time as the share of services in US output has increased. Several comparisons with an alternative measure of output growth, GDI growth, are also consistent with LoSE in GDP growth over the Great Moderation.7 The output growth estimates are plotted in Figure 1 over this period, from the mid-1980s to the mid-2000s.8 GDI growth has higher variance than GDP growth over this sample, which, under the generalized measurement error model in section 3, may stem from some combination of: (1) a relatively large amount of CME in GDI growth, boosting its variance, and (2) a relatively large amount of LoSE in GDP growth, damping its variance. The upcoming evidence in section 6 favors placing more weight on the second explanation. Earlier research on revisions—see Fixler and Nalewaik (2007)—supports this notionaswell. Briefly, Table1 shows that thevarianceofGDIgrowthbecomes relatively large only after the data pass through its sequence of annual revisions (GDI is unavailable when the “advance” estimates are released for each quarter about a month after the quarter ends, but is always available when the “3rd” estimates are released about three 6 This situation has begun to change with the introduction of the Quarterly Services Survey (QSS) in 2002, but these data have no material affect on the Great Moderation period. 7GDI is also estimated in a bottom up fashion, estimating income from various categories (wages and salaries, profits, etc.) and then aggregating. Most of the income-side data is ultimately based on tax and adminstrative records, rather than samples as is the case for GDP. 8The “latest” time series employed in this paper are as they appeared close to the end of the Great Moderation, in August 2007. The results in section 6 are similar using later data vintages. 14
months after the quarter ends, and the variances of the “3rd” GDP and GDI growth estimates are almost equal).9 Subsequent annual and benchmark revisions incorporate more comprehensive and higher-quality source data, plausibly reducing measurement error in the estimates, either LoSE or CME. If the bulk of the measurement error eliminated by the revisions is LoSE, so the revisions mainly add news to the estimates as in Mankiw and Shapiro (1986), the variance of the estimates should increase as in table 1. Moreover, the revisions increase the variance of GDI growth more than the variance of GDP growth, consistent with the revisions adding more news to GDI growth than GDP growth. The implication is that GDP growth is missing some news or signal, and is thus contaminated with LoSE. The statistics in Table 1 suggest that, of these output growth estimates, “advance” GDP growth is contaminated with the most LoSE, as one would expect since it is based ontheleastamountofinformation. Somewhat counterintuitively, itisjustsuchvariables that are likely to meet the conditions of Assumption 2 and provide valid instruments W, motivating the instrumenting strategy below. The baseline regressions are of an output growth estimate on current and lagged stock and bond prices, which may reflect some information missing from the output growth estimates.10 Since stock and bond prices 9Each quarterly observation in the “advance” or “3rd” time series is the estimate for that quarter released about one or three months after that quarter ends. 10Dynan and Elmendorf (2001) show that asset prices predict revisions to GDP growth, evidence that asset prices contain information missed by the initial estimates of GDP growth. Asset prices may contain information missed by the fully-revised estimates of GDP growth as well, entering through publicly-availableinformationaboutthe state ofthe economynotfully incorporatedinto GDP growth, such as the source data used to compute GDI, or as the aggregation of all the private information of asset market participants. 15
are measured with little error, we have: ∆Y⋆ = Xβ +U⋆ ∆Yi = Xβi +Ui, b where i indexes output growth estimates. In this case, the instruments must be uncorrelated with E X Zyi X, which is the information missing from output growth | − (cid:16) (cid:17) estimate i that is captured by the asset prices X. Paradoxically, an instrument based on a smaller information set, while remaining correlated with X, is more likely to be uncorrelated with this missing information and thus meet the conditions of Assumption 2. In particular, contemporaneous and lagged “advance” GDP growth rates are presumably in the information sets used to compute the various output growth estimates examined here, all released after the “advance” estimate. The identifying assumption employed here is that the “advance” GDP growth estimate for each quarter, and lagged “advance” estimates, are uncorrelated with whatever information remains missing from later, revised estimates of GDP growth for that quarter. Subcomponents of “advance” GDP growth are likely in the information sets used to compute those later, revised GDP growth estimates as well. Equipment and software (E&S) investment is an appealing subcomponent to use as an instrument because it produces a high first-stage R-square, its growth rate being highly correlated with stock price changes and bond spreads as predicted by Q-theory—see Tobin (1969) and Philippon (2009). For this reason, current and lagged “advance” growth rates of real E&S investment are the main set of instruments W employed in the paper. 16
6 Regression Evidence for LoSE in GDP growth in the Great Moderation Under the model of section 4, the OLS βi estimated in this section are governed by equation (7) with X measured without error, and we have: (9) βGDI βGDP p (Q ) −1 σ2 σ2 β, − −→ xx ζGDP,xy − ζGDI,xy (cid:0) (cid:1) d d where σ2 is the variance of bias-inducing LoSE ζxy in estimate ∆Yi. When X is uni- ζi,xy variate, a test of βGDI = βGDP equivalent to a test of σ2 = σ2 if Q and β ζGDP,xy ζGDI,xy xx are treated as consdtants. βdGDI > βGDP is then consistent with positive bias-inducing | | | | LoSE variance in ∆YGDP.1d1 Furtherdmore, for the instrumental variables estimates reported below, standard Durbin-Wu-Hausman tests (see Hausman (1978)) are available to test whether the OLS estimates βGDP are biased towards zero as in (7). Table 2 shows estimation resultsdusing as the explanatory variable X an average of current and lagged stock price growth; standard errors are in parentheses.12 Such a specification can be motivated in several ways, but for our purposes, it suffices that a relation between true output growth ∆Y⋆ and stock prices X exists governed by a true parameter vector β.13 Comparing the first two specifications of Table 2, we see that β 11Anearlierworkingpaperreportedtestsofequalitybetweentheβsacrossregressions,whichMontecarlo simulations (of an environment where one dependent variable is more contaminated with LoSE than another) showed have the expected properties of a textbook t-statistic. 12Thestandarderrorsarecorrectedforheteroskedasticityandautocorrelation,althoughthereislittle evidence of either. Standard errors computed under the assumption of i.i.d. errors are very similar to those reported here. 13An earlier workingpaper examined multivariate specifications of Table 2 that estimated the coefficientoneachlagseparately,whichyieldedresultsverysimilartothosereportedhere. Themainresults 17
increases when switching the dependent variable from “advance” GDP growth to latest GDP growth, consistent with LoSE in “advance” GDP growth.14 Switching from latest GDP growth to latest GDI growth, β increases again, consistent with LoSE in not only the “advance” GDP growth estimates, but also latest, revised GDP growth. Of course, other explanations for this result are possible, but appear less likely. First, alternative measurement error modelsthatdonot meet therestrictions ofsection3could hold. Appendix A examines such a model in which GDP and GDI growth are crudely rescaled versions of true output growth, and finds that it is inconsistent with results from reverse regressions. Second, and more obviously, stock prices could be reacting to estimates of corporate profits, which are a component of GDI, more than to output. However, if this were true, β should be particularly large using the initial estimates of GDI (and profits) to which the stock market reacts in real time. The fourth column of the table shows this is not the case. Moreover, the fifth column shows that β actually increases when corporate profits are stripped out of GDI. The last specification of table 2, the instrumental variables estimate, does not use GDI growth at all, and is consistent with even more LoSE-inducing bias in latest GDP growth than is evident based on the comparison with GDI growth. In particular, this estimate implies attenuation of the OLS β computed using latest GDP growth of about here are also robust to the inclusion of control variables such as lags of the output growth measures. The stock price changes are quarterly growth rates of the Wilshire 5000 stock price index, while the output growth measures are annualized quarterly growth rates as in table 1, so the effect on the level of output in percentage points of a permanent 1 percent stock price increase is roughly the reported coefficientdividedby4. Thestockpriceindexisnominal,andtheresultschangelittle ifthestockprice index is deflated. 14Substituting either the 2ndor3rdGDP growthestimates forthe “advance”estimatesyields avery similar β of 0.13, consistent with these early revisions not adding any of the missing signal that is reflected in stock price movements. The greater signal in the latest estimates is added in subsequent annual revisions. 18
60 percent.15 The Durbin-Wu-Hausman test rejects the hypothesis of no bias in that OLS β with a p-value of 0.02. Table 3 shows similar results using bond spreads—the difference in yield between 10yearand2-yearUStreasurynotes(TERM),andthedifferenceinyieldbetweencorporate bonds and 10-year treasury notes (DEF).16 Many papers have used similar variables to forecast output growth; see Chen (1991) and Estrella and Hardouvelis (1991), for example. The results (where eachpair ofβsisfroma separateregression) provide almost uniform evidence favoring LoSE-induced attenuation of the OLS coefficients computed using either “advance” or latest, revised GDP growth. All of the β coefficients DEF increase in absolute value when switching the dependent variable from “advance” to latest GDP growth and again when switching from latest GDP growth to latest GDI growth. Similarly, all of theβ coefficients increase when switching fromlatest GDP TERM growth to latest GDI growth except for k 2, horizons where the explanatory power of ≤ TERM is weakest. The instrumental variables estimates in Table 3 are generally consistent with even more LoSE-inducing bias in latest GDP growth than is evident from the comparison with GDI growth. The Durbin-Wu-Hausman tests reject the hypothesis of no bias in the OLS βs computed using latest GDP growth, with p-values ranging from 0.07 (for k = 1) to 0.002 (for k = 4 and k = 5). The instruments are highly correlated with DEF at all horizons k, with high first-stage R2s. The very large instrumental variables 15This result is robust to the choice of instruments likely to meet the conditions of Assumption 2. In particular, estimates using only lagged “advance” E&S growth rates, excluding the contemporaneous growthratefromW,yieldaβ of0.47. Substituting“advance”GDPgrowthfor“advance”E&S growth in W cuts down on the first-stage R2 considerably, but yields the same β of 0.47. 16The corporate bond yield measure is the Merrill Lynch High Yield Master II Index. This series extends back only as far as 1986; hence the shorter sample for these regressions. 19
estimates of β should be discounted for k 2 as the instruments are weak, but for TERM ≤ the longer horizons where the instruments have higher first-stage R2s, the IV β s TERM are consistent with LoSE-induced attenuation of the OLSβ s computed using latest TERM GDP growth of between 50 and 70 percent. This degree of attenuation bias, similar to that found using stock prices, could be related to some puzzles regarding the continued forecasting power of the yield curve, as noted in Rudebusch and Williams (2009). In particular, the LoSE in GDP growth may have masked the long-horizon forecasting power of the yield curve to forecasters focused on predicting GDP growth rather than recessions, which are dated based on a broad array of indicators, including income data, that may be less contaminated with LoSE than is GDP growth. 7 Application: Underestimating the Depth of the Great Recession The key macroeconomic forecasting question at the end of 2008 and early 2009 was, given the extraordinary turmoil in financial markets, how sharply would the real economy turn down? Financial markets had already tanked by that time, so the issue was how to translate the information in financial markets into a forecast of real economic activity. Blue Chip Consensus Forecast for the unemployment rate issued in January 2009 (the solid blue line in Figure 2) underpredicted the actual rise in the unemployment rate (the black line) by a wide margin, as did even the average of the top ten Blue Chip forecasts (the dashed blue line). Publicly-available government forecasts, such as the red line, did not do much better (see Romer and Bernstein, 2009). Macroeconomic analysts typically use an Okun’s law-type relation to translate GDPforecasts into unemployment 20
rate forecasts, so LoSE in GDP growth may have contributed to these forecast errors. The LoSE in the “advance” 2008Q4 GDP growth estimate is obvious: it badly underestimated the severity of the downturn, revising down from -3.8 percent (annualized) to -6.3 percent two months later and even more subsequently, and this poor initial estimate may have contributed to the overly-optimistic unemployment rate forecasts. More subtly, LoSE in the latest available GDP growth estimates over the preceding Great Moderation period may have been problematic for forecasting the depth of the Great Recession as well.17 Consider the OLS regressions from table 3 using latest GDP growth.18 Figure 2 plots three additional forecasts of the unemployment rate, the green solid, dashed and dotted lines, using the first difference of the unemployment rate, real GDI growth, and real GDP growth as they appeared in December 2008 as dependent variables in the regression specification in Table 4.19 The forecasts for GDI growth and GDP growth are translated into unemployment rate forecasts using an Okun’s law relation.20 17Note that while this was an important episode in the history of macroeconomic forecasting, the results in this section are meant to be illustrative only. For more comprehensive out-of-sample forecast analyses, see Koenig, Dolmas and Piger (2003) and Clements and Galvao (2013). 18TheseregressionswerepostedtotheFederalReserveBoardwebsiteinMarch2008,priortomassive intensification of financial market turmoil discussed in this section. 19 Specifically, the forecasts for 2008Q4, 2009Q1, 2009Q2, 2009Q3, and 2009Q4 are predicted values from five regressions as in table 4 (for k = 0,1,2,3,and 4), with the average values for the corporate bond spread and the slope the yield curve in December 2008 used to produce predicted values. The averagelevelofthehigh-yieldcorporatebondspreadwasalmost20percentagepointsinDecember2008, compared to an average level of around 4 percentage points during expansions. During the previous two recessions, this spread had peaked at around 10 percentage points. 20This is estimated by regressing the quarterly change in the unemployment rate on the contemporaneous quarterly output growth measure and two of its lags, using a 1959Q4 to 2008Q3 sample. Note that, if the primary source of measurement error in the output growth measures is LoSE, these regressions yield consistent parameter estimates since the LoSE-ridden variables are explanatory, and the downward biases from the first stage regressions using bond spreads are passed through to the unemployment rate forecasts. In contrast, in the crude rescaling model outlined in Appendix A, the 21
The unemployment rate forecasts produced directly from bond spreads (the solid green line) track the rise in unemployment almost perfectly over the first three quarters of the projection, before overshooting in the second half of 2009. The Okun’s law translation of the GDP growth projection (the dotted green line) undershoots these direct forecasts of the unemployment rate by about a half a percentage point in 2008Q4 and one and a half percentage points in 2009Q4. Interestingly, the Okun’s law translation of the GDI growth projection (the dashed green line) also undershoots the direct forecasts of the unemployment rate, and the unemployment rate itself for much of the forecast period. But, since the bond spread coefficients are larger in absolute value, the undershooting is considerably less than using GDP. In particular, in the first half of 2009, more than half of the forecast error from the Okun’s law translation of GDP growth disappears when we switch from GDP growth to GDI growth, likely because GDI is less contaminated with LoSE. This suggests that, after financial markets tanked in late 2008, LoSE in GDP growth over the Great Moderation period contributed to the failure of conventional macroeconomic models and analysis to forecast the severity of the Great Recession. Had that analysis employed the information in GDI growth, instead of focusing solely on GDP growth, it might not have misread the signals from financial markets so badly. 8 Conclusions The canonical classical measurement error (CME) model is too restrictive to handle important types of measurement error, including measurement error in one of the most biasinthesecondstageregressionswouldlargelyoffsetthebiasinthefirststageregressionsusingbond spreads, which does not appear to be the case empirically. 22
widely-followed macroeconomic time series, US GDP growth. The paper studies a simple generalization of the CME model that is mathematically tractable, embeds the CME model as a special case, and adds useful flexibility. Instead of just allowing measurement error that adds noise to the true variable of interest, the generalization permits measurement errors that subtract signal from that variable, called Lack of Signal Errors, or LoSE, for short. In some ways, this generalization of the CME model is the flip side of the coin regarding the effect of errors in variables on ordinary least squares regression. CME in the dependent variable of a regression Y does not bias parameter estimates and increases standarderrors, and, in the baseline case studied here, LoSE in the explanatory variables X has the same effect. Of course, CME in the explanatory variables X does bias regression parameter estimates, towards zero in the univariate case; LoSE in the dependent variable Y introduces a similar attenuation bias under some circumstances, namely, when some of the signal missing from the dependent variable Y is captured by the explanatory variables X. LoSE in Y also shrinks the variance of the regression residuals, raising concerns about the robustness of hypothesis tests by increasing the probability of type I errors. In the limiting case of maximal LoSE, Y approaches a constant, and in a regression of Y on any non-constant variable X, β = cov(X,Y) and var(X) var β approachzero, regardlessofthetrueβ. Theresultisbadlyattenbuatedparameter (cid:16) (cid:17) estimbates, estimated with false precision. On a positive note, the results derived here provide some clear prescriptions for handling this type of attenuation, in terms of choice of instruments. The previous literature had not developed instrumenting strategies for dealing with bias from LoSE. The paper provides evidence for LoSE not only in the initial GDP growth estimates 23
based on limited source data, but also the latest, revised GDP growth estimates based on more comprehensive data. In particular, coefficients from regressions of the GDP growth estimates on a fixed set of stock or bond prices are smaller than coefficients from regressions that substitute for GDP an alternative measure of US output, GDI, that is likely more accurate than GDP over the Great Moderation period—see Nalewaik (2010) and Aruoba, Diebold, Nalewaik, Schorfheide, and Song (2012, 2013). These results are consistent with LoSE in GDP growth even after it has passed through all of its revisions. The paper shows that some other forms of non-classical measurement error cannot explain the differences in coefficients across these regressions. Furthermore, implementation of the instrumenting strategies derived in this paper, which rely in no way on the information in GDI growth, corroborate and provide independent amplifying evidence of substantial LoSE in latest, revised GDP growth over the Great Moderation period. Some implications of significant LoSE in latest, revised GDP growth and its major subcomponents follow immediately. Those variables are simply less informative than many macroeconomists currently believe, given the common but incorrect presumption that the fully-revised estimates are measured with little error. And in a macroeconomic forecasting context, the attenuation biases discussed here can lead to serious mistakes. In particular, in late 2008 and early 2009, conventional macroeconomic analysis severly underestimated the size ofthe shocks thathadhit theeconomy and thatwere alreadyreflected in the behavior of asset prices. The paper demonstrates that part of that forecast error may have been due to the focus of conventional macroeconomic analysis on GDP growth: LoSE in GDP growth likely biased down the coefficients employed to translate asset prices into forecasts of output and unemployment. A better understanding of the 24
implications of LoSE in GDP growth may help avoid such forecast errors in the future. References [1] Angrist, J., and Krueger, A. (1999), “Empirical Strategies in Labor Economics,” in Handbook of Econometrics (Vol. 5), eds. O. Ashenfelter and D. Card, Amsterdam: Elsevier. [2] Aruoba, S.B., F.X. Diebold, J. Nalewaik, F Schorfheide, and D. Song (2012), “Improving GDP Measurement: A Forecast Combination Perspective,” In X. Chen and N. Swanson (eds.), Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis: Essays in Honour of Halbert L. White Jr., Springer, 1- 26. [3] Aruoba, S.B., F.X. Diebold, J. Nalewaik, F Schorfheide, and D. Song (2013), “Improving GDP Measurement: A Measurement Error Perspective,” submitted by invitation, Journal of Econometrics. [4] Berkson, J. (1950), “Are There Two Regressions?” Journal of the American Statistical Association, 45, 164-180. [5] Bernstein, J., and Romer, C. (2009), “The Job Impact of the American Recovery and Reinvestment Plan,” Council of Economic Advisers report, January 10, 2009. [6] Bollinger, C. (1996), “Bounding Mean Regression When a Binary Regressor is Mismeasured,” Journal of Econometrics, 73, 387-399. 25
[7] Bollinger, C. (1998), “Measurement Error in the Current Population Survey: A Nonparametric Look,” Journal of Labor Economics, 16, 576-594. [8] Bollinger, C., and Hirsch, T. (2006), “Match Bias from Earnings Imputation in the Current Population Survey: The Case of Imperfect Matching” Journal of Labor Economics, 24, 483-519. [9] Bound, J., Brown, C., and Mathiowetz, N. (2001), “Measurement Error in Survey Data,” in Handbook of Econometrics (Vol. 5), eds. J.J. Heckman and E. Leamer, Amsterdam: Elsevier. [10] Bound, J., Brown, C., Duncan, G., and Rogers, W. (1994), “Evidence on the Validity of Cross-sectional and Longitudinal Labor Market Data” Journal of Labor Economics, 12, 345-368. [11] Bound, J., and Krueger, A. (1991), “The Extent of Measurement Error in LongitudinalEarningsData: DoTwoWrongsMakeaRight?,” Journal of Labor Economics, 9, 1-24. [12] Campbell, J. and Mankiw, G. (1989), “Consumption, Income, and Interest Rates: Reinterpreting the Time Series Evidence,” in NBER Macroeconomics Annual, eds. O. Blanchard and S. Fischer, Cambridge, NBER. [13] Card, D., (1996), “The Effect of Unions on the Structure of Wages: A Longitudinal Analysis.” Econometrica, 64, 957-979. [14] Carroll, R., and Stefanski, L. (1990), “Approximate Quasi-likelihood Estimation in ModelswithSurrogatePredictors,” Journal of the American Statistical Association, 85, 652-663. 26
[15] Chen, N. (1991), “Financial Investment Opportunities and the Macroeconomy,” Journal of Finance, 46, 529-554. [16] Clements, M. P., and Galvao, A. B. (2013), “Real-Time Forecasting of Inflation and Output Growth with Autoregressive Models in the Presence of Data Revisions,” Journal of Applied Econometrics, 28, 458-477. [17] de Leeuw, Frank, and McKelvey, Michael J. (1983), “A ’True’ Time Series and Its Indicators” Journal of the American Statistical Association, 78, 37-46. [18] Durbin, J. (1954), “Errors in Variables,” Review of the International Statistical Institute, 22, 23-32. [19] Dynan, K. and Elmendorf, D. (2001), “Do Provisional Estimates of Output Miss Economic Turning Points?” Board of Governors of the Federal Reserve System, FEDS working paper 2001-52. [20] Escobal, J., and Laszlo, S. (2008), “Measurement Error in Access to Markets,” Oxford Bulletin of Economics and Statistics, 70, 209-243. [21] Estrella, A., and Hardouvelis, G. (1991), “The Term Structure as a Predictor of Real Economic Activity.” Journal of Finance, 46, 555-576. [22] Fixler, D. and Nalewaik, J. (2007) “News, Noise, and Estimates of the True Unobserved State of the Economy,” Board of Governors of the Federal Reserve System, FEDS working paper 2007-34. [23] Federov, V. V. (1974), “Regression Problems with Controllable Variables Subject to Error,” Biometrika, 61, 49-56. 27
[24] Fuller, W. (1987) Measurement Error Models, New York: John Wiley and Sons. [25] Geary, R. C. (1953), “Non-Linear Functional Relationship Between Two Variables When One Variable is Controlled,” Journal of the American Statistical Association, 48, 94-103. [26] Griliches, Z. (1986), “Economic Data Issues,” in Handbook of Econometrics (Vol. 3), eds. Z. Griliches and M.D. Intriligator, Amsterdam: Elsevier. [27] Hausman, J., (1978), “Specification tests in econometrics,” Econometrica, 46, 1251- 1272. [28] Hirsch, T., and Schumacher, E. (2004), “Match Bias in Wage Gap Estimates Due to Earnings Imputation” Journal of Labor Economics, 22, 689-722. [29] Huwang, L, and Huang, Y. H. (2000), “On Errors-In-Variables in Polynomial Regression-Berkson Case,” Statistica Sinica, 10, 923-936. [30] Hyslop, R., and Imbens, Guido R. (2001). “Bias from Classical and Other Forms of Measurement Error,” Journal of Business and Economic Statistics, 19, 475-481. [31] Jacobs, J. and van Norden, S. (2011), “Modelling Data Revisions: Measurement Error and the Dynamics of “True” Values,” Journal of Econometrics, 161, 101-109. [32] Kane, T., Rouse, C., and Staiger, D. (1999), “Estimating Returns to Schooling when Schooling is Misreported,” working paper 7235, NBER, Cambridge, MA. [33] Koenig, E., Dolmas, S., and Piger, J. (2003). “The Use and Abuse of Real-Time Data in Economic Forecasting,” The Review of Economics and Statistics, 85, 618- 628. 28
[34] Kimball, Miles; Sahm, Claudia; and Shapiro, Matthew (2008), “Imputing Risk Tolerance from Survey Responses” Journal of the American Statistical Association, 103, 1028-1038. [35] Kishor, N. K. and Koenig, E. (2011). “VAR Estimation and Forecasting when Data are Subject to Revision,” Journal of Business and Economic Statistics, 29. [36] Klepper, S., and Leamer, E., (1984), “Consistent Sets of Estimates for Regressions with Errors in All Variables,” Econometrica, 52, 163-183. [37] Leamer, E., (1987), “Errors in Variables in Linear Systems,” Econometrica, 55, 893-909. [38] Mankiw, N. and Shapiro, M., (1986), “News or Noise: An Analysis of GNP Revisions” Survey of Current Business, 66, 20-25. [39] Nalewaik, J., (2010), “The Income- andExpenditure-Side Estimates of U.S. Output Growth,” Brookings Papers on Economic Activity, 1, 71-106. [40] Newey, W.K. and West, K.D., (1987), “A Simple, Positive Semi-Definite Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.” Econometrica, 55, 703-708. [41] Philippon, T., (2009), “The Bond Market’s q,” Quarterly Journal of Economics, 124, 1011-1056. [42] Piscke, J-S. (1995). “Measurement Error and Earnings Dynamics: Some Estimates from the PSID Validation Study,” Journal of Business and Economic Statistics, 13, 305-314. 29
[43] Rudebusch, G., andWilliams, J. (2009).“Forecasting Recessions: The Puzzle of the Enduring Power of the Yield Curve,” Journal of Business and Economic Statistics, 27, 492-503. [44] Sargent, T. (1989). “Two Models of Measurement and the Investment Accelerator,” Journal of Political Economy, 97, 251-287. [45] Tobin, J., (1969), “A General Equilibrium Approach to Monetary Theory” Journal of Money, Credit, and Banking, 1, 15-29. [46] Wang, L. (2003), “Estimation of Nonlinear Berkson-Type Measurement Error Models,” Statistica Sinica, 13, 1201-1210. [47] Wang, L. (2004), “Estimation of Nonlinear Models with Berkson Measurement Errors,” Annals of Statistics, 32, 2559-2579. 30
Table 1: Summary Statistics on Vintages of GDP and GDI Growth Quarterly Data, 1984Q3-2004 Vintage var ∆YGDP var ∆YGDI t t (cid:0) (cid:1) (cid:0) (cid:1) Current Quarterly, “Advance” 3.1 . Current Quarterly, “3rd” 4.1 4.0 Latest Vintage Available 4.2 4.8 Note: Each quarterly observation in the “advance” or “3rd” time series is the estimate for that quarter released about one or three months after that quarter ends. Table 2: Regressions of Different Measures of Quarterly Output Growth on Current and Lagged Stock Price Growth, 1984Q3 to 2004Q4: ∆Y t i = α+β(∆p t +∆p t−1 +...+∆p t−6 )/7+U t i Measure: ∆YGDP ∆YGDP ∆YGDI ∆YGDI ∆YGDI−CP ∆YGDP Vintage: “Advance” Latest Latest “3rd” Latest Latest Estimation: OLS OLS OLS OLS OLS IV β: 0.142 0.214 0.325 0.152 0.389 0.522 (0.060) (0.068) (0.073) (0.075) (0.078) (0.213) Note: The instruments are the time t “advance” growth rate of real equipment and software investment, scaled by its share of nominal GDP to approximate contributions to real GDP growth, and 6 of its lags; the first stage R2 is 0.22. 31
Table 3: Regressions of Different Measures of Quarterly Output Growth on Lagged Interest Rates Spreads (TERM and DEF), 1988Q3 to 2004Q4: ∆Yi = α+β r 10yr r 2yr +β r corp r 10yr +Ui t TERM t−k − t−k DEF t−k − t−k t (cid:16) (cid:17) (cid:16) (cid:17) Measure: ∆YGDP, “Advance” ∆YGDP, Latest ∆YGDI, Latest ∆YGDP, Latest Estimation: OLS OLS OLS IV, E&S β β β β β β β β TERM DEF TERM DEF TERM DEF TERM DEF k=1 0.20 -0.50 0.31 -0.61 0.23 -0.79 2.83 -1.11 (0.26) (0.13) (0.26) (0.13) (0.29) (0.10) (2.68) (0.36) k=2 0.42 -0.44 0.48 -0.53 0.43 -0.69 2.75 -0.79 (0.26) (0.12) (0.31) (0.12) (0.33) (0.13) (1.18) (0.27) k=3 0.58 -0.38 0.60 -0.40 0.68 -0.65 1.68 -0.64 (0.30) (0.12) (0.36) (0.15) (0.37) (0.15) (0.59) (0.20) k=4 0.62 -0.23 0.57 -0.28 0.70 -0.50 1.87 -0.49 (0.32) (0.15) (0.39) (0.17) (0.40) (0.17) (0.55) (0.21) k=5 0.59 -0.19 0.67 -0.29 0.75 -0.41 1.97 -0.41 (0.35) (0.14) (0.38) (0.14) (0.44) (0.19) (0.63) (0.23) k=6 0.72 -0.27 0.76 -0.32 0.92 -0.39 1.75 -0.31 (0.35) (0.10) (0.38) (0.13) (0.41) (0.16) (0.63) (0.24) k=7 0.73 -0.19 0.81 -0.20 0.96 -0.39 1.84 -0.18 (0.35) (0.10) (0.36) (0.13) (0.38) (0.15) (0.73) (0.22) k=8 0.66 -0.10 0.72 -0.15 0.94 -0.27 1.72 -0.15 (0.34) (0.13) (0.36) (0.14) (0.37) (0.15) (0.71) (0.21) Note: Theinstruments arethetimet“advance” growthrateofrealequipment andsoftwareinvestment, scaledby its share of nominal GDP to approximate contributions to real GDP growth, and k of its lags. The first stage R2s for 32
DEF rangefrom0.43to0.53,dependingonk. ThefirststageR2sforTERMrangefrom0.01(k=1)to0.26(k=8). Appendix A: Alternative forms of mismeasurement Start with the conditioning information set Z , and assume E(Y⋆ Z ) = Z γ. In t t | t t an alternative form of mismeasurement, the estimate Y misuses Z , so Y = Z γ + ε t t t t t with γ = γ. The estimate “misses” in a systematic way, inconsistent with the efficeiency 6 assumeptions of section 2. For estimation and inference about Y⋆ (for example in regressions), these systematic “misses” clearly lead to biased and inconsistent estimates. Unless additional information is available about the nature of Z γ Z γ, the direction t t − and magnitude of these biases is unclear, but in highly stylized exaemples the biases may be derived. One such example is Y = α + α Y⋆ + ε , with α = 0 and α = 1, and t 0 1 t t 0 6 1 6 ε noise. This model is employed by de Leeuw and McKelvey (1983), Bound, Brown, t DuncanandRodgers(1994), Pischke (1995), andBound, Brown andMathiowetz (2001). In the case of latest GDP and GDI growth, ignoring constants, consider: YGDP = αGDPY⋆ +εGDP and: YGDI = αGDIY⋆ +εGDI, with εGDP and εGDI noise. In this model, the regressions in table 2 pin down the relative αs, since: βGDI αGDI p (10) . βdGDP −→ αGDP d 33
Interestingly, reverse regressions X = Yβ +U yield: r r b σ2 βGDI αGDP var(Y⋆)+ εGDP (11) r p αGDP . βdGDP −→ αGDI var(Y⋆)+ σ ε 2 GDI r αGDI d While an increase in αGDP, ceteris paribus, decreases the ratio (10) from the forward regression, if var(Y⋆) > σ2 , it increases the ratio (11) from the reverse regression. So, εGDP if the variance of true GDP growth exceeds the variance of the noise in measured GDP growth (and GDI growth), which seems plausible, these ratios (10) and (11) move in opposite directions with respect to αGDP (and αGDI) under this crude rescaling model. Table 2 implies αGDI/αGDP = 1.5, so we should observe βGDI/βGDP = 0.66 in the r r reverse regression in table 2A with no noise in either estimate. The ratio is very far from that, 1.32. Adding some noise variance to both estimates takes the implied ratio from (11) closer to 1, but it exceeds 1 only if the noise variance in GDP growth exceeds the noise variance in GDI growth by an implausibly large amount. Specifically, a ratio of 1.32 could be consistent with (11) if half the variance of GDP growth were noise uncorrelated with GDI growth, but the estimated covariance between GDP and GDI growth is larger than half the variance of GDP growth, so this is highly unlikely: a test of the hypothesis that this covariance (about 3.1 over the 1984Q3 to 2004Q4 sample) is half the variance of GDP growth (about 4.2) rejects with a p-value of 0.01.21 Similarly, univariate specifications similar to table 3 but using only DEF imply αGDI/αGDP ranging from 1.3 to 2.0, as can be seen comparing the third and fourth 21These calculations assume no noise in GDI growth, but examination of (11) shows that allowing for noise in GDI growth only increases the already implausibly-large fraction of the variance of GDP growth that must be noise under the crude rescaling model. Allowing a realistic amount of noise in GDI growth, then, only reduces the plausibility of the crude rescaling model. 34
columns of table 3A. However, comparing the sixth and seventh columns, we see the coefficients using GDI growth as the explanatory variable are once again larger in absolute value than the coefficients using GDP growth, inconsistent with the crude rescaling model and plausible assumptions about the noise variances. By contrast, the generalized LoSE model outlined in section 3 yields the following for the reverse regressions: var(YGDP) β (12) βGDI βGDP p σ2 σ2 r . r − r −→ (cid:18) εGDP − εGDI var(YGDI)(cid:19) var(YGDP) d d An increase in βGDI relative to βGDP implies an increase in the variance of LoSE in YGDP relative todYGDI, which redduces var(YGDP) and increases βGDI βGDP, all else var(YGDI) r − r equal. This model is much more consistent with the results from thde reversde regressions. Table 2A: Reverse Regressions of Current and Lagged Stock Price Growth on Different Measures of Quarterly Output Growth, 1984Q3 to 2004Q4: (∆p t +∆p t−1 +...+∆p t−6 )/7 = α+β r ∆Y t i +U r i ,t Measure: ∆YGDP ∆YGDP ∆YGDI ∆YGDI ∆YGDI−CP Vintage: “Advance” Latest Latest “3rd” Latest β: 0.411 0.454 0.600 0.338 0.497 (0.194) (0.182) (0.169) (0.161) (0.140) 35
Table 3A: Regressions of Lagged Interest Rates Spreads (DEF) on Different Measures of Quarterly Output Growth, 1988Q3 to 2004Q4: Forward: ∆Yi = α+β r corp r 10yr +Ui t t−k − t−k t (cid:16) (cid:17) Reverse: r corp r 10yr =α+β ∆Yi +Ui t−k − t−k r t t (cid:16) (cid:17) Measure: Forward βs Reverse β s r Measure: ∆YGDP ∆YGDP ∆YGDI ∆YGDP ∆YGDP ∆YGDI Vintage: “Advance” Latest Latest “Advance” Latest Latest k=1 -0.47 -0.58 -0.76 -0.48 -0.46 -0.52 (0.13) (0.13) (0.11) (0.12) (0.11) (0.10) k=2 -0.40 -0.48 -0.64 -0.39 -0.38 -0.43 (0.13) (0.13) (0.14) (0.12) (0.10) (0.09) k=3 -0.31 -0.33 -0.57 -0.30 -0.26 -0.38 (0.14) (0.15) (0.17) (0.12) (0.11) (0.09) k=4 -0.15 -0.21 -0.41 -0.15 -0.17 -0.28 (0.16) (0.16) (0.17) (0.15) (0.12) (0.10) 36
Figure 1: annualized 1984 to 2004 Growth Rates of Real GDP and Real GDI, percent change Latest Available data as of August 2008 10 GDP GDI 8 6 4 2 0 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 -2 -4
Figure 2: Unemployment Rate Forecasts at the End of 2008 or Early 2009, Compared to Actual 12 12 Actual Predicted From Bond Spread Forecasts of GDI and GDI-Based Okun's Law 11 Predicted From Bond Spread Forecasts of GDP and GDP-Based Okun's Law 11 Predicted Directly From Bond Spreads Forecast from 'The Job Impact of the American Recovery and Reinvestment Plan', no fiscal stimulus 10 10 Blue Chip Consensus Forecast, January 10th, 2009 Blue Chip Top 10 Average Forecast, January 10th, 2009 9 9 8 8 7 7 6 6 5 5 4 4 2008Q1 2008Q2 2008Q3 2008Q4 2009Q1 2009Q2 2009Q3 2009Q4
Appendix: Regression with Measurement Error in both X and Y , and examples highlighting the implications of LoSE This appendix discusses regression when both X and Y follow the generalized measurement error model of section 3, before examining special cases highlighting the most importantimplications ofLoSEinX andY forparameterestimates andstandarderrors. Our full set of assumptions follows: Assumption 1 Y⋆ = X⋆β +U⋆. U⋆ is i.i.d., mean zero, with var(U⋆) = σ2 and U⋆ t t t t t U⋆ s independent of X⋆, t,s. Measured Y = E(Y⋆ Z y )+ε , with: t ∀ t t | t t The CME ε is i.i.d., mean zero, and independent of all conditioning information t • sets, with var(ε ) = σ2. t ε Zy may be partitioned into two sets of variables, Zy and Zy, with variables in Zy • x u x independent of U⋆ and Zy, and variables in Zy independent of X⋆ and Zy. u u x The LoSE ζ = X⋆ E X⋆ Z y β + U⋆ E U⋆ Z y = ζ xy β + ζu. ζu is • t t − t| x,t t − t| u,t t t t (cid:0) (cid:0) (cid:1)(cid:1) (cid:0) (cid:0) (cid:1)(cid:1) i.i.d. and mean zero with var(ζu) = σ2 , and ζ xy is i.i.d. and mean zero with t ζ,u t var(ζ xy ) = σ2 , a k k matrix. t ζ,xy × Measured X = E(X⋆ Zx)+εx, with: t t| t t The CMEεx is i.i.d., meanzero, independentof ε and all conditioninginformation • t t sets, with var(εx) = σ2 , a k k matrix. t ε,x × The variables in Zx are independent of U⋆ and Zy. • u 1
The LoSE ζx = X⋆ E(X⋆ Zx) is i.i.d. and mean zero with var(ζ ) = σ2 , a • t t − t| t t ζ,x k k matrix. × As T : • −→ ∞ – 1 (X⋆) ′ X⋆ p Q T −→ xx – 1 (E(X⋆ Zy)) ′ E(X⋆ Zy) p Qzy = Q σ2 T | x | x −→ xx xx − ζ,xy – 1 (E(X⋆ Zx)) ′ E(X⋆ Zx) p Qzx = Q σ2 T | | −→ xx xx − ζ,x – 1 (E(X⋆ Zy)) ′ E(X⋆ Zx) p Qzb T | x | −→ xx – 1X ′ X p = Qzx +σ2 . T −→ xx ε,x All relevant fourth moments exist. The assumptions imposed on the information sets Zy and Zx regarding partitioning and independence allow us to factor the joint distribution of the relevant variables as follows: f (U⋆,X⋆,Zy,Zx) = f (U⋆,Zy)f (X⋆,Zy,Zx). UZ u XZ x Without these assumptions, the conditioning may introduce correlation between the measurement error in X and the regression residual (which includes the measurement error in Y). For example, assume the information sets Zy and Zy are univariate and x u Zx = Zy + Zy; then E(X⋆ Zx) and ζx are correlated with U⋆ (as long as Zy captures u x t| t t u some variation in U⋆), and the above factorization is not valid. Another example is in section 3.1 of HI, and while these biases are likely worthy of further empirical study, they are of a different nature from those introduced by LoSE, and studying them in detail is beyond the scope of this paper.1 1Most important, these biases are not applicable to regressionsusing X variables measuredwithout 2
Given assumption 1, Y can be written as: t (1) Y = E X⋆ Z y β +E U⋆ Z y +ε t t| x,t t| u,t t (cid:0) (cid:1) (cid:0) (cid:1) = X β + E X⋆ Z y X β +E U⋆ Z y +ε t t| x,t − t t| u,t t (cid:0) (cid:0) (cid:1) (cid:1) (cid:0) (cid:1) = X β + E X⋆ Z y E(X⋆ Zx) εx β +U⋆ ζu +ε . t t| x,t − t| t − t t − t t (cid:0) (cid:0) (cid:1) (cid:1) The OLS regression estimator is: ′ −1 ′ β = (X X) X Y (2) b = β +(X ′ X) −1 X ′ ((E(X⋆ Zy) E(X⋆ Zx) εx)β +U⋆ ζu +ε). | x − | − − Taking expectations and probability limits of (2) yields: (3) E β = β +E (X ′ X) −1 X ′ (E(X⋆ Zy) E(X⋆ Zx) εx) β, and: | x − | − (cid:16) (cid:17) (cid:16) (cid:17) (4) b β p β + Qzx +σ2 −1 Qzb Qzx σ2 β. −→ xx ε,x xx − xx − ε,x (cid:0) (cid:1) (cid:0) (cid:1) b The usual attenuation bias and inconsistency from σ2 is evident. The additional in- ε,x consistency from LoSE depend on the difference between Qzb and Qzx. xx xx The inconsistency of β can be corrected by instrumenting with a (1 m) set of × instruments W , with m bk, if the instruments meet the following set of assumptions: t ≥ Assumption 2 With P = W (W ′ W) −1W ′, 1X ′ P X p Qw , a positive semi- W T W −→ xx definitematrix, and 1X ′ P ((E(X⋆ Zy) E(X⋆ Zx) εx)β +U⋆ ζu +ε) p 0. All T W | x − | − − −→ relevant fourth moments exist. error—see also section 3.2 of HI. All of the regressions in the empirical sections 6 and 7 of this paper use X variables measured without error. 3
To correct the biases inOLS, validinstruments must be uncorrelated with εx, a standard condition. However, an additional condition must be met: the instruments must be uncorrelated with E(X⋆ Zy) E(X⋆ Zx). This condition is met by instruments W | x − | that are common to both information sets (if such information exists), so W Zx and ⊂ W Zy, since W ′ E(X⋆ Zy) and W ′ E(X⋆ Zx) then have the same probability limit. ⊂ x | x | With valid instruments, we have: ′ −1 ′ β = X P X X P Y W W (cid:0) (cid:1) (5) b = β + X ′ P X −1 X ′ P ((E(X⋆ Zy) E(X⋆ Zx) εx)β +U⋆ ζu+ε), W W x | − | − − (cid:0) (cid:1) p and β β. The asymptotic distribution of the estimator is: −→ b √T β β d N 0,(Qw xx ) −1 σ U 2 ⋆ σ ζ 2 ,u +σ ε 2+β ′ Qz x y x 2Qz x b x +Qz x x x +σ ε 2 ,x β . − −→ − − (cid:16) (cid:17) (cid:16) (cid:16) (cid:16) (cid:17) (cid:17)(cid:17) b d where denotes convergence in distribution as T , and N (a,b) is a Gaussian −→ −→ ∞ distribution with mean a andvariance b. The usual estimator of the variance of the error ′ term, s2 = 1 Y Xβ Y Xβ , converges to the error variance in this asymptotic T − − (cid:16) (cid:17) (cid:16) (cid:17) distribution: b b 1 ′ s2 = E(X⋆ Zy)β +E(U⋆ Zy)+ε (E(X⋆ Zx)+εx)β T | x | u − | (cid:16) (cid:17) E(X⋆ Zy)β+E(U⋆ Zy)+ε (E(X⋆ Zx)+εx)β b x u ∗ | | − | (cid:16) (cid:17) = 1 E(U⋆ Zy) ′ E(U⋆ Zy)+ 1 ε ′ ε+ 1 β ′ E(X⋆ Zy) ′ E(X b ⋆ Zy)β T | u | u T T | x | x 1 β ′ E(X⋆ Zy) ′ E(X⋆ Zx)β 1 β ′ E(X⋆ Zx) ′ E(X⋆ Zy)β −T | x | − T | | x + 1 β ′ E(X⋆ Zx) ′ E(X⋆ Zx)β b + 1 β b′ εx′ εxβ+ 1 cross terms. T | | T T b b b b The first two terms converge in probability to σ2 σ2 + σ2; the terms involving β U⋆ − ζ,u ε 4
p and β simplify in the limit since β β; and the cross terms converge in probability −→ to zebro. Then: s2 p σ2 σ2 +bσ2 +β ′ Qzy 2Qzb +Qzx +σ2 β. −→ U⋆ − ζ,u ε xx − xx xx ε,x (cid:0) (cid:1) Several specialized examples of this general measurement error model follow, highlighting the following important implications of LoSE in X and Y for parameter estimates and standard errors. Example 1: X Mismeasured, Y Not Mismeasured: No LoSE Problems The LoSE in X, ζx, introduces no bias or inconsistency into the estimates, as long as all k explanatory variables are conditioned on the same information set Zx. Similar to CME in Y, the only effect of LoSE in X is to increase the variance of the regression residuals. Given the traditional focus on mismeasurement in X on regression estimation, we begin with this subsection making the following assumption (on top of assumption 1): Assumption 3 Y is not mismeasured: Y = Y⋆. t t t Then (4) simplifies to: Y⋆ = X⋆β +U⋆ t t t = X β +(X⋆ X )β +U⋆ t t − t t = X β εxβ +ζxβ +U⋆. t − t t t Not all of the true variation in X⋆ appears in X due to LoSE, but all of that variation t t does appear in Y⋆ through X⋆β. The variation in Y⋆ missing from X is relegated to t t t t the error term of this equation. 5
The OLS regression estimator in this case is: ′ −1 ′ β = (X X) X Y b = β +(X ′ X) −1 X ′ ( εxβ +ζxβ +U⋆). − Since ζx is uncorrelated with E(X⋆ Zx) + εx = X, the LoSE in X introduces no bias | into β in this case. Given assumption 1, 1X ′ ζx p 0, and the LoSE introduces no T −→ inconbsistency either. These results rely on the assumption that the LoSE is the difference between truth and a conditional expectation, and for multivariate regressions, the consistency result also relies on all k explanatory variables being conditioned on the same information set Zx. Bound, Brown, and Mathiowetz (2001), and Kimball, Sahm, and Shapiro (2008) discuss the case where different elements of X are conditioned on different information sets, causing bias and inconsistency.2 Of course, the CME in X produces the usual attenuation bias. By way of review, and for comparison with later results: (6) E β = β E (X ′ X) −1 X ′ εx β, and: − (cid:16) (cid:17) (cid:16) (cid:17) (7) b β p β Qzx +σ2 −1 σ2 β. −→ − xx ε,x ε,x (cid:0) (cid:1) b Instruments uncorrelated with the CME in X yield consistent estimates. To focus more tightly on the implications of LoSE, the remainder of this subsection considers the case of no CME in X: Assumption 4 var(εx) = 0. t 2In that case, consistency may be acheived by instrumenting with variables from the smallest information set only. 6
Then E β = β, and β p β. The variation in X⋆ that appears in Y⋆ but is missing −→ (cid:16) (cid:17) from X shbows up in thebregression error, increasing the variance of the parameter estimates. We have var β = E var β X + var E β X , but E β X = β and | | | (cid:16) (cid:17) (cid:16) (cid:16) (cid:17)(cid:17) (cid:16) (cid:16) (cid:17)(cid:17) (cid:16) (cid:17) var(β) = 0, so the secobnd term vanisbhes. Then since Ub⋆ and ζx are uncborrelated, and both are uncorrelated with X, standard manipulations show: ′ var β = E var β X = E E β β β β X | (cid:18) (cid:18) − − | (cid:19)(cid:19) (cid:16) (cid:17) (cid:16) (cid:16) (cid:17)(cid:17) (cid:16) (cid:17)(cid:16) (cid:17) b = E E (X b ′ X) −1 X ′ (U⋆ +ζxβ b )(U⋆ + b ζxβ) ′ X (X ′ X) −1 X | (cid:16) (cid:16) (cid:17)(cid:17) = E (X ′ X) −1 X ′ E((U⋆U⋆′ +ζxββ ′ ζx′ ) X)X(X ′ X) −1 | (cid:16) (cid:17) = E (X ′ X) −1 σ2 +β ′ σ2 β . U⋆ ζ,x (cid:16) (cid:17) (cid:0) (cid:1) Asymptotically, the analogous distributional results hold, as: √T β β d N 0,(Qzx) −1 σ2 +β ′ σ2 β , − −→ xx U⋆ ζ,x (cid:16) (cid:17) (cid:0) (cid:0) (cid:1)(cid:1) b and s2 converges to this error variance σ2 +β ′ σ2 β. So the LoSE in X increases the U⋆ ζ,x variance of the regression error. Example 2: Y Mismeasured, X Not Mismeasured, X Z y : Shrunken Standard Errors t x,t ∈ The ζu component of the LoSE in Y introduces no bias or inconsistency into the estimates, but decreases the variance of the regression residuals and standard errors. In addition to assumption 1, this subsection makes the following assumptions: Assumption 5 X is not mismeasured: X = X⋆, and X Z y . t t t t ∈ x,t 7
Then Y⋆ = X β + U⋆. The relation between X and the information set Z y has an t t t t x,t important effect on the properties of the OLS regression estimates; this subsection cony y siders X Z , and the next X Z . t x,t t x,t ∈ 6∈ Since E X Z y = X , we have: Y = X β+E U⋆ Z y +ε in this case. The LoSE t | x,t t t t t| u,t t (cid:0) (cid:1) (cid:0) (cid:1) impacts only U⋆, so ζ = U⋆ E U⋆ Z y , and var E U⋆ Z y = σ2 σ2. The OLS t t t − t| u,t t| u,t U⋆ − ζ (cid:0) (cid:1) (cid:0) (cid:0) (cid:1)(cid:1) regression estimates β as: b ′ −1 ′ β = (X X) X Y b = β +(X ′ X) −1 X ′ (E(U⋆ Zy)+ε) | u = β +(X ′ X) −1 X ′ (U⋆ ζ +ε). − LoSE in U⋆ introduces no bias or inconsistency since Zy is uncorrelated with X, so the u overall measurement error in Y introduces no bias or inconsistency in this case. Forthevarianceofthepointestimates, var β = E var β X sincevar E β X | | (cid:16) (cid:17) (cid:16) (cid:16) (cid:17)(cid:17) (cid:16) (cid:16) (cid:17)(cid:17) = 0, and: b b b E var β X = E E X ′ X −1 X ′ (E(U⋆ Zy)+ε)(E(U⋆ Zy)+ε) ′ X X ′ X −1 X u u | | | | (cid:16) (cid:16) (cid:17)(cid:17) (cid:16) (cid:16) (cid:17)(cid:17) (cid:0) (cid:1) (cid:0) (cid:1) b = E X ′ X −1 σ U 2 ⋆ σ ζ 2+σ ε 2 , − (cid:16) (cid:17) (cid:0) (cid:1) (cid:0) (cid:1) since E(U⋆ Zy) and ε are uncorrelated. The analogous asymptotic results hold. The | u CME in Y increases the variance of the regression residuals and parameter estimates, and reduces the power of hypothesis tests, similar to LoSE in X. The LoSE in Y has the opposite effect, decreasing the variance of the regression residuals and parameter estimates. Such excessively precise standard errors can be a serious problem, especially in the 8
next example where β is biased towards zero. As we approach the limiting case of maximal LoSE in Y wbhere Y approaches a constant, β and var β both approach zero. (cid:16) (cid:17) Under this limiting case, a test of β = β0 rejects wibth certaintyb when β0 is non-zero, even if the hypothesis is true. The shrunken standard errors increase the risk that the econometrician rejects such true hypotheses. Example 3: Y Mismeasured, X Not Mismeasured, X Z y : Biased Point Estimates t x,t 6∈ In addition to assumption 1, this subsection makes the following assumptions: Assumption 6 X is not mismeasured: X = X⋆, and X Z y . t t t t 6∈ x,t This is the case studied in section 4 of the main paper. Example 4: Both X and Y Mismeasured: Illuminating Special Cases The ζxy component of the LoSE in Y introduces an attenuation-type bias (i.e. towards zero in the univariate case) and inconsistency into the estimates under some circumstances. In particular, when Zx Zy, so measured X contains information about 6⊂ x X⋆ missed by the information set used to compute measured Y, then the LoSE in Y introduces bias and inconsistency. Put another way, bias and inconsistency occur when the explanatory variables X contain signal missing from the dependent variable Y. Again for simplicity, and to focus on the effects of LoSE, this section considers the case of no CME in X, so assumption 4 holds, as well as assumption 1. Three special cases are illuminating. The first is where the information sets used to construct Y and X coincide in the universe of variables correlated with X, so Zy = Zx. Then x 9
E(X⋆ Zy) = E(X⋆ Zx), so their difference in (3) and (4) disappears, leaving unbiased | x | andconsistent regressionparameterestimates. Thevarianceandasymptoticdistribution of β, and the probability limit of s2, are as in example 2. bThe second illuminating case is where Zy Zx, so Zx contains all the information x ⊂ about X⋆ in Zy, plus additional information. The difference E(X⋆ Zx) E(X⋆ Zy) is x | − | x uncorrelated with Zy; substituting this difference for ζxy in example 3 then leaves the x results of that section unchanged. The estimate β is biased and inconsistent, with the bias towards zero in the univariate case; some vabriation in measured X that appears in Y⋆ is missed by measured Y, biasing down the covariance between X and Y. Valid instruments must be drawn from the information set used to compute the more-poorly measured Y. The last illuminating case is where Zy contains all the information about X⋆ in Zx x plus additional information, so Zy Zx. Then E(X⋆ Zy) E(X⋆ Zx) is uncorrelated x ⊃ | x − | with Zx and X, and if this difference replaces ζx in example 1, the results in that subsection carry over to this case, except LoSE in U⋆ shrinks the error and parameter variances. The estimates are unbiased and consistent. These cases should help provide some intuition about the potential effects of LoSE in particular regression applications where the econometrician has some knowledge of the relative degree of LoSE mismeasurement in the explanatory and dependent variables. For each application, whether Zy Zx, Zy = Zx, or Zy Zx provides the best x ⊃ x x ⊂ description of reality determines which results are most relevant, those from example 1 (augmented with LoSE in U⋆), example 2, or example 3. For example, the extent of any bias in the parameter estimates depends on the degree to which the mismeasured explanatory variables contain signal missing from the dependent variable. 10
Cite this document
Jeremy J. Nalewaik (2014). Missing Variation in the Great Moderation: Lack of Signal Error and OLS Regression (FEDS 2014-27). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2014-27
@techreport{wtfs_feds_2014_27,
author = {Jeremy J. Nalewaik},
title = {Missing Variation in the Great Moderation: Lack of Signal Error and OLS Regression},
type = {Finance and Economics Discussion Series},
number = {2014-27},
institution = {Board of Governors of the Federal Reserve System},
year = {2014},
url = {https://whenthefedspeaks.com/doc/feds_2014-27},
abstract = {This paper studies measurement errors that subtract signal from true variables of interest, labeled lack of signal errors (LoSE). The effect on OLS regression of LoSE is opposite the conventional wisdom about classical measurement errors, with LoSE in the dependent variable, not the explanatory variables, causing attenuation bias under some conditions. The paper provides evidence of LoSE in US GDP growth during the period known as the Great Moderation (roughly the mid-1980s to the mid-2000s), illustrating attenuation bias in regressions of GDP growth on asset prices. These biases may have contributed to conventional macroeconomic analysis missing the severity of the adverse shocks hitting the economy in the Great Recession.},
}