feds · November 12, 2017

Common Factors, Trends, and Cycles in Large Datasets

Abstract

This paper considers a non-stationary dynamic factor model for large datasets to disentangle long-run from short-run co-movements. We first propose a new Quasi Maximum Likelihood estimator of the model based on the Kalman Smoother and the Expectation Maximisation algorithm. The asymptotic properties of the estimator are discussed. Then, we show how to separate trends and cycles in the factors by mean of eigenanalysis of the estimated non-stationary factors. Finally, we employ our methodology on a panel of US quarterly macroeconomic indicators to estimate aggregate real output, or Gross Domestic Output, and the output gap. Accessible materials (.zip)

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Common Factors, Trends, and Cycles in Large Datasets Matteo Barigozzi and Matteo Luciani 2017-111 Please cite this paper as: Barigozzi, Matteo, and Matteo Luciani (2017). “Common Factors, Trends, and Cycles in Large Datasets,” Finance and Economics Discussion Series 2017-111. Washington: Board of Governors of the Federal Reserve System, https://doi.org/10.17016/FEDS.2017.111. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Common factors, trends, and cycles in large datasets Matteo Barigozzi Matteo Luciani London School of Economics Federal Reserve Board m.barigozzi@lse.ac.uk matteo.luciani@frb.gov November 6, 2017 Abstract This paper considers a non-stationary dynamic factor model for large datasets to disentanglelong-runfromshort-runco-movements. WefirstproposeanewQuasiMaximum Likelihood estimator of the model based on the Kalman Smoother and the Expectation Maximisationalgorithm. Theasymptoticpropertiesoftheestimatorarediscussed. Then, we show how to separate trends and cycles in the factors by mean of eigenanalysis of the estimated non-stationary factors. Finally, we employ our methodology on a panel of US quarterlymacroeconomicindicatorstoestimateaggregaterealoutput,orGrossDomestic Output, and the output gap. JEL classification: C32, C38, C55, E0. Keywords: Non-stationary Approximate Dynamic Factor Model; Trend-Cycle Decomposition; Quasi Maximum Likelihood; EM Algorithm; Kalman Smoother; Gross Domestic Output; Output Gap. ∗We thank for helpful comment the participants to the conferences: “Inference in Large Econometric Models”, CIREQ, Montréal, May 2017; “Big Data in Dynamic Predictive Econometric Modelling”, University of Pennsylvania, Philadelphia, May 2017; “Computing in Economics and Finance”, Fordham University, New York City, June 2017; and to the seminars at: Federal Reserve Board, September 2016; Warwick Business School, February 2017; Department of Statistics, Universidad Carlos III, Madrid, May 2017. We would like also to thank: Stephanie Aaronson, Gianni Amisano, Massimo Franchi, Marco Lippi, Filippo Pellegrino, Ivan Petrella, Lucrezia Reichlin, John Roberts, and Esther Ruiz. Disclaimer: the views expressed in this paper are those of the authors and do not necessarily reflect the views and policies of the Board of Governors or the Federal Reserve System.

1 Introduction This paper is about two stylized facts of macroeconomic time series: co-movements and nonstationarity (Lippi and Reichlin, 1994a). More precisely, this paper is about disentangling long-run co-movements (common trends) from short-run co-movements (common cycles) in a large dataset of non-stationary US macroeconomic indicators. Since the seminal work of Beveridge and Nelson (1981), the issue of decomposing GDP into a trend and a cycle has been a central question in both time series econometrics and policy analysis. This is not surprising, as long-run trends are mainly influenced by supply-side factors, while short-run cycles are mainly associated with demand-side factors, and therefore different estimates of the trend and of the cycle can lead to different policy recommendations. Given the relevance of the issue, in the last 30 years, many papers have suggested different ways to obtain a Trend-Cycle (TC) decomposition of GDP. Roughly speaking, those works can be grouped under two main approaches: one based on univariate methods (e.g. Watson, 1986; Lippi and Reichlin, 1994b; Morley, Nelson, and Zivot, 2003; Dungey, Jacobs, Tian, and Van Norden, 2015), and another using multivariate, but low-dimensional, time series techniques (e.g. Stock and Watson, 1988; Lippi and Reichlin, 1994a; Gonzalo and Granger, 1995; Garratt, Robertson, and Wright, 2006; Creal, Koopman, and Zivot, 2010). In this paper we use a novel approach to decompose GDP into a trend and cycle based on large datasets. We first disentangle common and idiosyncratic dynamics by using a Non- Stationary Approximate Dynamic Factor Model (DFM), and then we disentangle common trends from common cycles by applying a non-parametric TC decomposition to the latent commonfactors. Ourmethodologybuildsonfourpoints: first, focusingonahigh-dimensional setting is crucial, as only in a high-dimensional setting it is possible to disentangle common from idiosyncratic dynamics in a consistent way (Forni, Hallin, Lippi, and Reichlin, 2000; Bai and Ng, 2002; Stock and Watson, 2002) — i.e., we can separate macroeconomic fluctuations from sectoral dynamics and measurement error only in a high-dimensional setting. Second, assuming the existence of a factor structure is a realistic and convenient way to represent co-movements in large macroeconomic datasets. Third, considering non-stationary data is necessary to account for the presence of common trends or, equivalently, cointegration (Bai, 2004; Bai and Ng, 2004; Barigozzi, Lippi, and Luciani, 2016a,b). And fourth, by using a non-parametric TC decomposition we do not have to make assumptions on the law of motion of either the trend, or the cycle. Our approach is deliberately reduced form, and therefore our empirical analysis is conducted “without pretending to have too much a priori economic theory” (Sargent and Sims, 1977), thus letting the data speak as freely as possible. The first contribution of this paper is methodological. Namely, we propose a Quasi MaximumLikelihoodestimatorofthenon-stationaryDFMbasedontheExpectationMaximisation (EM) algorithm combined with the Kalman Filter and the Kalman Smoother estimators of the factors. The theoretical properties of this approach in the large stationary DFM case have been studied in Doz, Giannone, and Reichlin (2011, 2012), and here we extend their results to the non-stationary case by proving consistency and by providing rates of convergence for the factors and the parameters of the model. Compared to the non-stationary principal component estimator (Bai and Ng, 2004), the estimator proposed in this paper is more efficient, and it is more flexible in that, thanks to the use of the Kalman Filter, it allows us to explicitly model the idiosyncratic dynamics, and to impose economically meaningful restrictions. Thesecondcontributionofthispaperistoshowhowtoisolatecommontrendsandcommon cycles in large macroeconomic datasets. In detail, we use a non-parametric approach that 2

identifiesthecommontrendsasthoselinearcombinationsofthefactorsobtainedbytheleading eigenvectors of the long-run covariance matrix (Bai, 2004; Peña and Poncela, 2006), and the common cycles as deviations from the long-run equilibria, which coincide with the space orthogonal to that of the common trends — i.e., the cointegration space (Zhang, Robinson, and Yao, 2016). Because our approach is non-parametric, we are not imposing any particular form to the trend, which is not constrained to be a random walk, or to the cycle. This is what differentiates our approach from the standard state-space, which normally is applied on a handful of variables and where the trend and the cycle dynamics are explicitly specified and jointly estimated with the parameters of the model (Harvey, 1990). Ourfinalcontributionsareempirical. Specifically,weemployourmethodologytoanalysea large panel of US quarterly macroeconomic time series with the goal of estimating the cyclical position of the economy and the observation error. With the expression “estimating the observation error,” we mean estimating aggregate real output. With the expression “estimating the cyclical position of the economy,” we mean decomposing aggregate real output into potential output and output gap. To the best of our knowledge, Fleischman and Roberts (2011) and Aruoba, Diebold, Nalewaik, Schorfheide, andSong(2016)aretheonlyworksthat, sofar, have used (small) factor models to estimate aggregate real output. On the other hand, a few papers have used low-dimensional factor models to estimate the cyclical position of the economy (e.g. Fleischman and Roberts, 2011; Jarociński and Lenza, 2016), and a few more to estimate long-run trends (e.g. Antolin-Diaz, Drechsel, and Petrella, 2016). Finally, Aastveit and Trovik (2014) and Morley and Wong (2017) have used a high-dimensional setting for estimating the output gap by means of a factor model and a large Bayesian VAR, respectively. However, in both works the variables are transformed to stationarity prior to model estimation. The first part of our empirical analysis is about estimating aggregate real output, to which we refer as Gross Domestic Output (GDO). We first show that our model naturally produces anestimateofGDOasthatpartofGDP/GDIthatisdrivenbythemacroeconomic(common) shocks. WethencompareourestimateofGDOwith“theaverageofGDPandGDI” releasedby the Bureau of Economic Analysis, and with “GDPplus” proposed by Aruoba et al. (2016) and released by the Philadelphia Fed. Our results show that these three measures are very similar, which is not surprising, as they are attempting to estimate the same thing. However, we estimate that since 2010 quarterly annualized GDO growth was on average 1⁄ of a percentage 2 point higher than estimated by the BEA or the Philadelphia Fed, thus pointing out that — basedonthecommonalityinthedata—theUSeconomygrewatafasterpacethanmeasured by national account statistics. The second part of our empirical analysis is about estimating the output gap. To this end, we use the above-mentioned TC decomposition in order to separate long-run from shortrun co-movements, and in particular we focus on the decomposition derived for GDO. We compare our estimate with the one produced by the Congressional Budget Office (CBO), which estimates potential output as that level of output consistent with current technologies andnormalutilisationofcapitalandlabour,andtheoutputgapastheresidualpartofoutput. Although these two estimates are obtained in completely different ways, in practice they look very similar. The two estimates are comparable for most of the sample considered, but from the late nineties to the financial crisis, when our measure suggests that a greater part of the produced output was driven by transitory factors. In particular, according to our estimate between 2001:Q1 and 2005:Q4 the output gap was on average 21⁄ percentage points higher 2 than estimated by the CBO. Therestofthispaperisstructuredasfollows.InSection2wediscussrepresentationoflarge 3

non-stationarypanelsoftimeseries. Inthissectionwefirstpresentthenon-stationarydynamic factor model, and we define the concepts of commonality — i.e., the common factors. Then wediscusshowtodisentanglelong-runco-movementsfromshort-runco-movements—i.e., we define what common trends and common cycles are. In Section 3 we discuss estimation. We first introduce in Section 3.1 the static representation of the DFM, which is just a convenient way to approach estimation of the dynamic model presented in Section 2. We then present in Section 3.2 our estimator, we discuss its properties, and we compare it with existing methods. Finally, in Section 3.3 we present the non-parametric TC decomposition that we use in the empirical section. Then, Section 4 presents the empirical analysis. This section is split in two, with the first part presenting our estimate of GDO (Section 4.1), and the second part presenting our estimate of the output gap (Section 4.2). To conclude, in Section 5 we discuss ourfindingsandtheadvantagesandlimitationsofourmethodology,andweproposedirections for further research. In the Appendix we report all technical proofs and the description of the data used and their transformation. Notation A vector z is I(1) if the higher-order of integration among all its components is 1, thus under t this definition some components of z can be stationary. Eigenvalues are always considered t as ordered from the largest to the smallest, so for a given set of eigenvalues {µ }m , we have j j=1 µ ≥ µ ≥ ... ≥ µ ≥ µ . Therefore, the spectral norm of A is defined as (cid:107)A(cid:107)2 = µA(cid:48)A. 1 2 m−1 m 1 The j-th largest eigenvalue of a spectral density matrix at frequency ω is denoted as µ (ω). j The generic(i,j)-th entry of a matrix A is denoted as[A] . We denote byL the lag operator, ij such that Lky = y , for any k ∈ Z and we use the notation ∆y := (1−L)y . Finally, we t t−k t t let M,M ,M ... denote generic positive and finite constants that do not depend on the panel 0 1 dimensions n or T, and whose value may change from line to line. 2 Representation of non-stationary panels of time series Let us assume to observe a vector of n time series {y = (y ···y )(cid:48) : t = 1,...,T} such that t 1t nt y = D +x , (1) it it it where D is a deterministic component — e.g., a linear trend — and x = (x ···x )(cid:48) is such it t 1t nt that x ∼ I(1). We also assume that E[x ] = 0, for any i and t, therefore, x contains all the t it t stochastic trends but no deterministic component. Throughout, the spectral density matrix of ∆x is assumed to exist. t In a high-dimensional setting, it is reasonable to assume that there are common trends and common cycles, but also idiosyncratic terms. Thus, for each variable x we write it x = T +C +ξ , (2) it it it it where T ∼ I(1) is the trend component, C ∼ I(0) is the cycle component, and ξ is it it it the idiosyncratic component, which is allowed to be either I(1) (in presence of idiosyncratic trends) or I(0) (e.g. measurement errors). The trend and the cycle are capturing the common dynamics across series, and thus constitute the common component defined as χ = T +C . it it it Hence, (2) is also written as x = χ +ξ . (3) it it it 4

We define the vectors of common and idiosyncratic components as χ = (χ ···χ )(cid:48) and t 1t nt ξ = (ξ ···ξ )(cid:48), respectively. Finally, notice that consistently with the data considered in t 1t nt this paper: (i) some (but not all) components of x are allowed to be stationary, and (ii) t the deterministic components D are not common to all series — i.e., there are no common it deterministic trends. We assume that the co-movements in χ are driven by q “structural” shocks, with q (cid:28) n, t which are collected in a weak white noise vector process u = (u ···u )(cid:48). Then, for a given t 1t qt q, we decompose each element of x as t x = b(cid:48)(L)f +ξ , (4) it i t it ∆f = C(L)u , (5) t t where from (3) the common component is given by χ = b(cid:48)(L)f and the following properties it i t hold: w.n. A1. u ∼ (0 ,I ), with q is independent of n; t q q A2. E[u ξ ] = 0, for any j = 1,...q, i = 1,...,n, and s,t = 1,...,T; jt is A3. B(L) = (b(cid:48)(L)···b(cid:48) (L))(cid:48) is an n×q one-sided, matrix polynomial matrix of finite order 1 n s, f ∼ I(1) of dimension q; t A4. C(L) = (c(cid:48)(L)···c(cid:48)(L))(cid:48) is a q ×q one-sided, infinite matrix polynomial with square- 1 q summable coefficients and such that rk(C(1)) = (q−d) with 0 < d < q; A5. the q-th largest eigenvalue µ∆χ(ω) of the spectral density matrix of ∆χ is such that q t M ≤ liminf n−1µ∆χ(ω) ≤ limsup n−1µ∆χ(ω) ≤ M , ω-a.e. ∈ [−π,π], 1 q q 2 n→∞ n→∞ while the largest eigenvalue µ∆ξ(ω) of the spectral density matrix of ∆ξ is such that 1 t M ≤ liminf µ∆ξ(ω) ≤ limsup µ∆ξ(ω) ≤ M , ω-a.e. ∈ [−π,π]. 3 1 1 4 n→∞ n→∞ Equations (4) and (5) together with properties A1-A5 define a Non-Stationary Approximate Dynamic Factor Model (DFM). In the case of stationary time series our model is a special case of the Generalised Dynamic Factor Model originally proposed by Forni et al. (2000). Condition A5 is crucial and it allows for identification of the common component by defining it according to its spectral properties. An explanation for A5 in the time domain is provided by Hallin and Lippi (2013) who show that this condition is equivalent to defining the common and idiosyncratic component by asking that for any dynamic aggregation scheme given by an n-dimensional vector of weights a such that (cid:80) a(cid:48)a = 1, the following holds k k∈Z k k (cid:32) ∞ (cid:33) (cid:32) ∞ (cid:33) 1 (cid:88) 1 (cid:88) 0 < lim Var a(cid:48)∆χ ≤ M and lim Var a(cid:48)∆ξ = 0. (6) n→∞ n k t−k n→∞ n k t−k k=−∞ k=−∞ The following asymptotic conditions for the eigenvalues µ (ω) of the spectral density of i ∆x are a direct consequence of A4, A5, and Weyl’s inequality: t 5

B1. for ω-a.e. ∈ [−π,π] the following holds: M ≤ liminf n−1µ (ω) ≤ limsup n−1µ (ω) ≤ M , 1 n→∞ q n→∞ q 2 M ≤ liminf µ (ω) ≤ limsup µ (ω) ≤ M ; 3 n→∞ q+1 n→∞ q+1 4 B2. for ω = 0 the following holds: M ≤ liminf n−1µ (0) ≤ limsup n−1µ (0) ≤ M , 1 n→∞ q−d n→∞ q−d 2 M ≤ liminf µ (0) ≤ limsup µ (0) ≤ M . 3 n→∞ q−d+1 n→∞ q−d+1 4 BymeansofB1thenumberofshocksq canthenbeidentified(HallinandLiška,2007,Onatski, 2009). Similarly, by means of B2 the number of common trends, (q −d), can be identified (Barigozzi et al., 2016b). In particular, from the intuition given in (6) and because of B1 and B2, it is clear that the DFM is identifiable only in the limit n → ∞. Condition A4 allows for the presence of (q −d) common trends in the factors f . In line t with our empirical results in Section 4 we rule out the degenerate cases d = 0 or d = q. This impliesthatthevectorf admitsaVECMrepresentationwithdcointegrationrelations(Engle t and Granger, 1987), as well as the factor representation (Escribano and Peña, 1994): f = Ψτ +γ , (7) t t t whereΨisq×(q−d)andτ isthevectorof(q−d)commontrendswithcomponentsτ ∼ I(1) t jt forj = 1,...,(q−d),whileγ isaq-dimensionalstationaryvector.1 Noticethat(7)isdifferent t from the common trends representation (or multivariate Beveridge-Nelson decomposition) of Stock and Watson (1988) in that the trend τ is not constrained to be a vector random walk, t a property advocated for by many authors (e.g. Lippi and Reichlin, 1994a). ForagivenchoiceofΨ,the(q−d)commontrendscanthenbeobtainedbylinearprojection onto the space spanned by the columns of Ψ: τ = (Ψ(cid:48)Ψ)−1Ψ(cid:48)f = Ψ(cid:48)f . t t t where the second equality holds because, without loss of generality, we can always assume the identifying constraint Ψ(cid:48)Ψ = I . (q−d) Different choices of Ψ lead to different definitions of common trends. Here we opt for a non-parametric approach and we identify the elements of τ as the first (q −d) principal t components of f , as proposed by Bai (2004) and Peña and Poncela (2006) (see Section 3.3 for t details on estimation). Given this definition, the columns of Ψ are orthonormal and therefore there exists a q×d matrix Ψ such that Ψ(cid:48) Ψ = I and Ψ(cid:48) Ψ = 0 . Now, consider ⊥ ⊥ ⊥ d ⊥ d×(q−d) the d-dimensional process obtained by projecting f onto the space orthogonal to the common t trends c = (Ψ(cid:48) Ψ )−1Ψ(cid:48) f = Ψ(cid:48) f = Ψ(cid:48) γ . t ⊥ ⊥ ⊥ t ⊥ t ⊥ t It is straightforward to see that c ∼ I(0), that its components are d common cycles in the t sense of Vahid and Engle (1993), and that the columns of Ψ are a basis of the cointegration ⊥ space of f , thus these common cycles represent deviations from long-run equilibria — see also t e.g. Johansen (1991) and Kasa (1992) for similar definitions.2 1Notice that in general all factors are non-stationary, unless some ad hoc zero-constraint is imposed on the elements of C(1). On the other hand if we were to ask for one of the factors to be stationary then the correspondingrowofΨmustbesettozero. However,wedonotconsiderthiscasefurthersinceitcouldeasily be included in our framework by imposing the appropriate identifying assumptions. 2Other TC decompositions based on a different definitions of cyclesthan the one used here arein Gonzalo and Granger (1995) and Gonzalo and Ng (2001). 6

According to our definition, common trends and common cycles are orthogonal by construction, and we have the TC decomposition of the factors: f = ΨΨ(cid:48)f +Ψ Ψ(cid:48) f = Ψτ +Ψ c , (8) t t ⊥ ⊥ t t ⊥ t and therefore, by combining (1), (4) and (8), we have the TC decomposition of the data: y = D +b(cid:48)(L)Ψτ +b(cid:48)(L)Ψ c +ξ = D +T +C +ξ . (9) it it i t i ⊥ t it it it it it 3 Estimation In order to estimate (9), we need to estimate the factors, f and their TC decomposition. We t opt for a two-step approach, where we first extract the common factors and then we estimate their TC decomposition. In particular, we first introduce a convenient re-parametrization of the DFM based on its static state-space representation (Section 3.1), which is then used for retrieving the factors space by means of the EM algorithm (Section 3.2). Then, in a second step we use principal component analysis for extracting common trends and cycles (Section 3.3). Noticethatcomparedtotheclassicalstate-spaceapproach(e.g. FleischmanandRoberts, 2011) or from the Bayesian approach (e.g. Jarociński and Lenza, 2016) in which the trend and the cycle are estimated in one-step together with the parameters of the models, our approach has the advantage that it does not require us to specify a law of motion for the trend and the cycles. For simplicity of exposition we assume in this section that there is no deterministic component and we refer to Section 4 and to Appendix D for the treatment of these terms in practice. 3.1 The static representation of dynamic factor models Consider the state-space form of the DFM in (4)-(5) (Stock and Watson, 2005; Forni, Giannone, Lippi, and Reichlin, 2009): x = λ(cid:48)F +ξ , (10) it i t it ∆F = D(L)u , (11) t t where from (3) the common component is now given by χ = λ(cid:48)F and u is the same as in it i t t (5). We assume that A1, A2 and A5 still hold and in addition we require: C1. D(L) = (d(cid:48)(L)···d(cid:48)(L))(cid:48) is an r×q one-sided, infinite matrix polynomial with square- 1 r summable coefficients and such that rk(D(1)) = (q−d) with 0 < d < q; C2. Λ = (λ ···λ )(cid:48) is an n×r loadings matrix such that lim (cid:107)n−1Λ(cid:48)Λ−I (cid:107) = 0 and 1 n n→∞ r |[Λ] | < M, for any i = 1,...,n and j = 1,...,r; ij C3. F ∼ I(1) of dimension r, with E[∆F ∆F(cid:48)] positive definite. t t t Condition C1 is equivalent to A4 in that it requires the existence of (q −d) common trends driving the common component. Conditions C2 and C3 are standard in the literature and imply that the eigenvalues of the covariance of ∆χ diverge as n → ∞ at a rate n (Stock t and Watson, 2002; Bai and Ng, 2002; Fan, Liao, and Mincheva, 2013). Finally, from A5 we 7

immediatelyhavethatthelargesteigenvalueofthecovarianceof∆ξ isfiniteforanyn. Given t the way F and f are loaded by the data, hereafter we call F static factors and f dynamic t t t t factors. Let us stress once more the fact that here the DFM and the related TC decomposition are our focus, while the static representation is just a convenient way to approach estimation of the dynamic model. In particular, for (10)-(11) to be equivalent to (4)-(5) we need the following restrictions to hold: R1. there exists an invertible r × r matrix K such that F = K(f(cid:48)···f(cid:48) )(cid:48) and λ(cid:48) = t t t−s i (b(cid:48) ···b(cid:48) )K−1, for any i = 1,...,n, where b , for k = 0,...,s, are the coefficients of i0 is ik b (L) defined in A3; i R2. the dimension of F is r = q(s+1); t R3. the cointegration rank of F is d. t Let us consider each restriction in detail. Restriction R1 implies that the spectral density of ∆F has reduced rank q. In the following, we impose this restriction when estimating the t model but we do not attempt to identify K. RestrictionR2offersanalternativewaytodeterminer withrespecttothetypicalmethods available in the literature based on the behavior of the eigenvalues of the covariance matrix of ∆x and therefore on C2, C3, and A5 (e.g. Bai and Ng, 2002). Specifically, by virtue t of restriction R2, once we set q using B1, we can choose r such that the share of variance explained by the static factors F coincides with the share of variance explained by the q t dynamic factors f — see also D’Agostino and Giannone (2012). t Finally, restriction R3 tells us that the autoregressive representation for (11) is a VECM with d cointegration relations (a proof is in Appendix A). Moreover, since the vector F is t singular,theautoregressiverepresentationhasafiniteorder(Barigozzietal.,2016a). However, in the next section we do not estimate a VECM, rather we estimate an unrestricted VAR in the levels (Sims, Stock, and Watson, 1990). We use the knowledge of the cointegration rank to determine the dimension of the common cycles space (see Section 3.3). Summing up, by not fully imposing R1 and R3 when estimating the factors, we opt for simplicity of estimation versus complexity of a more realistic representation, which implies that the model considered is deliberately mis-specified. The effects of such mis-specification will appear clear in Section 3.3, when we consider TC decompositions of F as opposed to t those of f . t 3.2 Estimating the space of factors and loadings We consider the following state-space form of (10)-(11) in which we assume a VAR(2) for the static factors as in the empirical analysis of Section 4: x = λ(cid:48)F +ξ , (12) it i t it F = A F +A F +Hu , (13) t 1 t−1 2 t−2 t ξ = ρ ξ +e . (14) it i it−1 it We estimate (12)-(14) via the EM algorithm (Dempster, Laird, and Rubin, 1977), combined with the Kalman Filter (KF) and the Kalman Smoother (KS) estimators of the factors (Anderson and Moore, 1979; Harvey, 1990). In the stationary, low-dimensional — i.e., finite n — 8

setting, estimation of a factor model by means of the EM algorithm can be found in Shumway and Stoffer (1982) and Watson and Engle (1983), while the asymptotic properties of this factors’ estimator are studied by Doz et al. (2011, 2012) under the joint limit n,T → ∞.3 In the non-stationary case, applications of the EM algorithm can be found in Quah and Sargent (1993) and Seong, Ahn, and Zadrozny (2013) in a low-dimensional setting. Here, we study the theoretical properties in the non-stationary case when n,T → ∞. In order to run the KF-KS it is necessary to make some additional assumptions on the idiosyncratic component. Let R be the covariance matrix of the vector e = (e ···e )(cid:48) of t 1t nt the idiosyncratic innovations in (14), then we assume: D1. ρ = 1 if ξ ∼ I(1) or ρ = 0 if ξ ∼ I(0); i it i it w.n. D2. e ∼ N(0 ,R), with [R] > 0 and [R] = 0 for any i (cid:54)= j and i,j = 1,...,n; t n ii ij w.n. D3. u ∼ N(0 ,I ). t q q It is clear from D1, D2 and (14) that if some idiosyncratic components are I(1), we can still consider a factor model for x with stationary errors in (12) by adding additional latent t states with unit loadings and evolving as random walks. Notice that the dimension of the parameter space does not increase by increasing the number of I(1) idiosyncratic components. On the other hand modeling the dynamics of I(0) idiosyncratic components would increase the complexity of the estimation problem. For this reason, in D1 we choose to leave the dynamics of the stationary idiosyncratic components unspecified — see Section 4 for practical implementation of this assumption. Assumptions D1-D3 define a mis-specified approximating modelofthetrueDFMandinthissenseourEMapproachdeliversQuasiMaximumLikelihood (QML) estimators. The effect of these mis-specifications are discussed at the end of this section, but before discussing them we present the asymptotic properties of the estimated factors and loadings. We collect all unknown parameters of the model into the vector Θ := (vec(Λ)(cid:48) vec(A )(cid:48) vec(A )(cid:48) vec(H)(cid:48) diag(R)(cid:48))(cid:48). 1 2 We denote by Q the dimension of Θ, then we assume that the true values of the parameters satisfy: D4. Θ ∈ int(Ω), with Ω ⊆ RQ and compact. This condition is standard in QML theory and ensures existence of the true values of the parameters. The EM algorithm is based on the iteration of two steps. In the E-step, for a given estimator of the parameters Θ(cid:98)k , we compute the expected likelihood conditional on all observed data {x ,...,x }. This is in turn a function of the first and second conditional moments of 1 T the static factors, which are computed by means of the KS when using Θ(cid:98)k . Note that, under the assumption of normality, as in D2 and D3, and for a given value of the parameters Θ, the KF-KS give the conditional expectations: F := E [F |x ,...,x ], F := E [F |x ,...,x ], F := E [F |x ,...,x ], t|t−1 Θ t 1 t−1 t|t Θ t 1 t t|T Θ t 1 T 3For recent applications of this approach see e.g. Reis and Watson (2010); Bańbura and Modugno (2014); Juvenal and Petrella (2015); Luciani (2015); Coroneo, Giannone, and Modugno (2016). 9

with the associated covariance matrices denoted as P , P , and P , respectively. These t|t−1 t|t t|T arethereforeoptimalestimatorsofthestatic factorssincetheyminimizetheassociatedMean- Square-Error (MSE) for a given value of the parameters. In the M-step a new estimator of the parameters Θ(cid:98)k+1 is computed by maximizing the expectedlikelihood. AtconvergenceoftheEMalgorithm,sayatiterationk∗,weobtaintheestimator of the parameters, which we denote by Θ(cid:98) := Θ(cid:98)k∗ . The estimator of the factors is then obtainedbyrunningtheKSalasttimesusingΘ(cid:98) anditisdenotedbyF(cid:98)t := E Θ(cid:98) [F t |x 1 ,...,x T ]. The estimated common and idiosyncratic components are then given by χ (cid:98)it = λ(cid:98) (cid:48) i F(cid:98)t and ξ(cid:98)it = x it −χ (cid:98)it . Details of the EM algorithm, as well as closed form expressions for all the estimators, are in Appendix B. To initialise the EM algorithm we use as initial estimator of the loadings the r leading eigenvectors of the covariance of ∆x , from which we have an estimator of the static factors t as the integrated principal components of ∆x (Bai and Ng, 2004). This factors’ estimator t is in turn used to: (i) initialize the KF, together with a diffuse prior for the factors’ covariance (Koopman, 1997; Koopman and Durbin, 2000) and (ii) estimate the VAR parameters (Barigozzi et al., 2016b). Define as V the n × r matrix having as columns the r leading normalised eigenvectors of the covariance of ∆χ , then the following identifying assumptions t are convenient for proving consistency: √ E1. Λ = nV with [Λ] > 0 for all j = 1,...,r; 1j E2. F = n−1/2V(cid:48)χ with F = 0 . t t 0 r Since the static factors have no economic meaning, these identifying assumptions are perfectly valid and — together with assumption C2 on the loadings scale — they rule out any indeterminacy in the estimators used to initialize the EM algorithm — see Doz et al. (2011) for similar assumptions. We have the following consistency result. Proposition 1. Let A1, A2, A5, C1, C2, C3, D1, D2, D3, D4, E1, and E2 hold and let t¯(T) > 0 be such that limsup Te−t¯(T) ≤ M. (15) T→∞ Define F† := (f(cid:48)···f(cid:48) )(cid:48) and λ† := (b(cid:48) ···b(cid:48) )(cid:48). Then, there exists an invertible r×r matrix t t t−s i i0 is K such that, as n,T → ∞, for all t¯(T) ≤ t ≤ T and any given i = 1,...n, √ T (cid:107)λ(cid:98)i −K−1(cid:48) λ† i (cid:107) = O p (1), (16) √ √ min( n, T)(cid:107)F(cid:98)t −KF† t (cid:107) = O p (1), (17) √ √ min( n, T)|χ −χ | = O (1). (18) (cid:98)it it p Proposition1statesthatundertheassumptionspresentedbefore,wecanconsistentlyestimate the common component, as well as the spaces spanned by the dynamic factors f and the t corresponding dynamic loadings which are the coefficients of b (L) defined A3. i Our proof, which is presented in detail in Appendix C, is based on the same approach followed by Poncela and Ruiz (2015) in the one-factor case, and it is made of two main parts which we summarize here. Population results. We first show that, when the parameters are known the one-step-ahead factors’ MSE, P , converges to a steady state, while both the MSEs of the KF, P , and t|t−1 t|t 10

Figure 1: Conditional Mean Squared Errors 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 This figure reports tr(P t|s ) when using Θ(cid:98), computed for the data analyzed in Section 4, where: s = t−1 is the one-step-ahead conditional MSE (solid line); s=t is KF conditional MSE (dashed line); s=T is theKSconditionalMSE(dashed-dottedline). of the KS, P , tend to zero as n → ∞ (Lemmas 4 and 5). Notice that this is true also when t|T initializingwithadiffusepriorsincethishasaneffectonlyforafinitenumberofinitialperiods, say t (Koopman, 1997). In particular, convergence to the steady state is exponentially fast 0 (Anderson and Moore, 1979), hence our result holds for any t ≥ t¯(T) > t , where t¯(T) satisfies 0 condition(15),whichasymptoticallyrequirest¯(T) = O(logT). Inpractice,though,thesteady state is reached very quickly as shown in Figure 1, where we report the trace of P (solid t|t−1 line), P (dashed line) and P (dashed-dotted line), computed for the data analysed in t|t t|T Section 4. Estimation results. Inthesecondstepoftheproof,consistencyoftheKFandKSestimators of the static factors when using estimated parameters is proved (Lemma 7). This is done by taking into account an additional parameter estimation error which has two components: (i) theerroroftheQMLestimatoroftheparametersforthecaseofknownfactors,sayΘ(cid:98) ∗(Lemma 6) and (ii) the error due to the numerical approximation of Θ(cid:98) to Θ(cid:98) ∗ which is related to the stopping rule of the EM algorithm (Meng and Rubin, 1994, and Lemma 9). In particular, the latter error is shown to be negligible with respect to the former one. Therefore the rate of convergenceoftheloadingsestimatedviatheEMalgorithmisthesamethatonewouldobtain byQMLestimation,werethetruefactorsobservable,andmoreover,becauseofassumptionD2 the loadings are estimated equation by equation, thus such error depends only on T. Results similar to (16) hold also for all other estimated parameters in Θ(cid:98). On the other hand the rate of convergence for the estimated static factors is standard in the literature. The results in Proposition 1 extend those by Doz et al. (2011, 2012) to the non-stationary case. A major difference between the EM algorithm in levels proposed in this paper, and the EM algorithm in first differences proposed by Doz et al. (2012), is relative to the way idiosyncratic components are modelled. Indeed, while by considering first differences it is implicitly assumed that all idiosyncratic components have a unit root, in our case we can distinguish between stationary and non-stationary idiosyncratic components — i.e., we can allow for idiosyncratic trends only in some variables. This is not a minor difference, as it has 11

substantial implications for the properties of the estimators. First of all we model non-stationary idiosyncratic components as additional latent states rather than differencing them, thus improving efficiency (see also Remark 2 below). Second, when ξ ∼ I(0), under D1 and D2 the QML estimator of the loadings of the i-th variable it is obtained by minimizing the sample variance of ξ . In this case this is not the same as it differencing before estimation, since in that case the loadings would be estimated by minimizing the sample variance of ∆ξ . The resulting common component of the i-th variable it has therefore different empirical properties: compared to our non-stationary approach, the common component estimated in first differences is likely to provide a better fit of the first differenced data, but not necessarily of the levels. Conversely, the common component obtained with our approach is likely to provide a better fit of the levels thus capturing better the lower frequencies — and so the long-run trends — and resulting in a smoother estimator, which however might have a worse fit of the differenced data. Weconcludethissectionbybrieflydiscussingthepossiblemis-specificationsintroducedby assumptions D1, D2 and D3. In particular, we assume the vector of idiosyncratic shocks e to t be i.i.d. Gaussian, thus imposing four restrictions on: (1) the cross-sectional dependence; (2) the variances; (3) the serial dependence; (4) the distribution. Let us consider the implications for the properties of the estimators of each of these restrictions — see also Doz et al. (2011) for a similar discussion. Remark 1. Iftheidiosyncraticcomponentshavesomecross-sectionaldependence, asallowed byA5,thenthestate-spaceformofthemodelismis-specified,howeverbyinspectingtheproofs we see that, as long as we use an invertible estimator of R, consistency is not affected as long as n → ∞. As a consequence of this asymptotic argument, we do not attempt here to model the off-diagonal terms of R. This is better illustrated by a simple example showing the properties of the KF (an analogous argument holds for the KS). Denote as P the steady state of P then it can be shown t|t−1 that P = HH(cid:48) (Lemma 4). Consider the case in which the parameters are given, ξ ∼ I(0), t and r = q, so that P is invertible, then for t ≥ t¯(T) the KF estimator is such that F = F +PΛ(cid:48)(ΛPΛ(cid:48)+R)−1(x −ΛF ) t|t t|t−1 t t|t−1 = F +(Λ(cid:48)R−1Λ+P−1)−1Λ(cid:48)R−1(x −ΛF ) t|t−1 t t|t−1 = (Λ(cid:48)R−1Λ)−1Λ(cid:48)R−1x +O(n−1) t = F +(Λ(cid:48)R−1Λ)−1Λ(cid:48)R−1ξ +O(n−1) t t = F +O (n−1/2), t p where we used (in order) the Woodbury formula, assumption C2, the definition of x in (12), t andassumptionA5. ClearlyconsistencyoftheKFdoesnotdependonthespecificassumption for R, as long as it is invertible. However for finite n the KF depends on R and modeling also its out of diagonal terms could in principle improve its efficiency (e.g. Bai and Liao, 2016). Remark 2. From the example in Remark 1 it is clear that for finite n the KF estimator is a weighted average of the data where the heteroskedasticity of the idiosyncratic components is accounted for. Again the same argument holds also for the KS. In this respect the KF-KS approach is analogous to the generalized principal component estimator, which is however derived in a stationary setting and without explicitly addressing the dynamics of the data (Choi, 2012). 12

Remark 3. If the idiosyncratic components are autocorrelated, then, unless we model them explicitly as additional latent states, optimality is lost, in particular the loadings’ estimators are still consistent but not efficient. By means of D1 we partially solve the problem at least for the series with I(1) idiosyncratic components. Remark 4. If the idiosyncratic components are non-Gaussian then the estimator is not optimal being only the best linear estimator. Nevertheless, it has to be noticed that typical macroeconomic data show little deviations from normality, so we are minimally concerned by the restrictions imposed by this assumption. Summing up, regardless of these mis-specifications even though we might not have the most efficient estimator, we are likely to have gains in efficiency with respect to those estimators obtained by integrating the principal components of first differences of the data (Bai and Ng, 2004). Indeed, principal components are optimal only in the case of serially and crosssectionally i.i.d. Gaussian idiosyncratic components (Lawley and Maxwell, 1971; Tipping and Bishop, 1999), and such conditions clearly do not hold in a time series context, especially when non-stationarities are present and the cross-sectional dimension is large. On the contrary, our approach explicitly takes into account the autocorrelation in the factors and in the idiosyncratic components as well as their heteroscedasticity, and, as discussed above, it delivers consistent estimates even when some degree of cross-sectional dependence is present but not modelled. 3.3 Trend and cycles We now turn to estimation of common trends and common cycles. Notice that since we do not fully impose R1, the dynamic factors f are not identified and instead we have to deal t with a TC decomposition of the static factors F , which can be carried out analogously to t the one described in Section 2 for f . Because of assumption C1 and restriction R3, for given t values of q and d, the vector F admits the factor representation: t F = Φ T +Γ , t 1 t t where Γ ∼ I(0), Φ is r × (q − d) and T is the vector of (q − d) common trends with t 1 t components T ∼ I(1) for j = 1,...,(q−d). Hence, in general the common trends admit the jt MA representation: ∆T = B(L)η , t t w.n. where η ∼ (0 ,Σ ) with Σ positive definite and B(L) is a (q−d)×(q−d) one-sided, t q−d η η infinite matrix polynomial with square-summable coefficients and rk(B(1)) = (q−d). As a consequence of the results by Peña and Poncela (1997) and Proposition 1 above, given the estimated factors F(cid:98)t , it is clear that, as n,T → ∞, 1 (cid:88) T (cid:18)(cid:90) 1 (cid:19) S(cid:98) := T2 F(cid:98)t F(cid:98) (cid:48) t ⇒ Φ 1 B(1)Σ1 η /2 W(u)W(u)(cid:48)du Σ η 1/2B(1)(cid:48)Φ(cid:48) 1 , (19) 0 t=1 where convergence is in the sense of weak convergence of the associated probability measures and {W(u), 0 ≤ u ≤ 1} is a (q−d)-dimensional standard Wiener process. Hence, by virtue of (19), we can estimate the common trends T as the first (q −d) principal components of t the estimated static factors F(cid:98)t (Bai, 2004; Peña and Poncela, 2006). Specifically, we denote 13

by (Φ(cid:98)1 Φ(cid:98)0 ) the r×r matrix with columns given by the normalized eigenvectors of S(cid:98), ordered accordingtothedecreasingvalueofthecorrespondingeigenvalues,andsuchthatΦ(cid:98)1 isr×(q− d) and Φ(cid:98)0 is r×(r−q+d). This leads to the estimator of common trends as the projection: T(cid:98)t = Φ(cid:98) (cid:48) 1 F(cid:98)t . As for the common cycles, notice first that, by projecting F(cid:98)t onto the columns of Φ(cid:98)0 , we obtain the (r−q+d)-dimensional process G(cid:98)t = Φ(cid:98) (cid:48) 0 F(cid:98)t , which, by construction, is orthogonal to T(cid:98)t . Moreover, G(cid:98)t is stationary since it belongs to the cointegrationspaceofF(cid:98)t (Zhangetal.,2016). However, byR3weknowthatthecointegration space must have dimension d, but we do not impose R3 when estimating the static factors. Thus, we face the problem of identifying d cycles from the higher-dimensional stationary process G(cid:98)t . In order to identify the common cycles we then look for the d-dimensional projection of G(cid:98)t with maximum spectral density. In the empirical analysis of Section 4, we consider the VAR(2): G(cid:98)t = A 1 G(cid:98)t−1 +A 2 G(cid:98)t−2 +v t , (20) where v w ∼ .n. (0 ,Σ ) and det(I −A z −A z2) (cid:54)= 0 for |z| ≤ 1. Once we estimate t r−q+d v r−q+d 1 2 (20) we have its residuals v (cid:98)t and their covariance matrix Σ(cid:98)v . Denote as H(cid:98) the (r−q+d)×d matrix having as columns the leading d normalized eigenvectors of Σ(cid:98)v . We then define the estimated cycle component as the d-dimensional projection: (cid:48) C(cid:98)t = H(cid:98) G(cid:98)t . The estimated TC decomposition is then given by F(cid:98)t = Φ(cid:98)1 Φ(cid:98) (cid:48) 1 F(cid:98)t +Φ(cid:98)0 Φ(cid:98) (cid:48) 0 F(cid:98)t = Φ(cid:98)1 T(cid:98)t +Φ(cid:98)0 G(cid:98)t (cid:48) (cid:48) = Φ(cid:98)1 T(cid:98)t +Φ(cid:98)0 H(cid:98)H(cid:98) G(cid:98)t +Φ(cid:98)0 H(cid:98)⊥ H(cid:98) ⊥ G(cid:98)t = Φ(cid:98)1 T(cid:98)t +Φ(cid:98)0 H(cid:98)C(cid:98)t +Φ(cid:98)0 (G(cid:98)t −H(cid:98)C(cid:98)t ), (21) (cid:48) where H(cid:98)⊥ is (r −q +d)×(r −q) and such that H(cid:98) ⊥ H(cid:98) = 0 (r−q)×d . The last term on the right-hand-side of (21) appears due to the mis-specification caused by not fully imposing R1 and R3 and in particular it has covariance of rank (r−q) and since r > q it is in general not zero. To appreciate the meaning and the appropriateness of decomposition (21), in Figure 2 we show the spectral densities of the first differences of the three components of F(cid:98)t for the data analyzed in Section 4, where r = 6, q = 3, and d = 2. As expected the estimated common trend T(cid:98)t (black line) contributes most at the lowest frequencies — i.e., lower than 1 π 0 — which correspond to periods higher than five years. Once we remove the common trend, of the remaining five processes G(cid:98)t , the two estimated common cycles C(cid:98)t (red lines) capture most of the variation for almost all frequencies: one cycle dominates at periods longer than two years — i.e., frequencies lower than π — and the other cycle dominates at periods shorter than two 4 14

Figure 2: Spectral Densities of Common Trends and Common Cycles 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10Y 5Y 2Y 1Y ThisfigurereportsforthedataanalyzedinSection4thespectraldensitiesofthecommontrend∆T(cid:98)t(blackline),thecommoncycles∆C(cid:98)t(red lines),andtheresidualcycles(∆G(cid:98)t−H(cid:98)∆C(cid:98)t)(bluelines). Onthehorizontal axis we report periods τj measured in years and corresponding tofrequenciesωj = 4 2 τ π j (thedataconsideredisquarterly). years — i.e., frequencies higher than π. With respect to those two cycles, the residual three 4 cycles (G(cid:98)t −H(cid:98)C(cid:98)t ) (blue lines) give a negligible contributions to the total variation. Given thisempiricalresult, theextratermin(21)canbeneglectedandtreatedasamis-specification error. Finally, from (21), the estimated TC decomposition of the data immediately follows: x it = λ(cid:98) (cid:48) i Φ(cid:98)1 T(cid:98)t +λ(cid:98) (cid:48) i Φ(cid:98)0 H(cid:98)C(cid:98)t +λ(cid:98) (cid:48) i Φ(cid:98)0 (G(cid:98)t −H(cid:98)C(cid:98)t )+ξ(cid:98)it , which is the estimated counterpart of the representation given in (9). 4 Estimating the cyclical position of the economy and the observation error WenowuseourmodeltoestimatethecyclicalpositionoftheUSeconomyandtheobservation error. In particular, in Section 4.1 we will estimate “the observation error” by estimating the non-stationary approximate DFM as explained in Section 3.2. And, in Section 4.2 we will estimate “the cyclical position of the economy” by decomposing the common factors into common trends and common cycles using the TC decomposition discussed in Sections 2 and 3.3. Thefollowinganalysisiscarriedoutonalargemacroeconomicdatasetcomprisingn = 103 quarterly series from 1960:Q1 to 2017:Q1 describing the US economy. The complete list of variables and transformations is reported in Appendix D. ComparedtothepapersthatusesmallDFMstoestimatethecyclicalpositionoftheeconomy, which typically estimate the output gap using only high level variables such as GDP, the unemployment rate, and PCE price inflation, we include several other indicators, thus being able to capture information coming from a wider spectrum of the economy. Specifically, our datasets includes national account statistics, industrial production indexes, various 15

Table 1: Percentage of explained variance 1 2 3 4 5 6 7 8 9 10 q 33.4 45.8 53.3 58.9 63.6 67.4 70.6 73.4 75.8 77.9 r 23.4 33.9 42.1 47.9 51.8 55.3 58.2 60.6 62.7 64.9 This table reports the percentage of total variance explained by the q largest eigenvalues of the spectral density matrixof∆xt andbyther largesteigenvaluesofthecovariancematrixof∆xt. price indexes including CPIs, PPIs, and PCE price indexes, various labor market indicators including indicators from both the household survey and the establishment survey as well as labor cost and compensation indexes, monetary aggregates, credit and loans indicators, housing market indicators, interest rates, the oil price, and the S&P500 index. Broadly speaking, all the variables that are I(1) are not transformed, while all the variables that are I(2) are differenced once. Notice that some variables should from a theoretical economic point of view always be considered as I(0) (e.g. inflation rates, unemployment rate, and interest rates) but since they exhibit a great deal of persistence are here treated as I(1). Finally, a linear trend is estimated where necessary before applying our methodology, thus accounting for the deterministic component in (1). A thorough empirical analysis requires tackling two main preliminary problems. First, we need to determine the number of common trends (q − d), of common shocks q, and of static factors r. To determine the number of common trends (q−d) we use the criterion by Barigozzietal.(2016b), whichexploitsthebehaviouroftheeigenvaluesdescribedincondition B2. This criterion indicates the presence of (q−d) = 1 common trend, which is in line with many theoretical models assuming a common productivity trend as the sole driver of longrun dynamics (e.g. Del Negro, Schorfheide, Smets, and Wouters, 2007). To determine the number of common shocks q we use the test by Onatski (2009) and the criterion by Hallin and Liška (2007), which exploit the behaviour of the eigenvalues described in condition B1. Both methods indicate the presence of q = 3 common shocks. Having determined q, as we explained in Section 3.1 by virtue of R2 we can set the number of static factors r according to their explained variance. By looking at Table 1 we can clearly see that r (cid:39) 2q, and therefore in our benchmark specification we set q = 3 and r = 6.4 Second, we need to choose which idiosyncratic components to model as random walk, and which as white noises. Following the methodology proposed by Bai and Ng (2004), we can explicitly test the null-hypothesis H : ρ = 1, and if we do not reject H , we set ρ = 1, while 0 i 0 i if we reject H , we set ρ = 0. This approach is applied to all variables in the dataset except 0 i GDP, GDI, unemployment rate, Federal funds rate, and CPI, core CPI, PCE, and core PCE inflation, for which we impose a priori ρ = 0. That is, while for most of the variables in the i dataset we let the data determine what is driving their long run dynamics, we impose that the long-run dynamics of GDP, GDI, unemployment rate, Federal funds rate, and CPI, core CPI, PCE, and core PCE inflation are driven exclusively by macroeconomic shocks, with the idiosyncratic shocks accounting only for short-run movement. 4Analternativewaytoselectthenumberofstaticfactorsristoresorttooneofthemanyavailablemethods, such as, for example, the criterion of Bai and Ng (2002), which for our dataset gives results in line with our choice of r. 16

4.1 Measuring Gross Domestic Output A fundamental issue in economics is the measurement of aggregate real output, henceforth Gross Domestic Output (GDO). Historically, GDO has been measured mainly by the Gross Domestic Product (GDP), but GDP, which tracks all expenditures on final goods and services produced, is just an estimate of GDO. An equally acceptable estimate of the concept of GDO isrepresentedbytheGrossDomesticIncome(GDI),whichtracksallincomereceivedbythose who produced the output. GDP is almost always preferred to GDI, the main reason being that it is released before GDI.5 However it has been shown that GDI reflects the business cycle fluctuations in true output growth better than GDP and moreover GDI is better than GDP in recognising the start of a recession (Nalewaik, 2010, 2012). In recent years, there has been interest in combining GDP and GDI to come up with a better estimate of GDO, where the rationale for doing so is that the difference between GDP and GDI is exclusively the result of measurement error — using the NIPA table definition “statistical discrepancy” — as these two statistics are in fact measuring the same thing. For example, starting from November 4, 2013, the Philadelphia Fed releases an estimate of GDO, called“GDPplus” proposedbyAruobaetal.(2016),whichisdefinedasthecommoncomponent of a bivariate one-factor model built with GDP and GDI growth rates. Similarly, and starting from July 30, 2015, the Bureau of Economic Analysis (BEA) releases “the average of GDP and GDI”, which the Council of Economic Advisers refers to as GDO (Council of Economic Advisers, 2015). Our approach differs from those mentioned above in that our estimate of GDO is not obtained by combining GDP and GDI, rather it is obtained by using all the 103 variables included in our dataset. In detail, we define GDO as that part of GDP/GDI that is driven by the macroeconomic (common) shocks, i.e., GDO = χGDP = χGDI. To estimate GDO in this t t t way, we estimate a constrained version of model (12)-(13), where we impose the restriction of equal common components: χGDP = χGDI. This restriction is indeed corroborated by the t t data, as even if we do not impose it, the estimated χGDP and χGDI are nearly identical. In t t numbers, the standard deviation of (∆yGDP −∆yGDI) is 1.93, while the standard deviation t t of (∆χGDP−∆χGDI) is reduced to 0.28. t t Figure 3 shows our proposed estimate of GDO (red line) together with “GDPplus” (blue line) and the “the average of GDP and GDI” released by the BEA (black line). Overall, the three measures are very similar, which is not surprising, as they are attempting to estimate the same quantity. However, three important differences emerges. First, our estimate of GDO is smoother than the other two. This is not surprising. Compared to “GDPplus” and “the average of GDP and GDI”, our estimate of GDO is constructed to contain a larger low frequency component, because it is estimated on data in levels rather than on growth rates. Moreover, because it is derived under the assumption that the idiosyncratic components of GDP and GDI are stationary, by construction our estimate of GDO captures all the low frequency movements of GDP and GDI. Second, our estimate of GDO does not show any kind of residual seasonality in the last fifteen years, where the term “residual seasonality” refers to the presence of “lingering seasonal effects even after seasonal adjustment processes have been applied to the data” (Moulton and Cowan, 2016). Mainly motivated by the fact that since 2010 GDP growth in Q1 has been on averagemorethan1percentagepointlowerthanintheotherquarters(NWplotofFigure4),in 5ThefirstestimateofGDPisreleasedonemonthafterthereferencequarter,whileGDIisgenerallyreleased two months after the reference quarter, together with the second release of GDP. 17

Figure 3: Gross Domestic Output Quarterly annualised percentage change 4-quarter percentage change 8 12 10 6 BEA BEA 8 GDPplus GDPplus BL BL 6 4 4 2 2 0 0 -2 -4 -2 -6 -8 -4 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 ThisfigurereportsdifferentestimatesofGDO.Blackline: “theaverageofGDPandGDI” releasedbytheBEA; blueline: “GDPplus” releasedbythePhiladelphiaFed;redline: ourestimate. recent years there has been lots of discussion on whether US GDP exhibit residual seasonality or not. The profession is not in agreement on this issue, as some authors (e.g. Gilbert et al., 2015; Lengermann et al., 2017) conclude that US GDP does not exhibit residual seasonality, while others (e.g. Rudebusch, Wilson, and Mahedy, 2015; Lunsford, 2017) find evidence of residual seasonality — see Moulton and Cowan (2016) for a technical discussion on causes and remedies for residual seasonality in US GDP. Figure 4 shows average real GDO growth by quarter for our estimate of GDO (SE plot), “GDPplus” (SW plot), and “the average of GDP and GDI” (NE plot). As can be clearly seen, our estimate of GDO exhibits no residual seasonality whatsoever in the last 15 years. Third, our estimate of GDO in the recent years gives a different signal about the economy than the one given by ‘GDPplus” and “the average of GDP and GDI”. According to our estimate, since 2010 quarterly annualized GDO growth was on average 1⁄ of a percentage 2 point higher than estimated by the BEA or the Philadelphia Fed, where this difference comes mainly from our estimate of GDO growth in the first quarter (see Figure 4), and therefore from the fact that our measure do not suffer of residual seasonality. In other words, based on the commonality in the data, the US economy grew at a faster pace than measured by national account statistics. 4.2 Measuring the output gap Decomposing aggregate real output into potential output and output gap is a critical task for both monetary and fiscal policy, as the former is a key input for long-term projections, and the latter can be an important gauge of inflationary pressure. There exist many definitions of potential output and of output gap — see Kiley, 2013, for a survey of different methods and definitions. Here we use the definition implied by the TC decomposition discussed in Sections 2 and 3.3. Among the many existing approaches the most similar to ours are Fleischman and Roberts (2011) and Jarociński and Lenza (2016), who use small dynamic factor models, Aastveit and Trovik (2014), who use a large stationary dynamic factor model combined with the Hodrick Prescott filter, and Morley and Wong (2017), who use a large stationary BVAR combined with the Beveridge and Nelson decomposition. We compare our output gap estimate with the one produced by the Congressional Budget 18

Figure 4: Residual Seasonality GDP BEA 4.5 4.5 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 4 3.8 4 3.9 3.7 3.6 3.5 3.2 3.33.2 3.3 3.5 3.4 3.4 3.3 3.2 3 3 2.8 2.5 2.5 2.6 2.6 2.52.5 2.3 2.5 2.3 2.1 2.2 2.7 2.2 2 1.7 2 1.8 1.5 1.5 1.4 1.5 1.4 1.2 1.1 1 0.9 1 0.5 0.5 0 0 1980s 1990s 2000s 2010-2016 1980s 1990s 2000s 2010-2016 GDPplus BL 4.5 4.5 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 4 4 3.5 3.2 3.33.4 3.3 3.6 3.3 3.6 3.5 3.2 3.7 3.2 3.63.53.4 2.5 3 2.6 2.3 2.6 2.5 3 2.82.7 2.62.62.5 2.8 2.1 2.1 2 2.0 2 1.7 1.71.7 1.5 1.4 1.2 1.5 1.4 1.3 1 1 0.5 0.5 0 0 1980s 1990s 2000s 2010-2016 1980s 1990s 2000s 2010-2016 ThisfigurereportsaveragegrowthatanannualratebyquarterforGDP,“theaverageofGDPand GDI” released by the BEA, the Philadelphia Fed estimate of GDO (GDPplus), and our estimate ofGDO(BL). Office (CBO). The CBO estimates potential output and the output gap by using the so-called “production function approach” according to which potential output is that level of output consistent with current technologies and normal utilisation of capital and labour, and the output gap is the residual part of output. Specifically, the CBO model is based upon a textbook Solow growth model, with a neoclassical production function. Labour and productivity trends areestimatedbyusinga variantofthe Okun’slaw, sothat actualoutputisaboveitspotential (the output gap is positive), when the unemployment rate is below the natural rate of unemployment, which is in turn defined as the non-accelerating inflation rate of unemployment (NAIRU), i.e., that level of unemployment consistent with a stable inflation — for further details see Congressional Budget Office (2001). In Figure 5, we compare our measure of the output gap (red line) with the one produced by the CBO (blue line), where the left plot shows the level of the output gap, while the right plot shows the 4-quarter percentage change of the output gap. The main result emerging from Figure 5 is that our estimate of the output gap is remarkably similar to that of the CBO. However, there are a few periods in which the two estimates diverge, among which the main one is from the late nineties to the financial crisis. In particular, while according to the CBO the level of the output gap was negative between 2001:Q1 and 2005:Q4, according to our estimate in that same period the output gap was positive — on average 21⁄ percentage 2 points higher than estimated by the CBO. Therefore, according to our estimate the level of the output gap right before the great financial crisis in 2007:Q4 was 1.3%, while according to the CBO was -0.7%, and hence we estimate that the level of slack in the economy at the trough of the crisis in 2009:Q2 was -4.5%, approximately 13⁄ percentage points higher than 4 estimated by the CBO. 19

Figure 5: Output gap Level 4-quarter percentage change CB0 4 BL 4 CBO 3 BL 2 2 1 0 0 -1 -2 -2 -4 -3 -4 -6 -5 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 The left plot shows the level of the output gap estimated by the CBO (blue line) together with our estimate (red line). Therightplotshowsthe4-quarterpercentagechangeoftheoutputgap. To conclude, let us emphasize that the fact that our estimate of the output gap is close to that of the CBO is a remarkable result, particularly so because our estimate of the output gap is very different from that of the CBO from both a technical and an interpretational point of view. Indeed, while the CBO constructs the output gap so that its level has a specific economic meaning, our measure of the output gap is simply the transitory/stationary part of thecommoncomponentofoutput—i.e.,thatpartofaggregaterealoutputthatwilldisappear in the long-run.6 Therefore, our output gap estimate provides different and complementary information on the cyclical position of the economy than that contained in the CBO estimate. In particular, our estimate of the output gap seems more suitable to answer the question “which part of current growth is due to temporary factors?”, while the measure of the CBO is certainly more suitable as a gauge of inflation pressure. This can explain in part the divergence of the two estimates in the 2000s. This period is characterized by stable and low inflation — on average core CPI inflation between 2001:Q1 and 2007:Q4 was approximately 2.1%. Accordingly, the CBO estimates that slack is positive (i.e., the output gap negative). By contrast, our measure, which is not specifically affected by inflation, but it is more broadly influenced by the co-movement in the data, estimates that a part of the aggregate real output was transitory. This makes sense given that the years before the crisis were characterized by several factors that proved indeed transitory, such as the housing boom, a historically high share of sub-prime loan origination (Haughwout and Okah, 2009), and a large amount of equity withdrawal from housing (Fuster, Geddes, and Haughwout, 2017). And, since our model includes a large number of variables, including housing indicators as well as loan and credit indicators, these transitory factors are captured by our model. 5 Discussion and conclusions In this paper we disentangle long-run co-movements (common trends) from short-run comovements (common cycles) in large datasets. To this end, we first estimate a non-stationary dynamicfactormodelbymeansofaQuasiMaximumLikelihoodestimatorbasedontheExpectation Maximisation algorithm, combined with the Kalman Filter and the Kalman Smoother 6Notice that also for the CBO the output gap is assumed to revert to zero in the long-run as it imposes in its forecast that in 10 years the output gap will be zero — see e.g. Congressional Budget Office (2004). 20

estimators of the factors. We then disentangle common trends from common cycles by applying a non-parametric Trend-Cycle decomposition to the latent common factors and based on eigenanalysis of their long-run covariance. The asymptotic properties of this estimator are derived and discussed in the paper. We estimate our model on a large panel of US quarterly macroeconomic time series with the goal of estimating the cyclical position of the economy and the observation error. After backing out the observation error, we show that our model naturally produces an estimate of aggregate real output, which we refer to as Gross Domestic Output (GDO). According to our estimate of GDO, since 2010 the US economy grew at a faster pace than measured by national account statistics. We then use a Trend-Cycle decomposition to estimate the output gap. We compare our estimate of the output gap, which is entirely data-driven, with that produced by the Congressional Budget Office (CBO), which is instead based on theoretical economic models. It turns out that our estimate of the output gap is remarkably similar to that of the CBO except from the late nineties to the financial crisis, when our measure suggests that a greater part of the produced output was driven by transitory factors. There are a number of aspects of our model that we have not fully developed in our empirical analysis and that are left for future research. First, due to the use of the Kalman Filter,ourfactorestimatorisinprincipleabletohandlebothmixedfrequencyandmissingdata (e.g. Mariano and Murasawa, 2003; Jungbacker, Koopman, and Van der Wel, 2011; Bańbura and Modugno, 2014) and, therefore, it can be used for real-time analysis (Giannone, Reichlin, and Small, 2008). This aspect is well-known to be particularly relevant when estimating the output gap, since as shown by Orphanides and van Norden (2002), end-of-sample revisions of GDP are of the same order of magnitude as the gap itself. Second, the use of the Kalman Filter makes our model suitable for scenario and counterfactual analysis based on conditional forecasts (Bańbura, Giannone, and Lenza, 2015). Third, as shown in equation (21), our model naturallyproducesaTrend-Cycledecompositionforeachvariableinthedataset,andtherefore it is possible to estimate other policy-relevant indicators, such as the unemployment gap (in our framework, the cycle component of the unemployment rate) or trend inflation (in our framework, the trend component of core CPI or the core PCE price indexes). Ourapproachhasbeensofardeliberatelyentirelydatadriven, andwehavebeencarefulin imposing the least possible amount of restrictions to let the data speak freely. This approach has undeniably some important merits, as estimation of GDO seems to fit naturally in our framework, and the Trend-Cycle decomposition that we obtain for GDO is economically sensible. However, we believe that imposing the statistical restrictions described in Section 3.1, thus eliminating the miss-specification error when computing the Trend-Cycle decomposition, as well as imposing economically meaningful constraints, seems to be an essential step forward. Our view is that one way to proceed is to consider Bayesian estimation of the model, so that our economic and statistical knowledge of the data can be included by means of suitable priors. All this is the subject of our current research. References Aastveit,K.A.andT.Trovik(2014).Estimatingtheoutputgapinrealtime: Afactormodelapproach. The Quarterly Review of Economics and Finance 54, 180–193. Anderson, B. D. O. and J. B. Moore (1979). Optimal Filtering. Dover Publications, Inc. 21

Antolin-Diaz, J., T.Drechsel, andI.Petrella(2016). Trackingtheslowdowninlong-runGDPgrowth. The Review of Economics and Statistics. forthcoming. Antsaklis, P. J. and A. M. Michel (2007). A Linear Systems Primer. Birkhaüser. Aruoba, S. B., F. X. Diebold, J. Nalewaik, F. Schorfheide, and D. Song (2016). Improving GDP measurement: A measurement-error perspective. Journal of Econometrics 191, 384–397. Bai,J.(2004).Estimatingcross-sectioncommonstochastictrendsinnonstationarypaneldata.Journal of Econometrics 122, 137–183. Bai, J. and Y. Liao (2016). Efficient estimation of approximate factor models via penalized maximum likelihood. Journal of Econometrics 191, 1–18. Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models. Econometrica 70, 191–221. Bai, J. and S. Ng (2004). A PANIC attack on unit roots and cointegration. Econometrica 72, 1127– 1177. Bańbura, M., D. Giannone, and M. Lenza (2015). Conditional forecasts and scenario analysis with vector autoregressions for large cross-sections. International Journal of Forecasting 31, 739–756. Bańbura, M. and M. Modugno (2014). Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data. Journal of Applied Econometrics 29, 133–160. Barigozzi, M., M. Lippi, and M. Luciani (2016a). Dynamic factor models, cointegration, and error correction mechanisms. FEDS 2016-18, Board of Governors of the Federal Reserve System. Barigozzi, M., M. Lippi, and M. Luciani (2016b). Non-stationary dynamic factor models for large datasets. FEDS 2016-24, Board of Governors of the Federal Reserve System. Beveridge, S.andC.R.Nelson(1981). Anewapproachtodecompositionofeconomictimeseriesinto permanent and transitory components with particular attention to measurement of the ‘business cycle’. Journal of Monetary Economics 7, 151–174. Choi, I. (2012). Efficient estimation of factor models. Econometric Theory 28, 274–308. Congressional Budget Office (2001). CBO’s method for estimating potential output: An update. CongressionalBudgetOffice(2004). AsummaryofalternativemethodsforestimatingpotentialGDP. CBO Background Paper. Coroneo, L., D. Giannone, and M. Modugno (2016). Unspanned macroeconomic factors in the yield curve. Journal of Business and Economic Statistics 34, 472–485. Council of Economic Advisers (2015). A better measure of economic growth: Gross Domestic Output (GDO). CEA Issue Brief. Creal, D., S. J. Koopman, and E. Zivot (2010). Extracting a robust US business cycle using a timevarying multivariate model-based bandpass filter. Journal of Applied Econometrics 25, 695–719. D’Agostino,A.andD.Giannone(2012). Comparingalternativepredictorsbasedonlarge-panelfactor models. Oxford Bulletin of Economics and Statistics 74, 306–326. DelNegro,M.,F.Schorfheide,F.Smets,andR.Wouters(2007). OnthefitofNewKeynesianmodels. Journal of Business and Economic Statistics 25, 123–143. Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 1–38. Doz, C., D. Giannone, and L. Reichlin (2011). A two-step estimator for large approximate dynamic factor models based on Kalman filtering. Journal of Econometrics 164, 188–205. 22

Doz, C., D. Giannone, and L. Reichlin (2012). A quasi maximum likelihood approach for large approximate dynamic factor models. The Review of Economics and Statistics 94(4), 1014–1024. Dungey, M., J. P. Jacobs, J. Tian, and S. Van Norden (2015). Trend in cycle or cycle in trend? New structural identifications for unobserved-components models of US real GDP. Macroeconomic Dynamics 19, 776–790. Durbin,J.andS.J.Koopman(2001). TimeSeriesAnalysisbyStateSpaceMethods. OxfordUniversity Press. Engle, R. F. and C. W. J. Granger (1987). Cointegration and error correction: Representation, estimation, and testing. Econometrica 55, 251–76. Escribano, A. and D. Peña (1994). Cointegration and common factors. Journal of Time Series Analysis 15, 577–586. Fan, J., Y. Liao, and M. Mincheva (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75, 603–680. Fleischman, C. A. and J. M. Roberts (2011). From many series, one cycle: improved estimates of the business cycle from a multivariate unobserved components model. FEDS 2011-046, Board of Governors of the Federal Reserve System. Forni, M., D. Giannone, M. Lippi, and L. Reichlin (2009). Opening the black box: Structural factor models versus structural VARs. Econometric Theory 25, 1319–1347. Forni, M., M. Hallin, M. Lippi, and L. Reichlin (2000). The Generalized Dynamic Factor Model: Identification and estimation. The Review of Economics and Statistics 82, 540–554. Franchi, M.(2017). Onthestructureofstatespacesystemswithunitroots. Technicalreport. mimeo. Fuster, A., E. Geddes, and A. Haughwout (2017). Houses as ATMs no longer. Federal Reserve Bank of New York Liberty Street Economics blog, February 15. http://libertystreeteconomics.newyorkfed.org/2017/02/houses-as-atms-no-longer.html. Garratt, A., D. Robertson, and S. Wright (2006). Permanent vs transitory components and economic fundamentals. Journal of Applied Econometrics 21, 521–542. Giannone, D., L. Reichlin, and D. Small (2008). Nowcasting: The real-time informational content of macroeconomic data. Journal of Monetary Economics 55, 665–676. Gilbert, C., N. Morin, A. D. Paciorek, and C. R. Sahm (2015). Residual seasonality in GDP. FEDS Notes 2015-05-14, Board of Governors of the Federal Reserve System. Gonzalo, J. and C. Granger (1995). Estimation of common long-memory components in cointegrated systems. Journal of Business and Economic Statistics 13, 27–35. Gonzalo,J.andS.Ng(2001). Asystematicframeworkforanalyzingthedynamiceffectsofpermanent and transitory shocks. Journal of Economic Dynamics and Control 25, 1527–1546. Hallin, M. and M. Lippi (2013). Factor models in high-dimensional time series? A time-domain approach. Stochastic Processes and their Applications 123, 2678–2695. Hallin, M. and R. Liška (2007). Determining the number of factors in the general dynamic factor model. Journal of the American Statistical Association 102, 603–617. Harvey, A. C. (1990). Forecasting, structural time series models and the Kalman filter. Cambridge University Press. Haughwout, A. F. and E. Okah (2009). Below the line: Estimates of negative equity among nonprime mortgage borrowers. Economic Policy Review 15, 31–43. Federal Reserve Bank of New York. 23

Jarociński, M. and M. Lenza (2016). An inflation-predicting measure of the output gap in the euro area. ECB Working Paper Series 1966, European Central Bank. Johansen, S. (1991). Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica 59, 1551–80. Jungbacker, B., S. J. Koopman, and M. Van der Wel (2011). Maximum likelihood estimation for dynamic factor models with missing data. Journal of Economic Dynamics and Control 35, 1358– 1368. Juvenal, L.andI.Petrella(2015). Speculationintheoilmarket. Journal of Applied Econometrics 30, 1099–1255. Kasa, K. (1992). Common stochastic trends in international stock markets. Journal of Monetary Economics 29, 95–124. Kiley, M. T. (2013). Output gaps. Journal of Macroeconomics 37(C), 1–18. Kohn, R. and C. F. Ansley (1983). Fixed interval estimation in state space models when some of the data are missing or aggregated. Biometrika 70, 683–688. Koopman, S. J. (1997). Exact initial Kalman filtering and smoothing for nonstationary time series models. Journal of the American Statistical Association 92, 1630–1638. Koopman,S.J.andJ.Durbin(2000). Fastfilteringandsmoothingformultivariatestatespacemodels. Journal of Time Series Analysis 21, 281–296. Lawley, D. N. and A. E. Maxwell (1971). Factor Analysis as a Statistical Method. Butterworths, London. Lengermann,P.,N.Morin,A.D.Paciorek,E.Pinto,andC.R.Sahm(2017). Anotherlookatresidual seasonality in GDP. FEDS Notes 2017-07-28, Board of Governors of the Federal Reserve System. Lippi, M. and L. Reichlin (1994a). Common and uncommon trends and cycles. European Economic Review 38, 624–635. Lippi, M.andL.Reichlin(1994b). Diffusionoftechnicalchangeandthedecompositionofoutputinto trend and cycle. The Review of Economic Studies 61, 19–30. Luciani, M. (2015). Monetary policy and the housing market: A structural factor analysis. Journal of Applied Econometrics 30, 199–218. Lunsford, K. G. (2017). Lingering residual seasonality in GDP growth. Economic Commentary 2017- 06, Federal Reserve Bank of Cleveland. Mariano, R. S. and Y. Murasawa (2003). A new coincident index of business cycles based on monthly and quarterly series. Journal of Applied Econometrics 18, 427–443. Meng, X.-L. and D. B. Rubin (1994). On the global and componentwise rates of convergence of the EM algorithm. Linear Algebra and its Applications 199, 413–425. Morley, J., C. R. Nelson, and E. Zivot (2003). Why are unobserved component and Beveridge-Nelson trend-cycle decompositions of GDP so different. The Review of Economics and Statistics 85, 235– 243. Morley, J. and B. Wong (2017). Estimating and accounting for the output gap with large Bayesian vector autoregressions. CAMA Working Papers 2017-46, Centre for Applied Macroeconomic Analysis. Moulton, B. R. and B. D. Cowan (2016). Residual seasonality in GDP and GDI: Findings and next steps. Survey of Current Business 96, 1–6. Nalewaik, J. J. (2010). The income- and expenditure-side measures of output growth. Brookings 24

Papers on Economic Activity 1, 71–106. Nalewaik,J.J.(2012). EstimatingprobabilitiesofrecessioninrealtimeusingGDPandGDI. Journal of Money Credit and Banking 44, 235–253. Onatski, A. (2009). Testing hypotheses about the number of factors in large factor models. Econometrica 77, 1447–1479. Orphanides, A.andS.vanNorden(2002). Theunreliabilityofoutput-gapestimatesinrealtime. The Review of Economics and Statistics 84, 569–583. Peña, D. and P. Poncela (1997). Eigenstructure of nonstationary factor models. UC3M Working Papers. Statistics and Econometrics 97-90-29, Universidad Carlos III Madrid. Peña,D.andP.Poncela(2006).Nonstationarydynamicfactoranalysis.JournalofStatisticalPlanning and Inference 136, 1237–1257. Poncela,P.andE.Ruiz(2015). Moreisnotalwaysbetter: Kalmanfilteringindynamicfactormodels. InS.J.KoopmanandN.Shephard(Eds.),Unobserved Components and Time Series Econometrics. Oxford Scholarship Online. Proietti, T. (1997). Short-run dynamics in cointegrated systems. Oxford Bulletin of Economics and Statistics 59, 405–422. Quah, D. and T. J. Sargent (1993). A dynamic index model for large cross sections. In Business cycles, indicators and forecasting. University of Chicago Press. Reis,R.andM.W.Watson(2010). Relativegoods’prices,pureinflation,andthePhillipscorrelation. American Economic Journal Macroeconomics 2, 128–157. Rudebusch, G. D., D. Wilson, and T. Mahedy (2015). The puzzle of weak first-quarter growth. Economic Letter 2015-16, Federal Reserve Bank of San Francisco. Sargent, T. J. and C. A. Sims (1977). Business cycle modeling without pretending to have too much a priori economic theory. In New methods in business cycle research. Federal Reserve Bank of Minneapolis. Seong, B., S. K. Ahn, and P. A. Zadrozny (2013). Estimation of vector error correction models with mixed-frequency data. Journal of Time Series Analysis 34, 194–205. Shumway, R.H.andD.S.Stoffer(1982). Anapproachtotimeseriessmoothingandforecastingusing the EM algorithm. Journal of Time Series Analysis 3, 253–264. Sims, C., J. H. Stock, and M. W. Watson (1990). Inference in linear time series models with some unit roots. Econometrica 58, 113–144. Stock,J.H.andM.W.Watson(1988). Testingforcommontrends. JournaloftheAmericanStatistical Association 83, 1097–1107. Stock, J. H. and M. W. Watson (2002). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 97, 1167–1179. Stock, J. H. and M. W. Watson (2005). Implications of dynamic factor models for VAR analysis. Working Paper 11467, NBER. Tipping, M. E. and C. M. Bishop (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61, 611–622. Vahid, F. and R. F. Engle (1993). Common trends and common cycles. Journal of Applied Econometrics 8, 341–360. Watson, M. W. (1986). Univariate detrending methods with stochastic trends. Journal of Monetary Economics 18, 49–75. 25

Watson, M. W. and R. F. Engle (1983). Alternative algorithms for the estimation of dynamic factor, mimic and varying coefficients regression models. Journal of Econometrics 23, 385–400. Wu, J.C.F.(1983). OntheconvergencepropertiesoftheEMalgorithm. The Annals of Statistics 11, 95–103. Zhang, R., P. Robinson, and Q. Yao (2016). Identifying cointegration by eigenanalysis. https://arxiv.org/abs/1505.00821. 26

Appendix A Representation results Hereafter, and throughout all appendices, we consider restriction R2 when s = 1 as found empirically in Section 4. Therefore, r = 2q. A.1 Proof of restriction R3 For the dynamic factors consider the VECM(2) ∆f = −ab(cid:48)f +Γ ∆f +Γ ∆f +u , (A1) t t−3 1 t−1 2 t−2 t where a and b are q×d and for simplicity we consider just the case of two lags since this will imply a VECM(1) and therefore a VAR(2) for the static factors as implemented in (13). First assume that in R1 we have K = I . Our aim is then to find the correct VECM repr resentation for F = (f(cid:48) f(cid:48) )(cid:48) when the VECM in (A1), and restrictions R1 and R2 hold. t t t−1 Since we model F as a VAR(2) we know that we must have a VECM(1) with reduced rank t innovations by R1, hence ∆F = −αβ(cid:48)F +M∆F +Hu , (A2) t t t−1 t where α and β are r×c with c < r and H is r×q. Moreover, from Barigozzi et al. (2016a) we have d ≤ c ≤ (r−q+d). We are then interested in finding c, and the expressions of M, α, β, and H as functions of the parameters a, b, Γ , and Γ in (A1). Let us write α = (α(cid:48) α(cid:48))(cid:48) 1 2 1 2 and β = (β(cid:48) β(cid:48))(cid:48) where α , α , β , β are all q ×c. We also denote as M for i,j = 1,2 1 2 1 2 1 2 ij the four q×q blocks of M and as H and H the two q×q blocks of H. Following Proietti 1 2 (1997), we define the (2r+c)-dimensional vector   ∆f t   ∆F t  ∆f t−1    G t =  ∆F t−1  =  ∆f t−1 .   β(cid:48)F t−2  ∆f t−2  β(cid:48)f +β(cid:48)f 1 t−2 2 t−3 Then, the state-space form of (A2) is given by ∆F = ZG , t t G = TG +Z(cid:48)Hu , (A3) t t−1 t with the r×(2r+c) matrix Z = (I 0 0 ). Then, r r r×c   H 1   Z(cid:48)H =  0 0 I r r c×r  (cid:18) H H 1 2 (cid:19) =      H 0 0 q q 2      . 0 c×q 27

and the (2r+c)×(2r+c) matrix T is given by  M M −α β(cid:48) −α β(cid:48) −α  11 12 1 1 1 2 1  M −αβ(cid:48) −α   M 21 M 22 −α 2 β 1 (cid:48) −α 2 β 2 (cid:48) −α 2    T =  I r 0 r 0 r×c  =  I q 0 q 0 q 0 q 0 q×c .   0 c×r β(cid:48) I c  0 q I q 0 q 0 q 0 q×c  0 0 β(cid:48) β(cid:48) I c×q c×q 1 2 c Now using these definitions into (A3) we have five q-dimensional equations. The first one is ∆f = M ∆f +M ∆f −α β(cid:48)f −α β(cid:48)f +H u , t 11 t−1 12 t−2 1 1 t−2 1 2 t−3 1 t which is equivalent to (A1) when M = Γ , M = Γ , α = a, β = 0 , β = b, H = I , c = d. (A4) 11 1 12 2 1 1 q×c 2 1 q The second equation is ∆f = M ∆f +M ∆f −α β(cid:48)∆f −α β(cid:48)∆f −α β(cid:48)f −α β(cid:48)f +H u , t−1 21 t−1 22 t−2 2 1 t−2 2 2 t−3 2 1 t−3 2 2 t−4 2 t from which we see that we must also have M = I , M = 0 , α = 0 , H = 0 . (A5) 21 q 22 q 2 q×c 2 q Under (A4) and (A5) the third, fourth and fifth equation in (A3) are just identities. By imposing these restrictions we have the mapping between the VECM(1) for F in (A2) t and the VECM(2) for f in (A1) t (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) Γ Γ a 0 I M = 1 2 , α = , β = q×d , H = q . (A6) I 0 0 b 0 q q q×d q If we now consider a generic K in R1, then (A2) holds for F = K(f(cid:48) f(cid:48) )(cid:48) and (A6) becomes t t t−1 (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) M = K Γ 1 Γ 2 K−1, α = K a , β = K−1(cid:48) 0 q×d , H = K I q . I 0 0 b 0 q q q×d q The cointegration rank c of F is given by rk(αβ(cid:48)) = d. t A.2 Reduced and structural form of the state-space representation Consider (12)-(13) written in matrix notation and using the companion form of the VAR x = ΛF +ξ , (A7) t t t (cid:18) (cid:19) (cid:18) (cid:19)(cid:18) (cid:19) (cid:18) (cid:19) F A A F H t = 1 2 t−1 + u , (A8) F I 0 F 0 t t−1 r r t−2 r×q with Λ = (λ ···λ )(cid:48) the n×r loadings matrix. We call (A7)-(A8) the reduced form of the 1 n model. Similarly consider the structural form, where, for convenience, in the VAR we write 28

twice the same equation: x = B f +B f +ξ , (A9) t 0 t 1 t−1 t        f Π Π 0 Π f I t 1 2 q 3 t−1 q    f f t t − − 1 1    =    I I q q 0 0 q q 0 0 q q 0 0 q q       f f t t − − 2 2    +    0 0 q q    u t , (A10) f 0 I 0 0 f 0 t−2 q q q q t−3 q where B = (b ···b )(cid:48), B = (b ···b ) are both n×q. Because of R1 there exists an 0 01 0n 1 11 1n invertible r×r matrix K such that F = K(f(cid:48)f(cid:48) )(cid:48), (f(cid:48)f(cid:48) )(cid:48) = K−1F , (A11) t t t−1 t t−1 t Λ = (B B )K−1, (B B ) = ΛK. (A12) 0 1 0 1 By comparing (A7)-(A8) with (A9)-(A10) and using (A11)-(A12), we have the parameters of the reduced form (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) Π Π 0 Π I A = K 1 2 K−1, A = K q 3 K−1, H = K q . (A13) 1 I 0 2 0 0 0 q q q q q The relations (A11), (A12) and (A13) are used throughout the following. Moreover, since a VAR(2) of dimension r can always be written as a VAR(1) of dimension 2r, to avoid introducing further notation hereafter we consider the case of a VAR(1) for F , where A ≡ A . t 1 A.3 Properties of the structural and reduced form of the linear system Lemma 1. The structural model (A9)-(A10) is stabilizable and detectable. Proof. Equations (A9)-(A10) define a linear system with r = 2q latent states (f(cid:48) f(cid:48) )(cid:48). We t t−1 say that a linear system is stabilizable if its unstable (non-stationary) states are controllable and all uncontrollable states are stable (see Anderson and Moore, 1979, page 342), where stability is dictated by the eigenvalues of the matrix of VAR coefficients, which we denote as (cid:18) (cid:19) Π Π A(cid:101) = 1 2 (A14) I 0 q q Because of cointegration, A(cid:101) has (q −d) unit eigenvalues corresponding to (q −d) unstable states. Moreover,(I −Π −Π ) = ab(cid:48),whereaandbhavefullcolumn-rankq×dmatrices,so q 1 2 that rk(ab(cid:48)) = d. Define the q×(q−d) matrices a and b such that a(cid:48) a = b(cid:48) b = 0 . ⊥ ⊥ ⊥ ⊥ (q−d)×d Then, since rk(a(cid:48) I ) = (q −d), the unstable states are controllable because they satisfy the ⊥ q Popov-Belevitch-Hautus rank test (see Franchi, 2017, Theorem 2.1, and Antsaklis and Michel, 2007, Corollary 6.11, page 249). Now, by looking at (A10), we see that A(cid:101) has also (r−q+d) = (q+d) eigenvalues which are smaller than one in absolute value. Of these q correspond to states which are uncontrollable because they are not driven by any shock, but are also stable since have no dynamics (see the secondequationin(A10)). TheremainingdstatesfollowastableVAR,hencearecontrollable. Similarly, we say that a linear system is detectable if its unstable states are observable and all unobservable states are stable (see Anderson and Moore, 1979, page 342). First, notice 29

that rk(B ) = q and rk(B ) = q because of C2 and (A12), therefore rk(B b ) = (q − d) 0 1 0 ⊥ and rk(B b ) = (q −d), which implies that the unstable states are observable because they 1 ⊥ satisfy the Popov-Belevitch-Hautus rank test (see Franchi, 2017, Theorem 2.1, and Antsaklis and Michel, 2007, Corollary 6.11, page 249). Since B and B have full column-rank there 0 1 are no unstable unobservable states. This completes the proof. (cid:3) Appendix B Details of estimation This appendix provides details on estimation of factors and parameters which are necessary to introduce the notation required in the proofs in Appendix C. The model considered is (12)- (14), where for simplicity of exposition we consider a VAR(1) for the factors. As explained in the text without loss of generality we can assume ξ ∼ I(0), thus considering as latent states t only the r static factors. Adding I(1) idiosyncratic components as latent states does not increase the dimension of the parameter space but it increases the dimension of the latent states vector. However, since the idiosyncratic components are assumed to be orthogonal (see D2), and moreover they are orthogonal to the static factors (see A2), the results in Appendix C can be generalized to this case by treating each new state separately. B.1 Expectation Maximization algorithm In what follows we denote the whole sample of observed data as X := (x ···x )(cid:48) and T 1 T the whole history of the unknown factors as F := (F ···F )(cid:48). Recall that the vector of T 1 T parameters is given by Θ := (vec(Λ)(cid:48) vec(A)(cid:48) vec(H)(cid:48) diag(R))(cid:48). To avoid heavier notation we use Θ to indicate both a generic value of the parameters and the true value, whether we refer to one or the other is either clearly implied by the context or explicitly stated. The joint pdf of data and factors is denoted as f(X ,F ;Θ) and the corresponding joint T T log-likelihood is denoted as (cid:96)(X ,F ;Θ) := logf(X ,F ;Θ) and it is such that T T T T (cid:96)(X ,F ;Θ) = (cid:96)(X |F ;Θ)+(cid:96)(F ;Θ), (B1) T T T T T where (cid:96)(X |F ;Θ) is the log-likelihood of the data conditional on the factors and (cid:96)(F ;Θ) T T T is the marginal log-likelihood of the factors. Because of D2 and D3 all log-likelihoods are Gaussian and in particular (cid:96)(X |F ;Θ) = − nT log(2π)− T logdet(R)− 1 tr (cid:2) (X −F Λ(cid:48))R−1(X −F Λ(cid:48))(cid:48)(cid:3) . T T T T T T 2 2 2 (B2) We first briefly review the steps of the EM algorithm, while in Section B.1.4 we prove that the values of the parameters obtained at convergence of the EM algorithm converge to the QML estimator. B.1.1 Initialization The EM algorithm is initialised with estimated parameters Θ(cid:98)0 := (vec(Λ(cid:98)0 )(cid:48) vec(A(cid:98)0 )(cid:48) vec(H(cid:98)0 )(cid:48) diag(R(cid:98)0 ))(cid:48). (B3) 30

These are obtained as follows. From the integration of the first r principal components of ∆x t we have an estimator of the factors, F(cid:101)t , and then of the loadings, Λ(cid:98)0 . The VAR parameters, A(cid:98)0 are obtained by fitting a VAR on the estimated factors F(cid:101)t and the columns of H(cid:98)0 are given by the q leading eigenvectors of the covariance matrix of VAR residuals. Finally, the diagonal entries of R(cid:98)0 , are obtained as sample variances of ξ(cid:98)it,0 = x it −Λ(cid:98)0 F(cid:101)t . Consistency of these estimators is discussed in Section C.3. B.1.2 E-step At iteration k ≥ 0, given X T and an estimate of the parameters Θ(cid:98)k , we compute the expected log-likelihood as function of a generic value of the parameters Θ, where the expectation is computed with respect to the conditional distribution of F T given X T and when using Θ(cid:98)k : (cid:90) Q(Θ;Θ(cid:98)k ) := Rr×T (cid:96)(X T ,F T ;Θ)f(F T |X T ;Θ(cid:98)k )dF T = E Θ(cid:98)k [(cid:96)(X T ,F T ;Θ)|X T ]. (B4) IntheGaussiancase(B4)dependsontheconditionalmeanofthefactorsandtheirconditional secondmoments,whichareobtainedwiththeKSwhenusingtheparametersΘ(cid:98)k andaregiven by (see Section B.2 for details) F := E [F |X ], P := E [(F −F )(F −F )(cid:48)|X ]. (B5) t|T,k Θ(cid:98)k t T t|T,k Θ(cid:98)k t t|T,k t t|T,k T B.1.3 M-step A new estimator of the parameters is obtained by maximising the expected log-likelihood over all possible values of the parameters: Θ(cid:98)k+1 = argmax Q(Θ;Θ(cid:98)k ). (B6) Θ∈Ω⊆RQ Thus, maximizing the conditional expectation of (B2) and using (B5), we have the loadings estimator (cid:32) T (cid:33)(cid:32) T (cid:33)−1 (cid:88) (cid:88) Λ(cid:98)k+1 = E Θ(cid:98)k [x t F(cid:48) t |X T ] E Θ(cid:98)k [F t F(cid:48) t |X T ] t=1 t=1 (cid:32) T (cid:33)(cid:32) T (cid:33)−1 (cid:88) (cid:88)(cid:16) (cid:17) = x F(cid:48) F F(cid:48) +P . t t|T,k t|T,k t|T,k t|T,k t=1 t=1 Similarly we can obtain estimates of the other parameters A(cid:98)k+1 and R(cid:98)k+1 (see e.g. Bańbura andModugno,2014, fortheirexpressions). ThecolumnsofH(cid:98)k+1 areobtainedastheq leading eigenvectors of the matrix (cid:32) T T (cid:33) 1 (cid:88) (cid:88) Σ(cid:98)k+1 = T E Θ(cid:98)k [F t F(cid:48) t |X T ]−A(cid:98)k+1 E Θ(cid:98)k [F t F(cid:48) t−1 |X T ] , t=1 t=1 whichisanestimatorofthecovarianceoftheVARresiduals,andwherethesecondexpectation can also be computed from the output of the KS. 31

B.1.4 Convergence Denote the QML estimator of the parameters as Θ(cid:98) ∗ := (vec(Λ(cid:98) ∗)(cid:48) vec(A(cid:98) ∗)(cid:48) vec(H(cid:98) ∗)(cid:48) diag(R(cid:98) ∗))(cid:48), (B7) then by definition we have Θ(cid:98) ∗ = argmax (cid:96)(X T ;Θ). (B8) Θ∈Ω⊆RQ where (cid:96)(X ;Θ) is the log-likelihood of the data such that T (cid:96)(X ;Θ) = (cid:96)(X ,F ;Θ)−(cid:96)(F |X ;Θ), (B9) T T T T T where the first term on the rhs is given by (B1) and the second can be computed using the output of the KS for a given value of Θ. Define the expectation (cid:90) H(Θ;Θ(cid:98)k ) := Rr×T (cid:96)(F T |X T ;Θ)f(F T |X T ;Θ(cid:98)k )dF T = E Θ(cid:98)k [(cid:96)(F T |X T ;Θ)|X T ], (B10) and recall the definition of Q(Θ;Θ(cid:98)k ) in the E-step in (B4). Since the lhs of (B9) does not depend on F , by taking its expectation with respect to the conditional distribution of F T T given X T and when using Θ(cid:98)k , for any Θ ∈ Ω, we have (cid:96)(X T ;Θ) = Q(Θ;Θ(cid:98)k )−H(Θ;Θ(cid:98)k ). (B11) Now, by definition of Kullback-Leibler divergence, we have (see also Lemma 1 in Dempster et al., 1977) H(Θ(cid:98)k+1 ;Θ(cid:98)k ) ≤ H(Θ(cid:98)k ;Θ(cid:98)k ). (B12) Hence, from (B11) and (B12), for any k, (cid:96)(X T ;Θ(cid:98)k+1 )−(cid:96)(X T ;Θ(cid:98)k ) ≥ Q(Θ(cid:98)k+1 ;Θ(cid:98)k )−Q(Θ(cid:98)k ;Θ(cid:98)k ) ≥ 0 where the last inequality is a consequence of the M-step in (B6). This shows that the loglikelihoodincreasesmonotonicallyaskincreases. Moreover,sinceduetoGaussianityQ(Θ;Θ(cid:48)) is continuous in Θ and Θ(cid:48) and its gradient ∇ Q(Θ;Θ(cid:48)) is continuous in Θ, then conditions Θ for Theorems 1 and 2 and Corollary 1 in Wu (1983) are satisfied and we have convergence of the log-likelihood to its unique maximum and of the parameters to the corresponding QML estimators lim (cid:96)(X T ;Θ(cid:98)k ) = (cid:96)(X T ;Θ(cid:98) ∗), lim Θ(cid:98)k = Θ(cid:98) ∗. (B13) k→∞ k→∞ The previous result holds in the limit k → ∞, but in practice we can run the EM algorithm only for a finite number of iterations k . Define, for any k, max |(cid:96)(X T ,F T,k+1 ;Θ(cid:98)k+1 )−(cid:96)(X T ,F T,k ;Θ(cid:98)k )| ∆(cid:96) = , k |(cid:96)(X T ,F T,k+1 ;Θ(cid:98)k+1 )|+|(cid:96)(X T ,F T,k ;Θ(cid:98)k )| 32

where F := (F ···F )(cid:48). We say that the algorithm has converged at iteration T,k 1|T,k T|T,k k∗ < k according to the following rule, which is defined for a given threshold η, max ∆(cid:96) < η, but ∆(cid:96) ≥ η. k∗ k∗−1 Once we find k∗, our estimator of the parameters is defined as Θ(cid:98) := Θ(cid:98)k∗ . The corresponding estimator of the factors is then defined as F(cid:98)t := F t|T,k∗ , thus running the KS once last time using Θ(cid:98)k∗ . The rate of convergence of Θ(cid:98) to Θ(cid:98) ∗ in (B13) is studied in Lemma 9 below. B.2 Kalman filter and Kalman smoother ForeaseofnotationassumetoknowthetrueparametercollectedinthevectorΘ. Whenusing the KF-KS in the EM algorithm at a given iteration k, the factors’ estimators given below are obtained by replacing Θ with Θ(cid:98)k throughout this section. We denote the conditional expectation and covariance of the factors as F := E [F |X ], P := E [(F −F )(F −F )(cid:48)|X ], (B14) t|s Θ t s t|s Θ t t|s t t|s s where X := (x ···x )(cid:48). Under Gaussianity (D2 and D3) these can be computed with the s 1 s KF-KS. Specifically, when s = t − 1 we have the optimal one-step-ahead prediction, when s = t we have the optimal in-sample estimator, when s = T we have the optimal smoother. The KF gives the first two cases while the KS gives the latter. In particular, we denote the KF-KS estimators respectively as: F and F when using the true value Θ and as F t|t t|T t|t,k and F t|T,k when using Θ(cid:98)k (see also (B5)). B.2.1 Forward iterations - Filtering For given initial conditions F and P , the KF is based on the forward iterations for 0|0 0|0 t = 1,...,T: F = AF , (B15) t|t−1 t−1|t−1 P = AP A(cid:48)+HH(cid:48), (B16) t|t−1 t−1|t−1 F = F +P Λ(cid:48)(ΛP Λ(cid:48)+R)−1(x −ΛF ), (B17) t|t t|t−1 t|t−1 t|t−1 t t|t−1 P = P −P Λ(cid:48)(ΛP Λ(cid:48)+R)−1ΛP . (B18) t|t t|t−1 t|t−1 t|t−1 t|t−1 Moreover, by combining (B16) and (B18), we obtain the Riccati difference equation P −AP A(cid:48)+AP Λ(cid:48)(ΛP Λ(cid:48)+R)−1ΛP A(cid:48) = HH(cid:48). (B19) t+1|t t|t−1 t|t−1 t|t−1 t|t−1 TheKFisstartedwithgivenvaluesofF andP . Thelattercanbeobtainedwithadiffuse 0|0 0|0 prior run for t < 0 (see Koopman, 1997, and Koopman and Durbin, 2000, for details). 33

B.2.2 Backward iterations - Smoothing The KS is then based on the backward iterations for t = T,...,1: F = F +P A(cid:48)P−1 (F −F ), (B20) t|T t|t t|t t+1|t t+1|T t+1|t P = P +P A(cid:48)P−1 (P −P )P−1 AP . (B21) t|T t|t t|t t+1|t t+1|T t+1|t t+1|t t|t The KS iterations in (B20) require T inversions of P and in the singular case r > q these t|t−1 matrices are likely to be singular (see also Lemma 5). There are two possible solutions to this problem. Kohn and Ansley (1983) suggest to use a generalized inverse of P , like the t|t−1 Moore-Penrose one. Alternatively, it can be proved that (B20) can be written in an equivalent way, which does not require matrix inversion, and which is defined by the backward iterations for t = T,...,1: F = F +P r , (B22) t|T t|t−1 t|t−1 t−1 r = Λ(cid:48)(ΛP Λ(cid:48)+R)−1(x −ΛF )+L(cid:48)r , (B23) t−1 t|t−1 t t|t−1 t t P = P −P N P , (B24) t|T t|t−1 t|t−1 t−1 t|t−1 N = Λ(cid:48)(ΛP Λ(cid:48)+R)−1Λ+L(cid:48)N L , (B25) t−1 t|t−1 t t t L = A−AP Λ(cid:48)(ΛP Λ(cid:48)+R)−1Λ, (B26) t t|t−1 t|t−1 where r = 0 , N = 0 and by consturction AP = L P (see also Durbin and T r×1 T r t|t t t|t−1 Koopman, 2001, pp.70-73). Although numerically no appreciable differences emerge with respect to the chosen method, (B22)-(B26) are particularly useful for our proofs. Appendix C Consistency of the EM algorithm C.1 Preliminary results Lemma 2. For m < n, and given symmetric positive definite matrices A of dimension m×m, B of dimension n×n, and for C of dimension n×m with full column-rank, the following holds AC(cid:48)(CAC(cid:48)+B)−1 = (A−1+C(cid:48)B−1C)−1C(cid:48)B−1. (C27) Proof. Recall the Woodbury forumla (CAC(cid:48)+B)−1 = B−1−B−1C(A−1+C(cid:48)B−1C)−1C(cid:48)B−1. (C28) Denote D = (A−1+C(cid:48)B−1C)−1 then from (C28) the lhs of (C27) is equivalent to AC(cid:48)(cid:2) B−1−B−1CDC(cid:48)B−1(cid:3) = A (cid:2) C(cid:48)B−1−C(cid:48)B−1CDC(cid:48)B−1(cid:3) = A (cid:2) I −C(cid:48)B−1CD (cid:3) C(cid:48)B−1. Then, (C27) becomes A (cid:2) I −C(cid:48)B−1CD (cid:3) C(cid:48)B−1 = DC(cid:48)B−1, or equivalently multiplying both sides on the right by BC(C(cid:48)C)−1 A (cid:2) I −C(cid:48)B−1CD (cid:3) = D. (C29) 34

Now notice that D = (A−1+C(cid:48)B−1C)−1 = A(I +AC(cid:48)B−1C)−1. (C30) Substituting (C30) in (C29) and multiplying both sides on the left by A−1 (cid:2) I −C(cid:48)B−1CA(I +AC(cid:48)B−1C)−1(cid:3) = (I +AC(cid:48)B−1C)−1. Multiplying both sides on the right by (I +AC(cid:48)B−1C) we have that (C27) is equivalent to I +AC(cid:48)B−1C −C(cid:48)B−1CA = I (C31) Therefore (C27) is correct provided that AC(cid:48)B−1C = C(cid:48)B−1CA which is always true since both A and C(cid:48)B−1C are symmetric. (cid:3) Lemma 3. For m < n with m independent of n and given (a) an m×m matrix A symmetric and positive definite with µA ≤ M for j = 1,...,m; j (b) an n×n matrix B symmetric and positive definite with µB ≤ M for j = 1,...,n; j (c) an n×m matrix C such that C(cid:48)C is positive definite with µC(cid:48)C = M n for j = 1,...,m; j j then the following holds (A−1+C(cid:48)B−1C)−1C(cid:48)B−1C = I +O(n−1). m Proof. First notice that for two matrices K and H we have (H +K)−1 = (H +K)−1−K−1+K−1 = (H +K)−1(K −(H +K))K−1+K−1 = (H +K)−1(−H)K−1+K−1 = K−1−(H +K)−1HK−1. (C32) Then setting K = C(cid:48)B−1C and H = A−1 from (C32) we have (A−1+C(cid:48)B−1C)−1 = (C(cid:48)B−1C)−1−(A−1+C(cid:48)B−1C)−1A−1(C(cid:48)B−1C)−1. which implies (A−1+C(cid:48)B−1C)−1C(cid:48)B−1C = I −(A−1+C(cid:48)B−1C)−1A−1. (C33) m Now consider the second term on the rhs of (C33) (cid:107)(A−1+C(cid:48)B−1C)−1A−1(cid:107)2 ≤ (cid:107)(A−1+C(cid:48)B−1C)−1(cid:107)2(cid:107)A−1(cid:107)2 ≤ (cid:107)(C(cid:48)B−1C)−1(cid:107)2(µA)−1 ≤ (cid:107)(C(cid:48)B−1C)−1(cid:107)2M−1, (C34) n 1 where we use norm sub-additivity and the fact that by condition (a) A and A−1 are positive definite and therefore µA ≥ M > 0 and moreover µA−1 ≥ M > 0 thus by Weyl’s inequality n 1 n 2 µA−1+C(cid:48)B−1C ≥ µA−1 +µC(cid:48)B−1C ≥ µC(cid:48)B−1C, n n n n 35

therefore, (cid:107)(A−1+C(cid:48)B−1C)−1(cid:107) = (µA−1+C(cid:48)B−1C)−1 ≤ (µC(cid:48)B−1C)−1 = (cid:107)(C(cid:48)B−1C)−1(cid:107). n n Then, the first term on the rhs of (C34) is m (cid:107)(C(cid:48)B−1C)−1(cid:107)2 ≤ tr (cid:2) (C(cid:48)B−1C)−2(cid:3) = (cid:88) 1 = O(n−2). (C35) (µC(cid:48)B−1C) 2 j=1 j Indeed,themeigenvaluesofC(cid:48)B−1C arealsothemnon-zeroeigenvaluesofB−1/2CC(cid:48)B−1/2, which are all O(n) by conditions (b) and (c). By using (C34) and (C35) in (C33) we prove the Lemma. (cid:3) C.2 Consistency of KF and KS using the true value of the parameters Lemma 4. For the conditional covariance P of the static factors given X , there exists t|t−1 t−1 a steady state for the reduced form denoted as P solving the algebraic Riccati equation (ARE) and such that P = P+O(e−t). t|t−1 Moreover, as n → ∞, (cid:18) (cid:19) I 0 P = K q q K(cid:48)+O(n−1) = HH(cid:48)+O(n−1). 0 0 q q Proof. DefineP(cid:101)t|t−1 astheconditionalcovariancematrixforthevector(f t (cid:48)f t (cid:48) −1 )(cid:48) givenX t−1 . Then, due to stabilizability and detectability proved in Lemma 1, there exists a steady state for the structural model denoted as P(cid:101) solving the algebraic Riccati equation (ARE) and such that (see Anderson and Moore, 1979, pp.76-77, and Harvey, 1990, pp.118-119) P(cid:101)t|t−1 = P(cid:101) +O(e−t). In presence of a diffuse prior its effect is limited to the first few periods, say t (see Koopman, 0 1997), then the result above holds for t > t . The ARE for the structural model is then (see 0 also (B19)) (cid:18) (cid:19) I 0 P(cid:101) −A(cid:101)P(cid:101)A(cid:101) (cid:48)+A(cid:101)P(cid:101)B(cid:101) (cid:48)(B(cid:101)P(cid:101)B(cid:101) (cid:48)+R)−1B(cid:101)P(cid:101)A(cid:101) (cid:48) = q q , (C36) 0 0 q q where (cid:18) (cid:19) Π Π A(cid:101) = I 1 0 2 , B(cid:101) = (B 0 B 1 ). (C37) q q Now since the structural model has only q controllable and observable states (see Lemma 1) and P(cid:101) is the steady state covariance of those states, rk(P(cid:101)) = q. Define as V the r×r matrix of eigenvectors of P(cid:101) and as D the q×q diagonal matrix of its non zero eigenvalues, then (cid:18) D 0 (cid:19) (cid:18) D1/2 0 (cid:19)(cid:18) D1/2 0 (cid:19) (cid:18) I 0 (cid:19) P(cid:101) = V q V(cid:48) = V q q V(cid:48) = W q q W(cid:48), (C38) 0 0 0 0 0 0 0 0 q q q q q q q q 36

with (cid:18) D1/2 0 (cid:19) W = V q . 0 I q q Define B∗ and B∗ as the n×q matrices such that B(cid:101)W = (B∗B∗). Then, from (C38) 0 1 0 1 (cid:18) I 0 (cid:19) (cid:18) I 0 (cid:19)(cid:18) B∗(cid:48) (cid:19) B(cid:101)P(cid:101)B(cid:101) (cid:48) = B(cid:101)W 0 q 0 q W(cid:48)B(cid:101) (cid:48) = (B∗ 0 B∗ 1 ) 0 q 0 q B 0 ∗(cid:48) = B∗ 0 B∗ 0 (cid:48), (C39) q q q q 1 and (cid:18) I 0 (cid:19) (cid:18) I 0 (cid:19)(cid:18) B∗(cid:48) (cid:19) (cid:18) B∗(cid:48) (cid:19) 0 q 0 q W(cid:48)B(cid:101) (cid:48) = 0 q 0 q B 0 ∗(cid:48) = 0 0 . (C40) q q q q 1 q×n From (C38), (C39), (C40), Lemmas 2 and 3, we have (cid:18) I 0 (cid:19) (cid:18) I 0 (cid:19) (cid:18) B∗(cid:48)(B∗B∗(cid:48)+R)−1 (cid:19) (cid:18) I 0 (cid:19) q q W(cid:48)B(cid:101) (cid:48)(B(cid:101)P(cid:101)B(cid:101) (cid:48)+R)−1B(cid:101)W q q = 0 0 0 (B∗B∗) q q 0 0 0 0 0 0 1 0 0 q q q q q×n q q (cid:18) (B∗(cid:48)R−1B∗+I )−1B∗(cid:48)R−1B∗ 0 (cid:19) = 0 0 q 0 0 q 0 0 q q (cid:18) I +O(n−1) 0 (cid:19) = q q . (C41) 0 0 q q Notice that we can apply Lemma 3 to the top left q × q block of (C41) since: I trivially q satisfies condition (a), R−1 satisfies condition (b) because of D2 and B∗(cid:48)B∗ satisfies condition 0 0 (c). Indeed, from definition (A12) we have 1 (cid:18) B∗(cid:48) (cid:19) Λ(cid:48)Λ 0 (B∗B∗) = W(cid:48)K(cid:48) KW, n B∗(cid:48) 0 1 n 1 and because of assumption C2 the top left q×q block of this matrix which is n−1B∗(cid:48)B∗ has 0 0 full column-rank for any n. By substituting (C38) and (C41) into (C36) we have (cid:18) (cid:19) I 0 P(cid:101) = q q +O(n−1). (C42) 0 0 q q Now, notice that by construction P t|t−1 = KP(cid:101)t|t−1 K(cid:48) and since K is full-rank then also the reduced form system is stabilizable and detectable, thus it has a steady state P such that P = P+O(e−t). (C43) t|t−1 Moreover, since K does not depend on t nor n, we have P = KP(cid:101)K(cid:48) and the result follows directly from (C42). Last, from the definition of H in (A13) we have also P = HH(cid:48). (cid:3) Lemma 5. For the static factors estimated via KF and KS when using the true value of the parameters Θ, under condition (15) in the text, the following hold, for all t¯≤ t ≤ T and as n → ∞, √ n(cid:107)F −F (cid:107) = O (1), t|t t p √ n(cid:107)F −F (cid:107) = O (1). t|T t p 37

Proof. By Lemma 4, the conditional covariance P of the static factors given X has a t|t t steady state S such that (see (B18)) S = P−PΛ(cid:48)(ΛPΛ(cid:48)+R)−1ΛP. (C44) Then, notice that by Lemma 4 and (A12) (cid:18) I 0 (cid:19) (cid:18) B(cid:48) (cid:19) (cid:18) B(cid:48) (cid:19) PΛ(cid:48) = K q q K(cid:48)(K(cid:48))−1 0 +O(n−1) = K 0 +O(n−1), (C45) 0 0 B(cid:48) 0 q q 1 q (cid:18) I 0 (cid:19) (cid:18) B(cid:48) (cid:19) ΛPΛ(cid:48) = (B B )K−1K q q K(cid:48)(K(cid:48))−1 0 +O(n−1) = B B(cid:48) +O(n−1). (C46) 0 1 0 0 B(cid:48) 0 0 q q 1 Using (C45) and (C46) and by applying Lemmas 2 and 3 we have (cid:18) B(cid:48)(B B(cid:48) +R)−1B 0 (cid:19) PΛ(cid:48)(ΛPΛ(cid:48)+R)−1ΛP = K 0 0 0 0 q K(cid:48)+O(n−1) 0 0 q q (cid:18) (B(cid:48)R−1B +I )−1B(cid:48)R−1B 0 (cid:19) = K 0 0 q 0 0 q K(cid:48)+O(n−1) 0 0 q q (cid:18) I +O(n−1) 0 (cid:19) = K q q K(cid:48)+O(n−1). (C47) 0 0 q q Notice that we can apply Lemma 3 to the top q×q block of (C47) since: I trivially satisfies q condition (a), R−1 satisfies condition (b) because of D2 and B (cid:48)B satisfies condition (c) 0 0 because of assumption C2 and definition (A12) (see also (C41) in the proof of Lemma 4). By substituting (C47) into (C44) and because of Lemma 4, we have (cid:18) (cid:19) (cid:18) (cid:19) I 0 I 0 S = K q q K(cid:48)+O(n−1)−K q q K(cid:48)+O(n−1) = O(n−1). (C48) 0 0 0 0 q q q q By substituting (C43) in (B18), from (C44) we have P = S+O(e−t). (C49) t|t Therefore, by substituting (C48) into (C49) and letting n = Tγ for γ > 0 and t¯ ≡ t¯(T), because of (15) for t¯≤ t ≤ T we have P = O(n−1)+O(e−t) = O(n−1). (C50) t|t Now, let us consider P defined in (B24). From (B18) t|T P = P +P Λ(cid:48)(ΛP Λ(cid:48)+R)−1ΛP . (C51) t|t−1 t|t t|t−1 t|t−1 t|t−1 By substituting (C51) and (B25) in (B24) we have P = P +P L(cid:48)N L P . (C52) t|T t|t t|t−1 t t t t|t−1 Since N is function of P , because of Lemma 4, it has a steady state N such that (cid:107)N(cid:107) = t t|t−1 O(1) and N = N+O(e−t). (C53) t 38

Now, since AP = L P , using (C50) and (C53), because of (15) for t¯≤ t ≤ T we have t|t t t|t−1 P L(cid:48)N L P = P A(cid:48)N AP = O(n−2). (C54) t|t−1 t t t t|t−1 t|t t t|t By using (C50) and (C54) into (C52), for t¯≤ t ≤ T we have P = O(n−1). (C55) t|T By the law of iterated expectations, for t¯≤ t ≤ T we have (see also the definitions in (B14)) E [(F −F )(F −F )(cid:48)] = E [E [(F −F )(F −F )(cid:48)|X ]] = E [P ] = O(n−1), Θ t t|t t t|t Θ Θ t t|t t t|t t Θ t|t E [(F −F )(F −F )(cid:48)] = E [E [(F −F )(F −F )(cid:48)|X ]] = E [P ] = O(n−1), Θ t t|T t t|T Θ Θ t t|T t t|T T Θ t|T which imply mean-square convergence of the KF and KS when the parameters are known and for all t¯≤ t ≤ T: r E [(cid:107)F −F (cid:107)2] = (cid:88) E [(F −F )2] = tr (cid:8) E [P ] (cid:9) = O(n−1), Θ t t|t Θ j,t j,t|t Θ t|t j=1 r E [(cid:107)F −F (cid:107)2] = (cid:88) E [(F −F )2] = tr (cid:8) E [P ] (cid:9) = O(n−1). Θ t t|T Θ j,t j,t|T Θ t|T j=1 The result follows from Chebychev’s inequality. (cid:3) C.3 Consistency of KF and KS using estimated parameters Lemma 6. Consider the QML estimator of the parameters Θ(cid:98) ∗ defined in (B7) and obtained using the true values of the static factors F , then, as T → ∞: t √ T (cid:107)λ(cid:98) ∗ i −λ i (cid:107) = O p (1), i = 1,...,n, √ T (cid:107)A(cid:98) ∗−A(cid:107) = O p (1), √ T (cid:107)H(cid:98) ∗−H(cid:107) = O p (1), √ T |[R(cid:98)]∗ ii −[R] ii | = O p (1), i = 1,...,n. Proof. The QML estimator of the loadings, for any i = 1,...,n, is given by (cid:32) T (cid:33)(cid:32) T (cid:33)−1 λ(cid:98) ∗ i (cid:48) = (cid:88) x it F(cid:48) t (cid:88) F t F(cid:48) t . (C56) t=1 t=1 We know that F is driven by (q − d) common trends (see C2), therefore we can find an t orthonormal linear basis of dimension (q −d) such that the projection of F onto this basis t spanthesamespaceasthecommontrends. Collecttheelementsofthisbasisinther×(q−d) matrix γ, and denote as γ the r×(r−q+d) matrix such that γ(cid:48) γ = 0 . Then, ⊥ ⊥ (r−q+d)×(q−d) consider the r×r linear transformation (cid:18) γ(cid:48)F (cid:19) (cid:18) Z (cid:19) DF = t = 1t , (C57) t γ(cid:48) F Z ⊥ t 0t 39

where Z has all (q −d) components which are I(1) while Z ∼ I(0) and is of dimension 1t 0t (r−q+d). Moreover, for Z we have the MA representation 1t ∆Z = Q(L)ζ , (C58) 1t t w.n. ζ ∼ (0 ,Σ ) with Σ positive definite and Q(L) is a (q−d)×(q−d) one-sided, infinite t q−d ζ ζ matrix polynomial with square-summable coefficients and rk(Q(1)) = (q−d). Because of orthonormality D(cid:48)D = I . Then, the corresponding transformation of the loadings r givesλ(cid:48) i D(cid:48) = (λ(cid:48) i1 λ(cid:48) i0 )suchthatx it = λ(cid:48) i1 Z 1t +λ(cid:48) i0 Z 0t +ξ it andwealsohaveλ(cid:98) ∗ i (cid:48)D(cid:48) = (λ(cid:98) ∗ i1 (cid:48) λ(cid:98) ∗ i0 (cid:48)). Recall that Z and Z are orthogonal by construction, then we have 1t 0t (cid:32) (cid:33) λ(cid:98) ∗(cid:48) −λ(cid:48) i1 i1 (C59) λ(cid:98) ∗(cid:48) −λ(cid:48) i0 i0  (cid:16) (cid:17)(cid:16) (cid:17)−1  1 (cid:80)T ξ Z(cid:48) 1 (cid:80)T Z Z(cid:48) 0 T2 t=1 it 1t T2 t=1 1t 1t (q−d)×(r−q+d) =  (cid:16) (cid:17)(cid:16) (cid:17)−1 . 0 1 (cid:80)T ξ Z(cid:48) 1 (cid:80)T Z Z(cid:48) (r−q+d)×(q−d) T t=1 it 0t T t=1 0t 0t By Theorem 1 in Peña and Poncela (1997, 2006), under C1 and C3, and from (C58), as T → ∞, 1 (cid:88) T Z Z(cid:48) ⇒ Q(1)Σ 1/2 (cid:18)(cid:90) 1 W(u)W(u)(cid:48)du (cid:19) Σ 1/2 Q(1)(cid:48), (C60) T2 1t 1t ζ ζ 0 t=1 where W(·) is a (q −d)-dimensional standard Wiener process. Thus this term is O (1) and p positive definite therefore invertible. Last, from (C58) we see that each component of ∆Z 1t has an MA representation with square summable coefficients (∆Z ∼ I(0) by construction), 1t therefore Var(t−1Z ) = O(1) for any j = 1,...,(q − d) and any t = 1,...,T. Thus, by 1jt using Gaussianity (see D2 and D3) and by A2 also independence of factors and idiosyncratic components, we can prove that, as T → ∞, T 1 (cid:88) ξ Z(cid:48) = O (cid:0) T−1(cid:1) . (C61) T2 it 1t p t=1 Moreover, from C1 it is easy to see that Z has an MA representation with square summable 0t coefficients (it is stationary) and because of A2 and C3 we have, as T → ∞, T T 1 (cid:88) 1 (cid:88) ξ Z(cid:48) = O (T−1/2), Z Z(cid:48) = E[Z Z(cid:48) ]+O (T−1/2) = O (1). (C62) T it 0t p T 0t 0t 0t 0t p p t=1 t=1 From (C59), (C60), (C61), and (C62), and since D does not depend on T, we obtain the result. Consider the VAR DF = (DAD(cid:48))DF +DHu . (C63) t t−1 t such that DHu = (e(cid:48) e(cid:48) )(cid:48) where e and e are white noise processes of dimensions (q−d) t 1t 0t 1t 0t 40

and (r−q+d), respectively. Then, similarly to (C59) we have D(A(cid:98) ∗−A)D(cid:48) = (C64)  (cid:16) (cid:17)(cid:16) (cid:17)−1 (cid:16) (cid:17)(cid:16) (cid:17)−1  1 (cid:80)T e Z(cid:48) 1 (cid:80)T Z Z(cid:48) 1 (cid:80)T e Z(cid:48) 1 (cid:80)T Z Z(cid:48) T2 t=1 1t 1t−1 T2 t=1 1t−1 1t−1 T t=1 0t 1t−1 T t=1 0t−1 0t−1 =  (cid:16) (cid:17)(cid:16) (cid:17)−1 (cid:16) (cid:17)(cid:16) (cid:17)−1  1 (cid:80)T e Z(cid:48) 1 (cid:80)T Z Z(cid:48) 1 (cid:80)T e Z(cid:48) 1 (cid:80)T Z Z(cid:48) T2 t=1 1t 0t−1 T2 t=1 1t−1 1t−1 T t=1 0t 0t−1 T t=1 0t−1 0t−1 Then, using the fact that e and e are white noise, it can be shown that 1t 0t T T 1 (cid:88) 1 (cid:88) e Z(cid:48) = O (T−1), e Z(cid:48) = O (T−1/2), (C65) T2 1t 1t−1 p T2 1t 0t−1 p t=1 t=1 T T 1 (cid:88) 1 (cid:88) e Z(cid:48) = O (T−1), e Z(cid:48) = O (T−1/2). T 1t 0t−1 p T 0t 0t−1 p t=1 t=1 Substituting (C60), (C62), and (C65) into (C64) and since D does not depend on T, we have the result for the VAR parameters. Similar results can be proved for all other parameters. (cid:3) Lemma 7. Define the KF and KS estimators of the static factors and their conditional covariances when using the QML estimator of the parameters Θ(cid:98) ∗ as F∗ = E [F |X ], P∗ = E [(F −F∗ )(F −F∗ )(cid:48)|X ], t|t Θ(cid:98)∗ t t t|t Θ(cid:98)∗ t t|t t t|t t F∗ = E [F |X ], P∗ = E [(F −F∗ )(F −F∗ )(cid:48)|X ]. t|T Θ(cid:98)∗ t T t|T Θ(cid:98)∗ t t|T t t|T T Then, under condition (15) in the text, the following hold, for all t¯≤ t ≤ T and as n,T → ∞, √ √ min( n, T)(cid:107)F∗ −F (cid:107) = O (1), t|t t p √ √ min( n, T)(cid:107)F∗ −F (cid:107) = O (1), t|T t p √ min(n, T)(cid:107)P∗ (cid:107) = O (1), t|t p √ min(n, T)(cid:107)P∗ (cid:107) = O (1). t|T p Proof. We start with three preliminary results. First, from (C50) in the proof of Lemma 5 we have F −F = O(e−(t−1)/2)+O(n−1/2). (C66) t−1|t−1 t−1 Second, because of Lemma 5 and (C66), we have (cid:13) (cid:13) x t ΛF t|t−1 (cid:13) (cid:13) (cid:13) (cid:13)ΛF t +ξ t ΛF t|t−1 (cid:13) (cid:13) (cid:13) (cid:13) Λ (cid:13) (cid:13)(cid:0) (cid:1) (cid:13) (cid:13) ξ t (cid:13) (cid:13) (cid:13) (cid:13) √ n − √ n (cid:13) (cid:13) = (cid:13) (cid:13) √ n − √ n (cid:13) (cid:13) ≤ (cid:13) (cid:13) √ n (cid:13) (cid:13) (cid:107)A(cid:107) (cid:107)F t−1 −F t−1|t−1 (cid:107)+(cid:107)Hu t (cid:107) +(cid:13) (cid:13) √ n (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) = (cid:13) (cid:13)√ Λ (cid:13) (cid:13) (cid:104) (cid:107)A(cid:107) (cid:16) O(e−(t−1)/2)+O(n−1/2) (cid:17) +(cid:107)Hu t (cid:107) (cid:105) + (cid:13) (cid:13)√ ξ t (cid:13) (cid:13) = O p (1), (cid:13) n(cid:13) (cid:13) n(cid:13) (C67) w.n. since u ∼ (0 ,I ) and ξ ∼ I(0). t q q t Third, from transformation (C57), defined in the proof of Lemma 6, DF = (Z(cid:48) Z(cid:48) )(cid:48), we √ t 1t 0t have(cid:107)Z (cid:107) = O ( T)and(cid:107)Z (cid:107) = O (1). Then, sinceD(cid:48)D = I , asaconsequenceofLemma 1t p 0t p r 41

6 (see in particular (C59) and (C64)), the following hold (A(cid:98) ∗−A)F t = D(cid:48)D(A(cid:98) ∗−A)D(cid:48)DF t = O p (T−1/2), (C68) (λ(cid:98) ∗ i (cid:48) −λ(cid:48) i )AF t = (λ(cid:98) ∗ i (cid:48) −λ(cid:48) i )D(cid:48)DAD(cid:48)DF t = O p (T−1/2), i = 1,...,n. (C69) Now, we compare the KF iterations, (B15)-(B18), with those obtained when using Θ(cid:98) ∗: F∗ = A(cid:98) ∗F∗ , (C70) t|t−1 t−1|t−1 P∗ = A(cid:98) ∗P∗ A(cid:98) ∗(cid:48) +H(cid:98) ∗H(cid:98) ∗(cid:48) , (C71) t|t−1 t−1|t−1 F∗ t|t = F∗ t|t−1 +P∗ t|t−1 Λ(cid:98) ∗(cid:48) (Λ(cid:98) ∗P∗ t|t−1 Λ(cid:98) ∗(cid:48) +R(cid:98) ∗)−1(x t −Λ(cid:98) ∗F∗ t|t−1 ), (C72) P∗ = P∗ −P∗ Λ(cid:98) ∗(cid:48) (Λ(cid:98) ∗P∗ Λ(cid:98) ∗(cid:48) +R(cid:98) ∗)−1Λ(cid:98) ∗P∗ . (C73) t|t t|t−1 t|t−1 t|t−1 t|t−1 From (C70) we have F∗ t|t−1 −F t|t−1 = A(F∗ t−1|t−1 −F t−1|t−1 )+(A(cid:98) ∗−A)(F∗ t−1|t−1 −F t−1|t−1 )+(A(cid:98) ∗−A)F t−1|t−1 = A(F∗ t−1|t−1 −F t−1|t−1 )+(A(cid:98) ∗−A)(F∗ t−1|t−1 −F t−1|t−1 )+O p (T−1/2), (C74) since (A(cid:98) ∗−A)F t−1|t−1 = (A(cid:98) ∗−A)(F t−1|t−1 −F t−1 )+(A(cid:98) ∗−A)F t−1 = O (T−1/2)O(e−(t−1)/2)+O (T−1/2)O(n−1/2)+O (T−1/2), p p p because of Lemma 6, (C66) and (C68). Similarly, from (C71) we have P∗ t|t−1 −P t|t−1 = A(P∗ t−1|t−1 −P t−1|t−1 )A(cid:48)+(A(cid:98) ∗−A)(P∗ t−1|t−1 −P t−1|t−1 )A(cid:48) (C75) +(A(cid:98) ∗−A)(P∗ t−1|t−1 −P t−1|t−1 )(A(cid:98) ∗−A)(cid:48)+A(P∗ t−1|t−1 −P t−1|t−1 )(A(cid:98) ∗−A)(cid:48) +(A(cid:98) ∗−A)P t−1|t−1 A(cid:48)+(A(cid:98) ∗−A)P t−1|t−1 (A(cid:98) ∗−A)(cid:48)+AP t−1|t−1 (A(cid:98) ∗−A)(cid:48) +(H(cid:98) ∗−H)H(cid:48)+(H(cid:98) ∗−H)(H(cid:98) ∗−H)(cid:48)+H(H(cid:98) ∗−H)(cid:48) = A(P∗ t−1|t−1 −P t−1|t−1 )A(cid:48)+(A(cid:98) ∗−A)(P∗ t−1|t−1 −P t−1|t−1 )A(cid:48) +(A(cid:98) ∗−A)(P∗ t−1|t−1 −P t−1|t−1 )(A(cid:98) ∗−A)(cid:48)+A(P∗ t−1|t−1 −P t−1|t−1 )(A(cid:98) ∗−A)(cid:48)+O p (T−1/2), since (A(cid:98) ∗−A)P t−1|t−1 A(cid:48) = AP t−1|t−1 (A(cid:98) ∗−A)(cid:48) = O p (T−1/2)O(e−(t−1))+O p (T−1/2)O(n−1), (A(cid:98) ∗−A)P t−1|t−1 (A(cid:98) ∗−A)(cid:48) = O p (T−1)O(e−(t−1))+O p (T−1)O(n−1). because of Lemma 6 and (C50) in the proof of Lemma 5 and H(cid:98) ∗H(cid:98) ∗(cid:48) −HH(cid:48) = (H(cid:98) ∗−H)H(cid:48)+H(H(cid:98) ∗−H)(cid:48)+(H(cid:98) ∗−H)(H(cid:98) ∗−H)(cid:48) = O p (T−1/2), because of Lemma 6. 42

Define K = P Λ(cid:48)(ΛP Λ(cid:48)+R)−1, t t|t−1 t|t−1 ∗ and analogously define K(cid:98) when using P∗ , Λ(cid:98) ∗ and R(cid:98) ∗. From (C72) we have t t|t−1 ∗ F∗ t|t −F t|t = F∗ t|t−1 −F t|t−1 +(K(cid:98) t −K t )(x t −ΛF t|t−1 ) ∗ +(K(cid:98) t −K t )(ΛF t|t−1 −Λ(cid:98) ∗F∗ t|t−1 )+K t (ΛF t|t−1 −Λ(cid:98) ∗F∗ t|t−1 ). (C76) Moreover, because of (C69) Λ(cid:98) ∗F∗ t|t−1 √ −ΛF t|t−1 = √ Λ (F∗ −F )+ (cid:32) Λ(cid:98) ∗ √ −Λ (cid:33) (F∗ −F )+ (cid:32) Λ(cid:98) ∗ √ −Λ (cid:33) F n n t|t−1 t|t−1 n t|t−1 t|t−1 n t|t−1 (cid:32) (cid:33) Λ Λ(cid:98) ∗−Λ = √ (F∗ −F )+ √ (F∗ −F )+O (T−1/2), n t|t−1 t|t−1 n t|t−1 t|t−1 p (C77) since (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) (cid:32) (cid:33) Λ(cid:98) ∗−Λ Λ(cid:98) ∗−Λ Λ(cid:98) ∗−Λ Λ(cid:98) ∗−Λ √ F = √ AF = √ A(F −F )+ √ AF n t|t−1 n t−1|t−1 n t−1|t−1 t−1 n t−1 = O (T−1/2)O(e−(t−1)/2)+O (T−1/2)O(n−1/2)+O (T−1/2), p p p because of Lemma 6, (C66), (C69) and since (cid:107)A(cid:107) = O(1). Similarly, from (C73) (cid:104) ∗ P∗ t|t −P t|t = P∗ t|t−1 −P∗ t|t−1 − (K(cid:98) t −K t )ΛP t|t−1 ∗ (cid:105) + (K(cid:98) t −K t )(Λ(cid:98) ∗P∗ t|t−1 −ΛP t|t−1 )+K t (Λ(cid:98) ∗P∗ t|t−1 −ΛP t|t−1 ) . (C78) Moreover, Λ(cid:98) ∗P∗ t|t−1 √ −ΛP t|t−1 = √ Λ (P∗ −P )+ (cid:32) Λ(cid:98) ∗ √ −Λ (cid:33) (P∗ −P )+ (cid:32) Λ(cid:98) ∗ √ −Λ (cid:33) P n n t|t−1 t|t−1 n t|t−1 t|t−1 n t|t−1 (cid:32) (cid:33) Λ Λ(cid:98) ∗−Λ = √ (P∗ −P )+ √ (P∗ −P )+O (T−1/2), n t|t−1 t|t−1 n t|t−1 t|t−1 p (C79) since (cid:32) (cid:33) (cid:32) (cid:33) Λ(cid:98) ∗ √ −Λ P = Λ(cid:98) ∗ √ −Λ (cid:0) AP A(cid:48)+HH(cid:48)(cid:1) n t|t−1 n t−1|t−1 = O (T−1/2)O(e−(t−1))+O (T−1/2)O(n−1)+O (T−1/2), p p p because of Lemma 6 and (C50) in the proof of Lemma 5 and since (cid:107)A(cid:107) = O(1) and (cid:107)H(cid:107) = 43

O(1). Following the same reasoning we also have ΛP √ t|t−1 = √ Λ (cid:0) AP A(cid:48)+HH(cid:48)(cid:1) = √ Λ (cid:16) O(e−(t−1))+O(n−1)+HH(cid:48) (cid:17) . (C80) n n t−1|t−1 n Now, set t = 1. Then, by noticing that F∗ = F and P∗ = P , from (C74) and (C75) at 0|0 0|0 0|0 0|0 t = 1 we have F∗ −F = O (T−1/2), P∗ −P = O (T−1/2). (C81) 1|0 1|0 p 1|0 1|0 p Then, because of (C67), (C81) and Lemma 6, at t = 1 we have √ ∗ (cid:18) x 1 ΛF 1|0 (cid:19) n(K(cid:98) 1 −K 1 ) √ n − √ n = (C82)   = P∗ 1|0 Λ √ (cid:98) ∗ n (cid:48) (cid:32) √ Λ(cid:98) n P∗ 1|0 Λ √ (cid:98) ∗ n (cid:48) + R(cid:98) n ∗ (cid:33)−1 −P 1|0 √ Λ n (cid:48) (cid:18) √ Λ n P 1|0 √ Λ n (cid:48) + R n (cid:19)−1  (cid:18) √ x 1 n − Λ √ F n 1|0 (cid:19) = O (T−1/2). p Moreover, from (C80) √ ΛP n(K(cid:98) ∗ 1 −K 1 ) √ n 1|0 = O p (T−1/2). (C83) From (C76) and using (C74), (C77), (C81), and (C82) at t = 1 we have ∗ F∗ 1|1 −F 1|1 = F∗ 1|0 −F 1|0 +(K(cid:98) 1 −K 1 )(x 1 −ΛF 1|0 ) ∗ +(K(cid:98) 1 −K 1 )(ΛF 1|0 −Λ(cid:98)F∗ 1|0 )+K 1 (ΛF 1|0 −Λ(cid:98)F∗ 1|0 ) = O p (T−1/2). (C84) Similarly, from (C78) and using (C75), (C79), (C81), and (C83) at t = 1 we have (cid:104) ∗ P∗ 1|1 −P 1|1 = P∗ 1|0 −P∗ 1|0 − (K(cid:98) 1 −K 1 )ΛP 1|0 ∗ (cid:105) + (K(cid:98) 1 −K 1 )(Λ(cid:98) ∗P∗ 1|0 −ΛP 1|0 )+K 1 (Λ(cid:98) ∗P∗ 1|0 −ΛP 1|0 ) = O p (T−1/2). (C85) Then substituting (C84) into (C76) and (C85) into (C78) we have F∗ −F = O (T−1/2), P∗ −P = O (T−1/2). (C86) 2|1 2|1 p 2|1 2|1 p Then, because of (C67), (C86) and Lemma 6, at t = 2 we have √ n(K(cid:98) ∗ 2 −K 2 ) (cid:18) √ x 2 n − Λ √ F n 2|1 (cid:19) = O p (T−1/2), (C87) and from (C80) √ ΛP n(K(cid:98) ∗ 2 −K 2 ) √ n 2|1 = O p (T−1/2). (C88) 44

From, (C76) and using (C74), (C77), (C86), and (C87), at t = 2 we have F∗ −F = O (T−1/2), 2|2 2|2 p and from (C78) and using (C75), (C79), (C86), and (C88), at t = 2 we have P∗ −P = O (T−1/2). 2|2 2|2 p By repeating the same reasoning for t = 3,...,T we have (cid:107)F∗ −F (cid:107) = O (T−1/2), (cid:107)P∗ −P (cid:107) = O (T−1/2), (C89) t|t t|t p t|t t|t p and also (cid:107)F∗ −F (cid:107) = O (T−1/2), (cid:107)P∗ −P (cid:107) = O (T−1/2). (C90) t|t−1 t|t−1 p t|t−1 t|t−1 p Because of Lemma 5 and (C89), we have for t¯≤ t ≤ T, (cid:107)F∗ −F (cid:107) ≤ (cid:107)F∗ −F (cid:107)+(cid:107)F −F (cid:107) = O (T−1/2)+O(n−1/2), t|t t t|t t|t t|t t p (cid:107)P∗ (cid:107) ≤ (cid:107)P∗ −P (cid:107)+(cid:107)P (cid:107) = O (T−1/2)+O(n−1). t|t t|t t|t t|t p Now compare the KS iterations, (B22)-(B26), with those obtained when using Θ(cid:98) ∗: F∗ = F∗ +P∗ r∗ , (C91) t|T t|t−1 t|t−1 t−1 r∗ t−1 = Λ(cid:98) ∗(cid:48) (Λ(cid:98) ∗P∗ t|t−1 Λ(cid:98) ∗(cid:48) +R(cid:98) ∗)−1(x t −Λ(cid:98) ∗F∗ t|t−1 )+L∗ t (cid:48) r∗ t , (C92) P∗ = P∗ −P∗ N∗ P∗ , (C93) t|T t|t−1 t|t−1 t−1 t|t−1 N∗ = Λ(cid:98) ∗(cid:48) (Λ(cid:98) ∗P∗ Λ(cid:98) ∗(cid:48) +R(cid:98) ∗)−1Λ(cid:98) ∗+L∗(cid:48) N∗L∗, (C94) t−1 t|t−1 t t t L∗ = A(cid:98) ∗−A(cid:98) ∗P∗ Λ(cid:98) ∗(cid:48) (Λ(cid:98) ∗P∗ Λ(cid:98) ∗(cid:48) +R(cid:98) ∗)−1Λ(cid:98) ∗, (C95) t t|t−1 t|t−1 where r∗ = 0 , N∗ = 0 . First notice that obviously at t = T both KF and KS give the T r×1 T r same result hence (C89) applies also in this case, and because of Lemma 6, (C67), (C77), and (C90), we have r∗ −r = O (T−1/2), N∗ −N = O (T−1/2). (C96) T−1 T−1 p T−1 T−1 p Moreover, from (C95), because of Lemma 6 and (C90), we have (cid:34) (cid:35) √ ∗Λ(cid:98) ∗ Λ L∗ t −L t = A(cid:98) ∗−A− n A(cid:98) ∗K(cid:98) t √ n −AK t √ n = O p (T−1/2). (C97) Then, from (C92), because of (C67), (C77), (C90), (C96) and (C97), at t = T −1 we have r∗ −r = O (T−1/2), N∗ −N = O (T−1/2). (C98) T−2 T−2 p T−2 T−2 p Therefore, from (C91) and (C93), because of (C90) and (C98), we have F∗ −F = O (T−1/2), P∗ −P = O (T−1/2). (C99) T−1|T T−1|T p T−1|T T−1|T p 45

By repeating the same reasoning for t = (T −2),...,1, we have (cid:107)F∗ −F (cid:107) = O (T−1/2), (cid:107)P∗ −P (cid:107) = O (T−1/2). (C100) t|T t|T p t|T t|T p Because of Lemma 5 and (C100), we have for t¯≤ t ≤ T (cid:107)F∗ −F (cid:107) ≤ (cid:107)F∗ −F (cid:107)+(cid:107)F −F (cid:107) = O (T−1/2)+O(n−1/2), t|T t t|T t|T t|T t p (cid:107)P∗ (cid:107) ≤ (cid:107)P∗ −P (cid:107)+(cid:107)P (cid:107) = O (T−1/2)+O(n−1), t|T t|T t|T t|T p which completes the proof. (cid:3) Lemma 8. Consider the initial estimator of the parameters Θ(cid:98)0 defined in (B3), then there exists an invertible r×r matrix J such that, as n,T → ∞: √ √ min( n, T)(cid:107)λ(cid:98) (cid:48) i0 −λ(cid:48) i J−1(cid:107) = O p (1), i = 1,...,n, √ √ min( n, T)(cid:107)A(cid:98)0 −JAJ−1(cid:107) = O p (1), √ √ min( n, T)(cid:107)H(cid:98)0 −JH(cid:107) = O p (1), √ √ min( n, T)|[R(cid:98)] ii,0 −[R] ii | = O p (1), i = 1,...,n. Moreover, under E1 and E2 we have J = I . r Proof. When E2 holds the proof Lemmas 3 and 5 in Barigozzi et al., 2016b where is shown that J is a diagonal matrix with entries ±1. If we impose also E1 the sign indeterminacy is fixed and J = I . (cid:3) r Lemma 9. Consider the estimator of the parameters obtained at convergence of the EM algorithm Θ(cid:98) := Θ(cid:98)k∗ , then, as n,T → ∞: √ T (cid:107)λ(cid:98)ik∗ −λ i (cid:107) = O p (1), i = 1,...,n, √ T (cid:107)A(cid:98)k∗ −A(cid:107) = O p (1), √ T (cid:107)H(cid:98)k∗ −H(cid:107) = O p (1), √ T |[R(cid:98)] ii,k∗ −[R] ii | = O p (1), i = 1,...,n. Proof. Define the Q×Q matrices I(Θ) = −∇2 (cid:96)(X ;Θ), ΘΘ(cid:48) T (cid:90) I (Θ) = − ∇2 (cid:96)(X ,F ;Θ)f(F |X ;Θ)dF = −E [∇2 (cid:96)(X ,F ;Θ)|X ], 0 ΘΘ(cid:48) T T T T T Θ ΘΘ(cid:48) T T T Rr×T (cid:90) I (Θ) = − ∇2 (cid:96)(F |X ,F ;Θ)f(F |X ;Θ)dF = −E [∇2 (cid:96)(F |X ;Θ)|X ], 1 ΘΘ(cid:48) T T T T T T Θ ΘΘ(cid:48) T T T Rr×T then, since I(Θ) does not depend on F from (B1) and (B11) T I(Θ) = I (Θ)−I (Θ). (C101) 0 1 46

Moreover,atconvergenceoftheEMalgorithm(iterationk∗)wehavetheTaylorapproximation (cid:16) (cid:17) (Θ(cid:98)k∗ −Θ(cid:98) ∗) = (Θ(cid:98)k∗−1 −Θ(cid:98) ∗)R(Θ(cid:98) ∗)+O (cid:107)Θ(cid:98)k∗ −Θ(cid:98) ∗(cid:107)2 , (C102) where following Meng and Rubin (1994) we have (see also (C101)) (cid:16) (cid:17)−1 (cid:16) (cid:17)−1 R(Θ(cid:98) ∗) = I 1 (Θ(cid:98) ∗) I 0 (Θ(cid:98) ∗) = I Q −I(Θ(cid:98) ∗) I 0 (Θ(cid:98) ∗) . Hence, by iterating (C102) k∗ times and neglecting the second term on the rhs which at convergence is always smaller the the first term, we obtain (cid:107)Θ(cid:98)k∗ −Θ(cid:98) ∗(cid:107) ≤ (cid:107)Θ(cid:98)0 −Θ(cid:98) ∗(cid:107) (cid:107)R(Θ(cid:98) ∗)(cid:107)k∗ . (C103) Hereafter, denote ζ := max(n−1/2,T−1/2). Consider (C103) for the estimated loadings, for nT any i = 1,...,n, and using Gaussianity (see D2 and D3), we have T T (cid:13) (cid:16)(cid:88) (cid:17)(cid:16)(cid:88) (cid:17)−1(cid:13)k∗ (cid:107)λ(cid:98)ik∗ −λ(cid:98) ∗ i (cid:107) ≤ (cid:107)λ(cid:98)i0 −λ(cid:98) ∗ i (cid:107) (cid:13) (cid:13) I r − F t F(cid:48) t E Θ(cid:98)∗ [F t F(cid:48) t |X T ] (cid:13) (cid:13) t=1 t=1 T T = (cid:107)λ(cid:98)i0 −λ(cid:98) ∗ i (cid:107) (cid:13) (cid:13) (cid:13) I r − (cid:16) T 1 2 (cid:88) F t F(cid:48) t (cid:17)(cid:16) T 1 2 (cid:88)(cid:16) F∗ t|T F∗ t| (cid:48) T +P∗ t|T (cid:17)(cid:17)−1(cid:13) (cid:13) (cid:13) k∗ . (C104) t=1 t=1 Therefore, by condition (15) in the text t¯= O(logT), because of Lemma 7, we have T t¯−1 T (cid:16) 1 (cid:88)(cid:16) F∗ F∗(cid:48) +P∗ (cid:17)(cid:17)−1 = (cid:16) 1 (cid:88)(cid:16) F∗ F∗(cid:48) +P∗ (cid:17) + 1 (cid:88)(cid:16) F∗ F∗(cid:48) +P∗ (cid:17)(cid:17)−1 T2 t|T t|T t|T T2 t|T t|T t|T T2 t|T t|T t|T t=1 t=1 t=t¯ = (cid:18) 1 (cid:88) T F F(cid:48) +O (cid:18) √ ζ nT (cid:19) +O (cid:18) logT (cid:19)(cid:19)−1 T2 t t p T p T t=1 T (cid:16) 1 (cid:88) (cid:17)−1 = F F(cid:48) +o (T−1/2), (C105) T2 t t p t=1 since (cid:107)F∗ F∗(cid:48) +P∗ (cid:107) = O (T) and (cid:107)F F(cid:48)(cid:107) = O (T), because F ∼ I(1) and F∗ ∼ I(1), t|T t|T t|T p t t p t t|T and t¯−1 (cid:18) (cid:19) T T (cid:18) (cid:19) 1 (cid:88)(cid:16) F∗ F∗(cid:48) +P∗ (cid:17) = O logT , 1 (cid:88) F F(cid:48) − 1 (cid:88) F F(cid:48) = O logT . T2 t|T t|T t|T p T T2 t t T2 t t p T t=1 t=1 t=t¯ Moreover, because of Lemmas 8 and 6, we have (cid:107)λ(cid:98)i0 −λ(cid:98) ∗ i (cid:107) ≤ (cid:107)λ(cid:98)i0 −λ i (cid:107)+(cid:107)λ(cid:98) ∗ i −λ i (cid:107) = O p (ζ nT )+O p (T−1/2). (C106) By substituting (C105) and (C106) into (C104), we have (cid:107)λ(cid:98)ik∗ −λ(cid:98) ∗ i (cid:107) = o p (ζ nT T−k∗/2). (C107) 47

Finally, because of Lemma 6 and (C107), we have (cid:107)λ(cid:98)ik∗ −λ i (cid:107) ≤ (cid:107)λ(cid:98)ik∗ −λ(cid:98) ∗ i (cid:107)+(cid:107)λ(cid:98) ∗ i −λ i (cid:107) = o p (T−k∗/2)+O p (T−1/2). The proof for the other parameters follows the same steps by taking the appropriate second derivatives and applying the results in Lemma 7. (cid:3) C.4 Proof of Proposition 1 Consistency of the estimated loadings is proved in Lemma 9. Recalling that Θ(cid:98) := Θ(cid:98)k∗ and also λ = K−1(cid:48)(b(cid:48) b(cid:48) ), because of (A12) we prove (16). i i0 i1 Consistency of the estimated static factors is then proved as in Lemma 7 but using the results of Lemma 9 for the estimated parameters. In particular, for all t¯≤ t ≤ T (cid:107)F −F (cid:107) ≤ (cid:107)F −F (cid:107)+(cid:107)F −F (cid:107) = O (T−1/2)+O(n−1/2). t|T,k∗ t t|T,k∗ t|t t|t t p Recalling that F(cid:98)t := F t|T,k∗ and also F t = K(f t (cid:48)f t (cid:48) −1 )(cid:48) because of (A11) we prove (17). A similar result can be proved for the KF estimator of the static factors, F using again t|t,k∗ Lemmas 7 and 9. Denote ζ nT := max(n−1/2,T−1/2). Then, recalling that λ(cid:98) (cid:48) i F(cid:98)t := λ(cid:98) (cid:48) ik∗ F t|T,k∗ , because of (16), (17) and Lemma 5, we have |χ (cid:98)it −χ it | = |λ(cid:98) (cid:48) i F(cid:98)t −λ(cid:48) i F t | ≤ (cid:107)(λ(cid:98)i −λ i )F t (cid:107)+(cid:107)λ i (cid:107) (cid:107)F(cid:98)t −F t (cid:107)+(cid:107)λ(cid:98)i −λ i (cid:107) (cid:107)F(cid:98)t −F t (cid:107) = (cid:107)(λ(cid:98)i −λ i )F t (cid:107)+O p (ζ nT ) = (cid:107)λ(cid:98)i −λ(cid:98) ∗ i (cid:107) (cid:107)F t (cid:107)+(cid:107)(λ(cid:98) ∗ i −λ i )F t (cid:107)+O p (ζ nT ). (C108) Now, from Lemma 9 (see in particular (C107)) and since (cid:107)F (cid:107) = O (T−1/2), we have t p (cid:107)λ(cid:98)i −λ(cid:98) ∗ i (cid:107) (cid:107)F t (cid:107) = o p (ζ nT T−(k∗−1)/2). (C109) Moreover, from transformation (C57), defined in the proof of Lemma 6, DF = (Z(cid:48) Z(cid:48) )(cid:48), √ t 1t 0t we have (cid:107)Z (cid:107) = O ( T) and (cid:107)Z (cid:107) = O (1). Then, since D(cid:48)D = I , as a consequence of 1t p 0t p r Lemma 6 (see in particular (C59)), we have (λ(cid:98) ∗ i −λ i )(cid:48)D(cid:48)DF t = O p (T−1/2). (C110) By substituting (C109) and (C110) into (C108) and since k∗ ≥ 1 because we run the EM algorithm at least once after initialization, we prove (18). (cid:3) 48

Appendix D Data Description and Data Treatment This Appendix present the dataset used in the analysis. All variables where downloaded from Haver on June 16th 2017. None of the variables where adjusted for outliers but variables 57, 83, 87, and 94. All variables are from the USECON database but variable 103 that is from the DAILY database. All monthly and daily series are transformed into quarterly observation by simple averages. In order to choose whether or not to de-trend a variable we apply the following procedure: let m be the sample mean of ∆y , γ (j) be the auto-covariance of order j of ∆y , and i it i it (cid:113) γ¯ = 1 (cid:80)J γ (j), then if |mi| ≥ 1.96 we estimate a and b from an OLS regression of y i T j=1 i γ¯i i i it on a constant and a time trend, whereas if |m γ¯i i| < 1.96 we set (cid:98) a i = m i and(cid:98)b i = 0. List of Abbreviations Source: BLS=U.S. Department of Labor: Bureau of Labor Statistics BEA=U.S. Department of Commerce: Bureau of Economic Analysis ISM = Institute for Supply Management CB=U.S. Department of Commerce: Census Bureau FRB=Board of Governors of the Federal Reserve System EIA=Energy Information Administration WSJ=Wall Street Journal CBO=Congressional Budget Office FRBPHIL=Federal Reserve Bank of Philadelphia F = Frequency T=Transformation SA ξ=Idiosyncratic Q = Quarterly 0 = None 0 = no 0=I(0) M = Monthly 1 = log 1 = yes 1=I(1) D = Daily 2 = ∆log D = Deterministic Component U=Units 0 = (cid:98) a i = T 1 (cid:80)T t=1 ∆y it ,(cid:98)b i = 0 1000–P = Thousands of Persons 1 = OLS Detrending 1000–U = Thousands of Units BoC = Billions of Chained $–B = Dollars per Barrel 49

N Series ID Definition Unit F S SA T D ξ 1 GDPH Real Gross Domestic Product BoC 2009$ Q BEA 1 1 1 0 2 GDYH Real gross domestic income BoC 2009$ Q BEA 1 1 1 0 3 FSH Real Final Sales of Domestic Product BoC 2009$ Q BEA 1 1 1 1 4 IH Real Gross Private Domestic Investment BoC 2009$ Q BEA 1 1 1 1 5 GSH Real State & Local∗ BoC 2009$ Q BEA 1 1 1 1 6 FRH Real Private Residential Fixed Investment BoC 2009$ Q BEA 1 1 1 0 7 FNH Real Private Nonresidential Fixed Investment BoC 2009$ Q BEA 1 1 1 1 8 MH Real Imports of Goods & Services BoC 2009$ Q BEA 1 1 1 0 9 GH Real Government∗ BoC 2009$ Q BEA 1 1 1 1 10 XH Real Exports of Goods & Services BoC 2009$ Q BEA 1 1 1 0 14 CH Real Personal Consumption Expenditures (PCE) BoC 2009$ Q BEA 1 1 1 0 11 CNH Real PCE: Nondurable Goods BoC 2009$ Q BEA 1 1 1 1 12 CSH Real PCE: Services BoC 2009$ Q BEA 1 1 1 0 13 CDH Real PCE: Durable Goods BoC 2009$ Q BEA 1 1 1 0 15 GFDIH Real National Defense Gross Investment BoC 2009$ Q BEA 1 1 1 0 16 GFNIH Real Federal Nondefense Gross Investment BoC 2009$ Q BEA 1 1 1 0 17 YPDH Real Disposable Personal Income BoC 2009$ Q BEA 1 1 1 0 18 JI Gross Private Domestic Investment:(cid:63) 2009=100 Q BEA 1 2 0 0 19 JGDP Gross Domestic Product:(cid:63) 2009=100 Q BEA 1 2 0 1 20 LXNFU Unit Labor Cost† 2009=100 Q BLS 1 1 1 1 21 LXNFR Real Compensation Per Hour† 2009=100 Q BLS 1 1 1 1 22 LXNFC Compensation Per Hour† 2009=100 Q BLS 1 1 1 1 23 LXNFH Hours of All Persons† 2009=100 Q BLS 1 1 1 0 24 LXNFA Output Per Hour of All Persons† 2009=100 Q BLS 1 1 1 0 25 LXMU Unit Labor Cost‡ 2009=100 Q BLS 1 1 1 1 26 LXMR Real Compensation Per Hour‡ 2009=100 Q BLS 1 1 1 1 27 LXMC Compensation Per Hour‡ 2009=100 Q BLS 1 1 1 0 28 LXMH Hours of All Persons‡ 2009=100 Q BLS 1 1 1 0 29 LXMA Output Per Hour of All Persons‡ 2009=100 Q BLS 1 1 1 1 30 IP Industrial Production (IP) Index 2012=100 M FRB 1 1 1 0 31 IP521 IP: Business Equipment 2012=100 M FRB 1 1 1 1 32 IP511 IP: Durable Consumer Goods 2012=100 M FRB 1 1 1 0 33 IP531 IP: Durable Materials 2012=100 M FRB 1 1 1 1 34 IP512 IP: Nondurable Consumer Goods 2012=100 M FRB 1 1 1 0 35 IP532 IP: nondurable Materials 2012=100 M FRB 1 1 1 0 ∗ ConsumptionExpenditures&GrossInvestment (cid:63) Chain-typePriceIndex † NonfarmBusinessSector ‡ ManufacturingSector 50

N Series ID Definition Unit F S SA T D ξ 36 PCU CPI-U: All Items 82-84=100 M BLS 1 2 0 0 37 PCUSE CPI-U: Energy 82-84=100 M BLS 1 2 0 0 38 PCUSLFE CPI-U: All Items Less Food and Energy 82-84=100 M BLS 1 2 0 0 39 PCUFO CPI-U: Food 82-84=100 M BLS 1 2 0 0 40 JCBM PCE: Chain Price Index 2009=100 M BEA 1 2 0 0 41 JCEBM PCE: Energy Goods & Services–price index 2009=100 M BEA 1 2 0 0 42 JCNFOM PCE: Food & Beverages–price index∗ 2009=100 M BEA 1 2 0 0 43 JCXFEBM PCE less Food & Energy–price index 2009=100 M BEA 1 2 0 0 44 JCSBM PCE: Services–price index 2009=100 M BEA 1 2 0 0 45 JCDBM PCE: Durable Goods–price index 2009=100 M BEA 1 2 0 0 46 JCNBM PCE: Nondurable Goods–price index 2009=100 M BEA 1 2 0 0 47 PC1 PPI: Intermediate Demand Processed Goods 1982=100 M BLS 1 2 0 0 48 P05 PPI: Fuels and Related Products and Power 1982=100 M BLS 0 2 0 0 49 SP3000 PPI: Finished Goods 1982=100 M BLS 1 2 0 0 50 PIN PPI: Industrial Commodities 1982=100 M BLS 0 2 0 0 51 PA PPI: All Commodities 1982=100 M BLS 0 2 0 0 52 PC1 PPI: Intermediate Demand Processed Goods 1982=100 M BLS 1 2 0 0 53 FMC Money Stock: Currency Bil. of $ M FRB 1 2 0 0 54 FM1 Money Stock: M1 Bil. of $ M FRB 1 2 0 1 55 FM2 Money Stock: M2 Bil. of $ M FRB 1 2 0 0 56 FABWC C & I Loans in Bank Credit:† Bil. of $ M FRB 1 1 1 1 57 FABWQ Consumer Loans in Bank Credit:† Bil. of $ M FRB 1 1 1 1 58 FAB Bank Credit:† Bil. of $ M FRB 1 1 1 1 59 FABW Loans & Leases in Bank Credit:† Bil. of $ M FRB 1 1 1 1 60 FABYO Other Securities in Bank Credit:† Bil. of $ M FRB 1 1 1 1 61 FABWR Real Estate Loans in Bank Credit:† Bil. of $ M FRB 1 1 1 0 62 FOT Consumer Credit Outstanding Bil. of $ M FRB 1 1 1 0 63 HSTMW Housing Starts: Midwest 1000–U M CB 1 1 0 0 64 HSTNE Housing Starts: Northeast 1000–U M CB 1 1 0 0 65 HSTS Housing Starts: South 1000–U M CB 1 1 0 0 66 HSTGW Housing Starts: West 1000–U M CB 1 1 0 0 67 HPT Building Permit(cid:63) 1000–U M CB 1 1 0 0 68 FBPR Bank Prime Loan Rate Percent M FRB 0 0 0 0 69 FFED Federal Funds [effective] Rate Percent M FRB 0 0 0 0 70 FCM1 1-Year Treasury Bill Yield‡ Percent M FRB 0 0 0 0 71 FCM10 10-Year Treasury Note Yield‡ Percent M FRB 0 0 0 0 ∗ PurchasedforOff-PremisesConsumption † AllCommercialBanks (cid:63) NewPrivateHousingUnitsAuthorizedby‡ atConstantMaturity 51

N Series ID Definition Unit F S SA T D ξ 72 LP Civilian Participation Rate: 16 yr + Percent M BLS 0 0 0 1 73 LQ Civilian Employment/Population Ratio: 16 yr + Percent M BLS 0 0 0 1 74 LE Civilian Employment: Sixteen Years & Over 1000–P M BLS 0 1 1 0 75 LR Civilian Unemployment Rate: 16 yr + Percent M BLS 0 0 0 0 76 LU0 Civilians Unemployed for Less Than 5 Weeks 1000–P M BLS 0 1 0 0 77 LU5 Civilians Unemployed for 5-14 Weeks 1000–P M BLS 0 1 0 1 78 LU15 Civilians Unemployed for 15-26 Weeks 1000–P M BLS 0 1 0 1 79 LUT27 Civilians Unemployed for 27 Weeks and Over 1000–P M BLS 0 1 0 1 80 LUAD Average [Mean] Duration of Unemployment Weeks M BLS 0 1 0 0 81 LANAGRA All Employees: Total Nonfarm 1000–P M BLS 0 1 1 1 82 LAPRIVA All Employees: Total Private Industries 1000–P M BLS 0 1 1 0 83 LANTRMA All Employees: Mining and Logging 1000–P M BLS 0 1 1 1 84 LACONSA All Employees: Construction 1000–P M BLS 0 1 1 1 85 LAMANUA All Employees: Manufacturing 1000–P M BLS 0 1 1 0 86 LATTULA AllEmployees: Trade,Transportation&Utilities 1000–P M BLS 0 1 1 1 87 LAINFOA All Employees: Information Services 1000–P M BLS 0 1 1 1 88 LAFIREA All Employees: Financial Activities 1000–P M BLS 0 1 1 1 89 LAPBSVA All Employees: Professional & Business Services 1000–P M BLS 0 1 1 1 90 LAEDUHA All Employees: Education & Health Services 1000–P M BLS 0 1 1 1 91 LALEIHA All Employees: Leisure & Hospitality 1000–P M BLS 0 1 1 1 92 LASRVOA All Employees: Other Services 1000–P M BLS 0 1 1 1 93 LAGOVTA All Employees: Government 1000–P M BLS 0 1 1 0 94 LAFGOVA All Employees: Federal Government 1000–P M BLS 0 1 1 1 95 LASGOVA All Employees: State Government 1000–P M BLS 0 1 1 0 96 LALGOVA All Employees: Local Government 1000–P M BLS 0 1 1 0 97 PETEXA West Texas Intermediate Spot Price FOB∗ $–B M EIA 0 2 0 0 98 NAPMNI ISM Mfg: New Orders Index Index M ISM 1 0 0 1 99 NAPMOI ISM Mfg: Production Index Index M ISM 1 0 0 1 100NAPMEI ISM Mfg: Employment Index Index M ISM 1 0 0 1 101NAPMVDI ISM Mfg: Supplier Deliveries Index Index M ISM 1 0 0 0 102NAPMII ISM Mfg: Inventories Index Index M ISM 1 0 0 0 103SP500 Standard & Poor’s 500 Stock Price Index 41-43=10 D WSJ 0 1 1 0 ∗ Cushing,Oklahoma Series ID Definition Unit F Source GDPPOTHQ Real Potential Gross Domestic Product BoC 2009$ Q CBO NAIRUQ Natural Rate of Unemployment percent Q CBO GDPPLUS US GDPplus percent Q FRBPHIL 52

Cite this document
APA
Matteo Barigozzi and Matteo Luciani (2017). Common Factors, Trends, and Cycles in Large Datasets (FEDS 2017-111). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2017-111
BibTeX
@techreport{wtfs_feds_2017_111,
  author = {Matteo Barigozzi and Matteo Luciani},
  title = {Common Factors, Trends, and Cycles in Large Datasets},
  type = {Finance and Economics Discussion Series},
  number = {2017-111},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2017},
  url = {https://whenthefedspeaks.com/doc/feds_2017-111},
  abstract = {This paper considers a non-stationary dynamic factor model for large datasets to disentangle long-run from short-run co-movements. We first propose a new Quasi Maximum Likelihood estimator of the model based on the Kalman Smoother and the Expectation Maximisation algorithm. The asymptotic properties of the estimator are discussed. Then, we show how to separate trends and cycles in the factors by mean of eigenanalysis of the estimated non-stationary factors. Finally, we employ our methodology on a panel of US quarterly macroeconomic indicators to estimate aggregate real output, or Gross Domestic Output, and the output gap. Accessible materials (.zip)},
}