feds · July 31, 2017

Non-Stationary Dynamic Factor Models for Large Datasets

Abstract

We study a Large-Dimensional Non-Stationary Dynamic Factor Model where (1) the factors F t are I (1) and singular, that is F t has dimension r and is driven by q dynamic shocks with q < r , (2) the idiosyncratic components are either I (0) or I (1). Under these assumption the factors F t are cointegrated and modeled by a singular Error Correction Model. We provide conditions for consistent estimation, as both the cross-sectional size n ,and the time dimension T , go to infinity, of the factors, the loadings, the shocks, the ECM coefficients and therefore the Impulse Response Functions. Finally, the numerical properties of our estimator are explored by means of a MonteCarlo exercise and of a real-data application, in which we study the effects of monetary policy and supply shocks on the US economy.

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Non-Stationary Dynamic Factor Models for Large Datasets Matteo Barigozzi, Marco Lippi, and Matteo Luciani 2016-024 Please cite this paper as: Barigozzi, Matteo, Marco Lippi, and Matteo Luciani (2016). “Non-Stationary Dynamic Factor Models for Large Datasets,” Finance and Economics Discussion Series 2016-024. Washington: Board of Governors of the Federal Reserve System, https://doi.org/10.17016/FEDS.2016.024r1. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Non-Stationary Dynamic Factor Models for Large Datasets Matteo Barigozzi1 Marco Lippi2 Matteo Luciani3 July 18, 2017 Abstract We study a Large-Dimensional Non-Stationary Dynamic Factor Model where (1) the factors F are I(1) and singular, that is F has dimension r and is driven by q t t dynamic shocks with q < r, (2) the idiosyncratic components are either I(0) or I(1). Under these assumption the factors F are cointegrated and modeled by a singular t Error Correction Model. We provide conditions for consistent estimation, as both the cross-sectional size n, and the time dimension T, go to infinity, of the factors, the loadings, the shocks, the ECM coefficients and therefore the Impulse Response Functions. Finally, the numerical properties of our estimator are explored by means of a MonteCarlo exercise and of a real-data application, in which we study the effects of monetary policy and supply shocks on the US economy. JEL subject classification: C0, C01, E0. Key words and phrases: Dynamic Factor models, unit root processes, cointegration, common trends, impulse response functions. 1m.barigozzi@lse.ac.uk – London School of Economics and Political Science, UK. 2marco.lippi@eief.it – Einaudi Institute for Economics and Finance, Roma, Italy. 3matteo.luciani@frb.gov – Federal Reserve Board of Governors, Washington DC, USA. Special thanks go to Paolo Paruolo and Lorenzo Trapani for helpful comments. This paper has benefited also from discussions with Antonio Conti, Domenico Giannone, Dietmar Bauer, and all participants to the 39th Annual NBER Summer Institute. This paper was written while Matteo Luciani was chargé de recherches F.R.S.-F.N.R.S., and he gratefully acknowledges their financial support. Of course, any errors are our responsibility. Disclaimer: the views expressed in this paper are those of the authors and do not necessarily reflect those of the Board of Governors or the Federal Reserve System.

1 Introduction Since the early 2000s Large-Dimensional Dynamic Factor Models (DFM) have become increasinglypopularintheeconometricandmacroeconomicliteratureandtheyarenowadays commonly used by policy institutions. Economists have been attracted by these models because they allow to analyze large panels of time series without suffering of the curse of dimensionality. Furthermore, these models proved successful in forecasting (Stock and Watson, 2002a,b; Forni et al., 2005; Giannone et al., 2008; Luciani, 2014), in the construction of both business cycle indicators and inflation indexes (Cristadoro et al., 2005; Altissimo et al., 2010), and also in policy analysis based on impulse response functions (Giannone et al., 2005; Stock and Watson, 2005; Forni et al., 2009; Forni and Gambetti, 2010; Barigozzi et al., 2014; Luciani, 2015), thus becoming a standard econometric tool in empirical macroeconomic analysis. DFMs are based on the idea that all the variables in an economic system are driven by a few common (macroeconomic) shocks, with their residual dynamics being explained by idiosyncratic components, such as measurement errors and sectorial or regional shocks. Formally, each variable in the n-dimensional dataset x , i = 1,2,...,n, can be decomit posed into the sum of two unobservable components: the common component χ , and the it idiosyncratic component ξ (Forni et al., 2000; Forni and Lippi, 2001; Stock and Watson, it 2002a,b). Moreover, the common components are linear combinations of an r-dimensional vector of common factors F = (F F ··· F )(cid:48), t 1t 2t rt x = χ +ξ , it it it (1) χ = λ F +λ F +···+λ F = λ(cid:48)F , it i1 1t i2 2t ir rt i t where λ = (λ λ ··· λ )(cid:48). The stochastic vector F is in its turn dynamically driven i i1 i2 ir t by a q-dimensional orthonormal white-noise vector u = (u u ··· u )(cid:48), the common t 1t 2t qt shocks: F = B(L)u , (2) t t where B(L) is an r ×q square-summable matrix in the lag operator (Stock and Watson, 2005; Bai and Ng, 2007; Forni et al., 2009). The dimension n of the dataset is assumed to belarge ascomparedtor andq, which areindependentofn, withq ≤ r. Moreprecisely, all assumptions and results are formulated assuming that both T, the number of observations for each x , and n, the number of variables, tend to infinity. it In the standard version of the DFM, the components χ and ξ , and therefore the it it 2

observable variables x , are assumed to be stationary. Under the stationarity assumption, it the factors F , and the loadings λ , can be consistently estimated by means of the first r t i principal components of the observable variables x (Stock and Watson, 2002a; Bai and it Ng, 2002). Estimation of the matrix B(L) is usually obtained by means of a VAR for the estimated factors F , this providing an estimate of the reduced-form, not identified, t impulse-response functions (IRF) of the variables x with respect to the common shocks it u , that is λ(cid:48)B(L). Lastly, as shown in Stock and Watson (2005) and Forni et al. (2009), t i the identification techniques used in Structural VAR analysis (SVAR) can be applied to obtain shocks and IRFs fulfilling restrictions based on macroeconomic theory. Of course the stationarity assumption does not hold for most of the variables contained in macroeconomic datasets. Assume for simplicity that all the variables x and the factors it are I(1). Equations (1) do not change, while the MA representation (2) becomes: ∆F = C(L)u . (3) t t In this case, the common practice in the applied DFM literature consists in taking first differences of the non-stationary variables, so obtaining a stationary dataset ∆x with it stationary factors ∆F , and then applying the procedure described above to ∆x and t it ∆F . This transformation is harmless as far as F and λ is concerned, as the first r t t i principal components of the variables ∆x consistently estimate ∆F and therefore, by it t integration, F , up to initial conditions (Bai and Ng, 2004). However, important issues t arising in connection with estimation of the IRFs in the non-stationary case have not been systematically analysed so far. In particular, estimation of C(L) by means of a VAR is not trivial. Firstly, if the factors F are cointegrated, consistent estimation of the long-run features t of the IRFs requires modeling F as a Vector Error Correction Model (VECM). The few t papers considering estimation of the IRFs in the non-stationary case model instead the dynamics of F as a VAR in differences (Stock and Watson, 2005; Forni et al., 2009). t Secondly, irrespective of whether the dataset is stationary or not, as a rule the vector F is singular, i.e. the number r of static common factors is greater than the number t q of common shocks. This finding is strongly supported by empirical evidence, see e.g. Giannone et al. (2005), Amengual and Watson (2007), Forni and Gambetti (2010), Luciani (2015) for US macroeconomic databases, Barigozzi et al. (2014) for the Euro area. The contribution of the present paper is the asymptotic analysis (consistency and rates) of estimators of IRFs for Non-Stationary Dynamic Factor Models for large datasets under 3

the assumptions that: (I) the factors F are I(1), singular and cointegrated (singular I(1) vectors are trivially t cointegrated, see below); (II) the idiosyncratic components ξ are I(1) or I(0). it As regards (I), singularity of the vector F is consistent with a point made in several t papers, see e.g. Stock and Watson (2005), Forni et al. (2009), that (1) and (2) are just a convenient static representation derived from a “deeper” set of dynamic equations linking the common components χ to the common shocks u . For example, assuming for simit t plicity stationarity, suppose that q = 1 and that the common components load the single shock u with the simple MA dynamics t χ = µ u +µ u . it i0 t i1 t−1 Representation (1) is obtained by setting r = 2, F = u , F = u , λ = (µ µ )(cid:48), while 1t t 2t t−1 i i0 i1 (2) takes the form (cid:32) (cid:33) (cid:32) (cid:33) u 1 t F = = u , t t u L t−1 so that B(L) = (1 L)(cid:48). This elementary model helps understanding the assumption of singularity for F , but also helps to point out that F has not an autonomous economic t t content. EstimationofF andofB(L), orC(L), areusedtoobtainthedynamicresponseof t x to the common shocks u . In particular, the factors F are identified only up to a linear it t t transformation and replacing F with H−1F , H being an invertible matrix, obviously t t requires replacing λ with H(cid:48)λ and fairly obvious transformations of B(L), or C(L). i i However, it is easily seen that application of identification restrictions based on economic logic, such as recursive schemes or long-run effects, only applies to the shocks u , which t implies that the identified IRF are independent of H, so that in this sense the factors F t are just playing an auxiliary role, see Remark 2 in Section 2. Second, in the companion paper Barigozzi et al. (2016) we address the representation problems for singular cointegrated vectors. Denoting by c the cointegration rank of F , c t is at least r − q, that is c = r − q + d with 0 ≤ d < q, hence singular I(1) vectors are always cointegrated. Moreover, under the assumption that the entries of C(L) are rational functions of L, F has a representation as a VECM: t G(L)∆F +αβ(cid:48)F = h+Ku , (4) t t−1 t 4

where α and β are both r×c and full rank, K is r×q and G(L) is a finite-degree matrix polynomial. Trivially, representation (4) implies the existence of q −d common trends for F . In the present paper we study estimation of the DFM and the IRFs for non-stationary t datawhen(4)holds. Specifically, weconsiderthecaseinwhich(4)isestimatedasaVECM or by means of an unrestricted VAR in levels. As regards (II), with the exception of Bai and Ng (2004), DFMs for I(1) variables are studied under the assumption of stationary idiosyncratic components, see e.g. Bai (2004) and Peña and Poncela (2006) (which is however a model with fixed n). This is a crucial assumption with non-trivial consequences on the model. First, I(0) idiosyncratic components imply that the x’s and the factors are cointegrated. This property is exploited in Banerjee et al. (2017), who assume I(0) idiosyncratic components and study a Factor Augmented Error Correction Model. However, not only the assumption of I(0) idiosyncratic components is empirically not supported by typical macroeconomic datasets, as the one analyzed in this paper, but also, as we argue in Section 2, I(0) idiosyncratic components imply “too much cointegration” among the variables x themselves. it Second, under the assumption of I(0) idiosyncratic components, it is possible to separately estimate I(1) non-cointegrated factors and I(0) factors, see Bai (2004). As a consequence, representation (4) becomes trivial, the I(0) factors being the errors terms. On the other hand, if the idiosyncratic terms are either I(0) or I(1), as we assume in the present paper, the estimated factors are all I(1) in general, and estimation of (4) is not trivial. The paper is organized as follows. In Section 2 we summarize and discuss the representation results proved in the companion paper Barigozzi et al. (2016). Moreover, we state the main assumptions of the model. Section 3 establishes consistency and rates for our estimators. In Section 4 we propose an information criterion for determining the number of common trends in a DFM. In Section 5, by means of a Monte Carlo simulation exercise, we study the finite sample properties of our estimators. Finally, in Section 6 we use our model to study the impact of monetary policy and supply shocks for the US economy. In Section 7 we conclude and discuss possible further applications of the model presented. The proofs of our main results and auxiliary lemmas are in Appendix A. 5

2 The Non-Stationary Dynamic Factor model 2.1 I(1) vectors and cointegration ThroughoutthepaperwewilladoptthefollowingdefinitionsforI(0), I(1)andcointegrated stochastic vectors. They are standard (see Johansen, 1995, Ch. 3), except that here the vectorscanbesingular, i.e. theycanbedrivenbyanumberofshocksq andbeofdimension r, with r > q. (I) Consider an r×q matrix A(L) = A +A L+··· , with the assumption that the series 0 1 (cid:80)∞ A zj converges for all complex number z such that |z| < 1+δ for some δ > 0. j=0 j This condition is fulfilled when the entries of A(L) are rational functions of L with no poles inside or on the unit circle (the VARMA case). Given the r-dimensional stationary stochastic vector y = A(L)v , t t where v is a q-dimensional white noise, q ≤ r, we say that y is I(0) if A(1) (cid:54)= 0. t t (II) The r dimensional stochastic vector y is I(1) if ∆y is I(0). t t (III) The r-dimensional I(1) vector y is cointegrated of order c, 0 < c < r, if (1) there t exist linearly independent r-dimensional vectors β , k = 1,2,...,c, such that β(cid:48)y is k k t stationary, (2) if γ(cid:48)y is stationary then γ is a linear combination of the vectors β . t k Some important properties for our model follow from these definitions. Remark 1 (a) Some of the coordinates of an I(1) vector can be stationary. (b) If one of the coordinates of the I(1) vector y is stationary, then y is cointegrated. t t (c) The cointegration rank of y is equal to r minus the rank of A(1). t (d) It easy to see that y is cointegrated with cointegration rank c if and only if y can t t be linearly transformed into a vector whose first c coordinates are stationary and the remaining r − c are I(1). For, let y be cointegrated of order c with cointegration t vectors β , k = 1,2,...,c. Let β = (β β ··· β ) and B = (β β ), where β is k 1 2 c ⊥ ⊥ an r×(r−c) matrix whose columns are linearly independent and orthogonal to the columns of β. Then, the first c coordinates of z = B(cid:48)y are stationary while the t t remaining r−c are I(1). 6

(e) As is well known, in model (1) the factors F are identified up to a linear transfort mation, see Remark 2 for details. Thus, in view of (d), the question whether some of the factors are stationary while the remaining ones are I(1) is perfectly equivalent to the question whether and “how much” the factors are cointegrated, see Bai (2004). (f) Note that if y is I(1) and r > q, then obviously y is cointegrated with cointegration t t rank at least r−q: c = (r−q)+d, 0 ≤ d < q. (5) 2.2 Assumptions on common and idiosyncratic components Under the assumption that F is I(1), defining t x = (x x ··· x )(cid:48), χ = (χ χ ··· χ )(cid:48), ξ = (ξ ξ ··· ξ )(cid:48), Λ = (λ λ ··· λ )(cid:48), t 1t 2t nt t 1t 2t nt t 1t 2t nt 1 2 n equations (1)–(3) become: x = χ +ξ = ΛF +ξ t t t t t (6) ∆F = C(L)u . t t Firstly, we suppose that the I(1) stochastic vector F has an ARIMA representation: t S(L)∆F = Q(L)u , (7) t t or ∆F = C(L)u = S(L)−1Q(L)u , (8) t t t where: (i) u is a q-dimensional white noise, rk(E[u u(cid:48)]) = q; t t t (ii) S(L) is an r × r finite-degree matrix polynomial with no zeros inside or on the unit circle; (iii) S(0) = I ; r (iv) Q(L) is a finite-degree r×q matrix polynomial, Q(1) (cid:54)= 0; (v) rk(Q(0)) = q. Setting d = q−rk(Q(1)), the cointegration rank of F is c = r−rk(Q(1)) = (r−q)+d. t It is easy to show, see Barigozzi et al. (2016), that there exists a non-singular q×q matrix 7

R such that, defining (cid:32) (cid:33) v 1t v = = Ru , t t v 2t where v has dimension d while v has dimension r−c = q−d, the d shocks in v have 1t 2t 1t a temporary effect on F whereas the q −d shocks in v have a permanent effect. Thus t 2t the number of permanent shocks is r minus the cointegration rank, as in the non-singular case, while the number of transitory shocks is the complement to q, not r, as though r −q transitory shocks had a zero coefficient. In applications to macroeconomic datasets permanentandtransitoryshockscanbeinterpretedastheusualsupplyanddemandcauses of fluctuation of the GDP and other key variables. In Sections 5 and 6 permanent and transitory effects on some of the variables x are used to identify structural IRFs. it The main result in Barigozzi et al. (2016), which is crucial for the present paper, is the following. Assumeequation(7)forF , supposethatc = (r−q)+dandsetβ = (β ··· β ). t 1 c Then, for generic values of the parameters in the matrices S(L) and Q(L), F has the t VECM representation: G(L)∆F +αβ(cid:48)F = h+Ku , (9) t t−1 t where: (A) α and β are full rank r×c matrices; (B) K = Q(0); (C) h is a constant vector; (D) G(L) is a finite-degree matrix polynomial with G(0) = I . r ThisVECMrepresentationisobtainedbycombiningtheGrangerRepresentationTheorem with recent results on singular stochastic vectors, see Anderson and Deistler (2008a,b). The existence of a finite-degree inverse of Q(L), see (D) above, is a consequence of singularity. Note that the VECM, fulfilling (A) through (D), holds generically, that is with the exception of a subset of lower dimension—thus except for a negligible subset—in the parameter space (details in Barigozzi et al., 2016). We use this result as a motivation for assuming that (9) holds. In particular, we make the following assumptions on the factors, loadings and idiosyncratic components. Assumption 1 (Common factors) (a) The I(1) r-dimensional stochastic vector F , with cointegration rank c = r−q+d, has t 8

an ARIMA representation S(L)∆F = Q(L)u , t t fulfilling properties (i) through (v), and a VECM representation G(L)∆F = h+αβ(cid:48)F +Ku , t t−1 t fulfilling properties (A) through (D). (b) rk(E[∆F ∆F(cid:48)]) = rk( (cid:80)∞ C C(cid:48)) = r. t t k=0 k k Part (a) of the next assumption implies that the r factors are not redundant, i.e. no representation with a number of factors smaller than r is possible. Assumption 2 (Loadings) (a) There exists an r×r positive definite matrix V such that, as n → ∞, n−1Λ(cid:48)Λ → V. (b) Denoting by λ the (i,j) element of Λ, |λ | < C for some positive real C independent ij ij of i and j. The idiosyncratic components are driven by idiosyncratic shocks with univariate dynamics and are orthogonal to the common components at any lead and lag: Assumption 3 (Idiosyncratic components) (1−ρ L)ξ = d (L)ε , (10) i it i it where (a) ε = (ε ε ··· ε )(cid:48) is a vector white noise; t 1t 2t nt (b) d (L) = (cid:80)∞ d , with (cid:80)∞ k|d | ≤ M , for some positive real M independent of i; i k=0 ik k=0 ik 1 1 (c) |ρ | ≤ 1, so that I(1) idiosyncratic components are allowed; i (d) u and ε are orthogonal for any j = 1,2,...,q, i ∈ N, and t,s ∈ Z. jt is Condition (b) implies square summability of the matrix polynomials in (10) so that ξ it is non-stationary if and only if ρ = 1. Assuming that |ρ | < 1, that is all idiosyncratic i i components are stationary, implies that any p-dimensional vector (x x ··· x ), with i1,t i2,t ip,t p ≥ q − d + 1, would be cointegrated. For, as we have seen above, the factors F are t 9

driven by r − c = q − d permanent shocks and the same holds for the variables x if i ,t h the idiosyncratic components are stationary. For example, if q = 3 and d = 0 then all 4dimensional subvectors of x are cointegrated (3-dimensional if d = 1). Moreover, applying t the test proposed in Bai and Ng (2004) on a panel of 101 quarterly US macroeconomic time series (see Section 6 and Appendix B), one of the datasets typically analysed in the empirical DFM literature, we found that the I(0) hypothesis is rejected for half of the estimated idiosyncratic components. Finally, note that contemporaneous cross-sectional dependence of the white noise ε is t not excluded. More on this in Assumption 4. To prove our consistency results we need enhancing Assumptions 1 and 3, in which we only require that u and the shocks (cid:15) are white noise, orthogonal at any lead and lag. t it Assumption 4 (Common and idiosyncratic shocks) (a) u is a strong orthonormal white noise, i.e. E[u u(cid:48)] = I , u and u are independent t t t q t t−k for any k (cid:54)= 0; (b) E[u4 ] ≤ M , for any j = 1,2,...,q and a positive real M ; jt 2 2 (c) ε = (ε ε ··· ε )(cid:48) is a strong vector white noise; t 1t 2t nt (d) E[|ε |κ1|ε |κ2] ≤ M for any κ +κ = 4, i,j ∈ N and a positive real M ; it jt 3 1 2 3 (e) max (cid:80)n |E[ε ε ]| ≤ M for any n ∈ N and a positive real M ; j=1,2,...,n i=1 it jt 4 4 (f) u and ε are independent for any j = 1,2,...,q, i ∈ N, and t,s ∈ Z. jt is As noted above, contemporaneous cross-sectional dependence of the white noise ε is t allowed. In particular, with condition (e) we require a mild form of sparsity as proposed by Fan et al. (2013) and found empirically in a stationary setting by Boivin and Ng (2006), Bai and Ng (2008), and Luciani (2014). The components of ∆ξ are allowed to be both t cross-sectionally and serially correlated. Condition (f) is in agreement with the economic interpretationofthemodel, inwhichcommonandidiosyncraticshocksaretwoindependent sources of variation. Lemmas 1 and 2 provide basic results for the eigenvalues of the covariance matrices of the idiosyncratic shocks ε and the variables ∆x , ∆χ , ∆ξ . it it it it Lemma 1 Under Assumptions 1 through 4, there exists a positive real M such that µε ≤ 5 1 M and n−1(cid:80)n (cid:80)n |E[ε ε ]| ≤ M for any n ∈ N. 5 i=1 j=1 it jt 5 Lemma 2 Under Assumptions 1 through 4, for any n ∈ N, there exist positive reals M , 6 M , M , M , M and an integer n¯ such that 6 7 8 8 10

(i) M ≤ n−1µ∆χ ≤ M for any j = 1,2,...,r and n > n¯; 6 j 6 (ii) µ∆ξ ≤ M ; 1 7 (iii) M ≤ n−1µ∆x ≤ M for any j = 1,2,...,r and n > n¯; 8 j 8 (iv) µ∆x ≤ M . r+1 7 The results in Lemma 2 are crucial to estimate the number of factors r, the loadings, the differenced factors and the factors themselves. Analogous results on the eigenvalues of the spectral density matrices of the x’s, the χ’s and the ξ’s, allow the estimation of q and the cointegration rank c of the factors F , see Section 4. t Remark 2 In model (6) the factors F are not identified. For, given the non singular r×r t matrix H, (cid:2) (cid:3) x = [ΛH] H−1F +ξ = Λ∗F∗ +ξ . (11) t t t t t Using F∗ implies changes in the matrices in (8) and (9) and the loadings that are easy to t compute: Λ∗ = ΛH, S∗(L) = H−1S(L)H, Q∗(L) = H−1Q(L), C∗(L) = H−1C(L), G∗(L) = H−1G(L)H, α∗ = H−1α, β∗ = Hβ, K∗ = H−1K. Note that Λ∗C∗(L) = ΛC(L), so that the raw IRFs of the x’s with respect to u , corret sponding to the factors F∗ and to the factors F are equal. As a consequence, identification t t of the IRFs based on any economic criterion is independent of the particular factors used. The following choice of the factors is very convenient and will be adopted in the sequel. Let W be the n × r matrix whose columns are the right normalised eigenvectors of the variance-covariance matrix of ∆χ , corresponding to the first r eigenvalues µ∆χ, t j j = 1,2,...,r. Define ∆F∗ = W(cid:48)∆χ . t t Now project ∆χ on ∆F∗: t t ∆χ = A∆F∗ +R . t t t We see that A = W and that the variance-covariance matrices of ∆χ and of W∆F∗ are t t equal, so that R = 0 and the projection becomes ∆χ = WW(cid:48)∆χ , that is t t t (I −WW(cid:48))∆χ = 0. n t 11

Setting χ = 0 we obtain χ = W[W(cid:48)χ ], for t > 0, or, in our preferred specification, 0 t t √ (cid:20) 1 (cid:21) (cid:2) (cid:3) χ = nW √ W(cid:48)χ . (12) t t n We do not need to complicate the notation by introducing new symbols and set henceforth √ 1 1 Λ = nW, F = √ W(cid:48)χ = Λχ . (13) t t t n n Note that now the factors F and the loadings λ , for a given i, depend on n and that in t i the new specification: 1 Λ(cid:48)Λ = I , (14) r n for all n ∈ N. Moreover, the variance-covariance matrix of the differenced factors ∆F is t the diagonal r × r matrix with µ∆χ/n as the (j, j) entry. By Lemma 2, (i), which is a j consequence of Assumption 1, (b) and 2, all such entries are bounded and bounded away from zero. We conclude with the following assumption, which has the consequence that χ = 0, 0 ξ = 0 and x = 0. 0 0 Assumption 5 For all i ∈ N and t ≤ 0, u = 0 and (cid:15) = 0. t it 3 Estimation We proceed in the same way as Stock and Watson (2005) and Forni et al. (2009) do in their stationary setting: (i) we estimate the loadings, the common factors, their VECM dynamics and the raw IRFs, (ii) we identify the structural common shocks and IRFs by imposing a set of restrictions based on economic logic. We observe an n-dimensional vector x over the period 0, 1,..., T, i.e. the n×(T +1) t panel x = (x ···x ). Asymptotics for all our estimators is studied for both n and T 0 T tending to infinity. The number of common factors r, of common shocks q, and of the cointegration relations c = r −q +d is assumed to be known in the present section, their estimation is studied in Section 4. We denote estimated quantities with a hat, like in F(cid:98) , without explicit notation for their t dependence on both n and T. Moreover, (1) the spectral norm of a matrix B is denoted by (cid:107)B(cid:107) = (µB(cid:48)B)1/2, where µB(cid:48)B is the largest eigenvalue of B(cid:48)B, (2) we denote by J a 1 1 diagonal r×r matrix, depending on n and T, whose diagonal entries are either 1 or −1. 12

3.1 Loadings and common factors We start with the model in differences, ∆x = Λ∆F +∆ξ . t t t Consider the n×T data matrix ∆x = (∆x ···∆x ). Let Γ(cid:98) = T−1∆x∆x(cid:48) be the sample 1 T 0 covariance matrix of ∆x and W(cid:99) the n×r matrix with the normalized eigenvectors of Γ(cid:98) , t 0 corresponding to the first r eigenvalues, on the columns. The standard estimators of the loadings and the differenced factors are √ 1 1 Λ(cid:98) = nW(cid:99), ∆F(cid:98) = √ W(cid:99) (cid:48)∆x = Λ(cid:98) (cid:48)∆x . t t t n n Integrating F(cid:98) under the condition x = 0, t 0 √ 1 1 Λ(cid:98) = nW(cid:99), F(cid:98) = √ W(cid:99) (cid:48)x = Λ(cid:98) (cid:48)x . (15) t t t n n Lemma 3 Under Assumptions 1 through 5, as n,T → ∞, (i) Given i, (cid:107)λ(cid:98)(cid:48) −λ(cid:48)J(cid:107) = O (max(n−1/2,T−1/2)); i i p (ii) Given t, (cid:107)∆F(cid:98) −J∆F (cid:107) = O (max(n−1/2,T−1/2)); t t p (iii) Given t, T−1/2(cid:107)F(cid:98) −JF (cid:107) = O (max(n−1/2,T−1/2)). t t p Our proof of statements (i) and (ii) is close to the one given in Forni et al. (2009). However, they make direct assumptions on the estimate of the covariance matrix of the x’s, whereas we start with “deeper” assumptions on common and idiosyncratic components. Moreover, we do not need assuming that the eigenvalues µ∆χj, j = 1,2,...,r, are asymptotically separated. Bai and Ng (2004) define their estimators as in (15) and prove statement (iii). In this respect, their paper and the present one only differ for the technique used in the proof. A significant difference, concerning detrending, which is needed when the actual data x contain a deterministic component, is discussed in Section 3.5. Lemma 3, though interesting per se, is not sufficient to prove our main result on the VECM representation of F and the IRFs. In particular, we need the asymptotic properties t of the sample second moments of F(cid:98) and ∆F(cid:98) . The main results, proved in Appendix A, t t Lemma A5, are worth mentioning here. As n,T → ∞, (I) (cid:107)T−1(cid:80)T ∆F(cid:98) ∆F(cid:98)(cid:48) −E[∆F ∆F(cid:48)](cid:107) = O (max(n−1/2,T−1/2)); t=1 t t t t p 13

(II) T−2(cid:80)T F(cid:98) F(cid:98)(cid:48) → d (cid:82)1 W(τ)W(cid:48)(τ)dτ, where W(·) is an r-dimensional Brownian mot=1 t t 0 tion with finite covariance matrix of rank q −d. 3.2 VECM for the common factors We now turn to estimation of the VECM in (9), with c = r−q+d cointegration relations, see Assumption 1: ∆F = αβ(cid:48)F +G ∆F +w , w = Ku . (16) t t−1 1 t−1 t t t For simplicity, we assume that the degree of G(L) is p = 1. Generalization to any degree, p > 1, is straightforward. As a consequence of Assumption 5 we set h = 0. Different estimators for the cointegration vector, β, are possible. As suggested by the asymptotic and numerical studies in Phillips (1991) and Gonzalo (1994), we opt for the estimation approach proposed by Johansen (1991, 1995). Although typically derived from the maximization of a Gaussian likelihood, this estimator is nothing else but the solution of aneigen-problemnaturallyassociatedtoareducedrankregressionmodel, wherenospecific assumption about the distribution of the errors is made in order to establish consistency (see e.g. Velu et al., 1986).1 Since F are unobserved, we estimate the parameters of (16) by using the estimated t factors F(cid:98) t instead. Denote as (cid:98) e 0t and (cid:98) e 1t the residuals of the least square regressions of ∆F(cid:98) t and of F(cid:98) t−1 on ∆F(cid:98) t−1 , respectively, and define the matrices S(cid:98) ij = T−1(cid:80)T t=1(cid:98) e it(cid:98) e(cid:48) jt . Then, the c cointegration vectors are estimated as the normalized eigenvectors corresponding to the c largest eigenvalues µ , such that, for j = 1,2,...c, (cid:98)j (S(cid:98) −S(cid:98) S(cid:98) −1S(cid:98) )β(cid:98) = µ β(cid:98) . 11 10 00 01 j (cid:98)j j The vectors β(cid:98) are then the c columns of the estimated matrix β(cid:98). The other parameters j of the VECM, α and G , are estimated in a second step as the least square estimators in 1 the regressions of ∆F(cid:98) on β(cid:98)(cid:48)F(cid:98) and on ∆F(cid:98) , respectively. t t−1 t−1 Finally, a linear combination of the q columns of K can be estimated as the first q eigenvectors of the sample covariance matrix of the VECM residuals w , rescaled by the (cid:98)t 1Other existing estimators of the cointegration vector, not considered here, are, for example: ordinary least squares (Engle and Granger, 1987), non-linear least squares (Stock, 1987), principal components (StockandWatson,1988),instrumentalvariables(PhillipsandHansen,1990),anddynamicordinaryleast squares (Stock and Watson, 1993). 14

square root of their corresponding eigenvalues (see Stock and Watson, 2005; Bai and Ng, 2007; Forni et al., 2009, for analogous definitions). This estimator is denoted as K(cid:98). Consistent estimation of (16) in presence of estimated factors, is possible under the following additional assumption. Assumption 6 (a) Let n be the number of I(1) variables among ξ , ξ ,..., ξ (i.e. the number of 1 1t 2t nt idiosyncratic components ξ such that ρ = 1, see Assumption 3). Then n = O(nδ) it i 1 for some δ ∈ [0,1); (b) Tn−(2−δ) → 0, as n,T → ∞; (c) let I and I be the sets {i ≤ n, such that ξ is I(0)} and {i ≤ n, such that ξ is I(1)} 0 1 it it respectively. Then: (cid:88)(cid:88) n−γ |E[ε ε ]| ≤ M it jt 9 i∈I0j∈I1 for some γ < δ, som positive real M and any n ∈ N. 9 Under condition (a), we put an asymptotic limit to the number of I(1) idiosyncratic components. Their number n can grow to infinity but slower than the number of the 1 I(0) components. Condition (b) imposes a constraint on the relative growth rates of n and T and it implies that at least T1/2/n → 0 (when δ = 0). Further motivations for, and the implications of, these two requirements are given below. Finally, with reference to the partitioning of the vector of idiosyncratic components into I(1) and I(0) coordinates, condition (c) limits the dependence between the two blocks more than the dependence within each block, which is in turn controlled by Lemma 1.2 We then have consistency of the estimated VECM parameters. (cid:0) (cid:1) Lemma 4 Define ϑ = max T1/2n−(2−δ)/2,n−(1−δ)/2,T−1/2 . Under Assumptions 1 nT,δ through 6, and given J defined in Lemma 3, there exist a c×c orthogonal matrix Q and a q ×q orthogonal matrix R, such that, as n,T → ∞, (i) (cid:107)β(cid:98)−JβQ(cid:107) = O (T−1/2ϑ ); p nT,δ (ii) (cid:107)α−JαQ(cid:107) = O (ϑ ); (cid:98) p nT,δ (iii) (cid:107)G(cid:98) −JG J(cid:107) = O (ϑ ); 1 1 p nT,δ (iv) (cid:107)K(cid:98) −JKR(cid:107) = O (ϑ ). p nT,δ 2We could in principle consider any γ < 1, in which case the rate of convergence of Lemma 4 and Proposition 1 below would depend also on γ. However, since the main message of those results would be qualitatively unaffected, we impose, for simplicity, γ <δ. 15

The rate of convergence in Lemma 4 is determined by ϑ . In particular, for generic nT,δ values of δ ∈ [0,1) we have  T1/2n−(2−δ)/2 if T1/(2−δ) < n < T,      T−(1−δ)/2 = n−(1−δ)/2 if n = T, ϑ = (17) nT,δ n−(1−δ)/2 if T < n < T1/(1−δ),      T−1/2 if n > T1/(1−δ). Consistency of the estimated parameters is guaranteed if and only if ϑ → 0, as n,T → nT,δ ∞, which is ensured by Assumption 6. The intuitive explanation for Assumption 6 is as follows: due to non-stationarity the factor estimation error grows with T, but since, as defined in (15), the estimated factors are cross-sectional averages of the x’s, we can keep this error under control by allowing for an increasingly large cross-sectional dimension, n. In particular, the factor estimation error is a weighted average of the idiosyncratic components, and therefore the trade-off between n and T depends on how many of those components are non-stationary. The following remarks provide some more intuition about the results in Lemma 4. Remark 3 From (17), we see that the classical T1/2-consistency is achieved if and only if T1/(1−δ)/n → 0, that is when n grows much faster than T. On the other hand, in the case n = O(T), which is of particular interest since it corresponds to typical macroeconomic datasets, the first two rates in ϑ are equal and we have convergence at a rate T(1−δ)/2, nT,δ whichforsmallvaluesofδ isclosetotheclassicalT1/2-rate. Finally,inthecaseδ = 0,which is asymptotically equivalent to saying that all idiosyncratic components are stationary, we need at least T1/2/n → 0 and we have the classical T1/2-consistency if and only if T/n → 0. Remark 4 Due to the factor estimation error we do not have in general the classical Tconsistency for the estimated cointegration vector β(cid:98). Still, β(cid:98) converges to the true value, β, at a faster rate with respect to the rate of consistency of the other VECM parameters. This is enough to consistently apply the two-step VECM estimation as in Johansen (1995). Remark 5 The estimated parameters approach the true parameters only up to three transformations J, Q, and R. First of all, since the estimated factors identify the true ones only up to a sign, determined by J, the same holds for the estimated VECM parameters with obvious multiplications by J. As already explained in Remark 2, this issue does not affect estimation and identification of IRFs. Second, the matrix Q represents the usual 16

indeterminacyintheidentificationofthecointegrationrelations, butagainitsidentification does not affect the IRFs. This is also in agreement with the fact that, in our setup, neither the factors nor their cointegration relations have any economic meaning. Last, the matrix R represents indeterminacy in the identification of the matrix K, and, as discussed below, R has to be determined in order to identify the structural IRFs. 3.3 Common shocks and impulse response functions Throughout the rest of the section we denote the true IRF of x , for i = 1,2,...n, to the it shock u , for j = 1,2,...,q, as (see also (6)) jt (cid:20) (cid:21) c (L) φ (L) = λ(cid:48) j , (18) ij i 1−L where λ(cid:48) is the i-th row of Λ, c (L) is the j-th column of C(L), and the notation used is i j convenient and makes sense, provided that we do not forget that such IRF is not square summable. A VECM(p) with cointegration rank c can also be written as a VAR(p+1) with r−c unit roots. Therefore, after estimating (16), we have the estimated matrix polynomial p+1 (cid:88) A(cid:98)VECM(L) = I − A(cid:98)VECMLk, r k k=1 with coefficients given by A(cid:98)V 1 ECM = G(cid:98) 1 −α (cid:98) β(cid:98) (cid:48) +I r , A(cid:98)VECM = G(cid:98) −G(cid:98) , k = 2,3,...,p (19) k k k−1 A(cid:98)VECM = −G(cid:98) . p+1 p such that rk(A(cid:98)VECM(1)) = rk(α (cid:98) β(cid:98)(cid:48)) = c. Then, for i = 1,2,...,n and j = 1,2,...,q, the raw IRFs estimator is defined as (cid:104) (cid:105)−1 φ(cid:101)VECM(L) = λ(cid:98) (cid:48) A(cid:98)VECM(L) k(cid:98) . (20) ij i j where λ(cid:98)(cid:48) is the i-th row of Λ(cid:98), k(cid:98) is the j-th column of K(cid:98) (see also Lütkepohl, 2006, for an i j explicit expression as function of the VECM parameters). 17

However, since K is not identified, the IRFs in (20) are in general not identified. Now, whileorthogonalityofRinLemma4isapurelymathematicalresultduetonon-uniqueness of eigenvectors, economic theory tells us that the choice of the identifying transformation can be determined by the economic meaning attached to the common shocks, u . We t then need to impose at most q(q − 1)/2 restrictions in order to achieve under- or justidentification.3 In this case, R is a function of the parameters of the model and it can be estimated as a function of the estimated parameters: R(cid:98) ≡ R(cid:98)(Λ(cid:98),A(cid:98)VECM(L),K(cid:98)) (see also Forni et al., 2009, for a discussion). Two examples of restrictions are considered in Section 6 when analyzing real data. The estimated and identified IRFs are then defined by combining the estimated parameters and the identification restrictions. In particular, for i = 1,2,...,n and j = 1,2,...,q, the dynamic reaction of the i-th variable to the j-th common shock is estimated as (cid:104) (cid:105)−1 φ(cid:98)V ij ECM(L) = λ(cid:98) (cid:48) i A(cid:98)VECM(L) K(cid:98) (cid:98) r j , (21) where λ(cid:98)(cid:48) i is the i-th row of Λ(cid:98), (cid:98) r j is the j-th column of R(cid:98). By denoting as φ(cid:98)VECM the k-th coefficient of the polynomial in (21), and as φ the ijk ijk corresponding coefficients of φ (L), we have the following consistency result. ij Proposition 1 (Consistency of Impulse Response Functions based on VECM) Under Assumptions 1 through 6, as n,T → ∞, we have (cid:12) (cid:12) (cid:12)φ(cid:98)VECM −φ (cid:12) = O (ϑ ), (22) (cid:12) ijk ijk(cid:12) p nT,δ for any k ≥ 0, i = 1,2,...,n, and j = 1,2,...,q. The proof of Proposition 1, follows directly by combining Lemma 3(i) and 4. As noticed above this result is not affected by the fact that common factors and their cointegration relations are not identified. All previous remarks on convergence rates apply also in this case. 3In principle any invertible transformation can be considered in order to achieve such identification. However, traditional macroeconomic practice assumes Gaussianity of the shocks and therefore restricts to orthogonal matrices only, that is to uncorrelated common shocks. 18

3.4 The case of unrestricted VAR for the common factors In presence of non-singular cointegrated vectors, several papers have addressed the issue whether and when a VECM or an unrestricted VAR for the levels should be used for estimation. Sims et al. (1990) show that the parameters of a cointegrated VAR are consistently estimated using an unrestricted VAR in the levels. On the other hand, Phillips (1998) shows that if the variables are cointegrated, then the long-run features of the IRFs are consistently estimated only if the unit roots are explicitly taken into account, that is within a VECM specification (see also Paruolo, 1997). This result is confirmed numerically in Barigozzi et al. (2016) also for the singular case, r > q. Nevertheless, since by estimating an unrestricted VAR it is still possible to estimate consistently short run IRFs without the need of determining the number of unit roots and therefore without having to estimate the cointegration relations, this approach has become very popular in empirical research. For this reason, here we also study the properties of IRFs when, following Sims et al. (1990), we consider least squares estimation of an unrestricted VAR(p) model for the common factors.4 For simplicity we fix p = 1 and we replace the VECM model (16) with the VAR F = A F +w , w = Ku . (23) t 1 t−1 t t t Denote by A(cid:98)VAR the least squares estimators of the coefficient matrix, obtained using F(cid:98) , 1 t and by K(cid:98) the estimator of K, which is obtained as in the VECM case but this time starting from the sample covariance of the VAR residuals. Consistency of these estimators is given in the following Lemma. Lemma 5 Under Assumptions 1 through 5, and given J defined in Lemma 3, there exists a q ×q orthogonal matrix R, such that, as n,T → ∞, i. (cid:107)A(cid:98)VAR −JA J(cid:107) = O (max(n−1/2,T−1/2)); 1 1 p ii. (cid:107)K(cid:98) −JKR(cid:107) = O (max(n−1/2,T−1/2)). p This results can be straightforwardly extended to a generic VAR(p) with coefficients A(cid:98)VAR such that k p (cid:88) A(cid:98)VAR(L) = I − A(cid:98)VARLk. r k k=1 4For alternative approaches, not considered here, see for example the fully modified least squares estimation by Phillips (1995). 19

As before, we can compute an estimator R(cid:98) of the identifying matrix R by imposing appropriate economic restrictions on the non-identified IRFs. Then, for i = 1,2,...,n and j = 1,2,...,q, the estimated and identified IRF of the i-th variable to the j-th shock is defined as (cid:104) (cid:105)−1 φ(cid:98)V ij AR(L) = λ(cid:98) (cid:48) i A(cid:98)VAR(L) K(cid:98) (cid:98) r j , (24) where λ(cid:98)(cid:48) i is the i-th row of Λ(cid:98), (cid:98) r j is the j-th column of R(cid:98). After denoting as φ(cid:98)V ij A k R the k-th coefficient of the polynomial in (24), and as φ the corresponding coefficients of φ (L), ijk ij we have the following consistency result. Proposition 2 (Consistency of Impulse Response Functions based on VAR) Under Assumptions 1 through 5, as n,T → ∞, we have (cid:12) (cid:12) (cid:0) (cid:0) (cid:1)(cid:1) (cid:12)φ(cid:98)VAR −φ (cid:12) = O max n−1/2,T−1/2 , (25) (cid:12) ijk ijk(cid:12) p for any finite k ≥ 0, i = 1,2,...,n, and j = 1,2,...,q. Two last remarks are in order. Remark 6 Foranyfinitehorizonk theimpulseresponseφ(cid:98)VAR isalsoaconsistentestimator ijk of φ . This result is consistent with the result for observed variables by Sims et al. (1990). ijk On the other hand, it is also possible to prove that the same unit roots affect the estimated long-run IRFs in such a way that their least squares estimator is no longer consistent, i.e. lim |φ(cid:98)VAR − φ | = O (1) (see Theorem 2.3 in Phillips, 1998). For this reason, k→∞ ijk ijk p Proposition 2 holds only for finite horizons k. Remark 7 For any finite k, the estimator φ(cid:98)VAR can converge faster than φ(cid:98)VECM to the ijk ijk true value φ . However, as shown in the proof of Lemma 5, the rate of convergence of ijk the parameters associated to the non-stationary components is slower than what it would be were the factors observed, that is we do not have super-consistency. This is due to the factors’ estimation error. Moreover, convergence in Proposition 2 is achieved without the need of Assumption 6. In particular, consistency holds even when all idiosyncratic componentsareI(1)andwithoutrequiringanyconstraintontherelativeratesofdivergence of n and T. Summing up, as a consequence of Propositions 1 and 2, the empirical researcher faces a trade-off between (i) estimating correctly the whole IRFs with a slower rate and more restrictive assumptions, as in Proposition 1, or (ii) giving up consistent estimation of the long-run behavior in exchange for a faster rate of convergence, as in Proposition 2. 20

3.5 The case of deterministic trends We conclude this section considering the case of deterministic components. Assumptions 1 and 3 imply E[∆F ] = 0 and E[∆ξ ] = 0. Because of Assumption 5, we also have E[F ] = 0 t t t and E[ξ ] = 0, which imply that no deterministic components are present in the model t for x . However, macroeconomic data often have at least a linear trend, in which case the t model for an observed time series, denoted as y , would read it y = a +b t+λ(cid:48)F +ξ , (26) it i i i t it wherex = λ(cid:48)F +ξ followstheNon-StationaryDFMdescribedbyAssumptions1through it i t it 5. Notice that we also allow for non-zero initial conditions (a (cid:54)= 0), this posing no difficulty i in terms of estimation. The IRFs defined in (18) are then to be considered for the de-trended data, x (see it Section 6 for their economic interpretation in this case), and, therefore, in order to estimate them, we have to first estimate the trend slope, b in (26). This can be done either by dei meaning first differences or by least squares regression, the two approaches respectively giving for i = 1,2,...,n, (cid:101)b = 1 (cid:88) T ∆y = y iT −y i0 , (cid:98)b = (cid:80)T t=0 (t− T 2 )(y it −y¯ i ) . (27) i T it T i (cid:80)T (t− T)2 t=1 t=0 2 Lemma 6 Under Assumptions 1 and 3, for any i = 1,2,...,n and as T → ∞, we have |(cid:101)b −b | = O (T−1/2) and |(cid:98)b −b | = O (T−1/2). If x ∼ I(0) then |(cid:98)b −b | = O (T−3/2). i i p i i p it i i p GiventheseresultsandtheratesinPropositions1and2, theIRFscanstillbeestimated consistently, as described above, also when using de-trended data. However, it has to be noticed that finite sample properties of (cid:98)b and (cid:101)b might differ i i substantially. First, assume to follow Bai and Ng (2004), and consider de-meaning of first differences. Then, from principal component analysis on ∆x = ∆y −(cid:101)b , we can estimate (cid:101)it it i the first differences of the factors, which, once integrated, give us the estimated factors, F(cid:101) , such that, due to differencing, F(cid:101) = 0. Moreover, since the sample mean of ∆x is t 0 (cid:101)it zero by construction, then also ∆F(cid:101) have zero sample mean and therefore we always have t F(cid:101) = F(cid:101) = 0. 0 T If instead we use least squares then we can estimate the factors as in (15) starting directly from x = y −(cid:98)b t, without integrating ∆F(cid:98) . Since, now, in general, ∆x has (cid:98)it it i t (cid:98)it sample mean different from zero, then those estimated factors have F(cid:98) (cid:54)= 0 and F(cid:98) (cid:54)= F(cid:98) . 0 0 T 21

In this paper, we opt for this second solution, while a complete numerical and empirical comparison of the finite sample properties of the two methods is left for further research. 4 Determining the number of factors and shocks In the previous section we made the assumption that r, q, and d are known. Of course this is not the case in practice and we need a method to determine them. Hereafter for simplicity of notation we define τ = q −d, the number of shocks with permanent effects. In light of the results in Lemma 2, we can determine r by using existing methods based on the behaviour of the eigenvalues of the covariance of the variables ∆x . A nonit exhaustive list of possible approaches includes the contributions by Bai and Ng (2002), Onatski (2009), Alessi et al. (2010) and Ahn and Horenstein (2013). In order to determine q and τ, we have instead to study the spectral density matrix of ∆x , ∆χ and ∆ξ , which are defined by it it it 1 Σ∆x(θ) = Σ∆χ(θ)+Σ∆ξ(θ) = ΛC(e−iθ)C(cid:48)(eiθ)Λ(cid:48) +Σ∆ξ(θ), θ ∈ [−π,π]. (28) 2π Lemma 7 provides results for the behaviour of the eigenvalues of these matrices. Lemma 7 Under Assumptions 1 through 4, for any n ∈ N, there exist positive reals M , 9 M , M , M , M and an integer n¯ such that 9 10 11 11 (i) M ≤ n−1µ∆χ(θ) ≤ M a.e. in [−π,π], and for any j = 1,2,...,q and n > n¯; 9 j 9 (ii) sup µ∆ξ(θ) ≤ M ; θ∈[−π,π] 1 10 (iii) M ≤ n−1µ∆x(θ) ≤ M a.e. in [−π,π], and for any j = 1,2,...,q and n > n¯; 11 j 11 (iv) sup µ∆x (θ) ≤ M ; θ∈[−π,π] q+1 10 (v) M ≤ n−1µ∆x(0) ≤ M , for any j = 1,2,...,τ and n > n¯ and µ∆x (0) ≤ M . 12 j 12 τ+1 10 Parts (i) to (iv) are already known in the literature, but part (v) is a consequence of cointegration in the common components and it is determined by C(e−iθ) in (28). Indeed, while rk(C(e−iθ)) = q a.e. in [−π,π], this is clearly not true when θ = 0, since, because of the existence of τ < q common trends, we have rk(C(1)) = τ, which in turn implies rk(Σ∆χ(0)) = τ. Part (v) of Lemma 7 is then just a consequence of Weyl’s inequality. Therefore, based on parts (iii) and (iv) of Lemma 7, we can employ the information criterion by Hallin and Liška (2007) to determine q, by analyzing the behaviour of the eigenvalues of the spectral density matrix Σ∆x(θ) over a window of frequencies (see also 22

Onatski, 2010, for a similar approach).5 Similarly, we propose an information criterion for determining τ based on the behaviour of the eigenvalues of the spectral density matrix Σ∆x(θ) only at zero-frequency, as suggested by part (v) of Lemma 7.6 In particular, consider the lag-window estimator of the spectral density matrix (cid:34) (cid:35) 1 (cid:88) BT 1 T (cid:88) −k Σ(cid:98) ∆x(θ) = ∆x ∆x(cid:48) e−ikθw(B−1k) 2π T t t+k T k=−BT t=1 where B is a suitable bandwidth and w(·) is a positive even weight function. We define T the estimators for q and τ as (cid:20) (cid:18) 1 (cid:88) BT (cid:88) n (cid:19) (cid:21) q = argmin log µ (θ ) +ks(n,T) , (29) (cid:98) (cid:98)j h k=0,...,qmax n(2B T +1) h=−BT j=k+1 (cid:20) (cid:18) n (cid:19) (cid:21) 1 (cid:88) τ = argmin log µ (0) +kp(n,T) , (30) (cid:98) (cid:98)j k=0,...,τmax n j=k+1 where s(n,T) and p(n,T) are some suitable penalty functions, q and τ are given max max maximumnumbersofcommonshocksandtrends,andµ∆x(θ)aretheeigenvaluesofΣ(cid:98)∆x(θ). (cid:98)j Hallin and Liška (2007) show that under suitable asymptotic conditions on B and T s(n,T), the number of common shocks is consistently selected, as n,T → ∞. Analogously, we have sufficient conditions for consistency in the selection of the number of common trends. Proposition 3 (Number of common trends) Defineρ = (B logB T−1)−1/2 andas- T T T sume that (i) as T → ∞, ρ → ∞ and ρ /T → 0; T T (ii) as n,T → ∞, p(n,T) → 0 and (nρ−1)p(n,T) → ∞; T Then, under Assumptions 1 through 4, as n,T → ∞, |τ −τ| = o (1). (cid:98) p Finally, notice that by definition we have τ = r −c which is the number of unit roots drivingthe dynamicsofthecommonfactors. Therefore, by virtueofProposition 3, oncewe 5Othermethodsfordeterminingq,notconsideredinthispaper,areproposedbyAmengualandWatson (2007) and Bai and Ng (2007). Both require knowing r before determining q. 6Analternativeapproachnotconsideredhereisrepresentedbythetestsforcointegrationinpanelswith a factor structure, as for example those proposed by Bai and Ng (2004) and Gengenbach et al. (2015). On the other hand applying the classical methods to determine the cointegration rank or the number of common trends might be problematic due to the use of estimated factors as inputs (see e.g. Stock and Watson, 1988, Phillips and Ouliaris, 1988, Johansen, 1991 and Hallin et al., 2016). 23

determine τ, q, and r, we immediately have an estimates for both the number of transitory shocks d = q −τ and the cointegration rank c = r−q +d = r−τ. 5 Simulations We simulate data, from the Non-Stationary DFM with r = 4 common factors, and q = 3 common shocks, and τ = 1 common trend, thus d = q − τ = 2 and the cointegration relations among the common factors are c = r−q+d = 3. More precisely, for given values of n and T, each time series follows the data generating process: x = λ(cid:48)F +ξ , i = 1,2,...,n, t = 1,2,...,T, it i t it w.n. A(L)F = KRu , u ∼ N(0,I ), t t t q where λ is r ×1 with entries λ ∼ N(0,1), A(L) is r ×r with τ = r −c = 1 unit root, i ij K is r×q, and R, which is necessary for identification of the IRFs, is q ×q. In practice, to generate A(L), we exploit a particular Smith-McMillan factorization (see Watson, 1994) according to which A(L) = U(L)M(L)V(L), where U(L) and V(L) are r ×r polynomial matrices with all of their roots outside the unit circle, and M(L) = diag((1−L)I ,I ). In particular, we set U(L) = (I −U L), and V(L) = I , so that F r−c c r 1 r t follow a VAR(2) with r −c unit roots, or, equivalently, a VECM(1) with c cointegration relations. The diagonal elements of the matrix U are drawn from a uniform distribution 1 on [0.5,0.8], while the off-diagonal elements from a uniform distribution on [0,0.3]. The matrix U is then standardized to ensure that its largest eigenvalue is 0.6. The matrix K 1 ˜ is generated as in Bai and Ng (2007): let K be a r × r diagonal matrix of rank q with ˇ entries drawn from a uniform distribution on [.8,1.2], and let K be a r × r orthogonal matrix, then, K is equal to the first q columns of the matrix K ˇ K ˜ 1. Finally, the matrix R 2 is calibrated such that the following restrictions hold for all the simulated IRFs: φ (0) = 12 φ (0) = φ (0) = 0. 13 23 The idiosyncratic components are generated according to the ARMA model (with a possible unit root) ∞ (cid:88) (1−ρ L)ξ = dkε , ε ∼ N(0,1), E[ε ε ] = 0.5|i−j|, i it i it−k it it jt k=0 where ρ = 1 for i = 1,2,...,m and ρ = 0 for i = m+1,2,...,n, so that m idiosyncratic i i 24

components are non-stationary, while the coefficients d ’s are drawn from a uniform disi tribution on [0,0.5]. Each idiosyncratic component ξ is rescaled so that it accounts for a it third of the variance of the corresponding x . it The matrices Λ, U , G and H are simulated only once so that the set of IRFs to be 1 estimated is always the same, while the vectors of shocks u and ε = (ε ···ε )(cid:48), and t t 1t nt all the idiosyncratic coefficients d ’s are drawn at each replication. Results are based on i 1000 MonteCarlo replications and the goal is to study the finite sample properties of the two estimators of the IRFs discussed in the previous section, for different cross-sectional and sample sizes (n and T) and for a different numbers (m) of non-stationary idiosyncratic components. Tables 1 and 2 show Mean Squared Errors (MSE) for the estimated IRFs simulated with different parameter configurations. Estimation is carried out as explained in Section 3. The loadings’ and factors’ estimators, Λ(cid:98) and F(cid:98) , are always computed as in (15). Then t on F(cid:98) we fit either a VECM as in (16) or an unrestricted VAR as in (23) (in both cases a t constant is also included in the estimation). The numbers r, q, and τ are assumed to be known. Let φ(cid:98) (h) be the kth coefficient of the estimated IRF of the ith variable to the jth shock ijk at the hth replication when using a VECM or a VAR and let φ be the corresponding ijk coefficientofthetruesimulatedIRFdefinedin(18), then, MSEsarecomputedwithrespect to all replications, all variables, and all shocks: n q 1000 1 (cid:88)(cid:88)(cid:88)(cid:16) (cid:17)2 MSE(k) = φ(cid:98) (h) −φ . 1000nq ijk ijk i=1 j=1 h=1 From Table 1 we can see that in the VECM case the estimation error decreases monotonically as n and T grow, while it is larger at higher horizons. Notice that, in accordance with Proposition 1 which states that the estimation error is inversely related to the number of non-stationary idiosyncratic components, for every couple of n and T the MSE decreases for smaller values of m. The picture offered by Table 2 is slightly different than the one offered by Table 1. On the one hand, at short horizons the MSE of when considering a VAR is comparable to, or slightly smaller than, the MSE when considering a VECM. This is in accordance with the result of Propositions 1 and 2 according to which the converge rate for VAR is faster than for VECM. On the other hand, at longer horizons, the MSE for the VAR case is always larger than the MSE for the VECM case. Again this is in accordance with the fact that 25

Table 1: MonteCarlo Simulations - Impulse Responses Mean Squared Errors VECM Estimation T n m k = 0 k = 1 k = 4 k = 8 k = 12 k = 16 k = 20 100 100 25 0.080 0.113 0.249 0.350 0.380 0.387 0.389 100 100 50 0.078 0.115 0.276 0.425 0.490 0.513 0.521 100 100 75 0.079 0.125 0.316 0.518 0.624 0.671 0.691 100 100 100 0.074 0.129 0.344 0.575 0.706 0.765 0.792 200 200 50 0.037 0.050 0.114 0.166 0.190 0.201 0.207 200 200 100 0.035 0.053 0.132 0.211 0.267 0.306 0.332 200 200 150 0.035 0.058 0.152 0.253 0.331 0.389 0.429 200 200 200 0.034 0.064 0.169 0.269 0.352 0.419 0.469 300 300 75 0.024 0.033 0.076 0.111 0.130 0.140 0.146 300 300 150 0.023 0.037 0.093 0.136 0.166 0.189 0.206 300 300 225 0.022 0.041 0.108 0.159 0.201 0.238 0.270 300 300 300 0.021 0.044 0.121 0.183 0.238 0.291 0.338 MSE for the estimated IRFs by fitting a VECM on F(cid:98)t as in (16). T is the number of observations, n is the numberofvariables,andmisthenumberofI(1)idiosyncraticcomponents. Table 2: MonteCarlo Simulations - Impulse Responses Mean Squared Errors Unrestricted VAR Estimation T n m k = 0 k = 1 k = 4 k = 8 k = 12 k = 16 k = 20 100 100 25 0.081 0.110 0.267 0.527 0.747 0.904 1.013 100 100 50 0.076 0.112 0.287 0.552 0.772 0.930 1.043 100 100 75 0.078 0.123 0.313 0.596 0.822 0.979 1.088 100 100 100 0.072 0.122 0.333 0.624 0.858 1.018 1.123 200 200 50 0.038 0.050 0.125 0.250 0.384 0.511 0.625 200 200 100 0.036 0.053 0.142 0.275 0.415 0.548 0.667 200 200 150 0.034 0.057 0.157 0.285 0.419 0.549 0.667 200 200 200 0.033 0.064 0.173 0.308 0.449 0.587 0.710 300 300 75 0.023 0.032 0.083 0.165 0.257 0.352 0.444 300 300 150 0.023 0.037 0.102 0.185 0.278 0.377 0.474 300 300 225 0.022 0.041 0.114 0.195 0.287 0.387 0.486 300 300 300 0.022 0.046 0.128 0.210 0.300 0.398 0.495 MSEfortheestimatedIRFsbyfittingaVARonF(cid:98)t asin(23). T isthenumberofobservations,nisthenumber ofvariables,andmisthenumberofI(1)idiosyncraticcomponents. long run IRFs estimated with an unrestricted VAR in levels are known to be asymptotically biased. Finally, for the same data generating process considered above, we study the performance of the information criterion (30), proposed in Section 4 for determining τ. Table 3 shows the percentage of times in which we estimate correctly the number of common trends 26

Table 3: MonteCarlo Simulations - Number of Common Trends and Shocks Percentage of Correct Answer T n m τ = τ q = q (cid:98) (cid:98) 100 50 25 98.6 96.5 100 50 50 99.2 99.8 100 100 50 98.7 100 100 100 100 99.8 100 100 200 100 96.5 100 100 200 200 99.9 100 200 50 25 99.6 100 200 50 50 100 100 200 100 50 99.9 100 200 100 100 100 100 200 200 100 99.7 100 200 200 200 100 100 Percentageofcasesinwhichtheinformationcriteria(29)and(30)returned thecorrectnumberofcommonshocks(q=q)andofcommontrends(τ = (cid:98) (cid:98) τ). T isthenumberofobservations,nisthenumberofvariables,andmis thenumberofI(1)idiosyncraticcomponents. τ = 1. For the sake of comparison, we also report results for the information criterion (29) by Hallin and Liška (2007) for estimating q = 3. It has to be noticed that the actual implementation of these criteria requires a procedure of fine tuning of the penalty. Indeed, according to the asymptotic results in Hallin and Liška (2007) and in Proposition 3, for any constant c > 0, the functions cs(n,T) and cp(n,T) are also admissible penalties, and, therefore a whole range of values of c should be explored. For this reason, numerical studies about the performance of these methods are computationally intensive, thus we limit ourselves to a small scale study and we leave to further research a thorough comparison of the estimator proposed in (30) with other possible methods. Still our results are promising, since our criterion seems to work fairly well by giving the correct answer more than 95% of the times. 6 Empirical application InthisSectionweestimatetheNon-StationaryDFMtostudytheeffectsofmonetarypolicy shocks and of supply shocks. We consider a large macroeconomic dataset comprising 101 quarterly series from 1960Q3 to 2012Q4 describing the US economy, where the complete list of variables and transformations is reported in Appendix B. All variables that are I(1) are not transformed, while we take first differences of those that are I(2). We then remove 27

deterministic component as described at the end of Section 3, therefore the IRFs presented in this section have to be interpreted as out of trend deviations. The model is estimated as explained in Section 3. We find evidence of r = 7 common factors as suggested both by the criteria in Alessi et al. (2010) and in Bai and Ng (2002), and of q = 3 common shocks as given by the criterion in Hallin and Liška (2007). Finally, using the information criterion described in Section 4, we find evidence of just one common stochastic trend, τ = 1, thus d = 2 shocks have no long-run effect but the cointegration rank for the common factors is c = 6, due to singularity of the common factors (r > q). We then consider two different identification schemes. First, we study the effects of a monetary policy shock, which is identified by using a standard recursive identification scheme, accordingtowhichGDPandCPIdonotreactcontemporaneouslytothemonetary policy shock (see e.g. Forni and Gambetti, 2010). Second, we study the effects of a supply shock, which is identified as the only shock having a permanent effect on the system (see e.g. King et al., 1991; Forni et al., 2009). Results of these two exercises are presented in Figures 1 and 2, respectively. In both figures the black lines are the IRFs obtained by fitting a VECM on F(cid:98) and the grey lines are the IRFs obtained by fitting an unrestriced t VAR on F(cid:98) . The dotted black lines and the grey shaded areas are the respective 68% t bootstrap confidence bands. Figure 1 shows the IRFs to a monetary policy shock normalized so that at impact it raisestheFederalFundsrateby50basispoints. GDPandResidentialInvestmentsrespond negatively to a contractionary monetary policy shock, and then they revert to the baseline. Similarly, consumer prices, which are modeled as I(2), stabilize, meaning that inflation reverts to zero. These IRFs, and in particular their long-run behaviour, are consistent with economic theory according to which a monetary policy shock has only a transitory effect on the economy. On the contrary, the IRFs estimated with a stationary DFM, i.e. with data in first differences, display non-plausible permanent effects of monetary policy shocks on all variables (not shown here). Notice also that there is no significant difference between estimates obtained using a VECM or an unrestricted VAR for the factors. Finally, the IRFs in Figure 1 are very similar, both in terms of shape and in terms of size, to those obtained with Large Bayesian VARs estimated in levels (see e.g. Giannone et al., 2015). Figure 2 shows the IRFs to a supply policy shock normalized so that at impact it increases GDP of 0.25%. All variables have a hump shaped response, with a maximum between six and seven quarters after the shock. The deviation from the trend estimated by fitting a VECM is 0.23% after ten years, and 0.12% after twenty years and onwards. 28

Figure 1: Impulse Response Functions to a Monetary Policy Shock Gross Domestic Product Consumer Price Index 0 0 −0.1 −0.4 −0.2 −0.8 −0.3 −1.2 −0.4 −1.6 −0.5 −2 −0.6 −2.4 −0.7 −2.8 −0.8 −3.2 −0.9 −3.6 −1 −4 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 Years After the shock Years After the shock Federal Funds Rate Residential Investments 0.6 0 0.5 −0.5 0.4 −1 0.3 −1.5 0.2 −2 −2.5 0.1 −3 0 −3.5 −0.1 −4 −0.2 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 Years After the shock Years After the shock SolidblacklinesaretheIRFsobtainedfromtheNon-StationaryDFMbyestimatingaVECMonF(cid:98)twith68%bootstrap confidencebands(dashed). SolidgreylinesaretheIRFsobtainedfromtheNon-StationaryDFMbyestimatingaVAR on F(cid:98)t with 68% confidence bands (shaded areas). The monetary policy shock is normalized so that at impact it increasestheFederalFundsrateof50basispoints. Differently from the results in Figure 1, while the IRFs obtained using a VECM or an unrestricted VAR show no difference in the short-run, at very long horizons significant differences appear. Notably, the IRFs estimated by fitting an unrestricted VAR tend to diverge. This result is consistent with lack of consistency of long-run IRFs obtained without imposing the presence of unit roots (see Proposition 2). Indeed, when, as in this case, we fit an unrestricted VAR on F(cid:98) and we impose long-run identifying restrictions, we are actually t imposing constraints on a matrix which is not consistently estimated. This unavoidably compromises the estimated structural responses. Differently from the case of a monetary policy shock, economic theory does not tell us neither what should be the long-run effect of a supply shock, besides being permanent, nor what should be the shape of the induced dynamic response. Hence, we cannot say a priori whether the effect found is realistic or not. While with our approach, we find that a supply 29

shock induces on GDP a permanent deviation of about 0.12% from its historical trend, with a stationary DFM we find a deviation of about 0.67% (not shown here). Finally, notice that similar IRFs are found also in Dedola and Neri (2007) and Smets and Wouters (2007), when employing other estimation techniques. To summarize, the empirical analysis of this section shows that the proposed Non- Stationary DFM is able to reproduce the main features of the dynamic effects of both temporary and permanent shocks postulated by macroeconomic theory 7 Conclusions In this paper, we propose a Non-Stationary Dynamic Factor Model (DFM) for large datasets. The natural use of these class of models in a macroeconomic context motivates the main assumptions upon which the present theory is built. This paper is complementary to another one where we address representation theory (Barigozzi et al., 2016). Estimation of impulse response functions (IRFs) is obtained with a two-step estimator based on approximate principal components, and on a VECM—or an unrestriced VAR model—for the latent I(1) common factors. This estimator is consistent when both the cross-sectional dimension n and the sample size T of the dataset grow to infinity. Furthermore, we also propose an information criterion to determine the number of common trends in a large dimensional setting. A numerical and empirical study show the validity and usefulness of our approach. The results of this paper are useful beyond estimation of IRFs in Non-Stationary DMFs. First, our estimation approach could also be used for estimating and validating Dynamic Stochastic General Equilibrium models in a data-rich environment (see Boivin and Giannoni, 2006, for the stationary case). Second, with such goal in mind and given the state-space form of our model, we could think of Quasi Maximum Likelihood estimation (see Doz et al., 2012, for the stationary case), thus allowing us to impose economically relevant restrictions on the model parameters. Third, our asymptotic results could be straightforwardly extended to estimation of IRFs in a non-stationary Factor Augmented VAR setting (see Bai and Ng, 2006, for the stationary case) and form the theoretical foundation of the existing empirical studies based non-stationary factor models (see e.g. Eickmeier, 2009; Banerjee et al., 2017). Last, our approach could be generalized to build an unrestricted Non-Stationary DMF, similar to the one proposed by Forni et al. (2017, 2015) for stationary data. 30

Figure 2: Impulse Response Functions to a Supply Shock Gross Domestic Product Consumption: Services 1.1 1.4 1 0.9 1.2 0.8 1 0.7 0.8 0.6 0.5 0.6 0.4 0.4 0.3 0.2 0.2 0.1 0 0 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45 Years After the Shock Years After the Shock Consumption: Nondurable Goods Consumption: Durable Goods 0.9 3.6 0.8 3.2 0.7 2.8 0.6 2.4 0.5 2 0.4 1.6 0.3 1.2 0.2 0.8 0.1 0.4 0 0 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45 Years After the Shock Years After the Shock Residential Investment Nonresidential Investment 6.3 3.6 5.6 3.2 4.9 2.8 4.2 2.4 3.5 2 2.8 1.6 2.1 1.2 1.4 0.8 0.7 0.4 0 0 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45 Years After the Shock Years After the Shock SolidblacklinesaretheIRFsobtainedfromtheNon-StationaryDFMbyestimatingaVECMonF(cid:98)twith68%bootstrap confidencebands(dashed). SolidgreylinesaretheIRFsobtainedfromtheNon-StationaryDFMbyestimatingaVAR onF(cid:98)t with68%confidencebands(shadedareas). ThesupplyshockisnormalizedsothatatimpactitincreasesGDP of0.25%. References Ahn, S. C. and A. R. Horenstein (2013). Eigenvalue ratio test for the number of factors. Econometrica 81, 1203–1227. 31

Alessi, L., M. Barigozzi, and M. Capasso (2010). Improved penalization for determining the number of factors in approximate static factor models. Statistics and Probability Letters 80, 1806–1813. Altissimo, F., R. Cristadoro, M. Forni, M. Lippi, and G. Veronese (2010). New eurocoin: Tracking economic growth in real time. The Review of Economics and Statistics 92, 1024–1034. Amengual, D. and M. W. Watson (2007). Consistent estimation of the number of dynamic factors in a large N and T panel. Journal of Business and Economic Statistics 25, 91–96. Anderson, B. D. and M. Deistler (2008a). Generalized linear dynamic factor models–A structure theory. IEE Conference on Decision and Control. Anderson, B.D.andM.Deistler(2008b). Propertiesofzero-freetransferfunctionmatrices. SICE Journal of Control, Measurement and System Integration 1, 284–292. Bai, J. (2004). Estimating cross-section common stochastic trends in nonstationary panel data. Journal of Econometrics 122, 137–183. Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models. Econometrica 70, 191–221. Bai, J. and S. Ng (2004). A PANIC attack on unit roots and cointegration. Econometrica 72, 1127–1177. Bai, J. and S. Ng (2006). Confidence intervals for diffusion index forecasts and inference for factor augmented regressions. Econometrica 74, 1133–1150. Bai, J. and S. Ng (2007). Determining the number of primitive shocks in factor models. Journal of Business and Economic Statistics 25, 52–60. Bai, J. and S. Ng (2008). Forecasting economic time series using targeted predictors. Journal of Econometrics 146(2), 304–317. Banerjee, A., M. Marcellino, and I. Masten (2017). Structural FECM: Cointegration in large-scale structural FAVAR models. Journal of Applied Econometrics. https://doi.org/10.1002/jae.2570. Barigozzi, M., A. M. Conti, and M. Luciani (2014). Do euro area countries respond asymmetrically to the common monetary policy? Oxford Bulletin of Economics and Statistics 76, 693–714. Barigozzi, M., M. Lippi, and M. Luciani (2016). Dynamic factor models, cointegration, and error correction mechanisms. http://arxiv.org/abs/1510.02399. Boivin, J. and M. Giannoni (2006). DSGE models in a data-rich environment. Technical Report Working Paper No. 12772, National Bureau of Economic Research. Boivin, J. and S. Ng (2006). Are more data always better for factor analysis? Journal of Econometrics 127, 169–194. 32

Cristadoro, R., M. Forni, L. Reichlin, and G. Veronese (2005). A core inflation indicator for the euro area. Journal of Money Credit and Banking 37(3), 539–560. Davis, C. and W. M. Kahan (1970). The rotation of eigenvectors by a perturbation. III. SIAM Journal on Numerical Analysis 7, 1–46. Dedola, L. and S. Neri (2007). What does a technology shock do? A VAR analysis with model-based sign restrictions. Journal of Monetary Economics 54, 512–549. Doz, C., D. Giannone, and L. Reichlin (2012). A quasi maximum likelihood approach for large approximate dynamic factor models. The Review of Economics and Statistics 94(4), 1014–1024. Eickmeier, S. (2009). Comovements and heterogeneity in the euro area analyzed in a non-stationary dynamic factor model. Journal of Applied Econometrics 24, 933–959. Engle, R. F. and C. W. J. Granger (1987). Cointegration and error correction: Representation, estimation, and testing. Econometrica 55, 251–76. Fan, J., Y. Liao, and M. Mincheva (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75, 603–680. Forni, M. and L. Gambetti (2010). The dynamic effects of monetary policy: A structural factor model approach. Journal of Monetary Economics 57, 203–216. Forni, M., D. Giannone, M. Lippi, and L. Reichlin (2009). Opening the black box: Structural factor models versus structural VARs. Econometric Theory 25, 1319–1347. Forni, M., M. Hallin, M. Lippi, and L. Reichlin (2000). The Generalized Dynamic Factor Model: Identification and estimation. The Review of Economics and Statistics 82, 540– 554. Forni, M., M. Hallin, M. Lippi, and L. Reichlin (2005). The Generalized Dynamic Factor Model: One sided estimation and forecasting. Journal of the American Statistical Association 100, 830–840. Forni, M., M. Hallin, M. Lippi, and P. Zaffaroni (2015). Dynamic factor models with infinite-dimensional factor spaces: One-sided representations. Journal of Econometrics 185, 359–371. Forni, M., M. Hallin, M. Lippi, and P. Zaffaroni (2017). Dynamic factor models with infinite dimensional factor space: Asymptotic analysis. Journal of Econometrics 199, 74–92. Forni, M. and M. Lippi (2001). The Generalized Dynamic Factor Model: Representation theory. Econometric Theory 17, 1113–1141. Gengenbach, C., J.-P. Urbain, and J. Westerlund (2015). Error correction testing in panels with common stochastic trends. Journal of Applied Econometrics 31, 982–1004. Giannone, D., M. Lenza, and G. E. Primicieri (2015). Prior selection for vector autore- 33

gressions. The Review of Economics and Statistics 97, 436–451. Giannone, D., L. Reichlin, and L. Sala (2005). Monetary policy in real time. In M. Gertler and K. Rogoff (Eds.), NBER Macroeconomics Annual 2004. MIT Press. Giannone, D., L. Reichlin, and D. Small (2008). Nowcasting: The real-time informational content of macroeconomic data. Journal of Monetary Economics 55, 665–676. Gonzalo, J. (1994). Five alternative methods of estimating long-run equilibrium relationships. Journal of Econometrics 60, 203–233. Hallin, M. and R. Liška (2007). Determining the number of factors in the general dynamic factor model. Journal of the American Statistical Association 102, 603–617. Hallin, M., R. Van den Akker, and B. J. Werker (2016). Semiparametric error-correction models for cointegration with trends: pseudo-Gaussian and optimal rank-based tests of the cointegration rank. Journal of Econometrics 190, 46–61. Hamilton, J.D.(1994). Time Series Analysis. Princeton, NewJersey: PrincetonUniversity Press. Johansen, S.(1991). EstimationandhypothesistestingofcointegrationvectorsinGaussian vector autoregressive models. Econometrica 59, 1551–80. Johansen, S. (1995). Likelihood-based inference in cointegrated vector autoregressive models (First ed.). Oxford: Oxford University Press. King, R., C. Plosser, J. H. Stock, and M. W. Watson (1991). Stochastic trends and economic fluctuations. The American Economic Review 81, 819–840. Luciani, M. (2014). Forecasting with approximate dynamic factor models: The role of non-pervasive shocks. International Journal of Forecasting 30, 20–29. Luciani, M. (2015). Monetary policy and the housing market: A structural factor analysis. Journal of Applied Econometrics 30, 199–218. Lütkepohl, H. (2006). Structural vector autoregressive analysis for cointegrated variables. In O. Hübler and J. Frohn (Eds.), Modern Econometric Analysis. Surveys on Recent Developments. Springer Berlin Heidelberg. Onatski, A. (2009). Testing hypotheses about the number of factors in large factor models. Econometrica 77, 1447–1479. Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues. The Review of Economics and Statistics 92, 1004–1016. Paruolo, P. (1997). Asymptotic inference on the moving average impact matrix in cointegrated I(1) VAR systems. Econometric Theory 13, 79–118. Peña, D. and P. Poncela (2006). Nonstationary dynamic factor analysis. Journal of Statistical Planning and Inference 136, 1237–1257. Phillips, P. C. (1991). Optimal inference in cointegrated systems. Econometrica 59, 238– 34

306. Phillips, P. C. (1995). Fully modified least squares and vector autoregression. Econometrica 63, 1023–1078. Phillips, P. C. (1998). Impulse response and forecast error variance asymptotics in nonstationary VARs. Journal of Econometrics 83, 21–56. Phillips, P. C. and S. N. Durlauf (1986). Multiple time series regression with integrated processes. The Review of Economic Studies 53, 473–495. Phillips, P. C. and B. E. Hansen (1990). Statistical inference in instrumental variables regressions with I(1) processes,. The Review of Economic Studies 57, 99–125. Phillips, P. C. and S. Ouliaris (1988). Testing for cointegration using principal components methods. Journal of Economic Dynamics and Control 12, 205–230. Phillips, P. C. and V. Solo (1992). Asymptotics for linear processes. The Annals of Statistics, 971–1001. Sims, C., J. H. Stock, and M. W. Watson (1990). Inference in linear time series models with some unit roots. Econometrica 58, 113–144. Smets, F. and R. Wouters (2007). Shocks and frictions in US business cycles: A Bayesian DSGE approach. The American Economic Review 97, 586–606. Stock, J. H. (1987). Asymptotic properties of least squares estimators of cointegrating vectors. Econometrica 55, 1035–1056. Stock, J. H. and M. W. Watson (1988). Testing for common trends. Journal of the American Statistical Association 83, 1097–1107. Stock, J. H. and M. W. Watson (1993). A simple estimator of cointegrating vectors in higher order integrated systems. Econometrica 61, 783–820. Stock, J. H. and M. W. Watson (2002a). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 97, 1167– 1179. Stock,J.H.andM.W.Watson(2002b). Macroeconomicforecastingusingdiffusionindexes. Journal of Business and Economic Statistics 20, 147–162. Stock, J. H. and M. W. Watson (2005). Implications of dynamic factor models for VAR analysis. Working Paper 11467, NBER. Velu, R. P., G. C. Reinsel, and D. W. Wichern (1986). Reduced rank models for multiple time series. Biometrika 73, 105–118. Watson, M. W. (1994). Vector autoregressions and cointegration. In R. Engle and D. Mc- Fadden (Eds.), Handbook of Econometrics, Volume IV. Elsevier Science. Yu, Y., T.Wang, andR.J.Samworth(2015). AusefulvariantoftheDavis–Kahantheorem for statisticians. Biometrika 102, 315–323. 35

A Technical appendix Preliminary definitions and notation Norms. For any m×p matrix B with generic element b , we denote its spectral norm as (cid:107)B(cid:107) = ij (cid:113) µB(cid:48)B, whereµB(cid:48)B isthelargesteigenvalueofB(cid:48)B, theFrobeniusnormas(cid:107)B(cid:107) = (cid:112) tr(B(cid:48)B) = 1 1 F (cid:113) (cid:80) (cid:80) b2 , and the column and row norm as (cid:107)B(cid:107) = max (cid:80) |b | and (cid:107)B(cid:107) = max (cid:80) |b |, i j ij 1 j i ij ∞ i j ij respectively. We use the following properties. 1. Subadditivity of the norm, for an m×p matrix A and a p×s matrix B: (cid:107)AB(cid:107) ≤ (cid:107)A(cid:107) (cid:107)B(cid:107). (A1) 2. Norm inequalities, for an n×n symmetric matrix A: (cid:112) µA = (cid:107)A(cid:107) ≤ (cid:107)A(cid:107) (cid:107)A(cid:107) = ||A|| , (cid:107)A(cid:107) ≤ (cid:107)A(cid:107) . (A2) 1 1 ∞ 1 F 3. Weyl’s inequality, for two n×n symmetric matrices A and B, with eigenvalues µA and µB j j |µA−µB| ≤ (cid:107)A−B(cid:107), j = 1,...,n. (A3) j j Factors’ dynamics. Itisconvenienttowritetheequationsgoverningthedynamicsofthefactors, (8), as q (cid:88) ∆F = c(cid:48)(L)u = c (L)u , j = 1,...r, (A4) jt j t jl lt l=1 where c (L) is an q×1 infinite rational polynomial matrix with entries c (L). Due to rationality, j jl there exists a positive real K such that 1 ∞ (cid:88) sup c2 ≤ K . (A5) jlk 1 j=1,...,r k=0 l=1,...,q From Assumption 5 we also have F = (cid:80)t c(cid:48)(L)u . jt s=1 j s Idiosyncratic dynamics. Likewise, for the idiosyncratic components it is convenient to write (10) as ∆ξ = dˇ(L)ε , i = 1,...,n, it i it where dˇ(L) are a infinite polynomials defined as dˇ(L) = (1−L)(1−ρ L)−1d (L) with d (L) also i i i i i infinite polynomials. Because of Assumption 3(b) there exists a positive real K such that 2 ∞ (cid:88) sup dˇ2 ≤ K . (A6) ik 2 i=1,...,n k=0 36

Moreover, with reference to Assumption 6(a) we have ρ = 1 if i ∈ I and |ρ | < 1 if i ∈ Ic. i 1 i 1 Hence, by Assumptions 5, we have also ξ = (cid:80)t dˇ(L)ε , which is non-stationary if and only it s=1 i is if i ∈ I . 1 Rates. Wedefineζ = max(T1/2n−(2−δ)/2,n−(1−δ)/2),withδ ≥ 0,andϑ = max (cid:0) ζ ,T−1/2(cid:1) . nT,δ nT,δ nT,δ Under Assumptions 6(a) and 6(b), we have ζ → 0 and ϑ → 0, as n,T → ∞. nT,δ nT,δ A.1 Proof of Lemma 1 First notice that, from Assumption 4(e), we have n n 1 (cid:88) (cid:88) |E[ε ε ]| ≤ max |E[ε ε ]| ≤ M . it jt it jt 4 n i=1,...,n i,j=1 j=1 Define Γε = E[ε ε(cid:48)], then Assumption 4(e) reads (cid:107)Γε(cid:107) ≤ M , thus, from (A2), we have µε = 0 t t 0 1 4 1 (cid:13) (cid:13)Γε 0 (cid:13) (cid:13) ≤ (cid:13) (cid:13)Γε 0 (cid:13) (cid:13) 1 ≤ M 4 . By setting M 5 = M 4 , we complete the proof. (cid:3) A.2 Proof of Lemma 2 DefineΓ∆F = E[∆F ∆F(cid:48)]. Then, wecanwriteΓ∆F = W∆FM∆FW∆F(cid:48), whereW∆F isther×r 0 t t 0 matrix of normalized eigenvectors and M∆F the corresponding diagonal matrix of eigenvalues. Now, define a new n×r loadings matrix L = ΛW∆F(M∆F)1/2. Under (14), this matrix satisfies Assumption 2(a) since L(cid:48)L = M∆F, (A7) n and by Assumption 1(b) and square summability of the coefficients given in (A5), all eigenvalues of Γ∆F are positive and finite, i.e. there exist positive reals M and M such that 0 6 6 M ≤ µ∆F ≤ M , j = 1,...,r. (A8) 6 j 6 Then, the covariance matrix of the first differences of the common component is given by Γ∆χ ΛW∆FM∆FW∆F(cid:48)Λ(cid:48) LL(cid:48) 0 = = . n n n Therefore, the non-zero eigenvalues of Γ∆χ are the same as those of L(cid:48)L, and from (A7), we have 0 for any n, n−1µ∆χ = µ∆F, for any j = 1,...,r. Part (i) then follows from (A8). j j As for part (ii), we have ∞ µ∆ 1 ξ = (cid:13) (cid:13)Γ∆ 0 ξ(cid:13) (cid:13) ≤ (cid:88)(cid:13) (cid:13)Dˇ k (cid:13) (cid:13) 2 (cid:13) (cid:13)Γε 0 (cid:13) (cid:13) ≤ K 2 M 4 = M 7 , (A9) k=0 because of square summability of the coefficients, with K defined in (A6), and from Lemma 1. 2 Finally, parts (iii) and (iv) are immediate consequences of Assumption 3(d) of uncorrelated common and idiosyncratic shocks, which implies that Γ∆x = Γ∆χ+Γ∆ξ and of Weyl’s inequality 0 0 0 37

(A3). So, because of parts (i) and (ii), there exist positive reals M and M , such that, for 8 8 j = 1,...,r, and for any n ∈ N, µ∆x µ∆χ µ∆ξ µ∆ξ M j ≤ j + 1 ≤ M + 1 ≤ M + 7 = M , 6 6 8 n n n n n µ∆x µ∆χ µ∆ξ µ∆ξ j j n n ≥ + ≥ M + = M , n n n 6 n 8 This proves part (iii). When j = r+1, using parts (i) and (ii), and since rk(Γ∆χ) = r, we have 0 µ∆x ≤ µ∆χ +µ∆ξ = µ∆ξ ≤ M , thus proving part (iv). This completes the proof. (cid:3) r+1 r+1 1 1 7 A.3 Proof of Lemma 3 Intermediate results Lemma A1 Define the covariance matrix Γ∆x = E[∆x ∆x(cid:48)] with generic (i,j)-th element γ∆x = 0 t t ij E[∆x ∆x ]. Then, under Assumptions 1 through 5, as T → ∞, |T−1(cid:80)T ∆x ∆x −γ∆x| = it jt t=1 it it ij O (T−1/2), for any i,j = 1,...,n. p Proof. Firstnoticethatγ∆x = λ(cid:48)Γ∆Fλ +γ∆ξ,whereλ(cid:48) isthei-throwofΛ,Γ∆F = E[∆F ∆F(cid:48)], ij i 0 j ij i 0 t t and γ∆ξ = E[∆ξ ∆ξ ]. Then, we also have ij it jt (cid:20) 1 (cid:88) T (cid:21) 1 (cid:88) T (cid:20)(cid:18) (cid:88) ∞ (cid:19)(cid:18) (cid:88) ∞ (cid:19)(cid:48)(cid:21) (cid:88) ∞ E ∆F ∆F(cid:48) = E C u C u = C C(cid:48) = Γ∆F, (A10) T t t T k t−k k(cid:48) t−k(cid:48) k k 0 t=1 t=1 k=0 k(cid:48)=0 k=0 where we used Assumption 1(a) which implies that u is a white noise. Moreover, rk(Γ∆F) = r t 0 because of Assumption 1(b), and (cid:107)Γ∆F(cid:107) = O(1) because of square summability of the coeffi- 0 cients given in (A5). Hence, Γ∆F is well defined. For the idiosyncratic component we trivially 0 have E[T−1(cid:80)T ∆ξ ∆ξ ] = γ∆ξ, therefore by Assumption 3(d) of uncorrelated common and t=1 it jt ij idiosyncratic shocks, E[T−1(cid:80)T ∆x ∆x ] = γ∆x. t=1 it it ij 38

Now, denote as γ∆F the generic (i,j)-th element of Γ∆F. Then, from (A2) and (A4), ij 0 E (cid:20)(cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T ∆F t ∆F(cid:48) t −Γ∆ 0 F (cid:13) (cid:13) (cid:13) (cid:13) 2(cid:21) ≤ (cid:88) r T 1 2 E (cid:20) (cid:88) T (cid:18) ∆F it ∆F jt −γ i ∆ j F (cid:19)(cid:18) ∆F is ∆F js −γ i ∆ j F (cid:19)(cid:21) t=1 i,j=1 t,s=1 r T = (cid:88) 1 (cid:88) (cid:16) E (cid:2) ∆F ∆F ∆F ∆F (cid:3) −(γ∆F)2 (cid:17) T2 it jt is js ij i,j=1 t,s=1 r2K4q4 (cid:88) T r2K4q4 (cid:88) T ≤ 1 E[u u u u ]− 1 (E[u u ])2 T2 lt l(cid:48)t hs h(cid:48)s T2 lt l(cid:48)t t,s=1 t,s=1 r2K4q4 (cid:88) T r2K4q4 (cid:88) T r2K4q4 (cid:88) T r2K4q4 (cid:88) T = 1 E[u2]E[u2 ]+ 1 E[u2u2 ]+ 1 E[u4]− 1 (E[u2])2 T2 lt hs T2 lt ht T2 lt T2 lt t,s=1 t=1 t=1 t,s=1 r2K4q4 (cid:88) T r2K4q4 (cid:18) 1 (cid:19) = 1 E[u2]E[u2 ] = 1 = O , (A11) T2 lt ht T T t=1 where we used Assumption 4(a) of independence of u and Assumption 4(b) of existence of fourth t moments, plus square summability of the coefficients, with K defined in (A5). Therefore, from 1 (A11), we have (cid:13) T (cid:13) (cid:18) (cid:19) (cid:13) (cid:13) (cid:13)T 1 (cid:88) ∆F t ∆F(cid:48) t −Γ∆ 0 F(cid:13) (cid:13) (cid:13) = O p √ 1 T . (A12) t=1 In the same way, for the idiosyncratic component we have E (cid:20)(cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T ∆ξ it ∆ξ jt −γ i ∆ j ξ (cid:13) (cid:13) (cid:13) (cid:13) 2(cid:21) ≤ T 1 2 (cid:88) T (cid:16) E (cid:2) ∆ξ it ∆ξ jt ∆ξ is ∆ξ js (cid:3) −(γ i ∆ j ξ)2 (cid:17) t=1 t,s=1 ≤ K 2 4 (cid:88) T E[ε2ε2 ] ≤ K 2 4M 3 = O (cid:18) 1 (cid:19) , (A13) T2 it jt T T t=1 where we used Assumption 4(c) of independence of ε and Assumption 4(d) of existence of fourth t moments and square summability of the coefficients, with K defined in (A6). Therefore, from 2 (A11), we have (cid:13) T (cid:13) (cid:18) (cid:19) (cid:13) (cid:13) (cid:13)T 1 (cid:88) ∆ξ it ∆ξ jt −γ i ∆ j ξ(cid:13) (cid:13) (cid:13) = O p √ 1 T . (A14) t=1 Bycombining(A12)and(A14)andAssumption2(b)ofboundedloadingswecompletetheproof.(cid:3) Lemma A2 For any given t, under Assumptions 1 through 6 and as n,T → ∞, (i) (cid:107)∆F (cid:107) = O (1); t p (ii) (cid:107)T−1/2F (cid:107) = O (1); t p (iii) (cid:107)n−1/2∆ξ (cid:107) = O (1); t p (iv) (cid:107)(nT)−1/2ξ (cid:107) = O (1); t p (v) (cid:107)n−1/2Λ(cid:48)∆ξ (cid:107) = O (1); t p 39

(vi) (cid:107)(nT)−1/2Λ(cid:48)ξ (cid:107) = O (1); t p (vii) (cid:107)n−1/2ξ (cid:107) = O (T1/2n−(1−δ)/2); t p (viii) (cid:107)n−1/2Λ(cid:48)ξ (cid:107) = O (T1/2n−(1−δ)/2). t p Proof. For part (i), just notice that, since by Assumption 1(a) ∆F ∼ I(0) for any j = 1,...,r, jt then they have finite variance. This proves part (i) by Chebychev’s inequality. For part (ii), we have E (cid:20)(cid:13) (cid:13) (cid:13) (cid:13) √ F T t (cid:13) (cid:13) (cid:13) (cid:13) 2(cid:21) = T 1 (cid:88) r E (cid:2) F j 2 t (cid:3) = T 1 (cid:88) r E (cid:20)(cid:18) (cid:88) t (cid:88) q c jl (L)u ls (cid:19)2(cid:21) j=1 j=1 s=1 l=1 r t q ∞ 1 (cid:88) (cid:88) (cid:88) (cid:88) rqK 1 t = c c E[u u ] ≤ ≤ rqK , (A15) jlk jl(cid:48)k(cid:48) ls−k l(cid:48)s(cid:48)−k(cid:48) 1 T T j=1s,s(cid:48)=1l,l(cid:48)=1k,k(cid:48)=0 since t ≤ T and where we used the fact u is a white noise because of Assumption 1(a) and we t used square summability of the coefficients, with K defined in (A5). This proves part (ii). 1 For part (iii), for any n ∈ N, we have, E (cid:20)(cid:13) (cid:13) (cid:13) (cid:13) ∆ √ ξ n t (cid:13) (cid:13) (cid:13) (cid:13) 2(cid:21) = n 1 (cid:88) n E (cid:2) ∆ξ i 2 t (cid:3) = n 1 (cid:88) n E[(dˇ i (L)ε it )2] i=1 i=1 n ∞ 1 (cid:88) (cid:88) = dˇ dˇ E[ε ε ] ≤ K maxE[ε2], (A16) n jk ik(cid:48) it−k it−k(cid:48) 2 i it i=1k,k(cid:48)=0 where we used Assumption 3(a) of serially uncorrelated ε and square summability of the coeffit cients, with K defined in (A6). Also because of the existence of fourth moments in Assumption 2 4(d) the variance of ε is finite for any i. This proves part (iii). it Similarly, for part (iv), for any n ∈ N, we have, E (cid:20)(cid:13) (cid:13) (cid:13) (cid:13) √ ξ n t T (cid:13) (cid:13) (cid:13) (cid:13) 2(cid:21) = n 1 T (cid:88) n E (cid:2) ξ i 2 t (cid:3) = n 1 T (cid:88) n E (cid:20)(cid:18) (cid:88) t dˇ i (L)ε is (cid:19)2(cid:21) i=1 i=1 s=1 n t ∞ = 1 (cid:88) (cid:88) (cid:88) dˇ dˇ E[ε ε ] ≤ K 2 t maxE[ε2] ≤ K maxE[ε2], (A17) nT ik ik(cid:48) is−k is(cid:48)−k(cid:48) T i it 2 i it i=1s,s(cid:48)=1k,k(cid:48)=0 since t ≤ T and where we used the same assumptions as in (A16). This proves part (iv). As for part (v), for any n ∈ N, we have (cid:20)(cid:13) (cid:13)Λ(cid:48)∆ξ t (cid:13) (cid:13) 2(cid:21) 1 (cid:88) r (cid:20)(cid:18) (cid:88) n (cid:19)2(cid:21) 1 (cid:88) r (cid:88) n (cid:2) (cid:3) E (cid:13) √ (cid:13) = E λ ij ∆ξ it = E λ ij ∆ξ it λ lj ∆ξ lt (cid:13) n (cid:13) n n j=1 i=1 j=1i,l=1 ≤ rC2 (cid:88) n (cid:88) ∞ dˇ ik dˇ lk(cid:48) E[ε it−k ε lt−k(cid:48) ] ≤ rC2K 2 (cid:88) n (cid:12) (cid:12)E[ε it ε lt ] (cid:12) (cid:12) ≤ rC2K 2 M 4 , (A18) n n i,l=1k,k(cid:48)=0 i,l=1 where we used the same assumptions as in (A16), Assumption 2(b) of bounded loadings, and Lemma 1. This proves part (v). 40

Similarly for part (vi), for any n ∈ N, we have (cid:20)(cid:13) (cid:13)Λ(cid:48)ξ t (cid:13) (cid:13) 2(cid:21) 1 (cid:88) r (cid:20)(cid:18) (cid:88) n (cid:19)2(cid:21) 1 (cid:88) r (cid:88) n (cid:2) (cid:3) E (cid:13)√ (cid:13) = E λ ij ξ it = E λ ij ξ it λ lj ξ lt (cid:13) nT(cid:13) nT nT j=1 i=1 j=1i,l=1 ≤ rC2 (cid:88) n (cid:88) t (cid:88) ∞ dˇ ik dˇ lk(cid:48) E[ε is−k ε ls(cid:48)−k(cid:48) ] ≤ rC2K 2 t (cid:88) n (cid:12) (cid:12)E[ε it ε lt ] (cid:12) (cid:12) ≤ rC2K 2 M 4 , (A19) nT nT i,l=1s,s(cid:48)=1k,k(cid:48)=0 i,l=1 where we used the same assumptions as in (A18). This proves part (vi). Now consider part (vii). Using Assumption 6(a), for any n ∈ N, we can write E (cid:20)(cid:13) (cid:13) (cid:13)√ ξ t (cid:13) (cid:13) (cid:13) 2(cid:21) = 1 (cid:88) E (cid:2) ξ2(cid:3) + 1 (cid:88) E (cid:2) ξ2(cid:3) . (A20) (cid:13) n(cid:13) n it n it i∈I1 i∈I 1 c Thesecondtermontherhsisboundedforanyn ∈ Nbecauseitisasumofstationarycomponents andwecanusethesamereasoningasforpart(iii). Forthefirsttermontherhs,usingAssumption 6(a) and part (iv), we have (multiply and divide by m) (cid:18) (cid:19) 1 (cid:88) E (cid:2) ξ2(cid:3) ≤ K 2 Tm maxE[ε2] = O T , (A21) n it n i it n1−δ i∈I1 which proves part (vii). Finally, for part (viii), using the same reasoning as for part (vii), we can write (cid:20)(cid:13) (cid:13)Λ(cid:48)ξ t (cid:13) (cid:13) 2(cid:21) 1 (cid:88) r (cid:88) n (cid:2) (cid:3) E (cid:13) √ (cid:13) = E λ ij ξ it λ lj ξ lt (cid:13) n (cid:13) n j=1i,l=1 r r r 1 (cid:88) (cid:88) (cid:2) (cid:3) 1 (cid:88) (cid:88) (cid:2) (cid:3) 2 (cid:88)(cid:88) (cid:88) (cid:2) (cid:3) = E λ ξ λ ξ + E λ ξ λ ξ + E λ ξ λ ξ . (A22) ij it lj lt ij it lj lt ij it lj lt n n n j=1i,l∈I1 j=1i,l∈I 1 c j=1i∈I1l∈I 1 c The second term on the rhs is bounded because it is a sum of products of stationary components as in (A18) and therefore it behaves as part (v). For the first term on the rhs, using Assumption 6(a) and part (iv), we have (multiply and divide by m) 1 (cid:88) r (cid:88) (cid:2) (cid:3) rC2K 2 T (cid:88) (cid:12) (cid:12) rC2K 2 M 4 Tm (cid:18) T (cid:19) n E λ ij ξ it λ lj ξ lt ≤ n (cid:12)E[ε it ε lt ](cid:12) ≤ n = O n1−δ . (A23) j=1i,l∈I1 i,l∈I1 Similarly, the third term on the rhs of (A22) is bounded as follows 1 (cid:88) r (cid:88) (cid:88) (cid:2) (cid:3) rC2K 2 T (cid:88) (cid:88)(cid:12) (cid:12) rC2K 2 M 9 Tnγ (cid:18) T (cid:19) n E λ ij ξ it λ lj ξ lt ≤ n (cid:12)E[ε it ε lt ](cid:12) ≤ n = O n1−γ . j=1i∈I1l∈I 1 c i∈I1l∈I 1 c (A24) We prove part (viii) by substituting (A23) and (A24) into (A22), and by noticing that (A23) converges to zero slower than (A24) because γ < δ by Assumption 6(c). This completes the proof. 41

(cid:3) Proof of Lemma 3 The sample covariance of ∆x t is given by Γ(cid:98) ∆ 0 x = T−1(cid:80)T t=1 ∆x t ∆x(cid:48) t and from Assumption 3(d) of uncorrelated common and idiosyncratic components, we have Γ∆x = Γ∆χ +Γ∆ξ. Moreover, 0 0 0 from Lemma A1, we have (cid:13) (cid:13) (cid:13) Γ(cid:98) ∆ 0 x − Γ∆ 0 x(cid:13) (cid:13) (cid:13) = O p (cid:18) √ 1 (cid:19) . (A25) (cid:13) n n (cid:13) T From (A25), Lemma 2(ii) and Assumption 3(d) we also have (cid:13) (cid:13) (cid:13) Γ(cid:98) ∆ 0 x − Γ∆ 0 χ(cid:13) (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:13) Γ(cid:98) ∆ 0 x − Γ∆ 0 x(cid:13) (cid:13) (cid:13)+ (cid:13) (cid:13) (cid:13) Γ∆ 0 x − Γ∆ 0 χ(cid:13) (cid:13) (cid:13) = (cid:13) (cid:13) (cid:13) Γ(cid:98) ∆ 0 x − Γ∆ 0 x(cid:13) (cid:13) (cid:13)+ (cid:13) (cid:13) (cid:13) Γ∆ 0 ξ(cid:13) (cid:13) (cid:13) (cid:13) n n (cid:13) (cid:13) n n (cid:13) (cid:13) n n (cid:13) (cid:13) n n (cid:13) (cid:13) n (cid:13) (cid:18) 1 (cid:19) µ∆ξ (cid:18) 1 (cid:19) M (cid:18) (cid:18) 1 1 (cid:19)(cid:19) =O √ + 1 ≤ O √ + 7 = O max √ , . (A26) p p p T n T n T n Now, define the n×r matrices W∆χ and W(cid:99) ∆x, having as columns the normalized eigenvectors corresponding to the j-th largest eigenvalues of Γ∆χ and Γ(cid:98) ∆x, respectively. From Theorem 2 in 0 0 Yu et al. (2015), which is a consequence of the “sinθ” Theorem in Davis and Kahan (1970), we have √ (cid:13) (cid:13)W(cid:99) ∆x−W∆χJ (cid:13) (cid:13) = 23/2 r(cid:107)Γ(cid:98) ∆ 0 x−Γ∆ 0 χ(cid:107) , (A27) min (cid:0) µ∆χ−µ∆χ,µ∆χ−µ∆χ (cid:1) 0 1 r r+1 where J is a diagonal r×r matrix with entries ±1 and we define µ∆χ = ∞ for any n ∈ N. Since 0 µ∆χ = 0 then, from Lemma 2(i) and (A26), we have r+1 √ (cid:13) (cid:13)W(cid:99) ∆x−W∆χJ (cid:13) (cid:13) ≤ 23/2 r(cid:107)Γ(cid:98) ∆ 0 x−Γ∆ 0 χ(cid:107) = O p (cid:18) max (cid:18) √ 1 , 1 (cid:19)(cid:19) . (A28) nM T n 6 Moreover, given the identification constraint (13), we identify the first difference of the factors (up to a sign) as the r (non-normalized) principal components of the common component vector: 1 1 ∆F = √ W∆χ(cid:48) ∆χ = √ W∆χ(cid:48) Λ∆F . (A29) t t t n n Since the eigenvectors are normalized, (A29) implies Λ = n1/2W∆χ, such that (14) is satisfied for any n ∈ N, while the loadings estimator is defined as Λ(cid:98) = n1/2W(cid:99) ∆x, and therefore n−1Λ(cid:98) (cid:48)Λ(cid:98) = I r .By substituting these expressions for Λ and Λ(cid:98) in (A28), we have (cid:13) (cid:13) (cid:18) (cid:18) (cid:19)(cid:19) (cid:13) (cid:13)W(cid:99) ∆x−W∆χJ (cid:13) (cid:13) = (cid:13) (cid:13) Λ(cid:98) √ −ΛJ(cid:13) (cid:13) = O p max √ 1 , 1 , (A30) (cid:13) n (cid:13) T n and also (cid:13) (cid:13)Λ(cid:98) (cid:48)Λ (cid:13) (cid:13) (cid:18) (cid:18) 1 1 (cid:19)(cid:19) (cid:13) −J(cid:13) = O p max √ , . (A31) (cid:13) n (cid:13) T n 42

In order to prove part (i), we need some other intermediate results. We denote as (cid:15) an ni dimensional vector with 1 as i-th entry and all other entries equal to zero. Then, (cid:13) (cid:13) (cid:13)√ (cid:15)(cid:48) i (cid:0) Γ(cid:98) ∆x−Γ∆χ(cid:1) (cid:13) (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:13)√ (cid:15)(cid:48) i (cid:0) Γ(cid:98) ∆x−Γ∆x(cid:1) (cid:13) (cid:13) (cid:13)+ (cid:13) (cid:13) (cid:13) (cid:15)(cid:48) i√ Γ∆ 0 ξ(cid:13) (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:13)√ (cid:15)(cid:48) i (cid:0) Γ(cid:98) ∆x−Γ∆x(cid:1) (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) (cid:15)(cid:48) i√ Γ∆ 0 ξ(cid:13) (cid:13) (cid:13) (cid:13) n 0 0 (cid:13) (cid:13) n 0 0 (cid:13) (cid:13) n (cid:13) (cid:13) n 0 0 (cid:13) (cid:13) n (cid:13) F (cid:118) ≤ (cid:117) (cid:117) (cid:116) n 1 (cid:88) n (cid:0) γ (cid:98)i ∆ j x−γ i ∆ j x (cid:1)2 + µ √ ∆ 1 n ξ ≤ O p (cid:18) √ 1 T (cid:19) + √ M n 7 = O p (cid:18) max (cid:18) √ 1 T , √ 1 n (cid:19)(cid:19) , j=1 (A32) where we used Lemmas A1 and 2(ii). Similarly, we can show that (cid:13) (cid:13) (cid:13) (cid:15)(cid:48) i√ Γ∆ 0 χ(cid:13) (cid:13) (cid:13) = O(1). (A33) (cid:13) n (cid:13) For the eigenvalues µ∆ j χ of Γ∆ 0 χ and µ (cid:98) ∆ j x of Γ(cid:98) ∆ 0 x, and using Weyl’s inequality (A3), we have (cid:12) (cid:12) (cid:12) µ (cid:98)j ∆x − µ∆ j χ(cid:12) (cid:12) (cid:12) ≤ (cid:13) (cid:13) (cid:13) Γ(cid:98) ∆ 0 x − Γ∆ 0 χ(cid:13) (cid:13) (cid:13) = O p (cid:18) max (cid:18) √ 1 , 1 (cid:19)(cid:19) , j = 1,...,r. (A34) (cid:12) n n (cid:12) (cid:13) n n (cid:13) T n From Lemma 2(i) and (A34), we also have µ∆χ µ∆x (cid:18) (cid:18) 1 1 (cid:19)(cid:19) r ≥ M > 0, (cid:98)r ≥ M +O max √ , . (A35) n 6 n 6 p T n Define as M∆χ and M(cid:99) ∆x the diagonal r × r matrices with diagonal elements µ∆ j χ and µ (cid:98) ∆ j x, respectively. Therefore, from (A35), the matrix n−1M∆χ is invertible, the inverse of n−1M(cid:99) ∆x exists with probability tending to one as n,T → ∞, and (see also Lemma 2 in Forni et al., 2009) (cid:13) (cid:13) (cid:18) M∆χ(cid:19)−1(cid:13) (cid:13) n (cid:13) (cid:13) = = O(1). (A36) (cid:13) n (cid:13) µ∆χ r Moreover, from (A34) and (A35), we have (cid:118) (cid:13) (cid:13) (cid:18) M(cid:99) ∆x(cid:19)−1 (cid:18) M∆χ(cid:19)−1(cid:13) (cid:13) (cid:13) (cid:13) (cid:18) M(cid:99) ∆x(cid:19)−1 (cid:18) M∆χ(cid:19)−1(cid:13) (cid:13) (cid:117) (cid:117)(cid:88) r (cid:18) n n (cid:19)2 (cid:13) − (cid:13) ≤ (cid:13) − (cid:13) = (cid:116) − (cid:13) n n (cid:13) (cid:13) n n (cid:13) F j=1 µ (cid:98) ∆ j x µ∆ j χ (cid:88) r (cid:12) (cid:12) µ (cid:98) ∆ j x−µ∆ j χ(cid:12) (cid:12) rmax 1≤j≤r |µ (cid:98) ∆ j x−µ∆ j χ| (cid:18) (cid:18) 1 1 (cid:19)(cid:19) ≤ j=1 n(cid:12) (cid:12) µ (cid:98) ∆ j xµ∆ j χ (cid:12) (cid:12) ≤ nM2 6 +O p (cid:16) max (cid:16) √n T ,1 (cid:17)(cid:17) = O p max √ T , n . (A37) Finally, notice that the columns of W∆χJ are also normalised eigenvectors of Γ∆χ, that is 0 Γ∆χW∆χJ = W∆χJM∆χ. Therefore, using (A28), (A32), (A33), (A36), and (A37), for a given i 0 43

we have (cid:13) (cid:13) √ n(cid:15)(cid:48)W(cid:99) ∆x− √ n(cid:15)(cid:48)W∆χJ (cid:13) (cid:13) = (cid:13) (cid:13) (cid:13)√ (cid:15)(cid:48) i (cid:20) Γ(cid:98) ∆xW(cid:99) ∆x (cid:18) M(cid:99) ∆x(cid:19)−1 −Γ∆χW∆χJ (cid:18) M∆χ(cid:19)−1(cid:21)(cid:13) (cid:13) (cid:13) i i (cid:13) n 0 n 0 n (cid:13) ≤ (cid:13) (cid:13) (cid:13)√ (cid:15)(cid:48) i (cid:0) Γ(cid:98) ∆x−Γ∆χ(cid:1) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:18) M∆χ(cid:19)−1(cid:13) (cid:13) (cid:13)+ (cid:13) (cid:13) (cid:13) (cid:15)(cid:48) i√ Γ∆ 0 χ(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:18) M(cid:99) ∆x(cid:19)−1 − (cid:18) M∆χ(cid:19)−1(cid:13) (cid:13) (cid:13) (cid:13) n 0 0 (cid:13) (cid:13) n (cid:13) (cid:13) n (cid:13) (cid:13) n n (cid:13) + (cid:13) (cid:13)W(cid:99) ∆x−W∆χJ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:15)(cid:48) i√ Γ∆ 0 χ(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:18) M∆χ(cid:19)−1(cid:13) (cid:13) (cid:13)+o p (cid:18) max (cid:18) √ 1 , √ 1 (cid:19)(cid:19) (cid:13) n (cid:13) (cid:13) n (cid:13) T n (cid:18) (cid:18) (cid:19)(cid:19) 1 1 = O max √ , √ . p T n √ √ By noticing that λ(cid:48) = n(cid:15)(cid:48)W∆χ and λ(cid:98) (cid:48) = n(cid:15)(cid:48)W(cid:99) ∆x, we complete the proof of part (i). i i i i Given the loadings estimator Λ(cid:98), the factors are estimated as F(cid:98)t = n−1Λ(cid:98) (cid:48)x t and therefore ∆F(cid:98)t = n−1Λ(cid:98) (cid:48)∆x t . Then, for a given t, (cid:13) (cid:13) (cid:13) (cid:13)Λ(cid:98) (cid:48)∆x t (cid:13) (cid:13) (cid:13) (cid:13)Λ(cid:98) (cid:48)Λ∆F t (cid:13) (cid:13) (cid:13) (cid:13)Λ(cid:98) (cid:48)∆ξ t (cid:13) (cid:13) (cid:13)∆F(cid:98)t −J∆F t(cid:13) = (cid:13) −J∆F t(cid:13) ≤ (cid:13) −J∆F t(cid:13)+(cid:13) (cid:13) (cid:13) n (cid:13) (cid:13) n (cid:13) (cid:13) n (cid:13) (cid:13) (cid:13)Λ(cid:98) (cid:48)Λ (cid:13) (cid:13) (cid:13) (cid:13)Λ(cid:98) −ΛJ (cid:13) (cid:13) (cid:13) (cid:13)∆ξ t (cid:13) (cid:13) (cid:13) (cid:13)Λ(cid:48)∆ξ t (cid:13) (cid:13) ≤ (cid:13) −J(cid:13) (cid:107)∆F t (cid:107)+(cid:13) √ (cid:13) (cid:13)√ (cid:13)+(cid:13) (cid:13) (cid:107)J(cid:107) (cid:13) n (cid:13) (cid:13) n (cid:13) (cid:13) n(cid:13) (cid:13) n (cid:13) (cid:18) (cid:18) (cid:19)(cid:19) 1 1 = O max √ , √ , p T n where we used (A30), (A31), and Lemma A2(i), A2(iii) and A2(v). Obviously (cid:107)J(cid:107) = 1. This proves part (ii). Similarly, for part (iii), for a given t we have 1 (cid:13) (cid:13) (cid:13) (cid:13)Λ(cid:98) (cid:48)Λ (cid:13) (cid:13) (cid:13) (cid:13) F t (cid:13) (cid:13) (cid:13) (cid:13)Λ(cid:98) −ΛJ (cid:13) (cid:13) (cid:13) (cid:13) ξ t (cid:13) (cid:13) (cid:13) (cid:13)Λ(cid:48)ξ t (cid:13) (cid:13) √ (cid:13)F(cid:98)t −JF t(cid:13) ≤ (cid:13) −J(cid:13) (cid:13)√ (cid:13)+(cid:13) √ (cid:13) (cid:13)√ (cid:13)+(cid:13) √ (cid:13) (cid:107)J(cid:107) T (cid:13) n (cid:13) (cid:13) T(cid:13) (cid:13) n (cid:13) (cid:13) nT(cid:13) (cid:13)n T(cid:13) (cid:18) (cid:18) (cid:19)(cid:19) 1 1 = O max √ , √ , p T n where we used (A30), (A31), and Lemma A2(ii), A2(iv) and A2(vi). This completes the proof. (cid:3) A.4 Proof of Lemma 4 Intermediate results Lemma A3 Under Assumptions 1 through 6, as n,T → ∞, (cid:13) (cid:13)Λ(cid:98) −ΛJ (cid:13) (cid:13) (cid:18) 1 (cid:19) (cid:13) (cid:13)Λ(cid:98) (cid:48)Λ (cid:13) (cid:13) (cid:18) 1 (cid:19) (cid:13) √ (cid:13) = O p √ and (cid:13) −J(cid:13) = O p √ . (cid:13) n (cid:13) T (cid:13) n (cid:13) T Proof. Under Assumption 6(b), we have n > T1/(2−δ) with δ ≥ 0, the lower bound for n being n > T1/2, and, therefore, (A30) and (A31) are both O (T−1/2). This completes the proof. (cid:3) p 44

Lemma A4 Under Assumptions 1 and 5 (i) The factors admit the common trends decomposition t t (cid:88) (cid:88) F = C(1) u +Cˇ(L)u = ψη(cid:48) u +Cˇ(L)u , t s t s t s=1 s=1 where ψ is r × q − d, η is q × q − d, and Cˇ(L) is an r × q infinite rational polynomial matrix also with square summable coefficients. The r×c cointegration matrix β is such that β(cid:48)C(1) = 0 . c×q (ii) For a given t, as n,T → ∞, (cid:107)β(cid:48)F (cid:107) = O (1). t p Proof. The proof follows Lemma 2.1 in Phillips and Solo (1992). Using the Beveridge-Nelson decomposition of C(L) in (8), we can write ∆F = C(1)u +Cˇ(L)(u −u ), t t t t−1 where Cˇ(L) = (cid:80)∞ Cˇ Lk with Cˇ = − (cid:80)∞ C . Then, k=0 k k h=k+1 h t (cid:88) F = C(1) u +ω , (A38) t s t s=1 where ω = Cˇ(L)(u −u ) = Cˇ(L)u , since u = 0 when t ≤ 0 by Assumption 5, and ω ∼ I(0), t t 0 t t t because of square summability of the coefficients of Cˇ(L). Moreover, from Assumption 1(a) of cointegration,wehaveC(1) = ψη(cid:48),whereψ isr×r−candη isq×r−c. Sinceβ isacointegrating vector for F , we have β = ψ and therefore β(cid:48)C(1) = 0 . This proves part (i). t ⊥ c×q For part (ii), from (A38), we have β(cid:48)F = β(cid:48)ω = β(cid:48)Cˇ(L)u . t t t Define C∗(L) = β(cid:48)Cˇ(L) and notice that it has square summable coefficients because of square summability of the coefficients of C(L), then r r (cid:20)(cid:18) q (cid:19)2(cid:21) E (cid:2)(cid:13) (cid:13)β(cid:48)F t (cid:13) (cid:13) 2(cid:3) = (cid:88) E[(c∗ j (cid:48) (L)u t )2] = (cid:88) E (cid:88) c∗ jl (L)u lt j=1 j=1 l=1 r q ∞ (cid:88) (cid:88) (cid:88) = c∗ c∗ E[u u ] ≤ rqK , (A39) jlk jl(cid:48)k(cid:48) lt−k l(cid:48)t−k(cid:48) 1 j=1l,l(cid:48)=1k,k(cid:48)=0 whereweusedthefactu isawhitenoisebecauseofAssumption1(a)andweusedsquaresummat bility of the coefficients, with K defined in (A5). Part (ii) is proved by means of Chebychev’s 1 inequality. This completes the proof. (cid:3) Lemma A5 Define the autocovariance matrices Γ∆F = E[∆F ∆F(cid:48) ], with k ∈ Z, and the k t t−k long-run autocovariance matrices Γ∆F = Γ∆F +2 (cid:80)∞ Γ∆F and Γ∆F = (cid:80)∞ Γ∆F. Denote as L0 0 h=1 h L1 h=1 h W (·) an r-dimensional Brownian motion with finite covariance of rank q −d and analogously r define W (·) having finite covariance of rank q. Define the autocovariances of ω in (A38) as Γω q t h and long-run covariance Γω = Γω +2 (cid:80)∞ Γω. Under Assumptions 1, 4 and 5, as T → ∞, L0 0 h=1 h 45

(i) (cid:107)T−1(cid:80)T ∆F ∆F(cid:48) −Γ∆F(cid:107) = O (T−1/2); t=k+1 t t−k k p (ii) T−2(cid:80)T F F(cid:48) → d (Γ∆F)1/2(cid:0)(cid:82)1 W (τ)W(cid:48)(τ)dτ (cid:1) (Γ∆F)1/2; t=1 t t L0 0 r r L0 (iii) T−1(cid:80)T F ∆F(cid:48) → d (Γ∆F)1/2(cid:0)(cid:82)1 W (τ)dW(cid:48)(τ) (cid:1) (Γ∆F)1/2+Γ∆F; t=1 t−1 t L0 0 r r L0 L1 (iv) T−1(cid:80)T F F(cid:48)β → d C(1) (cid:0)(cid:82)1 W (τ)dW(cid:48)(τ) (cid:1) (Γω )1/2β+Γωβ; t=1 t t 0 q r L0 0 (v) (cid:107)T−1(cid:80)T β(cid:48)F F(cid:48)β−β(cid:48)Γωβ(cid:107) = (cid:107)T−1(cid:80)T β(cid:48)F F(cid:48)β−E[β(cid:48)F F(cid:48)β](cid:107) = O (T−1/2); t=1 t t 0 t=1 t t t t p (vi) (cid:107)T−1(cid:80)T ∆F F(cid:48) β− (cid:0) Γω−Γω(cid:1) β(cid:107) = (cid:107)T−1(cid:80)T ∆F F(cid:48) β−E[∆F F(cid:48) β](cid:107) = O (T−1/2). t=1 t t−1 1 0 t=1 t t−1 t t−1 p Proof. For part (i), the case k = 0 is proved in (A12) in the proof of Lemma A1. The proof for the autocovariances, i.e. when k (cid:54)= 0, is analogous. For parts (ii) and (iii), first notice that, by Assumption 1, ∞ ∞ ∞ (cid:88) (cid:88)(cid:88)(cid:16) (cid:17) Γ∆F = C C(cid:48) + C C(cid:48) +C C(cid:48) , (A40) L0 k k k k+h k+h k k=0 h=1k=h which is positive definite, and by square summability of the coefficients this matrix is also finite. Moreover, by Assumptions 4(a) and 4(b) the vector u satisfies the assumptions of Corollary 2.2 t in Phillips and Durlauf (1986), then parts (ii) and (iii) are direct consequences of Lemma 3.1 in Phillips and Durlauf (1986). Turning to part (iv), since β(cid:48)F = β(cid:48)ω , because of Lemma A4(i), then, t t T (cid:20) T (cid:18) t (cid:19) (cid:21) (cid:20) T (cid:21) 1 (cid:88) 1 (cid:88) (cid:88) 1 (cid:88) F F(cid:48)β = C(1) u ω(cid:48) β+ ω ω(cid:48) β. (A41) T t t T s t T t t t=1 t=1 s=1 t=1 Define t = (cid:98)Tτ(cid:99) for τ ∈ [0,1] and the functionals (cid:98)Tτ(cid:99) (cid:98)Tτ(cid:99) 1 (cid:88) 1 (cid:16) (cid:17)−1/2 (cid:88) X (τ) = √ u , X (τ) = √ Γω ω , u,T s ω,T L0 s T T s=1 s=1 where as for (A40) we can show that Γω = Γω +2 (cid:80)∞ Γω is positive definite. Moreover, we √ L0 0 h=1 h can write ω = T(Γω )1/2[X (t/T)−X ((t−1)/T)]. As proved in Theorem 3.4 in Phillips t L0 ω,T ω,T and Solo (1992) and Corollary 2.2 in Phillips and Durlauf (1986), for any τ ∈ [0,1], we have, as T → ∞, d d X (τ) → W (τ), X (τ) → W (τ), (A42) u,T q ω,T r whereW (·)isaq-dimensionalBrownianmotionwithcovarianceI andW (·)isanr-dimensional q q r Brownian motion with covariance I . Then consider the first term in parenthesis on the rhs of r (A41), as T → ∞, using (A42), we have 1 (cid:88) T (cid:18) (cid:88) t (cid:19) (cid:88) T (cid:18) t (cid:19)(cid:18) (cid:18) t (cid:19) (cid:18) t−1 (cid:19)(cid:19)(cid:48)(cid:16) (cid:17)1/2 u ω(cid:48) = X X −X Γω (A43) T s t u T ω T ω T L0 t=1 s=1 t=1 (cid:18)(cid:90) 1 (W (τ)−W (τ −dτ))(cid:48) (cid:19) (cid:16) (cid:17)1/2 (cid:18)(cid:90) 1 (cid:19) (cid:16) (cid:17)1/2 → d W (τ) ω ω dτ Γω = W (τ)dW(cid:48) (τ) Γω . u dτ L0 u ω L0 0 0 As for the second term on the rhs of (A41), we have, using the same approach as for part (i), as 46

T → ∞, (cid:13) T (cid:13) (cid:18) (cid:19) (cid:13) (cid:13) (cid:13)T 1 (cid:88) ω t ω t (cid:48) −Γω 0 (cid:13) (cid:13) (cid:13) = O p √ 1 T . (A44) t=1 By substituting (A43) and (A44) in (A41), and by Slutsky’s theorem, we complete the proof of part (iv). Part (v) is proved analogously just by multiplying (A41) also on the left by β(cid:48). Finally, for part (vi), using the same approach as in the proof of part (i), we have T (cid:18) T T (cid:19) (cid:18) (cid:19) 1 (cid:88) 1 (cid:88) 1 (cid:88) (cid:16) (cid:17) 1 ∆F F(cid:48) β = C(1)u ω(cid:48) + ∆ω ω(cid:48) = Γω −Γω β+O √ . T t t−1 T t t−1 T t t−1 1 0 p T t=1 t=1 t=1 (A45) This completes the proof. (cid:3) Lemma A6 Define Fˇ = JF and βˇ = Jβ. For any given t, under Assumptions 1 through 6, as t t n,T → ∞, (i) (cid:107)(Tn)−1Λ(cid:98) (cid:48)ξ t Fˇ(cid:48) t (cid:107) = O p (max(n−1/2,T−1/2)); (ii) (cid:107)n−1Λ(cid:98) (cid:48)∆ξ t ∆Fˇ(cid:48) t (cid:107) = O p (max(n−1/2,T−1/2)); (iii) (cid:107)n−1Λ(cid:98) (cid:48)∆ξ t Fˇ(cid:48) t βˇ(cid:107) = O p (max(n−1/2,T−1/2)); (iv) (cid:107)(T1/2n)−1Λ(cid:98) (cid:48)∆ξ t Fˇ(cid:48) t (cid:107) = O p (max(n−1/2,T−1/2)); (v) (cid:107)(T1/2n)−1Λ(cid:98) (cid:48)ξ t Fˇ(cid:48) t βˇ(cid:107) = O p (max(n−1/2,T−1/2)). (vi) (cid:107)n−1Λ(cid:98) (cid:48)ξ t ∆Fˇ(cid:48) t (cid:107) = O p (ζ nT,δ ); (vii) (cid:107)(T1/2n)−1Λ(cid:98) (cid:48)ξ t Fˇ(cid:48) t (cid:107) = O p (ζ nT,δ ); (viii) (cid:107)n−1Λ(cid:98) (cid:48)ξ t Fˇ(cid:48) t βˇ(cid:107) = O p (ζ nT,δ ). Proof. Throughout, we use (cid:107)β(cid:107) = O(1) and obviously (cid:107)J(cid:107) = 1, and subadditivity of the norm (A1). Start with part (i): (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)ξ t Fˇ(cid:48) t (cid:13) (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:13) JΛ(cid:48)ξ t F(cid:48) t J (cid:13) (cid:13) (cid:13)+ (cid:13) (cid:13) (cid:13) (Λ(cid:98) (cid:48)−JΛ(cid:48))ξ t F(cid:48) t J (cid:13) (cid:13) (cid:13) (cid:13) nT (cid:13) (cid:13) nT (cid:13) (cid:13) nT (cid:13) (cid:13) (cid:13)2 (cid:13) (cid:13)Λ(cid:48)ξ t (cid:13) (cid:13) (cid:13) (cid:13) F t (cid:13) (cid:13) (cid:13) (cid:13)Λ(cid:98) (cid:48)−JΛ(cid:48)(cid:13) (cid:13) (cid:13) (cid:13) ξ t (cid:13) (cid:13) (cid:13) (cid:13) F t (cid:13) (cid:13) (cid:13) (cid:13) ≤ (cid:13)J(cid:13) (cid:13) √ (cid:13) (cid:13)√ (cid:13)+(cid:13) √ (cid:13) (cid:13)√ (cid:13) (cid:13)√ (cid:13) (cid:13)J(cid:13). (cid:13)n T(cid:13) (cid:13) T(cid:13) (cid:13) n (cid:13) (cid:13) nT(cid:13) (cid:13) T(cid:13) Then, because of Lemma A2(ii) and A2(vi), the first term on the rhs is O (n−1/2). Because of p Lemma A2(ii) and A2(iv) and Lemma A3. This proves part (i). For part (ii) we can repeat the same reasoning as for part (i), but using Lemma A2(i), A2(iii) and A2(v), and Lemma A3. Part (iii) is proved by noticing that Fˇ(cid:48)βˇ = F(cid:48)β and by following t t again the same reasoning as for part (i), but using Lemma A2(iii) and A2(iv), and Lemmas A3 and A4(ii). Part (iv) is also proved as part (i), but using Lemma A2(ii), A2(iii) and A2(v), and Lemma A3. Part (v) is proved as part (i), but using Lemma A2(iv) and A2(vi), and Lemmas A3 and A4(ii). 47

For part (vi), we have (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)ξ t ∆Fˇ(cid:48) t (cid:13) (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:13) JΛ(cid:48)ξ t ∆F(cid:48) t J (cid:13) (cid:13) (cid:13)+ (cid:13) (cid:13) (cid:13) (Λ(cid:98) (cid:48)−JΛ(cid:48))ξ t ∆F(cid:48) t J (cid:13) (cid:13) (cid:13) (cid:13) n (cid:13) (cid:13) n (cid:13) (cid:13) n (cid:13) (cid:13) (cid:13)2 (cid:13) (cid:13)Λ(cid:48)ξ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)Λ(cid:98) (cid:48)−JΛ(cid:48)(cid:13) (cid:13) (cid:13) (cid:13) ξ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ≤ (cid:13)J(cid:13) (cid:13) (cid:13) (cid:13)∆F t(cid:13)+(cid:13) √ (cid:13) (cid:13)√ (cid:13) (cid:13)∆F t(cid:13) (cid:13)J(cid:13). (cid:13) n (cid:13) (cid:13) n (cid:13) (cid:13) n(cid:13) From Lemma A2(i) and A2(viii), the first term on the rhs is O (T1/2n−(2−δ)/2). From Lemma p A2(i) and A2(vii) and Lemma A3, the second term on the rhs is O (n−1(1−δ)/2). This proves p part (vi). Parts (vii) and (viii) are proved similarly to part (vi) using Lemma A2(ii), A2(vii) and A2(viii), and Lemmas A3 and Lemma A4(ii). This completes the proof. (cid:3) Lemma A7 For any given t, under Assumptions 1 through 6, as n,T → ∞, (i) (cid:107)(Tn2)−1Λ(cid:98) (cid:48)ξ t ξ t (cid:48)Λ(cid:98)(cid:107) = O p (max(n−1,T−1)); (ii) (cid:107)n−2Λ(cid:98) (cid:48)∆ξ t ∆ξ t (cid:48)Λ(cid:98)(cid:107) = O p (max(n−1,T−1)). (iii) (cid:107)n−2Λ(cid:98) (cid:48)ξ t ξ t (cid:48)Λ(cid:98)(cid:107) = O p (ζ n 2 T,δ ); (iv) (cid:107)(T1/2n2)−1Λ(cid:98) (cid:48)ξ t ξ t (cid:48)Λ(cid:98)(cid:107) = O p (ζ n 2 T,δ T−1/2); (v) (cid:107)n−2Λ(cid:98) (cid:48)∆ξ t ξ t (cid:48)Λ(cid:98)(cid:107) = O p (ζ n,T max(n−1/2,T−1/2)). Proof. Throughout, we use subadditivity of the norm (A1). Start with part (i): (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)ξ t ξ t (cid:48)Λ(cid:98) (cid:13) (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:13) Λ(cid:98) √ −ΛJ (cid:13) (cid:13) (cid:13) 2 (cid:13) (cid:13) (cid:13)√ ξ t (cid:13) (cid:13) (cid:13) 2 +2 (cid:13) (cid:13) (cid:13) Λ(cid:98) √ −ΛJ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)√ ξ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) Λ √ (cid:48)ξ t (cid:13) (cid:13) (cid:13)+ (cid:13) (cid:13) (cid:13) Λ √ (cid:48)ξ t (cid:13) (cid:13) (cid:13) 2 . (cid:13) n2T (cid:13) (cid:13) n (cid:13) (cid:13) nT(cid:13) (cid:13) n (cid:13) (cid:13) nT(cid:13) (cid:13)n T(cid:13) (cid:13)n T(cid:13) Because of Lemma A2(iv) and Lemma A3, the first term on the rhs is O (T−1). Because of p Lemma A2(iv) and A2(vi), and Lemma A3, the second term is O (T−1/2n−1/2). The third term p is O (n−1) because of Lemma A2(vi). This proves part (i). Part (ii) is proved in the same way, p but using Lemma A2(iii) and A2(v), and Lemma A3. Now consider part (iii): (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)ξ t ξ t (cid:48)Λ(cid:98) (cid:13) (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:13) Λ(cid:98) √ −ΛJ (cid:13) (cid:13) (cid:13) 2 (cid:13) (cid:13) (cid:13)√ ξ t (cid:13) (cid:13) (cid:13) 2 +2 (cid:13) (cid:13) (cid:13) Λ(cid:98) √ −ΛJ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)√ ξ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) Λ(cid:48)ξ t (cid:13) (cid:13) (cid:13)+ (cid:13) (cid:13) (cid:13) Λ(cid:48)ξ t (cid:13) (cid:13) (cid:13) 2 . (A46) (cid:13) n2 (cid:13) (cid:13) n (cid:13) (cid:13) n(cid:13) (cid:13) n (cid:13) (cid:13) n(cid:13) (cid:13) n (cid:13) (cid:13) n (cid:13) Because of Lemma A2(vii) and Lemma A3, the first term on the rhs is O (n−(1−δ)). Because of p Lemma A2(vii) and A2(viii), and Lemma A3, the second term is O (T1/2n−(3/2−δ)). The third p term is O (Tn−(2−δ)) because of Lemma A2(viii). Summing up, for (A46), we have p √ (cid:13) (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)ξ n t 2 ξ t (cid:48)Λ(cid:98) (cid:13) (cid:13) (cid:13) (cid:13) ≤ O p (cid:18) n1 1 −δ (cid:19) +O p (cid:18) n3/2 T −δ (cid:19) +O p (cid:18) n2 T −δ (cid:19) . Inordertocomparetheratesofthethreetermsassumen = O(Tα),then,accordingtoAssumption 6(b), we must have at least α > 1/2. Now, when 1/2 < α < 1, the third term dominates over the first one (see also (17)), but the second would dominate over the third if and only if α > 1, which cannot be. When α ≥ 1, the first term dominates over the third one, and the second would dominate over the first if and only if α < 1, which cannot be. Hence, the second one is always 48

dominatedbytheothertwoandweprovedpart(iii). Part(iv)isprovedbymultiplyingeverything in part (iii) by T−1/2. For part (v), we have (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)∆ξ t ξ t (cid:48)Λ(cid:98) (cid:13) (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:13) Λ(cid:98) √ −ΛJ (cid:13) (cid:13) (cid:13) 2 (cid:13) (cid:13) (cid:13) ∆ √ ξ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)√ ξ t (cid:13) (cid:13) (cid:13)+ (cid:13) (cid:13) (cid:13) Λ(cid:48)∆ξ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) Λ(cid:48)ξ t (cid:13) (cid:13) (cid:13) (cid:13) n2 (cid:13) (cid:13) n (cid:13) (cid:13) n(cid:13) (cid:13) n(cid:13) (cid:13) n (cid:13) (cid:13) n (cid:13) (cid:13) (cid:13)Λ(cid:98) −ΛJ (cid:13) (cid:13) (cid:13) (cid:13)∆ξ t (cid:13) (cid:13) (cid:13) (cid:13)Λ(cid:48)ξ t (cid:13) (cid:13) (cid:13) (cid:13)Λ(cid:98) −ΛJ (cid:13) (cid:13) (cid:13) (cid:13) ξ t (cid:13) (cid:13) (cid:13) (cid:13)Λ(cid:48)∆ξ t (cid:13) (cid:13) +(cid:13) √ (cid:13) (cid:13)√ (cid:13) (cid:13) (cid:13)+(cid:13) √ (cid:13) (cid:13)√ (cid:13) (cid:13) (cid:13). (cid:13) n (cid:13) (cid:13) n(cid:13) (cid:13) n (cid:13) (cid:13) n (cid:13) (cid:13) n(cid:13) (cid:13) n (cid:13) BecauseofLemmaA2(iii)andA2(vii),andLemmaA3,thefirsttermontherhsisO (T−1/2n−(1−δ)/2). p Because of Lemma A2(v) and A2(viii), and Lemma A3, the second term is O (T1/2n−(3−δ)/2). p Hence, using (17), the first two terms are O (ζ max(n−1/2,T−1/2)). Using the same results as p n,T for the first two terms, we have that the third and fourth terms are both O (n−(2−δ)/2) and they p are both dominated by the first two and part (v) is proved. This completes the proof. (cid:3) Lemma A8 Define the matrices T T T 1 (cid:88) 1 (cid:88) 1 (cid:88) M(cid:99)00 = T ∆F(cid:98)t ∆F(cid:98) (cid:48) t , M(cid:99)01 = T ∆F(cid:98)t F(cid:98) (cid:48) t−1 , M(cid:99)02 = T ∆F(cid:98)t ∆F(cid:98) (cid:48) t−1 , t=1 t=1 t=1 T T T 1 (cid:88) 1 (cid:88) 1 (cid:88) M(cid:99)11 = T F(cid:98)t F(cid:98) (cid:48) t , M(cid:99)21 = T ∆F(cid:98) (cid:48) t−1 F(cid:98)t−1 , M(cid:99)22 = T ∆F(cid:98)t−1 ∆F(cid:98) (cid:48) t−1 , t=1 t=1 t=1 and denote by M , for i,j = 0,1,2, the analogous ones but computed by using Fˇ = JF . Define ij t t also βˇ = Jβ. Under Assumptions 1 through 6, as n,T → ∞, (i) (cid:107)T−1M(cid:99)11 −T−1M 11 (cid:107) = O p (n−1/2,T−1/2); (ii) (cid:107)M(cid:99)00 −M 00 (cid:107) = O p (n−1/2,T−1/2); (iii) (cid:107)M(cid:99)02 −M 02 (cid:107) = O p (n−1/2,T−1/2); (iv) (cid:107)M(cid:99)22 −M 22 (cid:107) = O p (n−1/2,T−1/2). (v) (cid:107)M(cid:99)01 βˇ−M 01 βˇ(cid:107) = O p (max(ζ nT,δ ,T−1/2)); (vi) (cid:107)βˇ(cid:48)M(cid:99)11 βˇ−βˇ(cid:48)M 11 βˇ(cid:107) = O p (max(ζ nT,δ ,T−1/2)); (vii) (cid:107)M(cid:99)21 βˇ−M 21 βˇ(cid:107) = O p (max(ζ nT,δ ,T−1/2)); (viii) (cid:107)T−1/2M(cid:99)01 −T−1/2M 01 (cid:107) = O p (max(ζ nT,δ ,T−1/2)); (ix) (cid:107)T−1/2M(cid:99)21 −T−1/2M 21 (cid:107) = O p (max(ζ nT,δ ,T−1/2)). Proof. Throughout, we use (cid:107)β(cid:107) = O(1) and obviously (cid:107)J(cid:107) = 1 and the fact that, from Lemma A3 we also have (cid:107)n−1Λ(cid:98) (cid:48)Λ(cid:107) = O p (1). Start with part (i). By adding and subtracting JF t from F(cid:98)t , we have (cid:13) T T (cid:13) (cid:13) T (cid:13) (cid:13) (cid:13) (cid:13)T 1 2 (cid:88) F(cid:98)t F(cid:98) (cid:48) t − T 1 2 (cid:88) Fˇ t Fˇ(cid:48) t (cid:13) (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:13)T 1 2 (cid:88)(cid:16) F(cid:98)t −JF t (cid:17)(cid:16) F(cid:98)t −JF t (cid:17)(cid:48)(cid:13) (cid:13) (cid:13) t=1 t=1 t=1 (cid:13) T (cid:13) (cid:13) 1 (cid:88)(cid:16) (cid:17)(cid:16) (cid:17)(cid:48)(cid:13) +2(cid:13) (cid:13)T2 F(cid:98)t −JF t JF t (cid:13) (cid:13) . (A47) t=1 49

Using (6) and (15) , the first term on the rhs of (A47) gives (cid:13) (cid:13) 1 (cid:88) T (cid:16) (cid:17)(cid:16) (cid:17)(cid:48) (cid:13) (cid:13) (cid:13) (cid:13) 1 (cid:88) T (cid:18) Λ(cid:98) (cid:48)x t (cid:19)(cid:18) Λ(cid:98) (cid:48)x t (cid:19)(cid:48)(cid:13) (cid:13) (cid:13) (cid:13)T2 F(cid:98)t −JF t F(cid:98)t −JF t (cid:13) (cid:13) = (cid:13) (cid:13)T2 n −JF t n −JF t (cid:13) (cid:13) t=1 t=1 (cid:13) (cid:13) 1 (cid:88) T (cid:18) Λ(cid:98) (cid:48)ΛF t Λ(cid:98) (cid:48)ξ t (cid:19)(cid:18) Λ(cid:98) (cid:48)ΛF t Λ(cid:98) (cid:48)ξ t (cid:19)(cid:48)(cid:13) (cid:13) =(cid:13) (cid:13)T2 n + n −JF t n + n −JF t (cid:13) (cid:13) t=1 ≤ (cid:13) (cid:13) (cid:13) (cid:13)T 1 2 (cid:88) T Λ(cid:98) (cid:48)Λ n F t F(cid:48) t (cid:18) Λ n (cid:48)Λ(cid:98) −J (cid:19) +JF t F(cid:48) t (cid:18) J− Λ n (cid:48)Λ(cid:98) (cid:19)(cid:13) (cid:13) (cid:13) (cid:13) +2 (cid:13) (cid:13) (cid:13) (cid:13)T 1 2 (cid:88) T Λ(cid:98) (cid:48)Λ n F 2 t ξ t (cid:48)Λ(cid:98) (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) A1 B1 +2 (cid:13) (cid:13) (cid:13) 1 (cid:88) T Λ(cid:98) (cid:48)ξ t F(cid:48) t J (cid:13) (cid:13) (cid:13)+ (cid:13) (cid:13) (cid:13) 1 (cid:88) T Λ(cid:98) (cid:48)ξ t ξ t (cid:48)Λ(cid:98) (cid:13) (cid:13) (cid:13). (A48) (cid:13)T2 n (cid:13) (cid:13)T2 n2 (cid:13) t=1 t=1 (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) C1 D1 Let us consider each term of (A48) separately: A 1 ≤ (cid:13) (cid:13) (cid:13) (cid:13) Λ n (cid:48)Λ(cid:98) −J (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)T 1 2 (cid:88) T F t F(cid:48) t (cid:13) (cid:13) (cid:13) (cid:13) (cid:40)(cid:13) (cid:13) (cid:13) (cid:13) Λ(cid:98) n (cid:48)Λ (cid:13) (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13)J (cid:13) (cid:13) (cid:41) = O p (cid:18) √ 1 T (cid:19) , t=1 B 1 ≤ 2 (cid:88) T (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)ξ t F(cid:48) t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)Λ (cid:13) (cid:13) (cid:13) = O p (cid:18) max (cid:18) √ 1 , √ 1 (cid:19)(cid:19) , T (cid:13) nT (cid:13) (cid:13) n (cid:13) n T t=1 C 1 ≤ 2 (cid:88) T (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)ξ t F(cid:48) t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)J (cid:13) (cid:13) = O p (cid:18) max (cid:18) √ 1 , √ 1 (cid:19)(cid:19) , T (cid:13) nT (cid:13) n T t=1 D 1 ≤ T 1 (cid:88) T (cid:13) (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48) n ξ 2 t ξ T t (cid:48)Λ(cid:98) (cid:13) (cid:13) (cid:13) (cid:13) = O p (cid:18) max (cid:18) n 1 , T 1 (cid:19)(cid:19) . t=1 Above we used, Lemma A3 and Lemma A5(ii) for A , Lemma A6(i) for B and C , and Lemma 1 1 1 A7(i) for D . Thus, the first term on the rhs of (A47) is O (max(n−1/2,T−1/2)). The second term 1 p on the rhs of (A47) is such that (cid:13) (cid:13) 1 (cid:88) T (cid:16) (cid:17)(cid:16) (cid:17)(cid:48) (cid:13) (cid:13) (cid:13) (cid:13) 1 (cid:88) T (cid:16)Λ(cid:98) (cid:48)x t (cid:17)(cid:16) (cid:17)(cid:48) (cid:13) (cid:13) (cid:13) (cid:13)T2 F(cid:98)t −JF t JF t (cid:13) (cid:13) = (cid:13) (cid:13)T2 n −JF t JF t (cid:13) (cid:13) t=1 t=1 ≤ (cid:13) (cid:13) (cid:13) (cid:13)T 1 2 (cid:88) T (cid:18) Λ(cid:98) n (cid:48)Λ −J (cid:19) F t F(cid:48) t J (cid:13) (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) (cid:13)T 1 2 (cid:88) T Λ(cid:98) (cid:48)ξ n t F(cid:48) t J (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 ≤ (cid:13) (cid:13) (cid:13) (cid:13) Λ(cid:98) n (cid:48)Λ −J (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)T 1 2 (cid:88) T F t F(cid:48) t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)J (cid:13) (cid:13)+ T 1 (cid:88) T (cid:13) (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)ξ n t T F(cid:48) t J (cid:13) (cid:13) (cid:13) (cid:13) = O p (cid:18) max (cid:18) √ 1 n , √ 1 T (cid:19)(cid:19) , (A49) t=1 t=1 where we used Lemmas A3, A5(ii) and A6(i). By combining (A48) and (A49) we prove part (i). Parts (ii), (iii) and (iv) are proved in the same way as part (i), using Lemma A3 and the results for the stationary process ∆F in Lemmas A5(i), A6(ii), and A7(ii). t 50

Now, consider part (v): (cid:13) T T (cid:13) (cid:13) T (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) ∆F(cid:98)t F(cid:98) (cid:48) t−1 βˇ− T 1 (cid:88) ∆Fˇ t Fˇ(cid:48) t−1 βˇ(cid:13) (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:13)T 1 (cid:88)(cid:16) ∆F(cid:98)t −J∆F t (cid:17)(cid:16) F(cid:98)t−1 −JF t−1 (cid:17)(cid:48) βˇ(cid:13) (cid:13) (cid:13) t=1 t=1 t=1 (cid:13) T (cid:13) (cid:13) T (cid:13) + (cid:13) (cid:13) 1 (cid:88)(cid:16) ∆F(cid:98)t −J∆F t (cid:17)(cid:16) βˇ(cid:48)JF t−1 (cid:17)(cid:48)(cid:13) (cid:13)+ (cid:13) (cid:13) 1 (cid:88)(cid:16) J∆F t (cid:17)(cid:16) F(cid:98)t−1 −JF t−1 (cid:17)(cid:48) βˇ(cid:13) (cid:13). (A50) (cid:13)T (cid:13) (cid:13)T (cid:13) t=1 t=1 Similarly to (A48), from (6) and (15), the first term on the rhs of (A50) is such that (cid:13) (cid:13) (cid:13) 1 (cid:88) T (cid:16) ∆F(cid:98)t −J∆F t−1 (cid:17)(cid:16) βˇ(cid:48)F(cid:98)t−1 −βˇ(cid:48)JF t−1 (cid:17)(cid:48) (cid:13) (cid:13) (cid:13) = (cid:13) (cid:13) (cid:13) 1 (cid:88) T (cid:18) Λ(cid:98) (cid:48)∆x t −J∆F t (cid:19)(cid:18) Λ(cid:98) (cid:48)x t−1 −JF t−1 (cid:19)(cid:48) βˇ (cid:13) (cid:13) (cid:13) (cid:13)T (cid:13) (cid:13)T n n (cid:13) t=1 t=1 ≤ (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T Λ(cid:98) (cid:48)Λ∆ n F t F(cid:48) t−1 (cid:18) Λ n (cid:48)Λ(cid:98) −J (cid:19) βˇ+J∆F t F(cid:48) t−1 (cid:18) J− Λ n (cid:48)Λ(cid:98) (cid:19) βˇ (cid:13) (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T Λ(cid:98) (cid:48)Λ∆F n t 2 ξ t (cid:48) −1 Λ(cid:98)βˇ(cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) A2 B2 + (cid:13) (cid:13) (cid:13) 1 (cid:88) T Λ(cid:98) (cid:48)∆ξ t F(cid:48) t−1 Λ(cid:48)Λ(cid:98)βˇ(cid:13) (cid:13) (cid:13)+ (cid:13) (cid:13) (cid:13) 1 (cid:88) T J∆F t ξ t (cid:48) −1 Λ(cid:98)βˇ(cid:13) (cid:13) (cid:13)+ (cid:13) (cid:13) (cid:13) 1 (cid:88) T Λ(cid:98) (cid:48)∆ξ t F(cid:48) t−1 Jβˇ(cid:13) (cid:13) (cid:13) (cid:13)T n2 (cid:13) (cid:13)T n (cid:13) (cid:13)T n (cid:13) t=1 t=1 t=1 (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) C2 D2 E2 + (cid:13) (cid:13) (cid:13) 1 (cid:88) T Λ(cid:98) (cid:48)∆ξ t ξ t (cid:48) −1 Λ(cid:98)βˇ(cid:13) (cid:13) (cid:13). (A51) (cid:13)T n2 (cid:13) t=1 (cid:124) (cid:123)(cid:122) (cid:125) F2 Let us consider first the terms: A 2 ≤ (cid:13) (cid:13) (cid:13) (cid:13) Λ n (cid:48)Λ(cid:98) −J (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T ∆F t F(cid:48) t−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:40)(cid:13) (cid:13) (cid:13) (cid:13) Λ(cid:98) n (cid:48)Λ (cid:13) (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13)J (cid:13) (cid:13) (cid:41) (cid:13) (cid:13)βˇ(cid:13) (cid:13) = O p (cid:18) √ 1 T (cid:19) , t=1 B 2 ≤ 1 (cid:88) T (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)ξ t−1 ∆F(cid:48) t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)Λ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)βˇ(cid:13) (cid:13) = O p (ζ nT,δ ), T (cid:13) n (cid:13) (cid:13) n (cid:13) t=1 F 2 ≤ T 1 (cid:88) T (cid:13) (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)∆ξ n t ξ 2 t (cid:48) −1 Λ(cid:98) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)βˇ(cid:13) (cid:13) = O p (cid:18) ζ nT,δ max (cid:18) √ 1 n , √ 1 T (cid:19)(cid:19) , t=1 Above we used, Lemmas A3 and A5(iii) for A , Lemma A6(vi) for B , and Lemma A7(v) for F . 2 2 2 The term D behaves exactly as B , while E is O (max(n−1/2,T−1/2)) because of Lemma A6(iii). 2 2 2 p Finally, recall that from Lemma A3, we also have Λ(cid:48)Λ(cid:98) (cid:18) 1 (cid:19) = J+O √ . (A52) p n T 51

Hence, from (A52), C 2 ≤ 1 (cid:88) T (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)∆ξ t F(cid:48) t−1 Jβˇ(cid:13) (cid:13) (cid:13)+ 1 (cid:88) T (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)∆ξ t F(cid:48) t−1 (cid:13) (cid:13) (cid:13) O p (cid:18) √ 1 (cid:19) = O p (cid:18) max (cid:18) √ 1 , √ 1 (cid:19)(cid:19) . T (cid:13) n (cid:13) T (cid:13) n (cid:13) T n T t=1 t=1 Indeed, the first term on the rhs of C is O (max(n−1/2,T−1/2)) because of Lemma A6(iii), while 2 p the second term is O (max(n−1/2,T−1/2)) because of Lemma A6(iv). Therefore, the first term on p the rhs of (A50) is O (max(ζ ,T−1/2)). p nT,δ As for the second term on the rhs of (A50), since βˇ(cid:48)JF = β(cid:48)F , we have t−1 t−1 (cid:13) (cid:13) (cid:13) 1 (cid:88) T (cid:16) ∆F(cid:98)t −J∆F t (cid:17)(cid:16) βˇ(cid:48)JF t−1 (cid:17)(cid:48) (cid:13) (cid:13) (cid:13) = (cid:13) (cid:13) (cid:13) 1 (cid:88) T (cid:18) Λ(cid:98) (cid:48)∆x t −J∆F t (cid:19) (cid:16) β(cid:48)F t−1 (cid:17)(cid:48) (cid:13) (cid:13) (cid:13) (cid:13)T (cid:13) (cid:13)T n (cid:13) t=1 t=1 ≤ (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T (cid:18) Λ(cid:98) n (cid:48)Λ −J (cid:19) ∆F t F(cid:48) t−1 β (cid:13) (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T Λ(cid:98) (cid:48)∆ξ t n Fˇ(cid:48) t−1 βˇ(cid:13) (cid:13) (cid:13) (cid:13) = O p (cid:18) max (cid:18) √ 1 n , √ 1 T (cid:19)(cid:19) , (A53) t=1 t=1 where we used Lemmas A3 and A5(vi) for the first term on the rhs and Lemma A6(iii) for the second. The third term on the rhs of (A50) is such that (cid:13) (cid:13) (cid:13) 1 (cid:88) T (cid:16) J∆F t (cid:17)(cid:16) F(cid:98)t−1 −JF t−1 (cid:17)(cid:48) βˇ (cid:13) (cid:13) (cid:13) = (cid:13) (cid:13) (cid:13) 1 (cid:88) T (cid:16) J∆F t (cid:17) (cid:18) Λ(cid:98) (cid:48)x t−1 −JF t−1 (cid:19)(cid:48) βˇ (cid:13) (cid:13) (cid:13) (cid:13)T (cid:13) (cid:13)T n (cid:13) t=1 t=1 ≤ (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T J∆F t F(cid:48) t−1 (cid:18) Λ n (cid:48)Λ(cid:98) −J (cid:19) βˇ (cid:13) (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T J∆F t ξ n t (cid:48) −1 Λ(cid:98)βˇ(cid:13) (cid:13) (cid:13) (cid:13) = O p (ζ nT,δ ), (A54) t=1 t=1 since the first term on the rhs behaves exactly as A above, while the second term is O (ζ ) as 2 p nT,δ in B . By combining (A51), (A53), and (A54) we prove part (v). 2 Then consider part (vi): (cid:13) T T (cid:13) (cid:13) T (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) βˇ(cid:48)F(cid:98)t F(cid:98) (cid:48) t βˇ− T 1 (cid:88) βˇ(cid:48)Fˇ t Fˇ(cid:48) t βˇ(cid:13) (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:13)T 1 (cid:88) βˇ(cid:48) (cid:16) F(cid:98)t −JF t (cid:17)(cid:16) F(cid:98)t −JF t (cid:17)(cid:48) βˇ(cid:13) (cid:13) (cid:13) t=1 t=1 t=1 (cid:13) T (cid:13) +2 (cid:13) (cid:13) 1 (cid:88) βˇ(cid:48) (cid:16) F(cid:98)t −JF t (cid:17)(cid:16) βˇ(cid:48)JF t (cid:17)(cid:48)(cid:13) (cid:13). (A55) (cid:13)T (cid:13) t=1 52

As before, from (6) and (15), the first term on the rhs of (A55) is such that (cid:13) (cid:13) (cid:13) 1 (cid:88) T βˇ(cid:48) (cid:16) F(cid:98)t −JF t (cid:17)(cid:16) F(cid:98)t −JF t (cid:17)(cid:48) βˇ (cid:13) (cid:13) (cid:13) = (cid:13) (cid:13) (cid:13) 1 (cid:88) T βˇ(cid:48) (cid:18) Λ(cid:98) (cid:48)x t −JF t (cid:19)(cid:18) Λ(cid:98) (cid:48)x t −JF t (cid:19)(cid:48) βˇ (cid:13) (cid:13) (cid:13) (cid:13)T (cid:13) (cid:13)T n n (cid:13) t=1 t=1 ≤ (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T βˇ(cid:48)Λ(cid:98) (cid:48)Λ n F t F(cid:48) t (cid:18) Λ n (cid:48)Λ(cid:98) −J (cid:19) βˇ+βˇ(cid:48)JF t F(cid:48) t (cid:18) J− Λ n (cid:48)Λ(cid:98) (cid:19) βˇ (cid:13) (cid:13) (cid:13) (cid:13) +2 (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T βˇ(cid:48)Λ(cid:98) (cid:48)Λ n F 2 t ξ t (cid:48)Λ(cid:98)βˇ(cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) A3 B3 +2 (cid:13) (cid:13) (cid:13) 1 (cid:88) T βˇ(cid:48)JF t ξ t (cid:48)Λ(cid:98)βˇ(cid:13) (cid:13) (cid:13)+ (cid:13) (cid:13) (cid:13) 1 (cid:88) T βˇ(cid:48)Λ(cid:98) (cid:48)ξ t ξ t (cid:48)Λ(cid:98)βˇ(cid:13) (cid:13) (cid:13). (A56) (cid:13)T n (cid:13) (cid:13)T n2 (cid:13) t=1 t=1 (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) C3 D3 Since βˇ(cid:48)JF = β(cid:48)F and using (A52), we have, t t A 3 ≤ (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T β(cid:48)F t F(cid:48) t (cid:18) Λ n (cid:48)Λ(cid:98) −J (cid:19) βˇ (cid:13) (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T βˇ(cid:48)F t F(cid:48) t (cid:18) Λ n (cid:48)Λ(cid:98) −J (cid:19) βˇ (cid:13) (cid:13) (cid:13) (cid:13) O p (cid:18) √ 1 T (cid:19) t=1 t=1 + (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T β(cid:48)F t F(cid:48) t (cid:18) J− Λ n (cid:48)Λ(cid:98) (cid:19) βˇ (cid:13) (cid:13) (cid:13) (cid:13) = O p (cid:18) √ 1 T (cid:19) . t=1 Indeed, the first and third terms on the rhs are O (T−1/2) because of Lemmas A3 and Lemma p A5(v), while using the same results and (A52), the second term is (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T βˇ(cid:48)F t F(cid:48) t (cid:18) Λ n (cid:48)Λ(cid:98) −J (cid:19) βˇ (cid:13) (cid:13) (cid:13) (cid:13) O p (cid:18) √ 1 T (cid:19) = (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T βˇ(cid:48)F t F(cid:48) t (cid:18) Λ(cid:48) n Λ(cid:98)J −JJ (cid:19) Jβˇ (cid:13) (cid:13) (cid:13) (cid:13) O p (cid:18) √ 1 T (cid:19) t=1 t=1 (cid:13) T (cid:13) (cid:18) (cid:19) (cid:18) (cid:19) = (cid:13) (cid:13) (cid:13)T 1 (cid:88) βˇ(cid:48)F t F(cid:48) t β (cid:13) (cid:13) (cid:13) O p T 1 = O p T 1 . t=1 In the same way we have B 3 ≤ 2 (cid:13) (cid:13) (cid:13) 1 (cid:88) T βˇ(cid:48)JF t ξ t (cid:48)Λ(cid:98)βˇ(cid:13) (cid:13) (cid:13)+2 (cid:13) (cid:13) (cid:13) 1 (cid:88) T βˇ(cid:48)F t ξ t (cid:48)Λ(cid:98)βˇ(cid:13) (cid:13) (cid:13) O p (cid:18) √ 1 (cid:19) = O p (ζ nT,δ ), (cid:13)T n (cid:13) (cid:13)T n (cid:13) T t=1 t=1 because of Lemma A6(vii) and A6(viii). Then, C 3 ≤ 2 (cid:88) T (cid:13) (cid:13) (cid:13) βˇ(cid:48)JF t ξ t (cid:48)Λ(cid:98) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)βˇ(cid:13) (cid:13) = O p (ζ nT,δ ), T (cid:13) n (cid:13) t=1 D 3 ≤ T 1 (cid:88) T (cid:13) (cid:13) (cid:13) (cid:13) Λ(cid:98) (cid:48)ξ n t 2 ξ t (cid:48)Λ(cid:98) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)βˇ(cid:13) (cid:13) 2 = O p (ζ n 2 T,δ ), t=1 because of Lemmas A6(viii) and A7(iii). Therefore, since from Assumption 6(b), ζ2 < ζ as nT,δ nT,δ n,T → ∞, the first term on the rhs of (A55) is O (ζ ). p nT,δ 53

The second term on the rhs of (A55) is such that 2 (cid:13) (cid:13) (cid:13) 1 (cid:88) T βˇ(cid:48) (cid:16) F(cid:98)t −JF t (cid:17)(cid:16) βˇ(cid:48)JF t (cid:17)(cid:48) (cid:13) (cid:13) (cid:13) = 2 (cid:13) (cid:13) (cid:13) 1 (cid:88) T βˇ(cid:48) (cid:16)Λ(cid:98) (cid:48)x t −JF t (cid:17)(cid:16) βˇ(cid:48)JF t (cid:17)(cid:48) (cid:13) (cid:13) (cid:13) (cid:13)T (cid:13) (cid:13)T n (cid:13) t=1 t=1 ≤2 (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T (cid:18) Λ(cid:98) n (cid:48)Λ −J (cid:19) F t F(cid:48) t Jβˇ (cid:13) (cid:13) (cid:13) (cid:13) +2 (cid:13) (cid:13) (cid:13) (cid:13)T 1 (cid:88) T Λ(cid:98) (cid:48)ξ t n F(cid:48) t Jβˇ(cid:13) (cid:13) (cid:13) (cid:13) = O p (cid:18) max (cid:18) ζ nT,δ , √ 1 T (cid:19)(cid:19) , (A57) t=1 t=1 because of Lemmas A3, A5(iv) and A6(viii). By combining (A56) and (A57), we prove part (vi). Finally, parts (vii), (viii) and (ix) are proved as part (v), by noticing that (cid:107)T−1/2F (cid:107) = O (1), t p because of Lemma A2(ii). This completes the proof. (cid:3) Lemma A9 Define the matrices S(cid:98)00 = M(cid:99)00 −M(cid:99)02 M(cid:99) − 22 1M(cid:99)20 , S(cid:98)01 = M(cid:99)01 −M(cid:99)02 M(cid:99) − 22 1M(cid:99)21 , S(cid:98)11 = M(cid:99)11 −M(cid:99)12 M(cid:99) − 22 1M(cid:99)21 , where M(cid:99)10 = M(cid:99) (cid:48) 01 , M(cid:99)20 = M(cid:99) (cid:48) 02 , and M(cid:99)12 = M(cid:99) (cid:48) 21 . Denote by S ij , for i,j = 0,1, the analogous ones but computed by using Fˇ = JF . Define also βˇ = Jβ and βˇ = βˇ (βˇ(cid:48) βˇ )−1, where t t ⊥∗ ⊥ ⊥ ⊥ βˇ = Jβ such that βˇ(cid:48) βˇ = 0 . Under Assumptions 1 through 6, as n,T → ∞, ⊥ ⊥ ⊥ r−c×r (i) (cid:107)S(cid:98)00 −S 00 (cid:107) = O p (max(n−1/2,T−1/2)). (ii) (cid:107)βˇ(cid:48)S(cid:98)11 βˇ−βˇ(cid:48)S 11 βˇ(cid:107) = O p (max(ζ nT,δ ,T−1/2)); (iii) (cid:107)T−1/2βˇ(cid:48)S(cid:98)11 βˇ ⊥∗ −T−1/2βˇ(cid:48)S 11 βˇ ⊥∗ (cid:107) = O p (max(ζ nT,δ ,T−1/2)); (iv) (cid:107)T−1/2βˇ(cid:48)S(cid:98)10 S(cid:98) − 00 1S(cid:98)01 βˇ ⊥∗ −T−1/2βˇ(cid:48)S 10 S− 00 1S 01 βˇ ⊥∗ (cid:107) = O p (max(ζ nT,δ ,T−1/2)); (v) (cid:107)T−1βˇ ⊥ (cid:48) ∗ S(cid:98)10 S(cid:98) − 00 1S(cid:98)01 βˇ ⊥∗ −T−1βˇ ⊥ (cid:48) ∗ S 10 S− 00 1S 01 βˇ ⊥∗ (cid:107) = O p (max(ζ nT,δ ,T−1/2)); (vi) (cid:107)T−1βˇ ⊥ (cid:48) ∗ S(cid:98)11 βˇ ⊥∗ −T−1βˇ ⊥ (cid:48) ∗ S 11 βˇ ⊥∗ (cid:107) = O p (max(ζ nT,δ ,T−1/2)). Proof. Throughout we use the fact that (cid:107)βˇ (cid:107) = O(1). Part (i) is proved using Lemma A8(ii), ⊥∗ A8(iii) and A8(iv). For proving part (ii) we use Lemma A8(iv), A8(v) and A8(vi). Part (iii) is proved by combining part (ii) with Lemma A8(v) and A8(vi), and by noticing that (cid:107)T−1/2F (cid:107) = t O (1) from Lemma A2(ii). For proving part (iv) we combine part (i) with Lemma A8(v), A8(viii) p and A8(ix). Part (v) is proved by combining part (i) with Lemma A8(viii) and A8(ix). Finally, part (vi) follows from Lemma A8(i) and A8(ix). This completes the proof. (cid:3) Lemma A10 Consider the matrices S defined in Lemma A9, with i,j = 0,1. Define Fˇ = JF , ij t t βˇ = Jβ and the conditional covariance matrices Ωˇ = E[∆Fˇ ∆Fˇ(cid:48)|∆Fˇ ], Ωˇ = E[βˇ(cid:48)Fˇ Fˇ(cid:48) βˇ|∆Fˇ ], Ωˇ = E[∆Fˇ Fˇ(cid:48) βˇ|∆Fˇ ]. 00 t t t−1 βˇβˇ t−1 t−1 t−1 0βˇ t t−1 t−1 Under Assumptions 1, 4 and 5, as T → ∞, (i) (cid:107)S −Ωˇ (cid:107) = O (T−1/2); 00 00 p (ii) (cid:107)βˇ(cid:48)S βˇ−Ωˇ (cid:107) = O (T−1/2); 11 βˇβˇ p (iii) (cid:107)S βˇ−Ωˇ (cid:107) = O (T−1/2). 01 0βˇ p 54

Proof. For part (i), notice that (cid:16) (cid:17)−1 Ωˇ = E[∆Fˇ ∆Fˇ(cid:48)]−E[∆Fˇ ∆Fˇ(cid:48) ] E[∆Fˇ ∆Fˇ(cid:48) ] E[∆Fˇ ∆Fˇ(cid:48)] 00 t t t t−1 t−1 t−1 t−1 t (cid:16) (cid:17)−1 = Γ∆F −Γ∆F Γ∆F Γ∆F, 0 1 0 1 and 1 (cid:88) T (cid:18) 1 (cid:88) T (cid:19)(cid:18) 1 (cid:88) T (cid:19)−11 (cid:88) T S = ∆Fˇ ∆Fˇ(cid:48) − ∆Fˇ ∆Fˇ(cid:48) ∆Fˇ ∆Fˇ(cid:48) ∆Fˇ ∆Fˇ(cid:48) 00 T t t T t t−1 T t−1 t−1 T t−1 t t=1 t=2 t=2 t=2 = M −M M−1M . 00 02 22 20 Using Lemma A5(i), we have the result. Parts (ii) and (iii) are proved in the same way, but using Lemma A5(v) and A5(vi), respectively. This completes the proof. (cid:3) Proof of Lemma 4 Throughout we make use of the matrices M(cid:99)ij and M ij , with i,j = 0,1,2, defined in Lemma A8, S(cid:98)ij and S ij , with i,j = 0,1, defined in Lemma A9, and the conditional covariances Ωˇ 00 , Ωˇ βˇβˇ and Ωˇ defined in Lemma A10. Define also Ωˇ = Ωˇ(cid:48) . Finally, we denote as βˇ = Jβ the matrix of 0βˇ βˇ0 0βˇ cointegration vectors of Fˇ = JF and its orthogonal complement as βˇ , such that βˇ(cid:48) βˇ = 0 . t t ⊥ ⊥ r−c×c Let us start from part (i). Notice that if we denote the residuals of the regression of ∆F(cid:98)t and of F(cid:98)t−1 on ∆F(cid:98)t−1 as (cid:98) e 0t and (cid:98) e 1t , respectively then S(cid:98)ij = T−1(cid:80)T t=1(cid:98) e it(cid:98) e(cid:48) jt , with i,j = 0,1. Consider the generalized eigenvalues problem det (cid:0) µ (cid:98)j S(cid:98)11 −S(cid:98)10 S(cid:98) − 00 1S(cid:98)01 (cid:1) = 0, j = 1,...,r. (A58) If U(cid:98) are the normalized eigenvectors of S(cid:98) − 11 1/2 S(cid:98)10 S(cid:98) − 00 1S(cid:98)01 S(cid:98) − 11 1/2 , then P(cid:98) = S(cid:98) − 11 1/2 U(cid:98) are eigenvectors ofS(cid:98)11 −S(cid:98)10 S(cid:98) − 00 1S(cid:98)01 witheigenvaluesµ (cid:98)j .Then, theestimatorβ(cid:98)proposedbyJohansen(1991,1995) is given by the c columns of P(cid:98) corresponding to the c largest eigenvalues. Analogously define U(cid:98) 0 as the normalized eigenvectors of S 1 − 1 1/2 S 10 S− 00 1S 01 S − 11 1/2 and define P(cid:98) 0 = S 1 − 1 1/2 U(cid:98) 0. Then the estimator β(cid:98) 0 that we would obtain if estimating a VECM on Fˇ t , is the matrix of the c columns of P(cid:98) 0, corresponding to the c largest eigenvalues µ (cid:98) 0 j of S 11 −S 10 S− 00 1S 01 , and such that det (cid:0) µ0S −S S−1S (cid:1) = 0, j = 1,...,r. (A59) (cid:98)j 11 10 00 01 Noticethatbydefinitionthetwoestimatorsβ(cid:98)andβ(cid:98) 0 arenormalizedinsuchawaythatβ(cid:98) (cid:48)S(cid:98)11 β(cid:98)= I c and β(cid:98) 0(cid:48)S 11 β(cid:98) 0 = I c . Consider then the r×r matrix (cid:18) βˇ (cid:19) A = βˇ √⊥∗ , T T 55

where βˇ = βˇ (βˇ(cid:48) βˇ )−1, and consider the equations ⊥∗ ⊥ ⊥ ⊥ det (cid:2) A(cid:48) T (cid:0) µ (cid:98)j S(cid:98)11 −S(cid:98)10 S(cid:98) − 00 1S(cid:98)01 (cid:1) A T (cid:3) = 0, j = 1,...,r, (A60) det (cid:2) A(cid:48) (cid:0) µ0S −S S−1S (cid:1) A (cid:3) = 0, j = 1,...,r. (A61) T (cid:98)j 11 10 00 01 T Clearly (A60) has the same solutions as (A58), but its eigenvectors are now given by A−1P(cid:98) and T those corresponding to the largest c eigenvalues are A−1β(cid:98). Analogously for (A61) we have the T eigenvectors A−1P(cid:98) 0 and the c largest are given by A−1β(cid:98) 0. Moreover, T T A(cid:48) T (cid:0) S(cid:98)11 −S(cid:98)10 S(cid:98) − 00 1S(cid:98)01 (cid:1) A T (cid:34) (cid:35) (cid:34) (cid:35) = βˇ(cid:48)S(cid:98)11 βˇ T−1/2βˇ(cid:48)S(cid:98)11 βˇ ⊥∗ − βˇ(cid:48)S(cid:98)10 S(cid:98) − 00 1S(cid:98)01 βˇ T−1/2βˇ(cid:48)S(cid:98)10 S(cid:98) − 00 1S(cid:98)01 βˇ ⊥∗ T−1/2βˇ ⊥ (cid:48) ∗ S(cid:98)11 βˇ T−1βˇ ⊥ (cid:48) ∗ S(cid:98)11 βˇ ⊥∗ T−1/2βˇ ⊥ (cid:48) ∗ S(cid:98)10 S(cid:98) − 00 1S(cid:98)01 βˇ T−1βˇ ⊥ (cid:48) ∗ S(cid:98)10 S(cid:98) − 00 1S(cid:98)01 βˇ ⊥∗ (cid:20) βˇ(cid:48)S βˇ T−1/2βˇ(cid:48)S βˇ (cid:21) (cid:20) βˇ(cid:48)S S−1S βˇ T−1/2βˇ(cid:48)S S−1S βˇ (cid:21) = 11 11 ⊥∗ − 10 00 01 10 00 01 ⊥∗ +O (ϑ ) T−1/2βˇ(cid:48) S βˇ T−1βˇ(cid:48) S βˇ T−1/2βˇ(cid:48) S S−1S βˇ T−1βˇ(cid:48) S S−1S βˇ p nT,δ ⊥∗ 11 ⊥∗ 11 ⊥∗ ⊥∗ 10 00 01 ⊥∗ 10 00 01 ⊥∗ = A(cid:48) (cid:0) S −S S−1S (cid:1) A +O (ϑ ). (A62) T 11 10 00 01 T p nT,δ This result is proved by using Lemma A9(ii), A9(iii) and A9(vi) for the first term on the rhs, and by using Lemma A9(i), A9(iv) and A9(v) for the second term. Thus, from (A62), for any j = 1,...,r, from Weyl’s inequality (A3), we have (cid:12) (cid:12)µ (cid:98)j −µ (cid:98) 0 j (cid:12) (cid:12) ≤ (cid:13) (cid:13)A(cid:48) T (cid:0) S(cid:98)11 −S(cid:98)10 S(cid:98) − 00 1S(cid:98)01 (cid:1) A T −A(cid:48) T (cid:0) S 11 −S 10 S− 00 1S 01 (cid:1) A T (cid:13) (cid:13) = O p (ϑ nT,δ ). (A63) Moreover, always from (A62) and similarly to (A27), it can be shown that, by Theorem 2 in Yu et al. (2015), we have (notice that µ0 are all positive since they are eigenvalues of a positive (cid:98)j definite matrix) (cid:13) (cid:13)A− T 1P(cid:98) −A− T 1P(cid:98) 0J r (cid:13) (cid:13) = O p (ϑ nT,δ ), (A64) where J is a diagonal r×r matrix with entries 1 or −1, different from J. r Then, from Lemmas A5(ii) and A10, (A62), and Slutsky’s theorem, as n,T → ∞, we have (see also Lemma 13.1 in Johansen, 1995) (cid:20) (cid:21) (cid:20) (cid:21) (cid:16) (cid:17) (cid:16) (cid:17) det A(cid:48) T µ (cid:98)j S(cid:98)11 −S(cid:98)10 S(cid:98) − 00 1S(cid:98)01 A T = det A(cid:48) T µ (cid:98) 0 j S 11 −S 10 S− 00 1S 01 A T +O p (ϑ nT,δ ) (A65) (cid:18) (cid:19) (cid:20) (cid:16) (cid:17)1/2 (cid:18)(cid:90) 1 (cid:19) (cid:16) (cid:17)1/2 (cid:21) → d det µ0Ωˇ −Ωˇ Ωˇ−1Ωˇ det µ0βˇ(cid:48) Γ∆F W (τ)W(cid:48)(τ)dτ Γ∆F βˇ . (cid:98)j βˇβˇ βˇ0 00 0βˇ (cid:98)j ⊥∗ L0 r r L0 ⊥∗ 0 where W (·) is an r-dimensional Brownian motion with covariance of rank q−d = r−c. The first r term on the rhs of (A65) has only c solutions different from zero (the matrix is positive definite) while the remaining r −c solutions come from the second term and are all zero. Therefore, as n,T → ∞ both A−1P(cid:98) and A−1P(cid:98) 0 span a space of dimension c given by their first c eigenvectors. T T 56

This, jointly with (A64), implies that the two spaces coincide asymptotically (cid:13) (cid:13)A− T 1β(cid:98)−A− T 1β(cid:98) 0J c (cid:13) (cid:13) = O p (ϑ nT,δ ). (A66) where J is a c×c diagonal matrix with entries 1 or −1, different from J and J . c r Now, by projecting β(cid:98) onto the space spanned by (βˇ,βˇ ⊥ ), we can write β(cid:98)= βˇ(βˇ(cid:48)βˇ)−1βˇ(cid:48)β(cid:98)+βˇ (βˇ(cid:48) βˇ )−1βˇ(cid:48) β(cid:98)= βˇβˇ(cid:48)β(cid:98)+βˇ βˇ(cid:48) β(cid:98) ⊥ ⊥ ⊥ ⊥ ∗ ⊥∗ ⊥ where, βˇ ∗ = βˇ(βˇ(cid:48)βˇ)−1 and βˇ ⊥∗ = βˇ ⊥ (βˇ ⊥ (cid:48) βˇ ⊥ )−1. Analogously we have a similar projection for β(cid:98) 0 and we define the transformed estimators β(cid:101)= β(cid:98)(βˇ(cid:48)β(cid:98))−1 = βˇ+βˇ βˇ(cid:48) β(cid:101), β(cid:101) 0 = β(cid:98) 0(βˇ(cid:48)β(cid:98) 0)−1 = βˇ+βˇ βˇ(cid:48) β(cid:101) 0. (A67) ∗ ⊥∗ ⊥ ∗ ⊥∗ ⊥ From Lemma 13.1 in Johansen (1995), we have (recall that βˇ(cid:48) βˇ = 0 ) ⊥ r−c×c (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) A− T 1β(cid:101) 0 = A− T 1(cid:0) βˇ+βˇ ⊥∗ βˇ ⊥ (cid:48) β(cid:101) 0(cid:1) = √ Tβ I ˇ c ⊥ (cid:48) β(cid:101) 0 = √ Tβˇ ⊥ (cid:48) ( I β c (cid:101) 0−βˇ) = o p I ( c 1) , (A68) since A−1β(cid:101) 0 spans a space of dimension c. In the same way, we have T (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) I I I A− T 1β(cid:101)= √ Tβˇ c ⊥ (cid:48) β(cid:101) = √ Tβˇ ⊥ (cid:48) ( c β(cid:101)−βˇ) = √ Tβˇ ⊥ (cid:48) (β(cid:101) 0−βˇ)+ c√ Tβˇ ⊥ (cid:48) (β(cid:101)−β(cid:101)0 ) . (A69) Now since sp(A−1β(cid:101)) = sp(A−1β(cid:98)), also (A69) spans a space of dimension c. Then by comparing T T (A68) and (A69), and using (A66), and since also sp(A−1β(cid:101) 0) = sp(A−1β(cid:98) 0), we have T T √ (cid:13) (cid:13) Tβˇ ⊥ (cid:48) (β(cid:101)−β(cid:101)0 ) (cid:13) (cid:13) = (cid:13) (cid:13)A− T 1β(cid:101)−A− T 1β(cid:101) 0(cid:13) (cid:13) = O p (ϑ nT,δ ). (A70) Therefore, given that (cid:107)βˇ(cid:48) (cid:107) = O(1) and given (A68) and (A70), we have ⊥ (cid:18) (cid:19) (cid:18) (cid:19) (cid:13) (cid:13)β(cid:101)−βˇ(cid:13) (cid:13) ≤ (cid:13) (cid:13)β(cid:101) 0−βˇ(cid:13) (cid:13)+ (cid:13) (cid:13)β(cid:101) 0−β(cid:101) (cid:13) (cid:13) = o p √ 1 +O p ϑ √ nT,δ . (A71) T T From (A67), we can always define a c × c orthogonal matrix Q such that β(cid:101)Q = β(cid:98) (see also pp.179-180 in Johansen, 1995, for a discussion about the choice of the identification matrix Q). Therefore, we have (cid:18) (cid:19) (cid:13) (cid:13)β(cid:98)−βˇQ (cid:13) (cid:13) = O p ϑ √ nT,δ , T which completes the proof of part (i). Once we have β(cid:98), the other parameters are estimated by linear regression as α (cid:98) = S(cid:98)01 β(cid:98) (cid:0) β(cid:98) (cid:48)S(cid:98)11 β(cid:98) (cid:1)−1 , G(cid:98)1 = (cid:0) M(cid:99)02 −α (cid:98) β(cid:98) (cid:48)M(cid:99)12 (cid:1) M(cid:99) − 22 1. (A72) For part (ii), first notice that, by definition from a VECM for F we have t α = E[∆F F(cid:48) β|∆F ] (cid:0) E[β(cid:48)F F(cid:48) β|∆F ] (cid:1)−1 t t−1 t−1 t t−1 t−1 57

Therefore, since conditioning on ∆F is equivalent to conditioning on J∆F = ∆Fˇ and t−1 t−1 t−1 β(cid:48)F = βˇ(cid:48)Fˇ , we immediately have t t αˇ = Hα =HE[∆F Fˇ(cid:48) βˇ|∆Fˇ ] (cid:0) E[βˇ(cid:48)Fˇ Fˇ(cid:48) βˇ|∆Fˇ ] (cid:1)−1 t t−1 t−1 t t−1 t−1 =E[∆Fˇ Fˇ(cid:48) βˇ|∆Fˇ ] (cid:0) E[βˇ(cid:48)Fˇ Fˇ(cid:48) βˇ|∆Fˇ ] (cid:1)−1 = Ωˇ Ωˇ−1. t t−1 t−1 t t−1 t−1 0βˇ βˇβˇ Then, (cid:13) (cid:13)S(cid:98)01 β(cid:98)−Ωˇ 0βˇ Q (cid:13) (cid:13) ≤ (cid:13) (cid:13)S(cid:98)01 (β(cid:98)−βˇQ) (cid:13) (cid:13)+ (cid:13) (cid:13)S(cid:98)01 βˇQ−S 01 βˇQ (cid:13) (cid:13)+ (cid:13) (cid:13)S 01 βˇQ−Ωˇ 0βˇ Q (cid:13) (cid:13) = O p (ϑ nT,δ ), (A73) using part (i) and the fact that (cid:107)S(cid:98)01 (cid:107) = O p (T1/2) for the first term on the rhs, Lemma A9(iv) for the second term, and Lemma A10(iii) for the third term. Analogously we have (cid:13) (cid:13)β(cid:98) (cid:48)S(cid:98)11 β(cid:98)−Q(cid:48)Ωˇ βˇβˇ Q (cid:13) (cid:13) ≤ (cid:13) (cid:13)(β(cid:98) (cid:48)−Q(cid:48)βˇ(cid:48))S(cid:98)11 (β(cid:98)−βˇQ) (cid:13) (cid:13)+ (cid:13) (cid:13)Q(cid:48)βˇ(cid:48)S(cid:98)11 βˇQ−Q(cid:48)βˇ(cid:48)S 11 βˇQ (cid:13) (cid:13) + (cid:13) (cid:13)Q(cid:48)βˇ(cid:48)S 11 βˇQ−Q(cid:48)Ωˇ βˇβˇ Q (cid:13) (cid:13) = O p (ϑ nT,δ ), (A74) using part (i) and the fact that (cid:107)S(cid:98)11 (cid:107) = O p (T) for the first term, Lemma A9(ii) for the second term, and Lemma A10(ii) for the third term. Therefore, from (A72), (A73), and (A74), and since Q is orthogonal, we have (cid:13) (cid:13) (cid:13)α (cid:98) −αˇQ(cid:13) = O p (ϑ nT,δ ), which proves part (ii). For part (iii), notice that, by definition, we have: Gˇ = HG H(cid:48) = (cid:0) Γ∆Fˇ −αˇE[βˇ(cid:48)Fˇ ∆Fˇ(cid:48) ] (cid:1) (Γ∆Fˇ )−1. (A75) 1 1 1 t−1 t−1 0 Then, from (A72), (cid:13) (cid:13)G(cid:98)1 −Gˇ 1 (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:0) M(cid:99)02 −α (cid:98) β(cid:98) (cid:48)M(cid:99)12 (cid:1) M(cid:99) − 22 1− (cid:0) M(cid:99)02 −αˇβˇ(cid:48)M(cid:99)12 (cid:1) M(cid:99) − 22 1(cid:13) (cid:13) + (cid:13) (cid:13) (cid:0) M(cid:99)02 −αˇβˇ(cid:48)M(cid:99)12 (cid:1) M(cid:99) − 22 1− (cid:0) M 02 −αˇβˇ(cid:48)M 12 (cid:1) M− 22 1(cid:13) (cid:13) + (cid:13) (cid:13) (cid:0) M 02 −αˇβˇ(cid:48)M 12 (cid:1) M− 22 1− (cid:0) Γ 1 ∆Fˇ −αˇE[βˇ(cid:48)Fˇ t−1 ∆Fˇ(cid:48) t−1 ] (cid:1) (Γ∆ 0 Fˇ )−1(cid:13) (cid:13) = O p (ϑ nT,δ ), since the first term on the rhs is O (ϑ ) by parts (i) and (ii) and since αˇQQ(cid:48)βˇ(cid:48) = αˇβˇ(cid:48), the p nT,δ second term is O (ϑ ) by Lemma A8(i), A8(iv) and A8(vii), and the third term is O (T−1/2) p nT,δ p by Lemmas A1 and A5(vi), and in particular by (A12) and (A45). This, together with (A75), proves part (iii). Finally, for part (iv), first notice that the sample covariance of the VECM residuals w = (cid:98)t ∆F(cid:98)t −α (cid:98) β(cid:98) (cid:48)F(cid:98)t−1 −G(cid:98)1 ∆F(cid:98)t−1 is also written as T T 1 (cid:88) 1 (cid:88) Γ(cid:98) w 0 = T w (cid:98)t w (cid:98)t (cid:48) = T (∆F(cid:98)t −α (cid:98) β(cid:98) (cid:48)F(cid:98)t−1 −G(cid:98)1 ∆F(cid:98)t−1 )(∆F(cid:98)t −α (cid:98) β(cid:98) (cid:48)F(cid:98)t−1 −G(cid:98)1 ∆F(cid:98)t−1 )(cid:48) t=1 t=1 =M(cid:99)00 +α (cid:98) β(cid:98) (cid:48)M(cid:99)11 β(cid:98)α (cid:98) (cid:48)+G(cid:98)1 M(cid:99)22 G(cid:98) (cid:48) 1 −M(cid:99)01 β(cid:98)α (cid:98) (cid:48)−α (cid:98) β(cid:98) (cid:48)M(cid:99)12 G(cid:98) (cid:48) 1 −α (cid:98) β(cid:98) (cid:48)M(cid:99)10 −G(cid:98)1 M(cid:99)20 −G(cid:98)1 M(cid:99)21 β(cid:98)α (cid:98) (cid:48). 58

Then from parts (i), (ii) and (iii), Lemma A8(ii) through A8(vii), and Lemma A5(i) and A5(vi), we can prove that (cid:13) (cid:13)Γ(cid:98) w 0 −JΓw 0 J (cid:13) (cid:13) = O p (ϑ nT,δ ), (A76) where Γw = E (cid:2) w w(cid:48)(cid:3) = E (cid:2) (∆F −αβ(cid:48)F −G ∆F )(∆F −αβ(cid:48)F −G ∆F )(cid:48)(cid:3) . 0 t t t t−1 1 t−1 t t−1 1 t−1 Notice that by (16), we have w = Ku , therefore, since the shocks u are orthonormal by t t t Assumption 4, we have Γw = KK(cid:48). Moreover, from Assumption 1 and the model given in (8), 0 K = C(0) = Q(0) has rank q and so Γw has also rank q. We denote as µw the eigenvalues of Γw, 0 j 0 thus µw = 0 if and only if j > q. These are also eigenvalues of JΓwJ. As a consequence, having j 0 defined as µ (cid:98) w j the eigenvalues of Γ(cid:98) w 0 , from (A76) and Weyl’s inequality (A3), we have (cid:12) (cid:12)µ (cid:98) w j −µw j (cid:12) (cid:12) ≤ (cid:13) (cid:13)Γ(cid:98) w 0 −JΓw 0 J (cid:13) (cid:13) = O p (ϑ nT,δ ), j = 1,...,q. (A77) If we denote by W the r×q matrix of non-zero normalised eigenvectors of Γw, then JW are the q 0 q normalised eigenvectors of JΓw 0 J. We denote as W(cid:99)q the r×q matrix of normalised eigenvectors of Γ(cid:98) w. Then, from (A76) by Theorem 2 in Yu et al. (2015), we can prove that 0 (cid:13) (cid:13) (cid:13)W(cid:99)q −JW q J q(cid:13) = O p (ϑ nT,δ ), (A78) where J is a diagonal q ×q matrix with entries 1 or -1, different from J. Notice that JW J q q q are also normalised eigenvectors of JΓw 0 J. From, the definition of K(cid:98) = W(cid:99)q D(cid:98) − q 1/2 and (A77) and (A78), we have (cid:13) (cid:13)K(cid:98) −JW q J q D q −1/2(cid:13) (cid:13) = O p (ϑ nT,δ ), (A79) whereD isadiagonalmatrixwithentriesµw forj = 1,...,q andW containsthecorresponding q j q −1/2 eigenvectors. For any q ×q orthogonal matrix R such that K = W J D R, by substituting q q q in (A79), we have the result. Notice that K(cid:48)ΓwK = I as requested by Assumption 1(a) of 0 q orthonormality of the shocks. This completes the proof. (cid:3) A.5 Proof of Proposition 1 The estimated VECM with p = 1 can always be written as a VAR(2) with estimated matrix polynomial, A(cid:98)VECM(L) = I r −A(cid:98)V 1 ECML−A(cid:98)V 2 ECML2, where A(cid:98)V 1 ECM = G(cid:98)1 +α (cid:98) β(cid:98) (cid:48)+I r , and A(cid:98)V 2 ECM = −G(cid:98)1 . Then, from Lemma 4(i), 4(ii) and 4(iii), we have, for k = 1,2, (cid:13) (cid:13) (cid:13)A(cid:98)V k ECM−JA k J(cid:13) = O p (ϑ nT,δ ). (A80) Define the infinite matrix polynomial ∞ (cid:104) (cid:105)−1 (cid:88) B(cid:98)(L) = A(cid:98)VECM(L) = (I r −A(cid:98)V 1 ECML−A(cid:98)V 2 ECML2)−1 = B(cid:98)k Lk, k=0 such that B(cid:98)(0) = I r , B(cid:98)1 = A(cid:98)V 1 ECM, B(cid:98)2 = (A(cid:98)V 1 ECMB(cid:98)1 +A(cid:98)V 2 ECM), B(cid:98)3 = (A(cid:98)V 1 ECMB(cid:98)2 +A(cid:98)V 2 ECMB(cid:98)1 ), and so on. Then, from (A80), we have, for any k ≥ 0, (cid:13) (cid:13) (cid:13)B(cid:98)k −JB k J(cid:13) = O p (ϑ nT,δ ). (A81) 59

The estimated impulse response of variable i is then a q-dimensional row vector defined as (see (21)) φ(cid:98)VECM(cid:48) (L) = λ(cid:98) (cid:48)B(cid:98)(L)K(cid:98)R(cid:98) (cid:48), i i where λ(cid:98) (cid:48) is the i-th row of Λ(cid:98). i The matrix R is estimated by R(cid:98) ≡ R(cid:98)(Λ(cid:98),A(cid:98)VECM(L),K(cid:98)). To estimate this mapping we have to impose q(q+1)/2 restrictions on the IRFs, i.e. at most only on q(q+1)/2 variables. So R(cid:98) depends only on q(q +1)/2 rows of Λ(cid:98) and for regular identification schemes, such that this mapping is analytical, using Lemmas 3(i) and 4(iv), and (A80), we have (see Forni et al., 2009) (cid:13) (cid:13) (cid:13)R(cid:98) −R(cid:13) = O p (ϑ nT,δ ). (A82) Finally, from Lemma 3(i), we have, for any i ∈ N, (cid:18) (cid:18) (cid:19)(cid:19) (cid:13) (cid:13)λ(cid:98) (cid:48) i −λ(cid:48) i H(cid:48)(cid:13) (cid:13) = O p max √ 1 T , √ 1 n . (A83) Therefore, for any i ∈ N and k ≥ 0, we have (cid:13) (cid:13)φ(cid:98)V ik ECM(cid:48) −φV ik ECM(cid:48)(cid:13) (cid:13) = (cid:13) (cid:13)λ(cid:98) (cid:48) i B(cid:98)k K(cid:98)R(cid:98) (cid:48)−λ(cid:48) i B k K (cid:13) (cid:13) = (cid:13) (cid:13)(λ(cid:98) (cid:48) i −λ(cid:48) i J+λ(cid:48) i J)(B(cid:98)k −JB k J+HB k J)(K(cid:98) −JKR+JKR)(R(cid:98) (cid:48)−R(cid:48)+R(cid:48))−λ(cid:48) i B k K (cid:13) (cid:13) ≤ (cid:13) (cid:13)λ(cid:98) (cid:48) i −λ(cid:48) i J (cid:13) (cid:13) (cid:13) (cid:13)HB k JJKRR(cid:48)(cid:13) (cid:13)+ (cid:13) (cid:13)λ(cid:48) i J (cid:13) (cid:13) (cid:13) (cid:13)B(cid:98)k −JB k J (cid:13) (cid:13) (cid:13) (cid:13)JKRR(cid:48)(cid:13) (cid:13) + (cid:13) (cid:13)λ(cid:48) i JJB k J (cid:13) (cid:13) (cid:13) (cid:13)K(cid:98) −JKR (cid:13) (cid:13) (cid:13) (cid:13)R(cid:48)(cid:13) (cid:13)+ (cid:13) (cid:13)λ(cid:48) i JJB k JJKR (cid:13) (cid:13) (cid:13) (cid:13)R(cid:98) (cid:48)−R(cid:48)(cid:13) (cid:13) (cid:18) (cid:18) (cid:19)(cid:19) + (cid:13) (cid:13)λ(cid:48) i JJB k JJKRR(cid:48)−λ(cid:48) i B k K (cid:13) (cid:13)+o p (ϑ nT,δ ) = O p max √ 1 T , √ 1 n +O p (ϑ nT,δ ), where we used (A81), (A82), and (A83), orthogonality of R, JJ = I , and the fact that R, K, r B , λ are all finite dimensional matrices with norm that does not depend on n or T. By (17) it k i is clear that the upper bound on the rate of convergence is ϑ . This completes the proof. (cid:3) nT,δ A.6 Proof of Lemma 5 Define the r ×r transformation D = (β(cid:48) β(cid:48) )(cid:48), where β is the r ×c cointegration vector of F , ⊥ t and β is such that β(cid:48) β = 0 . Then, the vector process Z = DF , is partitioned into an ⊥ ⊥ r−c×r t t I(0) vector Z = β(cid:48)F and an I(1) vector Z = β(cid:48) F . The vectors Z and Z are orthogonal. 0t t 1t ⊥ t 0t 1t Now consider the models for F , Z , and Z : t 0t 1t F = A F +w , Z = Q F +β(cid:48)w , Z = Q F +β(cid:48) w , t 1 t−1 t 0t 0 t−1 t 1t 1 t−1 ⊥ t where Q is c×r and Q is r−c×r, and w = Ku . Denote the ordinary least squares estimators 0 1 t t of the above models, when using F t , as A(cid:98) 1 1 VAR, Q(cid:98)0 , and Q(cid:98)1 . Then, (cid:13) (cid:13)Q(cid:98)0 −Q 0 (cid:13) (cid:13) = (cid:13) (cid:13) (cid:13) (cid:13) (cid:18) T 1 (cid:88) T β(cid:48)F t−1 u(cid:48) t K(cid:48)β (cid:19)(cid:18) T 1 (cid:88) T β(cid:48)F t−1 F(cid:48) t−1 β (cid:19)−1(cid:13) (cid:13) (cid:13) (cid:13) = O p (cid:18) √ 1 T (cid:19) . (A84) t=1 t=1 Indeed,thefirsttermontherhsisO (T−1/2)from(A38)andbyindependenceofu inAssumption p t 60

4(a), while the second term is O (1) by Lemma A5(v). Similarly, p (cid:13) (cid:13)Q(cid:98)1 −Q 1 (cid:13) (cid:13) = (cid:13) (cid:13) (cid:13) (cid:13) (cid:18) T 1 2 (cid:88) T β ⊥ (cid:48) F t−1 u(cid:48) t K(cid:48)β ⊥ (cid:19)(cid:18) T 1 2 (cid:88) T β ⊥ (cid:48) F t−1 F(cid:48) t−1 β ⊥ (cid:19)−1(cid:13) (cid:13) (cid:13) (cid:13) = O p (cid:18) T 1 (cid:19) . (A85) t=1 t=1 Indeed, the first term on the rhs is O (T−1) from (A38) and by independence of u in Assumption p t 4(a), while the second term is O (1) by Lemma A5(ii). Moreover, p (cid:32) (cid:33) vec (cid:0) A(cid:98) 1 1 VAR (cid:1) = (I r ⊗D(cid:48)) v v e e c c ( ( Q Q (cid:98) (cid:98) 1 0 ) ) . (A86) Analogous formulas to (A84)-(A86) are in Theorem 1 by Sims et al. (1990) and, by combining them, (cid:18) (cid:19) (cid:13) (cid:13)A(cid:98) 1 1 VAR−A 1 (cid:13) (cid:13) = O p √ 1 . (A87) T Notice that of the r2 parameters in A , cr in Q are estimated consistently with rate O (T−1/2), 1 0 p while (r−c)r in Q with rate O (T−1). 1 p If we now denote as A(cid:98) 0 1 VAR the ordinary least square estimator for the VAR when using JF t , then A(cid:98) 0VAR = JA(cid:98) 1VARJ, and from (A87) 1 1 (cid:18) (cid:19) (cid:13) (cid:13)A(cid:98) 0 1 VAR−JA 1 J (cid:13) (cid:13) = O p √ 1 . (A88) T Define T T 1 (cid:88) 1 (cid:88) M(cid:99)1L = T F(cid:98)t F(cid:98) (cid:48) t−1 , M(cid:99)LL = T F(cid:98)t−1 F(cid:98) (cid:48) t−1 . (A89) t=1 t=1 Then, we can write the VAR estimators as A(cid:98)VAR = M(cid:99)1L (cid:32) M(cid:99)LL (cid:33)−1 , A(cid:98) 0VAR = M 1L (cid:18) M LL (cid:19)−1 , (A90) 1 T T 1 T T where M and M are defined as in (A89), but when using JF . 1L LL t Because of Lemma A8(i), we have (cid:13) (cid:13) (cid:18) (cid:18) (cid:19)(cid:19) (cid:13) (cid:13) (cid:18) (cid:18) (cid:19)(cid:19) (cid:13) (cid:13) M(cid:99)1L − M 1L(cid:13) (cid:13) = O p max √ 1 , √ 1 , (cid:13) (cid:13) M(cid:99)LL − M LL(cid:13) (cid:13) = O p max √ 1 , √ 1 , (cid:13) T T (cid:13) n T (cid:13) T T (cid:13) n T thus (cid:18) (cid:18) (cid:19)(cid:19) (cid:13) (cid:13)A(cid:98)V 1 AR−A(cid:98) 0 1 VAR (cid:13) (cid:13) = O p max √ 1 n , √ 1 T . (A91) By combining (A91) with (A88) (cid:18) (cid:18) (cid:19)(cid:19) (cid:13) (cid:13)A(cid:98)V 1 AR−JA 1 J (cid:13) (cid:13) ≤ (cid:13) (cid:13)A(cid:98)V 1 AR−A(cid:98) 0 1 VAR (cid:13) (cid:13)+ (cid:13) (cid:13)A(cid:98) 0 1 VAR−JA 1 J (cid:13) (cid:13) = O p max √ 1 n , √ 1 T , (A92) 61

which completes the proof of part (i). By noticing that, as a consquence of part (i), (A76) holds also in this case, but with the rate given in (A92), we prove part (iii) exactly as in Lemma 4(iv). This completes the proof. (cid:3) A.7 Proof of Proposition 2 Define ∞ (cid:104) (cid:105)−1 (cid:88) B(cid:98)(L) = A(cid:98)VAR(L) = (I r −A(cid:98)V 1 ARL)−1 = B(cid:98)k Lk, k=0 such that B(cid:98)k = (A(cid:98)V 1 AR)k. Then, from Lemma 5(i), we have, for any finite k ≥ 0, (cid:18) (cid:18) (cid:19)(cid:19) (cid:13) (cid:13) 1 1 (cid:13)B(cid:98)k −JB k J(cid:13) = O p max √ , √ . (A93) n T If instead k → ∞, then B(cid:98)k has as limit for n,T → ∞ a random variable rather than B k (see Theorem 3.2 in Phillips, 1998), hence lim (cid:107)B(cid:98)k −B k (cid:107) = O p (1). (A94) k→∞ The estimated impulse response of variable i is then the q-dimensional row vector (see (24)) φ(cid:98)VAR(cid:48) (L) = λ(cid:98) (cid:48)B(cid:98)(L)K(cid:98)R(cid:98) (cid:48), (A95) i i where λ(cid:98) (cid:48) is the i-th row of Λ(cid:98) and R(cid:98) ≡ R(cid:98)(Λ(cid:98),A(cid:98)VAR(L),K(cid:98)) is a consistent estimator of the matrix i R, such that, because of Lemmas 3(i) and 5(i) (see also the proof of Proposition 1), (cid:18) (cid:18) (cid:19)(cid:19) (cid:13) (cid:13) 1 1 (cid:13)R(cid:98) −R(cid:13) = O p max √ , √ . (A96) n T Notice that (A96) is true provided that we do not consider long-run restrictions as identification schemes, since in that case R would be a function of AVAR(1) and it is not consistently estimated because of (A94). Consistency of the estimated IRFs (A95), at each finite lag k, is then proved exactly as in the proof of Proposition 1. (cid:3) A.8 Proof of Lemma 6 For any i = 1,...,n, recall that we defined x = a +λ(cid:48)F +ξ so that y = b t+x . The proof it i i t it it i it of part (i) is straightforward since it amounts to using the sample mean as an estimator of the mean of the stationary and ergodic process ∆y . it For part (ii), define y¯ = (T+1)−1(cid:80)T y and x¯ = (T+1)−1(cid:80)T x , then y¯ = x¯ +b T/2. i t=0 it i t=0 it i i i From least squares trend slope estimator,(cid:98)b i , in (27) we have (cid:80)T (t− T)(y −y¯) (cid:80)T (t− T)(x −x¯ ) (cid:80)T tx − T (cid:80)T x (cid:98)b i −b i = t= (cid:80) 0 T (t 2 − T it )2 i −b i = t= (cid:80) 0 T ( 2 t− T it )2 i = (cid:80) t=0 T i t t 2− 2 T2(T t + = 1 0 ) it . (A97) t=0 2 t=0 2 t=0 4 62

Thedenominatorof (A97)isO(T3). Forthenumerator,considerfirstthecaseinwhichx ∼ I(1), it then under Assumptions 4(a) and 4(c) of serial independence of the shocks, by Proposition 17.1 parts d and f in Hamilton (1994) we have, as T → ∞, T T 1 (cid:88) 1 (cid:88) x = O (1), tx = O (1). it p it p T3/2 T5/2 t=0 t=0 When x ∼ I(0), then, by Proposition 17.1 parts a and c in Hamilton (1994) we have, as T → ∞, it T T 1 (cid:88) 1 (cid:88) x = O (1), tx = O (1). it p it p T1/2 T3/2 t=0 t=0 Therefore, by multiplying and dividing (A97) by T3 we have the result both for x ∼ I(1) and it for x ∼ I(0). This completes the proof. (cid:3) it A.9 Proof of Lemma 7 For part (i) we can follow a reasoning similar to Lemma 2(i). The spectral density matrix of the first difference of the common factors can be written as Σ∆F(θ) = (2π)−1C(e−iθ)C(cid:48)(e−iθ) and, since rk(C(e−iθ)) = q a.e. in [−π,π], then it has q non-zero real eigenvalues and r−q zero eigenvalues. Notice also that we have rk(C(e−iθ)) ≤ q for any θ ∈ [−π,π]. Moreover, given square summability of the coefficients of C(L) as a consequence of Assumption 1(a), the non-zero eigenvalues are also finite for any θ ∈ [−π,π]. Thus, by denoting as µ∆F(θ) such eigenvalues, j there exist positive reals M and M such that a.e. in [−π,π] 10 10 M ≤ µ∆F(θ) ≤ M , j = 1,...,q. (A98) 10 j 10 Therefore, we can write Σ∆F(θ) = W∆F(θ)M∆F(θ)W∆F(cid:48)(θ), where W∆F(θ) is the r×q matrix ofnormalizedeigenvectors, i.e. suchthatW∆F(cid:48)(θ)W∆F(θ) = I foranyθ ∈ [−π,π], andM∆F(θ) q is the corresponding q×q diagonal matrix of eigenvalues. Define L(θ) = ΛW∆F(θ)(M∆F(θ))1/2 for any θ ∈ [−π,π]. Then the spectral density matrix of the first differences of the common component is given by Σ∆χ(θ) 1 1 L(θ)L(cid:48)(θ) = ΛΣ∆F(θ)Λ(cid:48) = ΛW∆F(θ)M∆F(θ)W∆F(cid:48)(θ)Λ(cid:48) = , θ ∈ [−π,π]. n n n n Moreover, since because of (14), n−1Λ(cid:48)Λ = I r L(cid:48)(θ)L(θ) = M∆F(θ), θ ∈ [−π,π]. (A99) n Therefore, a.e. in [−π,π] the non-zero dynamic eigenvalues of Σ∆χ(θ) are the same as those of L(cid:48)(θ)L(θ), and from (A99), we have for any n and a.e. in [−π,π], n−1µ∆χ(θ) = µ∆F(θ), for any j j j = 1,...,r. Part (i) then follows from (A98). As for part (ii), from Assumption 3(b), for any θ ∈ [−π,π], there exists a positive real M 1 63

such that (cid:12) ∞ (cid:12) ∞ sup (cid:12) (cid:12)dˇ i (e−iθ) (cid:12) (cid:12) ≤ sup (cid:12) (cid:12) (cid:88) dˇ ik e−ikθ(cid:12) (cid:12) ≤ sup (cid:88)(cid:12) (cid:12)dˇ ik (cid:12) (cid:12) ≤ M 1 . (A100) i∈N i∈N(cid:12) (cid:12) i∈N k=0 k=0 Define as σ (θ) the generic (i,j)-th entry of Σ∆ξ(θ). Then, for any n ∈ N, ij n n θ∈ s [− u π p ,π] (cid:13) (cid:13)Σ∆ξ(θ) (cid:13) (cid:13) 1 = θ∈ s [− u π p ,π]i= m 1, a .. x .,n (cid:88) |σ ij (θ)| = θ∈ s [− u π p ,π]i= m 1, a .. x .,n 2 1 π (cid:88)(cid:12) (cid:12)dˇ i (e−iθ)E[ε it ε jt ] dˇ j (eiθ) (cid:12) (cid:12) j=1 j=1 ≤ M 1 2 max (cid:88) n |E[ε ε ]| ≤ M 1 2M 4 , (A101) it jt 2π i=1,...,n 2π j=1 where we used (A100) and Assumption 4(e). From (A2) and (A101), we have, for any n ∈ N, sup µ∆ξ(θ) = sup (cid:13) (cid:13)Σ∆ξ(θ) (cid:13) (cid:13) ≤ sup (cid:13) (cid:13)Σ∆ξ(θ) (cid:13) (cid:13) ≤ M 1 2M 4 , (A102) 1 1 2π θ∈[−π,π] θ∈[−π,π] θ∈[−π,π] and part (ii) is proved by defining M = M2M (2π)−1. 11 1 4 Finally, parts (iii) and (iv), are immediate consequences of Assumption 3(d), which implies that Σ∆x(θ) = Σ∆χ(θ) + Σ∆ξ(θ), for any θ ∈ [−π,π], and of Weyl’s inequality (A3). So, for j = 1,...,q, and for any n ∈ N and a.e. in [−π,π], there exist positive reals M and M such 12 12 that µ∆x(θ) µ∆χ(θ) µ∆ξ(θ) µ∆ξ(θ) M j ≤ j + 1 ≤ M + sup 1 ≤ M + 11 = M , 10 10 12 n n n n n θ∈[−π,π] µ∆x(θ) µ∆χ(θ) µ∆ξ(θ) µ∆ξ(θ) j j n n ≥ + ≥ M + inf = M . n n n 10 θ∈[−π,π] n 12 because of parts (i) and (ii). This proves part (iii). When j = q+1, using parts (i) and (ii), and since rk(Σ∆χ(θ)) ≤ q, for any θ ∈ [−π,π], we have µ∆x (θ) ≤ µ∆χ (θ)+µ ∆ξ(θ) = µ ∆ξ(θ) ≤ M , q+1 q+1 1 1 11 thus proving part (iv). Finally, for part (v) consider parts (iii) and (iv) but when θ = 0. Then, rk(Σ∆χ(0)) = τ ≤ q which implies M ≤ n−1µ∆χ(0) ≤ M , but µ∆χ (0) = 0. Using again parts (i) and (ii) and 10 τ 10 τ+1 Weyl’s inequality (A3), prove part (v). This completes the proof. (cid:3) A.10 Proof of Proposition 3 The proof follows Proposition 2 in Hallin and Liška (2007) when fixing θ = 0, combined with Lemma 7 and the results about spectral density estimation in ?. (cid:3) 64

B Data Description and Data Treatment No. SeriesID Definition Unit F. Source SA T 1 INDPRO IndustrialProductionIndex 2007=100 M FED 1 2 2 IPBUSEQ IP:BusinessEquipment 2007=100 M FED 1 2 3 IPDCONGD IP:DurableConsumerGoods 2007=100 M FED 1 2 4 IPDMAT IP:DurableMaterials 2007=100 M FED 1 2 5 IPNCONGD IP:NondurableConsumerGoods 2007=100 M FED 1 2 6 IPNMAT IP:nondurableMaterials 2007=100 M FED 1 2 7 CPIAUCSL CPI:AllItems 1982-84=100 M BLS 1 3 8 CPIENGSL CPI:Energy 1982-84=100 M BLS 1 3 9 CPILEGSL CPI:AllItemsLessEnergy 1982-84=100 M BLS 1 3 10 CPILFESL CPI:AllItemsLessFood&Energy 1982-84=100 M BLS 1 3 11 CPIUFDSL CPI:Food 1982-84=100 M BLS 1 3 12 CPIULFSL CPI:AllItemsLessFood 1982-84=100 M BLS 1 3 13 PPICRM PPI:CrudeMaterialsforFurtherProcessing 1982=100 M BLS 1 3 14 PPIENG PPI:Fuels&RelatedProducts&Power 1982=100 M BLS 0 3 15 PPIFGS PPI:FinishedGoods 1982=100 M BLS 1 3 16 PPIIDC PPI:IndustrialCommodities 1982=100 M BLS 0 3 17 PPICPE PPI:FinishedGoods: CapitalEquipment 1982=100 M BLS 1 3 18 PPIACO PPI:AllCommodities 1982=100 M BLS 0 3 19 PPIITM PPI:IntermediateMaterials 1982=100 M BLS 1 3 20 AMBSL St. LouisAdjustedMonetaryBase Bil. of$ M StL 1 3 21 ADJRESSL St. LouisAdjustedReserves Bil. of$ M StL 1 3 22 CURRSL CurrencyComponentofM1 Bil. of$ M FED 1 3 23 M1SL M1MoneyStock Bil. of$ M FED 1 3 24 M2SL M2MoneyStock Bil. of$ M FED 1 3 25 BUSLOANS CommercialandIndustrialLoans Bil. of$ M FED 1 2 26 CONSUMER ConsumerLoans Bil. of$ M FED 1 2 27 LOANINV BankCredit Bil. of$ M FED 1 2 28 LOANS LoansandLeasesinBankCredit Bil. of$ M FED 1 2 29 REALLN RealEstateLoans Bil. of$ M FED 1 2 30 TOTALSL Tot. Cons. CreditOwnedandSecuritized Bil. of$ M FED 1 2 31 GDPC1 GrossDomesticProduct Bil. ofCh. 2005$ Q BEA 1 2 32 FINSLC1 FinalSalesofDomesticProduct Bil. ofCh. 2005$ Q BEA 1 2 33 SLCEC1 State&LocalCE&GI Bil. ofCh. 2005$ Q BEA 1 2 34 PRFIC1 PrivateResidentialFixedInvestment Bil. ofCh. 2005$ Q BEA 1 2 35 PNFIC1 PrivateNonresidentialFixedInvestment Bil. ofCh. 2005$ Q BEA 1 2 36 IMPGSC1 ImportsofGoods&Services Bil. ofCh. 2005$ Q BEA 1 2 37 GCEC1 GovernmentCE&GI Bil. ofCh. 2005$ Q BEA 1 2 38 EXPGSC1 ExportsofGoods&Services Bil. ofCh. 2005$ Q BEA 1 2 39 CBIC1 ChangeinPrivateInventories Bil. ofCh. 2005$ Q BEA 1 1 40 PCNDGC96 PCE:NondurableGoods Bil. ofCh. 2005$ Q BEA 1 2 41 PCESVC96 PCE:Services Bil. ofCh. 2005$ Q BEA 1 2 42 PCDGCC96 PCE:DurableGoods Bil. ofCh. 2005$ Q BEA 1 2 43 DGIC96 NationalDefenseGrossInvestment Bil. ofCh. 2005$ Q BEA 1 2 44 NDGIC96 FederalNondefenseGrossInvestment Bil. ofCh. 2005$ Q BEA 1 2 45 DPIC96 DisposablePersonalIncome Bil. ofCh. 2005$ Q BEA 1 2 46 PCECTPI PPCE:Chain-typePriceIndex 2005=100 Q BEA 1 3 47 GPDICTPI GPDI:Chain-typePriceIndex 2005=100 Q BEA 1 3 48 GDPCTPI GDP:Chain-typePriceIndex 2005=100 Q BEA 1 3 49 HOUSTMW HousingStartsinMidwest Thous. ofUnits M Census 1 2 50 HOUSTNE HousingStartsinNortheast Thous. ofUnits M Census 1 2 51 HOUSTS HousingStartsinSouth Thous. ofUnits M Census 1 2 52 HOUSTW HousingStartsinWest Thous. ofUnits M Census 1 2 53 PERMIT BuildingPermits Thous. ofUnits M Census 1 2 54 ULCMFG Manuf. S.: UnitLaborCost 2005=100 Q BLS 1 2 55 COMPRMS Manuf. S.: RealCompensationPerHour 2005=100 Q BLS 1 2 56 COMPMS Manuf. S.: CompensationPerHour 2005=100 Q BLS 1 2 57 HOAMS Manuf. S.: HoursofAllPersons 2005=100 Q BLS 1 2 58 OPHMFG Manuf. S.: OutputPerHourofAllPersons 2005=100 Q BLS 1 2 59 ULCBS BusinessS.: UnitLaborCost 2005=100 Q BLS 1 2 60 RCPHBS BusinessS.: RealCompensationPerHour 2005=100 Q BLS 1 2 61 HCOMPBS BusinessS.: CompensationPerHour 2005=100 Q BLS 1 2 62 HOABS BusinessS.: HoursofAllPersons 2005=100 Q BLS 1 2 63 OPHPBS BusinessS.: OutputPerHourofAllPersons 2005=100 Q BLS 1 2 65

No. SeriesID Definition Unit F. Source SA T 64 MPRIME BankPrimeLoanRate % M FED 0 1 65 FEDFUNDS EffectiveFederalFundsRate % M FED 0 1 66 TB3MS 3-MonthT.Bill: SecondaryMarketRate % M FED 0 1 67 GS1 1-YearTreasuryConstantMaturityRate % M FED 0 1 68 GS3 3-YearTreasuryConstantMaturityRate % M FED 0 1 69 GS10 10-YearTreasuryConstantMaturityRate % M FED 0 1 70 EMRATIO CivilianEmployment-PopulationRatio % M BLS 1 1 71 CE16OV CivilianEmployment Thous. ofPersons M BLS 1 2 72 UNRATE CivilianUnemploymentRate % M BLS 1 1 73 UEMPLT5 CiviliansUnemployed-LessThan5Weeks Thous. ofPersons M BLS 1 2 74 UEMP5TO14 CiviliansUnemployedfor5-14Weeks Thous. ofPersons M BLS 1 2 75 UEMP15T26 CiviliansUnemployedfor15-26Weeks Thous. ofPersons M BLS 1 2 76 UEMP27OV CiviliansUnemployedfor27WeeksandOver Thous. ofPersons M BLS 1 2 77 UEMPMEAN Average(Mean)DurationofUnemployment Weeks M BLS 1 2 78 UNEMPLOY Unemployed Thous. ofPersons M BLS 1 2 79 DMANEMP AllEmployees: Durablegoods Thous. ofPersons M BLS 1 2 80 NDMANEMP AllEmployees: Nondurablegoods Thous. ofPersons M BLS 1 2 81 SRVPRD AllEmployees: Service-ProvidingIndustries Thous. ofPersons M BLS 1 2 82 USCONS AllEmployees: Construction Thous. ofPersons M BLS 1 2 83 USEHS AllEmployees: Education&HealthServices Thous. ofPersons M BLS 1 2 84 USFIRE AllEmployees: FinancialActivities Thous. ofPersons M BLS 1 2 85 USGOOD AllEmployees: Goods-ProducingIndustries Thous. ofPersons M BLS 1 2 86 USGOVT AllEmployees: Government Thous. ofPersons M BLS 1 2 87 USINFO AllEmployees: InformationServices Thous. ofPersons M BLS 1 2 88 USLAH AllEmployees: Leisure&Hospitality Thous. ofPersons M BLS 1 2 89 USMINE AllEmployees: Miningandlogging Thous. ofPersons M BLS 1 2 90 USPBS AllEmployees: Prof. &BusinessServices Thous. ofPersons M BLS 1 2 91 USPRIV AllEmployees: TotalPrivateIndustries Thous. ofPersons M BLS 1 2 92 USSERV AllEmployees: OtherServices Thous. ofPersons M BLS 1 2 93 USTPU AllEmployees: Trade,Trans. &Ut. Thous. ofPersons M BLS 1 2 94 USWTRADE AllEmployees: WholesaleTrade Thous. ofPersons M BLS 1 2 95 OILPRICE SpotOilPrice: WestTexasIntermediate $perBarrel M DJ 0 3 96 NAPMNOI ISMManuf.: NewOrdersIndex Index M ISM 1 1 97 NAPMPI ISMManuf.: ProductionIndex Index M ISM 1 1 98 NAPMEI ISMManuf.: EmploymentIndex Index M ISM 1 1 99 NAPMSDI ISMManuf.: SupplierDeliveriesIndex Index M ISM 1 1 100 NAPMII ISMManuf.: InventoriesIndex Index M ISM 1 1 101 SP500 S&P500StockPriceIndex Index D S&P 0 2 Abbreviations Source Freq. Trans. SA BLS=U.S.DepartmentofLabor: BureauofLaborStatistics Q=Quarterly 1=None 0=no BEA=U.S.DepartmentofCommerce: BureauofEconomicAnalysis M=Monthly 2=log 1=yes ISM=InstituteforSupplyManagement D=Daily 3=∆log Census=U.S.DepartmentofCommerce: CensusBureau FED=BoardofGovernorsoftheFederalReserveSystem StL=FederalReserveBankofSt. Louis Note: Allmonthlyanddailyseriesaretransformedintoquarterlyobservationbysimpleaverages 66

Cite this document
APA
Matteo Barigozzi, Marco Lippi, & and Matteo Luciani (2017). Non-Stationary Dynamic Factor Models for Large Datasets (FEDS 2016-024). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2016-024
BibTeX
@techreport{wtfs_feds_2016_024,
  author = {Matteo Barigozzi and Marco Lippi and and Matteo Luciani},
  title = {Non-Stationary Dynamic Factor Models for Large Datasets},
  type = {Finance and Economics Discussion Series},
  number = {2016-024},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2017},
  url = {https://whenthefedspeaks.com/doc/feds_2016-024},
  abstract = {We study a Large-Dimensional Non-Stationary Dynamic Factor Model where (1) the factors F t are I (1) and singular, that is F t has dimension r and is driven by q dynamic shocks with q < r , (2) the idiosyncratic components are either I (0) or I (1). Under these assumption the factors F t are cointegrated and modeled by a singular Error Correction Model. We provide conditions for consistent estimation, as both the cross-sectional size n ,and the time dimension T , go to infinity, of the factors, the loadings, the shocks, the ECM coefficients and therefore the Impulse Response Functions. Finally, the numerical properties of our estimator are explored by means of a MonteCarlo exercise and of a real-data application, in which we study the effects of monetary policy and supply shocks on the US economy.},
}