feds · February 29, 2016

Dynamic Factor Models, Cointegration, and Error Correction Mechanisms

Abstract

The paper studies Non-Stationary Dynamic Factor Models such that: (1) the factors F are I(1) and singular, i.e. F has dimension r and is driven by a q-dimensional white noise, the common shocks, with q < r, and (2) the idiosyncratic components are I(1). We show that F is driven by r-c permanent shocks, where c is the cointegration rank of F, and q-(r-c) < c transitory shocks, thus the same result as in the non-singular case for the permanent shocks but not for the transitory shocks. Our main result is obtained by combining the classic Granger Representation Theorem with recent results by Anderson and Deistler on singular stochastic vectors: if (1-L)F is singular and has rational spectral density then, for generic values of the parameters, F has an autoregressive representation with a finite-degree matrix polynomial fulfilling the restrictions of a Vector Error Correction Mechanism with c error terms. This result is the basis for consistent estimation of Non-Stat ionary Dynamic Factor Models. The relationship between cointegration of the factors and cointegration of the observable variables is also discussed.

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Dynamic Factor Models, Cointegration, and Error Correction Mechanisms Matteo Barigozzi, Marco Lippi, and Matteo Luciani 2016-018 Please cite this paper as: Barigozzi, Matteo, Marco Lippi, and Matteo Luciani (2016). “Dynamic Factor Models, Cointegration, and Error Correction Mechanisms,” Finance and Economics Discussion Series 2016-018. Washington: Board of Governors of the Federal Reserve System, http://dx.doi.org/10.17016/FEDS.2016.018. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Dynamic Factor Models, Cointegration, and Error Correction Mechanisms Matteo Barigozzi1 Marco Lippi2 Matteo Luciani3 February 16, 2016 Abstract The paper studies Non-Stationary Dynamic Factor Models such that: (1) the factors F t areI(1)andsingular,i.e. F hasdimensionrandisdrivenbyaq-dimensionalwhitenoise, t the common shocks, with q <r, and (2) the idiosyncratic components are I(1). We show that F is driven by r−c permanent shocks, where c is the cointegration rank of F , and t t q−(r−c)<c transitory shocks, thus the same result as in the non-singular case for the permanent shocks but not for the transitory shocks. Our main result is obtained by combining the classic Granger Representation Theorem with recent results by Anderson and Deistler on singular stochastic vectors: if (1−L)F is singular and has rational spectral t densitythen,forgenericvaluesoftheparameters,F hasanautoregressiverepresentation t with a finite-degree matrix polynomial fulfilling the restrictions of a Vector Error Correction Mechanism with c error terms. This result is the basis for consistent estimation of Non-Stationary Dynamic Factor Models. The relationship between cointegration of the factors and cointegration of the observable variables is also discussed. JEL subject classification: C0, C01, E0. Key words and phrases: Dynamic Factor Models for I(1) variables, Cointegration for singular vectors, Granger Representation Theorem for singular vectors. 1m.barigozzi@lse.ac.uk – London School of Economics and Political Science, UK. 2ml@lippi.ws – Einaudi Institute for Economics and Finance, Roma, Italy. 3matteo.luciani@frb.gov – Federal Reserve Board of Governors, Washington DC, USA. Massimo Franchi, and Rocco Mosconi read previous versions of the paper and gave suggestions for improvements. WealsothanktheparticipantstotheWorkshoponEstimationandInferenceTheoryforCointegrated Processes in the State Space Representation, Technische Universität Dortmund, January 2016. Of course we areresponsibleforanyremainingerrors. ThepaperwaswrittenwhileMatteoLucianiwaschargéderecherches F.R.S.-F.N.R.S., whose financial support he gratefully acknowledges. The views expressed here are those of the authors and do not necessarily reflect those of the Board of Governors or the Federal Reserve System.

1 Introduction In the last fifteen years Large-Dimensional Dynamic Factor Models (DFM) have become increasingly popular in economic and econometric literature and they are nowadays commonly used by policy institutions, such as Central Banks and Ministries. These models are based on the idea that all the variables in an economic system are driven by a few common (macroeconomic) shocks, their residual dynamics being explained by idiosyncratic components which may result from measurement errors and sectoral or regional shocks. Formally,eachvariableinthen-dimensionaldatasetx , i = 1,2,...,ncanbedecomposed it into the sum of a common component χ , and an idiosyncratic component (cid:15) : x = χ +(cid:15) it it it it it (Fornietal.,2000;ForniandLippi,2001;StockandWatson,2002a,b). Inthestandardversion of the DFM, which is adopted here, the common components are linear combinations of an r-dimensional vector of common factors F = (F F ··· F )(cid:48), t 1t 2t rt χ = λ F +λ F +···+λ F = λ F . (1) it i1 1t i2 2t ir rt i t The vector F is dynamically driven by the q-dimensional non-singular white-noise vector1 t u = (u u ··· u )(cid:48), the common shocks: t 1t 2t qt F = U(L)u , (2) t t where U(L) is an r × q matrix, (Stock and Watson, 2005; Bai and Ng, 2007; Forni et al., 2009). The dimension n of the dataset is assumed to be large as compared to r and q, which are independent of n, with q ≤ r. More precisely, all assumptions and results are formulated assumingthatbothT, thenumberofobservationsforeachx , andn, thenumberofvariables, it tend to infinity. The assumption that the vector F is singular, i.e. r > q, has received sound empirical t support in a number of papers, see Giannone et al. (2005), Amengual and Watson (2007), ForniandGambetti(2010),andLuciani(2015)forUSmacroeconomicdatabasesandBarigozzi et al. (2014) for the Euro area. Such results can be easily understood observing that the static equation (1) is just a convenient representation derived from a “primitive” set of dynamic equationslinkingthecommoncomponentsχ tothecommonshocksu . Asasimpleexample, it t supposethatthevariablesx belongtoamacroeconomicdatasetandaredrivenbyacommon it one-dimensional cyclical process f , such that (1−αL)f = u , where u is scalar white noise, t t t t and that the variables x load f dynamically: it t x = a f +a f +(cid:15) . (3) it i0 t i1 t−1 it In this case representation (1) is obtained by setting F = f , F = f , λ = a , λ = a , 1t t 2t t−1 i1 i0 i2 i1 while equation (2) takes the form (cid:18) F (cid:19) (cid:18) (1−αL)−1 (cid:19) 1t = u , F (1−αL)−1L t 2t so that r = 2, q = 1 and the dynamic equation (3) is replaced by the static representation x = λ F +λ F +(cid:15) . Forageneralanalysisoftherelationshipbetweenrepresentation(1) it i1 1t i2 2t it and “deeper” dynamic representations like (3), see e.g. Stock and Watson (2005), Forni et al. 1Usually orthonormality is assumed. This is convenient but not necessary in the present paper. 2

(2009), see also Section 2.3 below. Singularity of F , i.e. r > q, will be assumed throughout t the present paper. If the factors F and the idiosyncratic terms are stationary, and hence the data x are t it stationary as well, the factors F and the loadings λλλ can be consistently estimated using t i the first r principal components (Stock and Watson, 2002a,b). The common shocks u and t the function U(L) can then be estimated by a singular VAR for F . Lastly, identification t restrictionscanbeappliedtotheshocksu andthefunctionU(L)toobtainstructuralcommon t shocks and impulse response functions, see Stock and Watson (2005), Forni et al. (2009). In the present paper and in Barigozzi et al. (2016) we study the Large-Dimensional Non- StationaryDynamicFactorModelundertheassumptionthatthefactorsandtheidiosyncratic components are I(1), so that the observables variables x are I(1) as well. Equation (1) it remains unchanged while (2) is replaced by (1−L)F = U(L)u . (4) t t In this case, the principal components of the stationary series (1−L)x provide an estimate it of the differenced factors (1−L)F and the loadingsλλλ . The factors F can then be recovered t i t by integration, see e.g. Bai and Ng (2004). Exceptions to this common practice are Bai (2004), Peña and Poncela (2004), in which the factors F are directly estimated using the I(1) t variables x . it The main difference with respect to the stationary case arises with the estimation of u t and U(L), or structural common shocks and impulse-response functions. This requires the estimation of a VAR for the I(1) factors F . On the other hand, because F is singular, t t then F is quite obviously cointegrated (the spectral density of (1 − L)F is singular at all t t frequenciesandthereforeatfrequencyzero). Herewestudytheautoregressiverepresentations of the singular cointegrated vector F , while estimation is studied in Barigozzi et al. (2016).2 t Thepaperisorganizedasfollows. InSection2werecallrecentresultsforsingularstochastic vectors with rational spectral density, see Anderson and Deistler (2008a,b), and we discuss cointegration and the cointegration rank for I(1) singular stochastic vectors: c, the cointegration rank, is equal to r−q, the minimum due to singularity, plus d, with 0 ≤ d < q. In Section 3 we obtain the permanent-transitory shock representation in the singular case: F is driven by r − c = q − d permanent and d = c − (r − q) transitory shocks, the same t result as in the non-singular case for the permanent shocks but not for the transitory. Then we prove our main results. Assuming rational spectral density for the vector (1−L)F and t therefore that the entries of U(L) in (4) are rational fuctions of L, then for generic values of the parameters of the matrix U(L), F has an autoregressive representation fulfilling the t restrictions of a Vector Error Correction Mechanism (VECM) with c error terms: A(L)F = A∗(L)(1−L)F +αααβββ(cid:48)F = h+Ru , (5) t t t−1 t whereαααandβββ arebothr×candfullrank, Risr×q, A(L)andA∗(L)arefinite-degreematrix polynomials. These results are obtained by combining the Granger Representation Theorem (Engle and Granger, 1987) with Anderson and Deistler’s results. 2Toourknowledge,thepresentpaperisthefirsttostudycointegrationandErrorCorrectionrepresentations for the singular factors of I(1) Dynamic Factor Models. An Error Correction model in the DFM framework is studied in Banerjee et al. (2014a,b). However, their focus is on the relationship between the observable variables and the factors. Their Error Correction term is a linear combination of the variables x and the it factors F , which is stationary if the idiosyncratic components are stationary (so that the x’s and the factors t arecointegrated). Becauseofthisandotherimportantdifferencestheirresultsarenotdirectlycomparableto those in the present paper. 3

Section 3 also contains an exercise carried on with simulated singular I(1) vectors. We compare the results obtained by estimating an unrestricted VAR in the levels and a VECM. Though limited to a simple example, the results confirm what has been found for non-singular vectors, that is under cointegration the long-run features of impulse-response functions are better estimated using a VECM rather than an unrestricted VAR in the levels (Phillips, 1998). In Section 4 we analyse cointegration of the observable variables x . Our results on it cointegration of the factors F have the obvious implication that p-dimensional subvectors t of the n-dimensional common-component vector χχχ , with p > q −d, are cointegrated. Stat tionarity of the idiosyncratic components would imply that all p-dimensional subvectors of the n-dimensional dataset x are cointegrated. For example, if q = 3 and d = 1, then all t 3-dimensional subvectors in the dataset are cointegrated, a kind of regularity that we do not observe in actual large macroeconomic datasets. This motivates our assumption that the idiosyncratic components are I(1) (some of the variables (cid:15) are I(1)). Section 5 concludes. it Some long proofs, a discussion of some non-uniqueness problems arising with singularity and details on the simulations are collected in the Appendix. 2 Stationary and non-stationary Dynamic Factor Models 2.1 The Factors and Idiosyncratic Components are Stationary Consider the Dynamic Factor Model x = χ +(cid:15) , χ = ΛF , (6) t t t t t where: (1) the observables x , the common components χ , and the idiosyncratic components t t (cid:15) are n-dimensional vectors, (2) F is an r-dimensional vector of common factors, with r t t independent of n, (3) Λ is an n×r matrix, (4) F is driven by a q-dimensional zero-mean t white noise vector process u , the common shocks, with q < r, (5) (cid:15) and u are orthogonal t it js for all i = 1,2,...,n, j = 1,2,...,q, t,s ∈ Z. Other assumptions concerning the asymptotic properties of the model, for n → ∞, will not be used here and are therefore not reported (see the literature mentioned in the Introduction). The results in the present and the next section, thoughstatedforthevectorofthefactorsF andthecommonshocksu , holdforanysingular t t stochastic vector under the assumptions specified below. As observed in the Introduction, with some exceptions, the theory of model (6) has been developed under the assumption that x , χ , (cid:15) and F are stationary. In addition to (6), it is t t t t often assumed that F has a reduced-rank VAR representation: t A(L)F = Ru , (7) t t where A(L) is a finite-degree r ×r matrix polynomial and R is r ×q. Moreover, it is well known that Λ and F can be estimated by principal components, while an estimate of A(L), t R and u can be obtained by standard techniques. Inversion of A(L) provides an estimate of t the impulse-response functions of the observables to the common shocks: x = ΛA(L)−1Ru +(cid:15) . t t t Structuralshocksandstructuralimpulse-responsefunctionscanthenbeobtained,respectively, as w = Qu and ΛA(L)−1RQ−1, where the q×q matrix Q is determined in the same way t t as in Structural VARs (Stock and Watson, 2005; Forni et al., 2009). 4

The VAR representation (7) has a standard motivation as an approximation to an infinite autoregression with exponentially declining coefficients. However, as stated above, F has t reduced rank. Under reduced rank and rational spectral density for F , Anderson and Deistler t (2008a,b)provethatgenericallyF hasafinite-degreeautoregressiverepresentation,sothatno t approximation argument is needed to motivate (7). A formal statement of this result requires the following definitions. Definition 1. (Rational reduced-rank family) Assume that r > q > 0 and let G be a set of ordered couples (S(L),C(L)), where: (i) C(L) is an r×q polynomial matrix of degree s ≥ 0. 1 (ii) S(L) is an r×r polynomial matrix of degree s ≥ 0. S(0) = I . 2 r (iii) Denoting by p the vector containing the λ = rq(s +1)+r2s coefficients of the entries 1 2 of C(L) and S(L), we assume that p ∈ Π, where Π is an open subset of Rλ, and that for p ∈ Π, if det(S(z)) = 0, then |z| > 1. We say that the family of weakly stationary stochastic processes F = S(L)−1C(L)u , (8) t t where u is a q-dimensional white noise with non-singular variance-covariance matrix and t (S(L),C(L)) belongs to G, is a rational reduced-rank family. The notation Fp, Cp(L), etc., though more rigorous, would be heavy and not really nect essary. We use it only once in the proof of Proposition 2. Note that (8) is the unique stationary solution of the ARMA equation S(L)F = C(L)u . (9) t t Definition 2. (Genericity) Suppose that a statement Q(p) depends on p ∈ A, where A is an open subset of Rλ. Then Q(p) holds generically in A if the subset N of A where it does not hold is nowhere dense in A, i.e. the closure of N has no internal points. Proposition 1. (Anderson and Deistler) (I) Suppose that V(L) is an r×q matrix whose entries are rational functions of L, with r > q. If V(L) is zeroless, i.e. has rank q for all complex numbers z, then V(L) has a finite-degree stable left inverse, i.e. there exists a finite-degree polynomial r × r matrix W(L), such that (a) det(W(z)) = 0 implies |z| > 1, (b) W(L)V(L) = V(0). (II) Let F = S(L)−1C(L)u be a rational reduced-rank family with t t parameter set Π. For generic values of the parameters p ∈ Π, S(z)−1C(z) is zeroless. In particular, generically S(1)−1C(1) and S(0)−1C(0) = C(0) have full rank q. Forstatement(I)seeDeistleretal.(2010),Theorem3. Statement(II)isamodifiedversion of their Theorem 2. They obtain genericity with respect to the parameters of the state-space representation of (9), whereas in statement (II) we refer to the original parameters of the matrix polynomials in (9) (see Forni et al. (2015) for a proof). Another version of Anderson and Deistler’s Theorem 2 is Proposition 2 in the present paper. Both statement (I) and our Proposition 2 are crucial for the proof of our main results in Propositions 3 and 4. 2.2 Non-Stationary factors Suppose now that the vector F and some of the idiosyncratic components (cid:15) are nont it stationary while (1 − L)F and all the variables (1 − L)(cid:15) are stationary. The common t it 5

practice in this case consists in reducing the data x to stationarity by taking first differences it and estimating the differenced factors (1−L)F by means of the principal components of the t variables (1−L)x . Then usually impulse-response functions are obtained by estimating a it VAR for (1−L)F . Of course this implies possible misspecification if F is cointegrated, which t t is always the case when F is singular. t To analyse cointegration and the autoregressive representations of the singular non-stationary vector F let us firstly recall the definitions of I(0), I(1) and cointegrated vectors. t In the present paper we only consider stochastic vectors that are either weakly stationary with a rational spectral density matrix or such that their first difference is weakly stationary with rational spectral density. Assume that the n-dimensional vector y is weakly stationary t with rational spectral density, denoted byΣΣΣ (θ). The matrixΣΣΣ (θ) has constant rank ρ, with y y ρ ≤ n, i.e. has the same rank ρ for θ almost everywhere in [−π, π]. We say that ρ is the rank of y . Moreover, y has moving average representations t t y = V(L)v , (10) t t where v is a non singular ρ-dimensional white noise, V(L) is an n×ρ matrix whose entries t are rational functions of L with no poles of modulus less or equal to unity. If rank(V(z)) = ρ for |z| < 1, then v belongs to the space spanned by y , with τ ≤ t, and representation (10), t τ as well as v , is called fundamental (see Rozanov (1967), pp. 43–7)3. t Let z be an r-dimensional weakly stationary with rational spectral density and assume t that z ∈ L (Ω,F,P). Consider the difference equation t 2 (1−L)ζζζ = z , (11) t t in the unknown process ζζζ . A solution of (11) is t  z +z +···+z , for t > 0  1 2 t  y˜ = 0, for t = 0 (12) t  −(z +z ···+z ), for t > 0. 0 −1 t+1 All the solutions of (11) are y = y˜ +W, where W is any r-dimensional stochastic vector t t belonging to L (Ω,F,P), that is the particolar solution y˜ plus any solution of (1−L)ζζζ = 0. 2 t t Definition 3. (I(0), I(1) and cointegrated vectors) I(0). The n-dimensional vector stochastic process y is I(0) if it is weakly stationary with t rational spectral density ΣΣΣ (θ) and ΣΣΣ (0) (cid:54)= 0. y y Integrated process of order 1, I(1). The n-dimensional vector stochastic process y is I(1) t if there exists an n-dimensional process z , weakly stationary with rational spectral density, t such that y is a solution of the equation (1−L)ζζζ = z . The rank of y is defined as the rank t t t t of z . t Cointegration. Assume that the n-dimensional stochastic vector y is I(1) and denote by t ΣΣΣ (θ) the spectral density of (1−L)y . The vector y is cointegrated with cointegration rank ∆y t t c, with 0 < c < n, if rank(ΣΣΣ (0)) = n−c. If ρ is the rank of y , then c ≥ n−ρ. ∆y t 3When n=p, the condition that rank(V(z))=p for |z|<1 becomes det(V(z))(cid:54)=0 for |z|<1. 6

Some comments are in order. C1. If y has representation (10), because ΣΣΣ (0) = (2π)−1V(1)ΓΓΓ V(1)(cid:48), where ΓΓΓ is the t y v v covariance matrix of v , y is I(0) if and only if V(1) (cid:54)= 0. t t C2. Under the parameterization in Definition 1, for generic values of the parameters in Π, we have rank (cid:0) S(1)−1C(1) (cid:1) = q (see proposition 1(II)), so that rank(ΣΣΣ (0)) = rank (cid:0) (2π)−1S(1)−1C(1)ΓΓΓ C(1)(cid:48)[S(1)(cid:48)]−1(cid:1) = q, F u where ΓΓΓ is the covariance matrix of u . Thus, generically, F has rank q and is I(0). u t t C3. Assume that y is such that (1−L)y is weakly stationary with rational spectral density t t and that (1−L)y = V(L)v (13) t t is one of its moving average representations. The process y is I(1) if and only if t V(1) (cid:54)= 0. Thus our definitions of I(0) and I(1) processes are equivalent to Definitions 3.2, and 3.3 in Johansen (1995), p. 35, with two minor differences: our assumption of rational spectral density and the time span of stochastic processes, t ∈ Z in the present paper, t = 0,1,... in Johansen’s book. C4. Note that y can be I(1) even though some of its coordinate processes are I(0). In t particular, assuming that the idiosyncratic vector (cid:15)(cid:15)(cid:15) is I(1) does not prevent that some t of the coordinates (cid:15) is I(0). it C5. Ify isI(1)andcointegratedwithcointegrationrankc,thereexistclinearlyindependent t n×1 vectors c , j = 1,...,c, such that the spectral density of c(cid:48)(1−L)y vanishes at j j t frequency zero. The vectors c are called cointegration vectors. Of course a set of j cointegration vectors c , j = 1,...,c, can be replaced by the set d , j = 1,...,c, where j j the vectors d are c independent linear combinations of the vectors c . j j C6. In the literature on integrated and cointegrated vectors, the expression “I(1) process” is often ambiguous, sometimes it refers to a specific process y which solves (1−L)ζζζ = z , t t t sometimes to the whole class y = y˜ +W. This minor abuse of language is very cont t venient and usually does not cause misunderstandings, see comment C7 below, Section 3.3, Proposition 3 in particular. C7. If y is I(1), cointegrated and has representation (13), the cointegration rank of y is t t c if and only if the rank of V(1) is n−c. Moreover c is a cointegration vector for y t if and only if c(cid:48)V(1) = 0. In Appendix A.1 we show that if c is a cointegration vector for y , then y can be determined (that is, a member of the class containing y can be t t t determined) such that c(cid:48)y is weakly stationary with rational spectral density. Thus our t definition of cointegration is equivalent to that in Johansen (1995), p. 37. C8. Let y be n-dimensional, I(1) with rank ρ < n. The spectral densityΣΣΣ (θ) has rank ρ t ∆y almost everywhere in [−π, π], so that there exist vectors d (e−iθ), j = 1,2,...,n−ρ, j such that d (e−iθ)ΣΣΣ (θ) = 0 for all θ. In the time domain, d (L)(1−L)y = 0. Such j ∆y j t exact linear dynamic relationships between the coordinates of y (not possible when t ρ = n) are closely linked to the non-uniqueness of autoregressive representations for singular vectors (see below in this section and Appendix B). 7

C9. Definition3,Cointegration,doesnotruleoutthataneigenvectorofΣΣΣ (θ)constant,i.e. ∆y does not depend on θ. If d is such an eigenvector, d(1−L)y = 0, which is a degenerate t case of cointegration (again, not possible when ρ = n). More on this in comment C3 on Definition 4. Now, suppose that F is I(1), that t (1−L)F = S(L)−1C(L)u = U(L)u t t t and that the rank of the r×q matrix U(z) is q for almost all z ∈ C, i.e. that the rank of F t is q. Then: (i) Obviously, as already observed in the definition of cointegration, F has at least r −q t cointegration vectors: c ≥ r−q. (ii) If we assume that the couple (S(L),C(L) is parameterized as in Definition 1, then, by Proposition 1(II), we can argue that generically S(z)−1C(z) has full rank q for all z and therefore the cointegration rank of F is generically r − q. However, the rank of t S(z)−1C(z) at z = 1 has a special interpretation as the number of long-run equilibrium relationships between the processes F . Such number usually has a theoretical or beft havioral motivation, so that it cannot be modified by any genericity argument. As a consequence we adopt a different parameterization for families of I(1) vectors in which the cointegration rank c is fixed, with r > c ≥ r−q, see Definition 4. (iii) If the family has c = r − q, then generically rank(S(1)−1C(1)) = q and Proposition 1(I) can be applied. In spite of cointegration, generically (1−L)F has a finite-degree t autoregressive representation A(L)(1−L)F = C(0)u . (14) t t (iv) If the family has c > r −q, then no autoregressive representation exists for (1−L)F , t finite or infinite. However, we prove that, if c is equal or greater than r−q, generically F has a representation as a VECM. t A(L)F = A∗(L)(1−L)F +A(1)F = h+C(0)u , (15) t t t−1 t where the rank of A(1) is c, and A(L) and A∗(L) are finite-degree matrix polynomials. (v) In the singular case the autoregressive representation of F is in general not unique (as t it is the case with full-rank vectors). For example, when c = r−q, (14) and (15) are two different autoregressive representations for F , the first with no error terms, the second t with r −q error terms. However, as we show in Appendix B, different autoregressive representations of F produce the same impulse-response functions. Proposition 3 in the t nextsectionprovestheexistenceofrepresentation(15), whichhasthemaximumnumber of error terms. Theexistenceofrepresentation(15)for reduced-rankI(1)vectors, whereA∗(L)isoffinite degree, is our main result. Its proof combines the Granger Representation Theorem with the results summarized in Proposition 1. 8

2.3 “Trivial” and “primitive” cointegration vectors of F t Denote by d the number of cointegrating vectors exceeding the minimum r−q, so that c = r−q+d. Ofcourseq > d ≥ 0. ThereisaninterestingclassofvectorsF forwhichitispossible t to distinguish between r−q cointegrating vectors that merely arise in the construction of F , t and d additional cointegrating vectors with a possible structural interpretation. Let f be a non-singular q-dimensional vector of primitive factors f with representation t t (1−L)f = U (L)u , (16) t f t and assume that the variables x load f and its lags up to the p-th: it t x = a f +a f +···+a f +(cid:15) . (17) it i0 t i1 t−1 ip t−p it This is a generalization of the example used in the Introduction to motivate the singularity of the vector F , see equation (3). Model (16)–(17) is transformed into the standard form (1) t by introducing the r dimensional vector F = (cid:0) f(cid:48) f(cid:48) ··· f(cid:48) (cid:1)(cid:48) , t t t−1 t−p where r = q(p+1). We have     a a ··· a U (L) 10 11 1p f a 20 a 21 ··· a 2p LU f (L) x t =ΛΛΛF t +(cid:15)(cid:15)(cid:15) t , ΛΛΛ =   . .   , (1−L)F t =   . .   u t = U(L)u t . (18)  .   .  a a ··· a LpU (L) n0 n1 np f It is immediately seen that F has r−q = qp cointegrationg vectors t (cid:16) (cid:17) th,k = th,k th,k ··· th,k , 1 2 r h = 1,...,q, k = 1,...,p, where  1 if j = h   th,k = −1 if j = kq+h j  0 otherwise. These can be called trivial cointegration vectors, as they merely result from the construction of F by stacking the vectors f , k = 0,...,p,. t t−k On the other hand, if f is cointegrated, with cointegration rank c , q > c > 0, and t f f cointegrating vectors sm, m = 1,...,c , then the cointegration rank of F is c = r −q +c f t f with the additional c cointegrating vectors obtained by augmenting each sm with qp zeros. f Thus cointegration of the primitive factors f naturally translates into cointegration of F . t t Two observations are in order. Firstly, the above distinction between primitive and trivial cointegration has heuristic interest but is limited to representation (18), the latter being only one among infinitely many equivalent standard representations of (16)–(17). If H is an r×r invertible matrix, x = ΛΛΛ∗F∗+(cid:15)(cid:15)(cid:15) , (1−L)F∗ = U∗(L)u , t t t t t 9

where F∗ = H−1F , ΛΛΛ∗ = ΛΛΛH, U∗(L) = H−1U(L), is another factor representation for t t the variables x . The cointegrating vectors of F∗ are linear combinations of the vectors th,k it t and sm, so that primitive and trivial cointegration get mixed. In particular, if the model is estimated by principal components of the variables x , in general the estimated factors Fˆ it t approximate the space spanned by F , not F itself, so that no distinction between trivial and t t primitive cointegrating vectors of Fˆ is possible. t Secondly, as the elementary example below shows, not all vectors F can be put in the t form (16)–(17). Let r = 2, q = 1 and (cid:18) (cid:19) 1+L U∗(L) = . L2 Suppose that there exists an invertible matrix (cid:18) (cid:19) α β H = , γ δ such that (cid:18) α+αL+βL2(cid:19) U(L) = HU∗(L) = γ +γL+δL2 has the form in (18), third equation. The second row of U(L) would equal the first multiplied by L: γ +γL+δL2 = (α+αL+βL2)L = αL+αL2+βL3. This implies that γ = β = α = δ = 0. Thus no representation (16)–(17) exists for U∗(L). 3 Representation theory for reduced rank I(1) vectors 3.1 Families of cointegrated vectors Consider the equation (1−L)ζζζ = S(L)−1C(L)u = U(L)u , (19) t t t wheredetS(L)hasnorootsofmoduluslessorequaltounityandC(1) (cid:54)= 0, sothatU(1) (cid:54)= 0. Suppose that u ∈ L (Ω,F,P), for j = 1,...,q, where (Ω,F,P) is a probability space. It is jt 2 easily seen that all the solutions of (19) are the processes F = F˜ +W, t ∈ Z, (20) t t where W is an r-dimensional stochastic vector with W ∈ L (Ω,F,P), f = 1,...,r, and k 2  u +u +···+u , for t > 0  1 2 t  F˜ = U(L)ννν = S(L)−1C(L)ννν , where ννν = 0, for t = 0 t t t t  −(u +u +···+u ), for t < 0. 0 −1 t+1 (21) Because F is a solution of (19), (1 − L)F = U(L)u is stationary with rational spectral t t t density. Moreover, as we assume C(1) (cid:54)= 0, so that U(1) = S(1)−1C(1) (cid:54)= 0, then F is I(1). t Now, because S(1)−1 is a non-singular r × r matrix, the cointegration rank of F only t depends on the rank of C(1). Precisely, if c is the cointegration rank of F , then c = r − t 10

rank(C(1)), so that r > c ≥ r − q. Moreover, there exist an r × (r − c) matrix ξ and a q×(r−c) matrix η, both of full rank r−c ≤ q, such that C(1) = ξη(cid:48), (22) see Lancaster and Tismenetsky (1985, p. 97, Proposition 3). The matrix C(L) has the (finite) Taylor expansion 1 C(L) = C(1)−(1−L)C(cid:48)(1)+ (1−L)2C(cid:48)(cid:48)(1)−··· 2 Gathering all terms after the second and using (22), C(L) = ξη(cid:48)−(1−L)C(cid:48)(1)+(1−L)2C (L), (23) 1 where C (L) is a polynomial matrix. 1 Representation (23) can be used for a very convenient parameterization of C(L). Definition 4. (Rational reduced-rank I(1) family with cointegration rank c) Assume that r > q > 0, r > c ≥ r−q and let G be a set of couples (S(L),C(L)), where: (i) The matrix C(L) has the parameterization C(L) = ξη(cid:48)+(1−L)D+(1−L)2E(L), (24) where ξ and η are r×(r−c) and q×(r−c) respectively, D is an r×q matrix and E(L) is an r×q matrix polynomial of degree s ≥ 0. 1 (ii) S(L) is an r×r polynomial matrix of degree s ≥ 0. S(0) = I . 2 r (iii) Denoting by p the vector containing the λ = (r−c)(r+q)+rq(1+s )+r2s coefficients 1 2 of the matrices in (24) and in S(L), we assume that p ∈ Π, where Π is an open subset of Rλ, and that for p ∈ Π, if det(S(z)) = 0 then |z| > 1. We say that the family of processes F , such that (1−L)F = S(L)−1C(L)u , where u is a qt t t t dimensional non-singular white noise and (S(L),C(L)) belongs to G, is a rational reduced-rank I(1) family with cointegration rank c. Three comments are in order. C1. Genericallythematricesξξξ andηηη havefullrankr−c,sothatF isI(1)withcointegration t rank c. C2. In Remark 2 in Appendix A.2, we show that generically F has rank q. t C3. Suppose that r = 4, q = 1, s = s = 0. For all p ∈ Π, there exists a vector d 1 2 orthogonal to the 4-dimensional columns ξξξηηη(cid:48), D, E . Thus d(cid:48)C(L) = 0 and therefore 0 d(1 − L)F = 0, the degenerate case of cointegration mentioned in comment C9 on t Definition 3. However, if s > 0 or s is big enough as compared to r, degenerate 2 1 contegration can be ruled out generically. Denoting by ξ an r×c matrix whose columns are linearly independent and orthogonal to ⊥ allcolumnsofξ,thecolumnsofξ andξξξˇˇˇ = S(cid:48)(1)ξξξ arefullsetsofindependentcointegrating ⊥ ⊥ ⊥ vectors for S(L)F and F respectively. t t 11

3.2 Permanent and transitory shocks Let F be a rational reduced-rank I(1) family with cointegration rank c = r−q+d, q > d ≥ 0. t Let η be a q ×d matrix whose columns are independent and orthogonal to the columns of ⊥ η, and let η = η(η(cid:48)η)−1, η = η (η(cid:48) η )−1. ⊥ ⊥ ⊥ ⊥ Defining v = η(cid:48) u , and v = η(cid:48)u , we have 1t ⊥ t 2t t (cid:18) (cid:19) u = η v +ηv = (cid:0) η η (cid:1) v 1t t ⊥ 1t 2t ⊥ v 2t We have (cid:18) (cid:19) v C(L)u = [C(L)(η η)] 1t = (1−L)G (L)v +(ξ+(1−L)G (L))v . (25) t ⊥ v 1 1t 2 2t 2t where G (L) = (D+(1−L)E(L))η , and G (L) = (D+(1−L)E(L))η. Using (25), we 1 ⊥ 2 can write all the solutions of the difference equation (1−L)F = S(L)−1C(L)u as t t F = S(L)−1[G (L)v +G (L)v +T ]+W, (26) t 1 1t 2 2t t where W ∈ L (Ω,F,P), and 2  ξ(v +v +···+v ), for t > 0  21 22 2t  T = 0, for t = 0 t  −ξ(v +v +···+v ), for t < 0. 20 2,−1 2,t+1 As ξ is full rank, we see that F is driven by the q−d = r−c permanent shocks v , and by t 2t the d temporary shocks v . In representation (26), the component T is the common-trend of 1t t Stock and Watson (1988). Note that the number of permanent shocks is obtained as r minus the cointegration rank, as usual. However, the number of transitory shocks is obtained as the complement of the number of permanent shocks to q, not to r, as though r − q transitory shocks had a zero coefficient. 3.3 Error Correction representations We now prove the Granger Representation Theorem for singular stochastic vectors F belongt ing to a rational reduced-rank I(1) family with cointegration rank c and parameters in the open set Π ∈ Rλ. Our line of reasoning combines arguments used in the proof of Granger’s Theorem in the non-singular case (see e.g. Johansen (1995), Theorem 4.5, p. 55-57) and the results in Proposition 1 on singular stochastic vectors. From Definition 4, F is a solution of the equation t S(L)(1−L)ζζζ = C(L)u = (cid:0) ξξξηηη(cid:48)+(1−L)D+(1−L)2E(L) (cid:1) u , t t t andhastherepresentationF = F˜ +W,whereF˜ isdefinedin(21)andWisanr-dimensional t t t stochastic vector. 12

(cid:18) ξ(cid:48) (cid:19) Generically, the matrix ζ = ⊥ is r×r and invertible (see comment C1 on Definition ξ(cid:48) 4). We have (cid:26)(cid:18) 0 (cid:19) (cid:18) ξ(cid:48) D (cid:19) (cid:18) ξ(cid:48) E(L) (cid:19)(cid:27) (1−L)ζS(L)F =ζζζC(L)u = c×q +(1−L) ⊥ +(1−L)2 ⊥ u t t ξ(cid:48)ξη(cid:48) ξ(cid:48)D ξ(cid:48)E(L) t (cid:18) (1−L)I 0 (cid:19)(cid:26)(cid:18) ξ(cid:48) D (cid:19) (cid:18) ξ(cid:48) E(L) (cid:19) (cid:18) 0 (cid:19)(cid:27) = c ⊥ +(1−L) ⊥ +(1−L)2 c×q u . 0 I ξ(cid:48)ξη(cid:48) ξ(cid:48)D ξ(cid:48)E(L) t r−c (27) Taking the first c rows, (1−L)ξ(cid:48) S(L)F = (1−L) (cid:0) ξ(cid:48) D+(1−L)ξ(cid:48) E(L) (cid:1) u . (28) ⊥ t ⊥ ⊥ t InAppendixA.1weprovethatifF issuchthat(28)holdsandξ(cid:48) S(L)F isweaklystationary t ⊥ t with rational spectral density, then, in F = F˜ +W, W must be chosen such that t t ξ(cid:48) S(L)F = k+ (cid:0) ξ(cid:48) D+(1−L)ξ(cid:48) E(L) (cid:1) u , (29) ⊥ t ⊥ ⊥ t where k is a c-dimensional constant vector and ξξξ(cid:48) S(1)W = k . Now, ⊥ ξ(cid:48) S(1)F = ξ(cid:48) S(L)F −ξ(cid:48) S∗(L)(1−L)F , (30) ⊥ t ⊥ t ⊥ t where S∗(L) is the polynomial (S(L)−S(1))/(1−L). Obviouslyξξξ(cid:48) S∗(L)(1−L)F is weakly ⊥ t stationary with rational spectral density. As a consequence, ξ(cid:48) S(L)F is weakly stationary ⊥ t withrationalspectraldensityifandonlyifξ(cid:48) S(1)F isweaklystationarywithrationalspectral ⊥ t density. Moreover, equation (30), replacing ξξξ(cid:48) S(L)F with the left-hand side of (29) and ⊥ t (1−L)F with S(L)−1(ξξξηηη(cid:48)+(1−L)D+(1−L2E(L))u , becomes t t ξ(cid:48) S(1)F = k+ (cid:8)(cid:0) ξξξ(cid:48) D−ξξξ(cid:48) S∗(1)S(1)−1ξξξηηη(cid:48)(cid:1) +(1−L)MMM(L) (cid:9) u . ⊥ t ⊥ ⊥ t As ξξξ(cid:48) D−ξξξ(cid:48) S∗(1)S(1)−1ξξξηηη(cid:48) (cid:54)= 0 generically, ξ(cid:48) S(1)F is generically I(0). ⊥ ⊥ ⊥ t In conclusion, assuming that F is such that ξξξ(cid:48) S(L)F is weakly stationary with rational t ⊥ t spectral density and mean k: (cid:18) (cid:19) I 0 c ζS(L)F = 0 (1−L)I t r−c (cid:18) k (cid:19) (cid:26)(cid:18) ξ(cid:48) D (cid:19) (cid:18) ξ(cid:48) E(L) (cid:19) (cid:18) 0 (cid:19)(cid:27) + ⊥ +(1−L) ⊥ +(1−L)2 c×q u . 0 ξ(cid:48)ξη(cid:48) ξ(cid:48)D ξ(cid:48)E(L) t (r−c)×1 Denote by M(L) the matrix between curly brackets. The following statement is proved in Appendix A.2. Proposition2. AssumethatthefamilyofI(1)processesF ,suchthat(1−L)F = S(L)−1C(L)u , t t t is a rational reduced-rank I(1) family with cointegration rank c and parameter set Π. Then, for generic values of the parameters in Π, the r ×q matrix M(z) is zeroless. In particular, generically, (cid:18)(cid:18) ξ(cid:48) D (cid:19)(cid:19) rank(M(1)) = rank ⊥ = q. (31) ξ(cid:48)ξη(cid:48) 13

For r = q, (31) is equivalent to the condition that ξ(cid:48) C∗η has full rank in Johansen ⊥ ⊥ (1995), Theorem 5.4, p. 55 (Johansen’s matrix C∗ is equal to our D). To see this, observe (cid:0) (cid:1) that if r = q then ηηη ηηη is r×r and invertible, and that ⊥ M(1) = (cid:0) ηηη ηηη (cid:1)−1 (cid:18) ξξξ(cid:48) ⊥ Dηηη ⊥ ξξξ(cid:48) ⊥ Dηηη (cid:19) . ⊥ 0 ξ(cid:48)ξη(cid:48)ηηη r−c×c As ξ(cid:48)ξη(cid:48)ηηη is non singular, the determinant of M(1) vanishes if and only if the determinant of ξξξ(cid:48) Dηηη vanishes. ⊥ ⊥ A consequence of Proposition 2 and Proposition 1(I) is that generically there exists a finite-degree r×r polynomial matrix N(L) = I +N L+···+N Lp, r 1 p for some p, such that: (i) N(L)M(L) = M(0), i.e. N(L) is a left inverse of M(L); (ii) all the roots of det(N(L)) lie outside the unit circle, so that N(1) has full rank. In conclusion, for generic values of the parameters in Π, A(L)F = h+C(0)u , t t where (cid:18) (cid:19) I 0 A(L) = I +A L+···+A LP = ζ−1N(L) c ζS(L) r 1 P 0 (1−L)I r−c (32) (cid:18) ξ(cid:48) (cid:19) = ζ−1N(L) ⊥ S(L), (1−L)ξ(cid:48) with P = p+1+s , and 2 (cid:18) (cid:19) k h = A(1) . (33) 0 (r−c)×1 Defining (cid:18) (cid:19) I α = ζ−1N(1) c , β = S(1)(cid:48)ξ , (34) 0 ⊥ (r−c)×c both α and β have generically rank c (regarding α, remember that N(1) has full rank) and A(1) = αβ(cid:48). Lastly, define A∗(L) = (1−L)−1(A(L)−A(1)L). (35) We have proved the following statement. Proposition 3. (Granger Representation Theorem for reduced-rank I(1) vectors) Let F be a rational reduced-rank I(1) family with cointegration rank c and parameter set Π, t so that (1−L)F = S(L)−1C(L)u . t t For generic values of the parameters in Π, (i) F can be determined such that β(cid:48)F = ξ(cid:48) S(1)F t t ⊥ t is weakly stationary with rational spectral density and mean k, (ii) F has the Error Correction t representation A(L)F = A∗(L)(1−L)F +αβ(cid:48)F = h+C(0)u . (36) t t t−1 t where the r×r finite-degree polynomial matrices A(L) and A∗(L), the full-rank r×c matrices α and β, the r-dimensional constant vector h, have been defined above in (32), (35), (34), (33), respectively. Generically, βββ(cid:48)F is I(0). t 14

In Definition 4 we have not assumed that u is fundamental for (1−L)F . However, t t (cid:18) (cid:19) (1−L)I 0 C(L) =ζζζ−1 c M(L). (37) 0 I r−c Therefore, by Proposition 2, generically the matrix C(L) has full rank q for |z| < 1. Thus, see Rozanov (1967), pp. 43–7: Proposition4. AssumethatthefamilyofI(1)processesF ,suchthat(1−L)F = S(L)−1C(L)u , t t t is a rational reduced-rank I(1) family with cointegration rank c and parameter set Π. For generic values of the parameters the vector u is fundamental for the vector (1−L)F . t t As recalled in Section 2.2, if (1 − L)F = S(L)−1C(L)u , then u is fundamental for t t t (1−L)F if it belongs to the space spanned by (1−L)F , τ ≤ t. This is not inconsistent with t τ thefactthatwhenc > r−q,sothatC(z)isnotfullrankforz = 1,therearenotautoregressive representations for (1−L)F , either finite or infinite, and therefore no representations of the t form n (cid:88) u = (A +A L+A L2+···)(1−L)F = lim A (1−L)F , t 0 1 2 t k t−k n→∞ k=0 where the matrices A are q×r. Indeed, fundamentalness of u only implies that u can be j t t obtained as the limit of linear combinations of the vectors (1−L)F , τ ≤ 0, i.e. τ ∞ (cid:88) (n) u = lim B (1−L)F , t k t−k n→∞ k=0 where the coefficient matrices depend both on k and n.4 Lastly, let us recall that neither F nor u are identified in the factor model t t x =ΛΛΛF +(cid:15)(cid:15)(cid:15) , (1−L)F = S(L)−1C(L)u . t t t t t In particular, if F = HF∗, where H is r×r and invertible, t t x = ΛΛΛ∗F∗+(cid:15)(cid:15)(cid:15) , (1−L)F∗ = S∗(L)−1C∗(L)u , t t t t t whereΛΛΛ∗ =ΛΛΛH,S∗(L) = H−1S(L)H,C∗(L) = H−1C(L)(seealsoSection2.3). Inparticular, if H−1 = ζζζS(1), the first c coordinates of F∗ are I(0) and the remaining r −c = q −d are t I(1).5 Moreover, the c coordinates of the error vector βββ∗(cid:48)F∗ in representation (36) for F∗ t−1 t are linear combinations of the I(0) factors alone. 3.4 VECMs and unrestricted VARs in the levels Several papers have addressed the issue whether and when an Error Correction model or an unrestricted VAR in the levels should be used for estimation in the case of non-singular cointegrated vectors: Sims et al. (1990) have shown that the parameters of a cointegrated VAR 4Forexample,ifx =(1−L)u ,whereu isaunivariatewhitenoise,thenu isfundamentalforx although t t t t t no autoregressive representation x +a x +a x +··· = u exists. See Brockwell and Davis (1991), p. t 1 t−1 2 t−2 t 111, Problem 3.8. 5Moreprecisely,thefirstcequationsin(1−L)F∗ =S∗(L)−1(L)C∗(L)u haveanI(0)solution(theargument t t goes as in the discussion of equation (27)). This is consistent with Definition 3(ii), some of the coordinates of a I(1) vector can be I(0). 15

are consistently estimated using an unrestricted VAR in the levels; on the other hand, Phillips (1998) shows that if the variables are cointegrated, the long-run features of the impulseresponse functions are consistently estimated only if the unit roots are explicitly taken into account, that is within a VECM specification. The simulation exercise described below provides some evidence in favour of the VECM specification in the singular case. WegenerateF usingaspecificationof (36)withr = 4,q = 3,d = 2,sothatc = r−q+d = t 3. The 4×4 matrix A(L) is of degree 2. Moreover, the upper 3×3 submatrix of C(0) is lower triangular (see Appendix C for details). We estimate a VECM as in Johansen (1988, 1991) and assuming c, the degree of A(L) and the identification restrictions known. We replicate the generation 1000 times for T = 100, 500, 1000, 5000. For each replication, we estimate a (misspecified) VAR in differences, a VAR in the levels and a VECM, assuming known c and the degree of A(L) and A∗(L). The Root Mean Square Error between estimated and actual impulse-response functions is computed for each replication using all 12 responses and averaged over all replications. The results are are shown in Table 1. We see that the RMSE of both the VECM and the LVAR decreases as T increases. However, for all values of T, the RMSE of the VECM stabilizes as the lag increases, whereas it deteriorates for the LVAR, in line with the claim that the long-run rsponse of the variables are better estimated with the VECM. Forthesakeofsimplicity, inthesimulationexercisethe“structuralshocks” aredetermined by imposing restrictions on the response of F to the shocks. Precisely, the upper 3×3 subt matrix of C(0) is lower triangular. However, as we recalled in the Introduction and Section 2.3, neither the factors F nor their response to the shocks, have a direct economic interpret tation. In empirical work with actual data, identification of the structural shocks v , which t result as a linear transformation of u , is usually obtained by imposing restrictions on the t impulse-response functions of the variables x with respect to v . it t Table 1: Monte Carlo Simulations. VECM: r = 4,q=3,c = 3. lags DVAR LVAR VECM lags DVAR LVAR VECM 001= T 0 0.06 0.05 0.05 005= T 0 0.02 0.02 0.02 4 0.26 0.18 0.17 4 0.23 0.07 0.07 20 0.30 0.37 0.22 20 0.25 0.14 0.09 40 0.30 0.45 0.22 40 0.25 0.21 0.09 80 0.30 0.57 0.22 80 0.25 0.32 0.09 0001= T 0 0.02 0.02 0.02 0005= T 0 0.01 0.01 0.01 4 0.23 0.05 0.05 4 0.22 0.02 0.02 20 0.25 0.09 0.07 20 0.25 0.03 0.03 40 0.25 0.13 0.07 40 0.25 0.04 0.03 80 0.25 0.22 0.07 80 0.25 0.06 0.03 Root Mean Squared Errors at different lags, when estimating the impulse response functions of the simulated variables Ft to the common shocks ut. Estimation is carried out using three differentautoregressiverepresentations: aVARfor(1−L)Ft(DVAR),aVARforFt(LVAR),and aVECMwithc=r−q+derrorcorrectionterms(VECM).Resultsarebasedon1000replications. For the data generating process see Appendix C. The RMSEs are obtained averaging over all replicationsandall4×3impulseresponses. 16

4 Cointegration of the variables x it The relationship between cointegration of the factors F and cointegration of the variables x t it is now considered. Let us firstly observe that, regarding model (6), neither the assumptions (1) through (5) listed in Section 2, nor the asymptotic conditions (see e.g. Forni et al. (2009)) say much on the matrix ΛΛΛ and the vector (cid:15)(cid:15)(cid:15) for a given finite n. In particular, the first r t eigenvalues of the matrix ΛΛΛΛΛΛ(cid:48) must diverge as n → ∞, but this has no implications on the rankofthematrixΛΛΛcorrespondingto,say,n = 10. Moreover,asweseeinProposition5(iii),if theidiosyncraticcomponentsareI(0),thenallp-dimensionalsubvectorsofx arecointegrated t for p > q−d, which is at odds with what is observed in the macroeconomic datasets analysed in the empirical Dynamic Factor Model literature. This motivates assuming that (cid:15)(cid:15)(cid:15) is I(1). t In that case, see Proposition 5(i), cointegration of x requires that both the common and the t idiosyncratic components are cointegrated. Some results are collected in the statement below. Proposition 5. Let x (p) = χχχ (p) +(cid:15)(cid:15)(cid:15) (p) = ΛΛΛ(p)F +(cid:15)(cid:15)(cid:15) (p) be a p-dimensional subvector of x , t t t t t t p ≤ n. Denote by cp and cp the cointegration rank of χχχ (p) and (cid:15)(cid:15)(cid:15) (p) respectively. Both range χ (cid:15) t t from p, stationarity, to 0, no cointegration. (p) (p) (p) (i) x is cointegrated only if χχχ and (cid:15)(cid:15)(cid:15) are both cointegrated. t t t (ii) If p > q−d then χχχ (p) is cointegrated. If p ≤ q−d and Λ(p) is full rank then χχχ (p) is not t t cointegrated. If p ≤ q−d and rank(Λ(p)) < p then χχχ (p) is cointegrated. t (iii) Let Vχ ⊆ Rp and V(cid:15) ⊆ Rp be the cointegration spaces of χχχ (p) and (cid:15)(cid:15)(cid:15) (p) respectively. The t t vector x (p) is cointegrated if and only if the intersection of Vχ and V(cid:15) contains non-zero t vectors. In particular, if p > q−d and c(cid:15) > q−d then x(p) is cointegrated. Proof. Because χ and (cid:15) are orthogonal for all i,j,t,s, see Assumption (5) for model (6), it js (p) (p) (p) the spectral densities of (1−L)x , (1−L)χχχ , (1−L)(cid:15)(cid:15)(cid:15) fulfill: t t t (p) (p) (p) ΣΣΣ (θ) =ΣΣΣ (θ)+ΣΣΣ (θ) θ ∈ [−π,π]. (38) ∆x ∆χ ∆(cid:15) Now, (38) implies that (cid:16) (cid:17) (cid:16) (cid:17) (cid:16) (cid:17) λ ΣΣΣ (p) (0) ≥ λ ΣΣΣ (p) (0) +λ(p) ΣΣΣ (p) (0) , (39) p ∆x p ∆χ ∆(cid:15) whereλ (A)denotesthesmallesteigenvalueofthehermitianmatrixA;thisisoneoftheWeyl’s p inequalities, see Franklin (2000), p. 157, Theorem 1. Because spectral density matrices are non-negative definite, the right hand side in (39) vanishes if and only if both terms on the (p) right hand side vanish, i.e. the spectral density of ∆x is singular at zero if and only if the t (p) (p) spectral densities of ∆χχχ and ∆(cid:15)(cid:15)(cid:15) are singular at zero. By definition 3, (i) is proved. t t Without loss of generality we can assume that S(L) = I . By substituting (26) in (6), we r obtain x = Λ[(G (L)v +G (L)v +T )+W]+(cid:15) , (40) t 1 1t 2 2t t t whereontherighthandsidetheonlynon-stationarytermisT and(possibly)(cid:15) . Byrecalling t t that T = ξ (cid:80)t v where ξ is of dimension r ×(q −d) and rank q −d, and by defining t s=1 2s G = Λ[G (L)v +G (L)v +Z] and T = (cid:80)t v , we can rewrite (40) as: t 1 1t 2 2t t s=1 2s x =ΛΛΛξξξT +G +(cid:15) . t t t t 17

(p) For x : t x (p) =χχχ (p) +(cid:15)(cid:15)(cid:15) (p) =ΛΛΛ(p)ξξξT +G (p) +(cid:15)(cid:15)(cid:15) (p) , t t t t t t where ΛΛΛ(p) and G (p) have an obvious definition. Of course cointegration of the common t components χχχ (p) is equivalent to cointegration of ΛΛΛ(p)ξξξT , which in turn is equivalent to t t rank(ΛΛΛ(p)ξξξ) < p. Statement (ii) follows from (cid:16) (cid:17) (cid:16) (cid:17) rank ΛΛΛ(p)ξξξ ≤ min rank(ΛΛΛ(p)),rank(ξξξ) . The first part of (iii) is obvious. Assume now that p > q −d. If cp +cp = dim(Vχ)+ χ (cid:15) dim(V(cid:15)) = p−(q−d)+cp > p, i.e. if cp > q−d, then the intersection between Vχ and V(cid:15) is (cid:15) (cid:15) (p) non-trivial, so that x is cointegrated. t 5 Summary and conclusions The paper studies representation theory for Dynamic Factor Models when the factors are I(1) and singular, and the idiosyncratic components are I(1). Singular I(1) vectors are cointegrated, with cointegration rank c equal to r−q, the dimension of F minus its rank, plus d, t with 0 ≤ d < q. We prove that if (1−L)F has rational spectral density, then generically F t t has an Error Correction representation with c error terms and a finite autoregressive matrix polynomial. Moreover, F is driven by r−c permanent shocks and d transitory shocks, with t r−c+d = q, not r as in the non-singular case. These results are obtained by combining the standard results on cointegration with recent results on singular stochastic vectors. Using simulated data generated by a simple singular VECM, confirms previous results, obtained for non-singular vectors, showing that under cointegration the long-run features of impulse-response functions are better estimated using a VECM rather than a VAR in the levels. In Section 4 we argue that stationarity of the idiosyncratic components would produce an amount of cointegration for the observable variables x that is not observed in the datasets it that are standard in Dynamic Factor Model literature, see e.g. Stock and Watson (2002a,b, 2005), Forni et al. (2009). Thus the idiosyncratic vector in those datasets is likely to be I(1). The results in this paper are the basis for estimation of I(1) Dynamic Factor Models with cointegrated factors, see the companion paper (Barigozzi et al., 2016). References Amengual, D. and M. W. Watson (2007). Consistent estimation of the number of dynamic factors in a large N and T panel. Journal of Business and Economic Statistics 25, 91–96. Anderson, B. D. and M. Deistler (2008a). Generalized linear dynamic factor models–a structure theory. IEE Conference on Decision and Control. Anderson, B. D. and M. Deistler (2008b). Properties of zero-free transfer function matrices. SICE Journal of Control, Measurement and System Integration 1, 284–292. Bai, J. (2004). Estimating cross-section common stochastic trends in nonstationary panel data. Journal of Econometrics 122, 137–183. 18

Bai, J. and S. Ng (2004). A PANIC attack on unit roots and cointegration. Econometrica 72, 1127–1177. Bai, J. and S. Ng (2007). Determining the number of primitive shocks in factor models. Journal of Business and Economic Statistics 25, 52–60. Banerjee, A., M. Marcellino, and I. Masten (2014a). Forecasting with factor-augmented error correction models. International Journal of Forecasting 30, 589–612. Banerjee, A., M. Marcellino, and I. Masten (2014b). Structural FECM: Cointegration in large-scale structural FAVAR models. Working Paper, http://www.econlab.si/igor.masten. Barigozzi, M., A. M. Conti, and M. Luciani (2014). Do euro area countries respond asymmetrically to the common monetary policy? Oxford Bulletin of Economics and Statistics 76, 693–714. Barigozzi, M., M. Lippi, and M. Luciani (2016). Non-stationary dynamic factor models for large datasets. http://arxiv.org/abs/1602.02398. Brockwell, P. J. and R. A. Davis (1991). Time Series: Theory and Methods (Second ed.). New York: Springer. Deistler, M., B. D. Anderson, A. Filler, C. Zinner, and W. Chen (2010). Generalized linear dynamic factor models: An approach via singular autoregressions. European Journal of Control, 211–224. Engle,R.F.andC.W.J.Granger(1987).Co-integrationanderrorcorrection: Representation, estimation, and testing. Econometrica 55, 251–76. Forni, M. and L. Gambetti (2010). The dynamic effects of monetary policy: A structural factor model approach. Journal of Monetary Economics 57, 203–216. Forni, M., D. Giannone, M. Lippi, and L. Reichlin (2009). Opening the Black Box: Structural Factor Models versus Structural VARs. Econometric Theory 25, 1319–1347. Forni, M., M. Hallin, M. Lippi, and L. Reichlin (2000). The Generalized Dynamic Factor Model: Identification and Estimation. The Review of Economics and Statistics 82, 540– 554. Forni, M., M. Hallin, M. Lippi, and P. Zaffaroni (2015). Dynamic factor models with infinitedimensionalfactorspaces: one-sidedrepresentations.JournalofEconometrics 185,359–371. Forni, M. and M. Lippi (2001). The Generalized Dynamic Factor Model: Representation Theory. Econometric Theory 17, 1113–1141. Forni, M. and M. Lippi (2010). The Unrestricted Dynamic Factor Model: One-sided Representation Results. Journal of Econometrics 163, 23–28. Franklin, J. N. (2000). Matrix Theory (Second ed.). New York: Dover Publications. Giannone, D., L. Reichlin, and L. Sala (2005). Monetary policy in real time. In M. Gertler and K. Rogoff (Eds.), NBER Macroeconomics Annual 2004. MIT Press. 19

Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12, 231–254. Johansen, S. (1991). Estimation and hypothesis testing of cointegration vectors in gaussian vector autoregressive models. Econometrica 59, 1551–80. Johansen, S. (1995). Likelihood-based inference in cointegrated vector autoregressive models (First ed.). Oxford: Oxford University Press. Lancaster, P. and M. Tismenetsky (1985). The theory of matrices (Second ed.). New York: Academic Press. Luciani, M. (2015). Monetary policy and the housing market: A structural factor analysis. Journal of Applied Econometrics 30, 199–218. Peña, D. and P. Poncela (2004). Nonstationary dynamic factor analysis. Journal of Statistical Planning and Inference 136, 1237–1257. Phillips, P. C. (1998). Impulse response and forecast error variance asymptotics in nonstationary VARs. Journal of Econometrics 83, 21–56. Rozanov, Y. A. (1967). Stationary Random Processes. San Francisco: Holden-Day. Sims, C., J. H. Stock, and M. W. Watson (1990). Inference in linear time series models with some unit roots. Econometrica 58, 113–144. Stock, J. H. and M. W. Watson (1988). Testing for common trends. Journal of the American Statistical Association 83, 1097–1107. Stock, J. H. and M. W. Watson (2002a). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 97, 1167–1179. Stock, J. H. and M. W. Watson (2002b). Macroeconomic forecasting using diffusion indexes. Journal of Business and Economic Statistics 20, 147–162. Stock, J. H. and M. W. Watson (2005). Implications of dynamic factor models for VAR analysis. Working Paper 11467, NBER. van der Waerden, B. L. (1953). Modern Algebra (Second ed.), Volume I. New York: Frederick Ungar. Watson, M. W. (1994). Vector autoregressions and cointegration. In R. Engle and D. McFadden (Eds.), Handbook of Econometrics, Volume IV. Elsevier Science. 20

Appendix A Proofs Appendix A.1 Stationary solutions of (1−L)y = (1−L)ζζζ t t Consider firstly the difference equation in the unknown g-dimensional vector process ζζζ : t (1−L)ζζζ = w , t ∈ Z, (A1) t t where w is a g-dimensional stochastic process with w ∈ L (Ω,F,P). Define a solution of t jt 2 (A1) as a process y , t ∈ Z, Y ∈ L (Ω,F,P), such that (1−L)Y = w . If y˜ is a solution, t jt 2 t t t thenallthesolutionsof (A1)arey˜ +W, whereW isag-dimensionalstochasticvariablewith t W ∈ L (Ω,F,P), i.e. a particular solution plus a constant stochastic process (a solution of jt 2 the homogeneous equation (1−L)y = 0). t Assume that z is a g-dimensional, weakly stationary process with a moving-average rept resentation z = Z(L)v , where: t t (i) v is an s-dimensional non-singular white noise and s ≤ g belonging to L (Ω,F,P), t 2 (ii) Z(L) is a g×s square-summable matrix such that z and v , for t ∈ Z, k = 1,2,...,g, kt jt j = 1,2,...,s, span the same subspace of L (Ω,F,P).6 2 Now consider the equation (1−L)ζζζ = (1−L)z . (A2) t t Because z trivially fulfills (A2), all the solutions of (A2) are t y = z +K t t where K is a g-dimensional stochastic vector belonging to L (Ω,F,P). We want to determine 2 the conditions such that the solution y is weakly stationary with a spectral density. t Let ∞ (cid:88) K = P v +H. k k k=−∞ be the orthogonal projection of K on the space spanned by v , k ∈ Z. Setting V = k (cid:80)∞ P v , because H is orthogonal to V and z , τ ∈ Z, k=−∞ k k τ E(y y(cid:48) ) = E(z z(cid:48) )+E(VV(cid:48))+E(HH(cid:48))+E(z V(cid:48))+E(Vz(cid:48) ). t t−k t t−k t t−k Given k, the last two terms tend to zero when t tends to infinity (by the same argument used toprovethattheautocovariancesofamovingaveragetendtozeroasthelagtendstoinfinity). Weak stationary of y implies that t E(z V(cid:48))+E(Vz(cid:48) ) = 0, t t−k for all t ∈ Z. On the other hand, given t, E(Vz(cid:48) ) tends to zero as k tends to infinity (again t−k the argument on autocovariances), so that E(z V(cid:48)) = 0 for all t. Orthogonality of V to all z t t implies orthogonality to all v , see assumption (ii) above. As V is an average of v , k ∈ Z, t k then V = 0. In conclusion, all the stationary solutions of (A2) are y = z + K with K t t orthogonal to z for all t ∈ Z. t Lastly, the spectral measure of y has a jump at frequency zero unless the variancet covariance matrix of K is zero. Thus y has a spectral density if and only if K is a constant t 6A very weak condition. However, if Z(L) is a band-pass filter, (ii) does not hold. 21

vector (i.e. K(ω) = k almost surely in Ω). In that case the spectral densities of y and z t t coincide. Using the results above we can prove the statement in comment C7 on Definition 3, Cointegration. If y is such that (1−L)y = V(L)v , then y = y˜ +W, where t t t t t  v +v +···+v , for t > 0  1 2 t  y˜ = V(L)µµµ where µµµ = 0, for t = 0 t t t  −(v +v +···+v ), for t < 0 0 −1 t+1 and W is an r-dimensional stochastic vector. If c is a cointegration vector, then (cid:20) (cid:21) V(L)−V(1) (1−L)c(cid:48)y = c(cid:48)V(L)v = c(cid:48) V(1)+(1−L) v = (1−L)c(cid:48)V∗(L)v , t t t t 1−L where trivially the entries of V∗(L) = (V(L)−V(1))/(1−L) are rational functions of L with no poles of modulus less or equal to 1. From the last equation we obtain c(cid:48)y = c(cid:48)y˜ +c(cid:48)W = c(cid:48)V∗(L)v +w, t t t where w is a stochastic variable. The process c(cid:48)y is weakly stationary with rational spectral t density if and only if w is a constant with probability one. On the other hand, c(cid:48)y˜ = c(cid:48)V(L)µµµ = (1−L)c(cid:48)V∗(L)µµµ = c(cid:48)V∗(L)v , t t t t so that c(cid:48)W = w. In conclusion, c(cid:48)y is weakly stationary with rational spectral density if t and only if, in the solution y = y˜ +W, the stochastic vector W is chosen such that c(cid:48)W is t t constant with probability one. The same reasoning applies to equation (28), to prove that ξ(cid:48) S(L)F is weakly stationary ⊥ t with rational spectral density if and only if (29) holds and ξξξ(cid:48) S(1)W = k. ⊥ Appendix A.2 Proof of Proposition 2 With one exception at the end of the proof, we keep using the notation C(z), M(z), etc., avoiding explicit dependence on p ∈ Π (see Definition 1). Remark 1. Suppose that the statement S(p), depending on a vector p ∈ Π, is equivalent to a set of polynomial equations for the parameters, for example the statement that all the q×q minors of M(1) vanish, i.e. that rank(M(1)) < q. Statement S(p) is true either for a nowhere dense subset of Π or for the whole Π. Thus, if the statement is false for one point in Π, it is generically false in Π. Moreover, S(p) can be obviously extended to any p ∈ Rλ and, as Π is an open subset of Rλ, if the statement S is false for one point in Rλ, then it is generically false in Rλ and therefore in Π. Remark 2. Remark 1 can be used to show that generically the stochastic vectors of a reduced rank I(1) family, see Definition 4, are of rank q. Let p∗ be a point in Rλ such that ξξξ = 0, ηηη = 0, S(L) = I , E(L) = 0 and let D be of rank q (note that p∗ does not necessarily belong to r Π). Given θ∗ ∈ [−π, π], θ∗ (cid:54)= 0, the matrix S(e−iθ∗)−1C(e−iθ∗) is equal to (1−e−iθ∗)D at p∗ and has therefore rank q, so that, by Remark 1, has rank q generically in Π. Thus, generically in Π, the spectral density of (1−L)F , which is a rational function of e−iθ, has rank q except t for a finite subset of [−π, π] (depending on p). 22

Remark 3. Consider the polynomials A(z) = a zn+a zn−1+···+a , B(z) = b zm+b zm−1+···+a 0 1 n 0 1 m and let α , i = 1,...,n and β , j = 1,...,m, be the roots of A and B respectively. Suppose i j that a (cid:54)= 0 and b (cid:54)= 0. Then, see van der Waerden (1953, pp. 83-8), 0 0 (cid:89) ambn (α −β ) = R(a ,a ,...,a ;b ,b ,...,b ), 0 0 i j 0 1 n 0 1 m i,j where R is a polynomial function. The function R is called the resultant of A and B. The resultantvanishesifandonlyifAandB haveacommonroot. Nowsupposethatthecoefficients a and b are polynomial functions of p ∈ Π. Then, by Remark 1, if there exists a point p˜ ∈ Π i j (or p˜ ∈ Rλ) such that a (p˜) (cid:54)= 0, b (p˜) (cid:54)= 0, and R(p˜) (cid:54)= 0, then generically A and B have no 0 0 common roots. Remark 4. Recall that a zero of M(z) is a complex number z∗ such that rank(M(z∗) < q (see Proposition 1). If M(z) has two q×q submatrices whose determinants have no common roots, then M(z) is zeroless. Starting with C(z) = ξη(cid:48)+(1−z)D+(1−z)2E(z), we obtain, see Section 3.3, (cid:18) (1−z)I 0 (cid:19)(cid:26)(cid:18) ξ(cid:48) D (cid:19) (cid:18) ξ(cid:48) E(z) (cid:19) (cid:18) 0 (cid:19)(cid:27) ζC(z) = c ⊥ +(1−z) ⊥ +(1−z)2 c×q 0 I ξ(cid:48)ξη(cid:48) ξ(cid:48)D ξ(cid:48)E(z) r−c (cid:18) (cid:19) (1−z)I 0 = c M(z). 0 I r−c With no loss of generality we can assume that r = q + 1, see Remark 4. We denote by M (z) and M (z) the q × q matrices obtained by dropping the first and the last row 1 2 of M(z) respectively. The degrees of the polynomials det(M (z)) and det(M (z)) are d = 1 2 1 (q−d)(s +2)+d(s +1) and d = (q−d−1)(s +2)+(d+1)(s +1). 1 1 2 1 1 Let us now define a subfamily of M(z), denoted by M(z), obtained by specifyingηηη,ξξξ,ξξξ(cid:48) , ⊥ D and E(L) in the following way:   η(cid:48) = (cid:0) 0 (q−d)×d I q−d (cid:1) , ξ = (cid:18) 0 I q−d (cid:19) , ξ(cid:48) ⊥ = (cid:18) K H (cid:19) , D = (cid:0) H(cid:48) 0 (q+1)×(q−d) (cid:1) , E(z) = E E 2 1 ( ( z z ) ), c×(q−d) E (z) 3 where (cid:0) (cid:1) (cid:0) (cid:1) K = 0 1 0 , H = 0 I , 1×(q−d) 1×d d×(q+1−d) d 23

   k 1 (z) h 1 (z) ··· 0    E 1 (z) =  0 (q−d)×d ... ...       ... h q−d−1 (z)   0 ··· k (z) q−d (cid:0) (cid:1) E (z) = e(z) 0 2 1×(q−1)   f (z) g (z) ··· 0 1 1 E 3 (z) =   ... ... 0 d×(q−d−1)   , 0 ··· f (z) g (z) d d the polynomial entries e, k , h , f and g being of degree s . We have: i i i i 1       0 0 E (z) 0 1×d 1×(q−d) 2 1×q M(z) =  I d 0 d×(q−d)+(1−z) E 3 (z) +(1−z)2 0 d×q . 0 I 0 E (z) (q−d)×d q−d (q−d)×q 1 NoticethatM(z)haszeroentriesexceptforthediagonaljoiningthepositions(1,1)and(q,q), and the diagonal joining (2,1) and (q +1,q). The matrices M (z) and M (z), obtained by 1 2 dropping the first and the last row of M(z), respectively, are upper- and lower-triangular, respectively. Moreover, det(M (z)) = [1+(1−z)f (z)]···[(1+(1−z)f (z)] 1 1 d ×[1+(1−z)2k (z)]···[1+(1−z)2k (z)] 1 q−d det(M (z)) = (1−z)2q−d−1e(z)[g (z)···g (z)][h (z)···h (z)] 2 1 d 1 q−d−1 Now: (i) The leading coefficient of det(M (z)), call it Q , corresponding to zd1, is the product 1 1 of the leading coefficients of the polynomials k (z), j = 1,...,(q − d) and f (z), i = j i 1,...,d. Trivially, there exist values for the parameters of the polynomials k and f , j i such that Q (cid:54)= 0. Let ωωω be the vector of such parameters and Mω1(z) the matrix 1 1 1 M (z) corresponding to the parameters in ωωω . 1 1 (ii) Now observe, firstly, that the polynomials det(M (z)) and det(M (z)) have no param- 1 2 eters in common, and, secondly, that the parameters of det(M (z)) vary in an open set 2 (each one is a subvector of the parameters vector of M(z), which varies in the open set Π). As a consequence, there exist parameters for the polynomials e, g and h , call ωωω j i 2 the vector of such parameters, such that (1) the leading coefficient of det(Mω2(z)) does 2 not vanish, (2) det(Mω1(z)) and det(Mω2(z)) have no roots in common. This implies 1 2 that, as the leading coefficient of det(Mω1(z)) does not vanish as well, the resultant of 1 det(Mω1(z)) and det(Mω2(z)) does not vanish, see Remark 2. 1 2 (iii) Combiningtheparametersinωωω andωωω , wedetermineapointp ∈ Πsuchthat, atp, the 1 2 leading coefficient of det(M (z)) and det(M (z)) and their resultant do not vanish. As 1 2 the leading coefficients and the resultant of det(M (z)) and det(M (z)) are polynomial 1 2 functions of the parameters in p, then by Remarks 2 and 3, M(z) is generically zeroless. 24

Appendix B Non uniqueness In Proposition 3 we prove that a singular I(1) vector has a finite Error Correction representation with c error correction terms. However, as anticipated in Section 2, this representation is not unique since: (i) different Error Correction representations can be obtained in which the number of error terms varies between d and r−q+d, (ii) the left inverse of the matrix M(L) may be not unique. We discuss these two causes of non-uniqueness for representation (36) below. InAppendixB.3weshowthatallsuchrepresentationproducethesameimpulse-response functions. Appendix B.1 Alternative representations with different numbers of error correction terms Let, for simplicity, S(L) = I and consider the following example, with r = 3, q = 2, c = 2, so r that d = 1: ξ(cid:48) = (cid:0) 1 1 1 (cid:1) η(cid:48) = (cid:0) 1 2 (cid:1) (cid:18) (cid:19) 1 −1 0 ξ(cid:48) = ⊥ 0 1 −1 We have,     (cid:18) ξ(cid:48) (cid:19) 1−L 0 0  d 11 −d 21 d 12 −d 22  (1−L) ξ ⊥ (cid:48) F t =  0 1−L 0 d 21 −d 31 d 22 −d 32+(1−L)G(L) u t , 0 0 1  3 6  where (1−L)G(L) gathers the second and third terms within curly brackets in the second line of (27). If the first matrix within the curly brackets has full rank, we can proceed as in Proposition 3 and obtain an Error Correction representation with error terms (cid:18) (cid:19) F −F ξ(cid:48) F = 1t 2t . ⊥ t F −F 2t 3t However, we also have    (cid:18) ξ(cid:48) (cid:19) 1−L 0 0  d 11 −d 21 d 12 −d 22 (1−L) ξ ⊥ (cid:48) F t =  0 1 0 (1−L)(d 21 −d 31 ) (1−L)(d 22 −d 32 ) 0 0 1  3 6   1−L 0 0 (cid:111) +(1−L)G˜(L) u t =  0 1 0M˜(L)u t . 0 0 1 Assuming that the matrix (cid:18) (cid:19) d −d d −d 11 21 12 22 3 6 is non-singular, the matrix M˜(L) is zeroless and has therefore a finite-degree left inverse. Proceeding as in Proposition 3, we obtain an alternative Error Correction representation with just one error term, namely F −F . 1t 2t 25

This example can be generalized to show that generically F admits Error Correction t representations with a minimum d and a maximum r −q +d of error terms. In particular, if d = 0, in addition to an Error Correction representation, F generically has a finite-degree t autoregressive representation with no error terms (i.e. a VAR), consistently with the fact that in this case C(L) is generically zeroless. Experiments with simulated and actual data suggest that the best results in estimation of singular VECMs are obtained using c (the maximum number of) error correction terms. Appendix B.2 The left inverse of M(L) is not necessarily unique In the proof of Proposition 3 we have used the fact that generically the matrix M(L) has a finite-degree left inverse N(L). We now give some examples in which N(L) is not unique. This is a well known fact, see also Forni and Lippi (2010), Forni et al. (2015). Consider (cid:18) (cid:19) 1+aL (1−L)F = u , (B3) t 1+bL t with r = 2, q = 1, d = 0, c = 1, with a (cid:54)= b. In this case A(L) is zeroless. An autoregressive representation can be obtained by elementary manipulations. Rewrite (B3) as (1−L)F = u +au 1t t t−1 (B4) (1−L)F = u +bu 2t t t−1 Taking (b −a)C(L)u , we get t b(1−L)F −a(1−L)F 1t 2t u = . t b−a This can be used to get rid of u in (B4) and obtain t−1   ab a2   (cid:18) (cid:19) 1 b−a b−a I 2 − b2 −ab L(1−L)F t = 1 u t , (B5) b−a b−a which is an autoregressive representation in first differences. Model(B4), slightlymodified, canbeusedtoillustratenon-uniquenessintheleftinversion of M(L). Consider (1−L)F = u +au 1t t t−1 (1−L)F = u +bu (B6) 2t t t−1 (1−L)F = u +cu . 3t t t−1 Taking any vector h = (h h h ), orthogonal to (a b c), we get rid of u in (B6) and obtain 1 2 3 t−1 an autoregressive representation in the differences. However, unlike in (B4), here the vectors h span a 2-dimensional space, thus producing an infinite set of autoregressive representations. In the example just above non-uniqueness can also be seen as the consequence of the fact that the three stochastic variables F , j = 1,2,3, are linearly dependent. Therefore, j,t−1 projecting each of the F onto the space spanned by F , j = 1,2,3, one would find a jt j,t−1 non-invertible covariance matrix, thus a unique projection of course but many representations of it as linear combinations of F , j = 1,2,3. j,t−1 We do not address this problem systematically in the present paper. However, in the empirical analysis of Barigozzi et al. (2016) we find no hint of singular covariance matrices. 26

Appendix B.3 Uniqueness of impulse-response functions Start with representation (19) (1−L)F = S(L)−1C(L)u = U(L)u = U u +U u +··· = C(0)u +U u +··· (B7) t t t 0 t 1 t−1 t 1 t−1 We assume that u is fundamental for (1−L)F , see Proposition 4. The impulse response t t function of F to u is t t H = U +···+U , i = 0,1,.... j 0 j Now suppose that F fulfills the autoregressive equation t B(L)F = (I +B L+...+B Lm)F = m˜ +R˜u˜ (B8) t r 1 m t t where: (i)R˜ isafull-rankr×qmatrix,(ii)u˜ isq-dimensionalwhitenoise,(iii)u˜ isorthogonal t t to (1−L)F , for τ ≥ 0. Applying (1−L) to both sides of (B8) we obtain τ B(L)U(L)u = (1−L)R˜u˜ . (B9) t t Assumption (i) and the argument mentioned in footnote 4 imply that u˜ belongs to the t space spanned by u , for τ ≥ 0, call it H . Now consider the projection τ u,t u˜ = G u +G u +··· t 0 t 1 t−1 Multiplying both sides by u(cid:48) and taking expected values: t−k Eu˜ u(cid:48) = G ΓΓΓ . t t−k k u By assumption (iii), u˜ is orthogonal to H , for k > 0, which is equal to H , for t (1−L)F,t−k u,t−k k > 0 (a consequence of the fundamentalness of u in (B7)). Thus G = 0, for k > 0 and t k u˜ = G u , t 0 t where G is a non-singular q×q matrix. Therefore, with no loss of generality, equation (B8) 0 can be rewritten with Ru instead of R˜u˜ and (B9) becomes t t B(L)U(L)u = (1−L)Ru . t t As u is a non-singular q dimensional white noise, this implies t B(L)U(L) = (1−L)R, so that: U = R, 0 B R+U = −R, U = −(I +B )R, 1 1 1 r 1 B R+B U +U = 0, U = (B +B2−B )R, 2 1 1 2 2 1 1 2 . . . and therefore H = U = R, 0 0 H = U +U = −B R, 1 0 1 1 H = U +U +U = (B2−B )R, 2 0 1 2 1 2 . . . 27

On the other hand, the impulse-response function implicit in (B8) is given by the coefficient matrices of K(L)R, where K(L)B(L) = (I +K L+···)B(L) = I . r 1 r It is easily seen that K R = H . j j NotethatwearenotmakingassumptionsonB(1)inequation(B8). Whend = 0,equation (B8) can be the autoregressive model in differences that results from left-inverting U(L) (no error correction term): B˜(L)(1−L)F = B(L)F = C(0)u . t t t Replacing u with any other white noise vector w = Qu , as we do when the shocks t t t are identified according to restrictions based on economic theory, produces different impulseresponse functions that are however independent of the autoregressive representation of F . t Appendix C Data Generating Process for the Simulations The simulation results of Section 3.3 are obtained using the following specification of (36): A(L)F = A∗(L)(1−L)F +αβ(cid:48)F = C(0)u = GHu , (C10) t t t−1 t t where r = 4, q = 3, c = 3, the degree of A(L) is 2, so that the degree of A∗(L) is 1. A(L) is generated using the factorization A(L) = U(L)M(L)V(L), where where U(L) and V(L) are r×r matrix polynomials with all their roots outside the unit circle, and (cid:18) (cid:19) (1−L)I 0 M(L) = r−c 0 I c (see Watson, 1994). To get a VAR(2) we set U(L) = I −U L, and V(L) = I , and then, by r 1 r rewriting M(L) = I −M L, we get A = M +U , and A = −M U . r 1 1 1 1 2 1 1 The data are then generated as follows. The diagonal elements of the matrix U are 1 drawn from a uniform distribution between 0.5 and 0.8, while the extra–diagonal elements from a uniform distribution between 0 and 0.3. U is then standardized to ensure that its 1 largest eigenvalue is 0.6. The matrix G is generated as in Bai and Ng (2007). Let G˜ be a r×r diagonal matrix of rank q with non-zero entries g˜ drawn from the uniform distribution ii between 0.8 and 1.2, and let Gˇ be a random r ×r orthogonal matrix. Then, G is equal to the first q columns of the matrix GˇG˜1/2. Lastly, the matrix H is such that the upper 3×3 submatrix of GH is lower triangular. Resultsarebasedon1000replications. ThematricesU , GandHaresimulatedonlyonce 1 so that the set of impulse responses to be estimated is the same for all replications, whereas the vector u is redrawn from N(0,I ) at each replication. t 4 28

Cite this document

APA

Matteo Barigozzi, Marco Lippi, & and Matteo Luciani (2016). Dynamic Factor Models, Cointegration, and Error Correction Mechanisms (FEDS 2016-018). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2016-018

BibTeX

@techreport{wtfs_feds_2016_018,
  author = {Matteo Barigozzi and Marco Lippi and and Matteo Luciani},
  title = {Dynamic Factor Models, Cointegration, and Error Correction Mechanisms},
  type = {Finance and Economics Discussion Series},
  number = {2016-018},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2016},
  url = {https://whenthefedspeaks.com/doc/feds_2016-018},
  abstract = {The paper studies Non-Stationary Dynamic Factor Models such that: (1) the factors F are I(1) and singular, i.e. F has dimension r and is driven by a q-dimensional white noise, the common shocks, with q < r, and (2) the idiosyncratic components are I(1). We show that F is driven by r-c permanent shocks, where c is the cointegration rank of F, and q-(r-c) < c transitory shocks, thus the same result as in the non-singular case for the permanent shocks but not for the transitory shocks. Our main result is obtained by combining the classic Granger Representation Theorem with recent results by Anderson and Deistler on singular stochastic vectors: if (1-L)F is singular and has rational spectral density then, for generic values of the parameters, F has an autoregressive representation with a finite-degree matrix polynomial fulfilling the restrictions of a Vector Error Correction Mechanism with c error terms. This result is the basis for consistent estimation of Non-Stat ionary Dynamic Factor Models. The relationship between cointegration of the factors and cointegration of the observable variables is also discussed.},
}