feds · October 23, 2024

Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm

Abstract

We study estimation of large Dynamic Factor models implemented through the Expectation Maximization (EM) algorithm, jointly with the Kalman smoother. We prove that as both the cross-sectional dimension, n, and the sample size, T , diverge to infinity: (i) the estimated loadings are √ T -consistent, asymptotically normal and equivalent to their Quasi Maximum Likelihood estimates; (ii) the estimated factors are √ n -consistent, asymptotically normal and equivalent to their Weighted Least Squares estimates. Moreover, the estimated loadings are asymptotically as efficient as those obtained by Principal Components analysis, while the estimated factors are more efficient if the idiosyncratic covariance is sparse enough.

Finance and Economics Discussion Series Federal Reserve Board, Washington, D.C. ISSN 1936-2854 (Print) ISSN 2767-3898 (Online) Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm Matteo Barigozzi and Matteo Luciani 2024-086 Please cite this paper as: Barigozzi,Matteo,andMatteoLuciani(2024). “QuasiMaximumLikelihoodEstimationand Inference of Large Approximate Dynamic Factor Models via the EM algorithm,” Finance and Economics Discussion Series 2024-086. Washington: Board of Governors of the Federal Reserve System, https://doi.org/10.17016/FEDS.2024.086. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm Matteo Barigozzi Matteo Luciani UniversitàdiBologna FederalReserveBoard matteo.barigozzi@unibo.it matteo.luciani@frb.gov This version: October 23, 2024 Firstdraft: November8,2017∗ Abstract WestudyestimationoflargeDynamicFactormodelsimplementedthroughtheExpectationMaximization (EM) algorithm, jointly with the Kalman smoother. We prove that as both the cross-sectional dimension, n, √ and the sample size, T, diverge to infinity: (i) the estimated loadings are T-consistent, asymptotically √ normal and equivalent to their Quasi Maximum Likelihood estimates; (ii) the estimated factors are nconsistent, asymptotically normal and equivalent to their Weighted Least Squares estimates. Moreover, the estimated loadings are asymptotically as efficient as those obtained by Principal Components analysis, while the estimated factors are more efficient if the idiosyncratic covariance is sparse enough. Keywords: Approximate Dynamic Factor Model; Expectation Maximization Algorithm; Kalman Smoother; Quasi Maximum Likelihood. 1 Introduction Factor analysis can be considered a pioneering technique in unsupervised statistical learning (Ghahramani and Hinton, 1996). It originally gained popularity in the early decades of the twentieth century as a dimensionreduction technique used in psychometrics (Spearman, 1904). Since then, it has become a classical method used for the statistical analysis of complex datasets in many human, natural, and social sciences (see, e.g., Lawley and Maxwell, 1971, Chapter 1, and references therein). In the last thirty years, factor analysis has seen significantsuccessinfinancialandmacroeconometricsbecauseitallowstoanalyzeandpredicteconomicactivity by summarizing large panels of economic time series in a simple and effective way (see, e.g., the survey by Stock and Watson, 2016 and references therein). An r-factor model is defined by x =µ +λ′F +ξ , i=1,...,n., t=1,...,T, (1) it i i t it where x is the observation for the ith cross-section at time t, µ is a constant, and F and λ are r-dimensional it i t i latent column vectors of factors and factor loadings, with r ≪ n. We call λ′F the common component and i t ∗A previous version of some results in this paper appeared in the paper “Common factors, trends, and cycles in large datasets”, availableatarXiv:1709.01445oratFEDS.2017.111. M.BarigozzigratefullyacknowledgesfinancialsupportfromMIUR(PRIN2020,Grant2020N9YFFE). Disclaimer: the views expressed in this paper are those of the authors and do not necessarily reflect the views and policies of the BoardofGovernorsortheFederalReserveSystem. Wethankforhelpfulcomments: MajidAl-Sadoon,LeopoldoCatania,GiuseppeCavaliere,ManfredDeistler,MassimoFranchi,Ivan Petrella,EstherRuiz,HaihanTang,LorenzoTrapani,andLucaTrapin.

ξ the idiosyncratic component. Throughout, we consider the standard case in which all {x } are zero-mean it it weakly stationary processes or are the result of a transformation to stationarity. Furthermore, in the case of time series, the factors are likely to be autocorrelated. For example, we can assume simple first order autoregressive dynamics: F =AF +v , t=1,...,T, (2) t t−1 t with v being an r-dimensional vector of innovations. Likewise, the idiosyncratic components might be autocort related. The measurement equation (1) and the state equation (2) form a state-space model, or, equivalently, a Dynamic Factor Model (DFM) (this is a restricted version of the more general model by Forni et al., 2000, wherefactorscanbeloadedalsowithlags). Thankstoitssimplicityandempiricalsuccess, theDFMisthemost common approach to factor analysis of high-dimensional time series. In large dimensional macroeconomic and financial datasets, the idiosyncratic components are likely to be also cross-correlated. Indeed, although macroeconomic or financial market dynamics are the main drivers of the comovement in these datasets, sectoral and local comovements are non-negligible sources of fluctuations. In the case of correlated idiosyncratic components the factor model is called approximate as opposed to an exact factor model having uncorrelated idiosyncratic components. In an exact factor model a small number of variables is enough to estimate the loadings by Quasi Maximum Likelihood (QML), but we cannot consistently estimate the factors (Lawley and Maxwell, 1971). In an approximatefactormodel, wecandisentanglethecommonandidiosyncraticcomponentsonlyintheextremecasewhen n → ∞ (Chamberlain and Rothschild, 1983)—in other words, we do not suffer the typical “curse of dimensionality” but rather benefit from the “blessing of dimensionality”. In particular, when n → ∞, we can consistently estimate the factors by some form of linear projection onto the estimated loadings. However, QML estimation of the loadings is now unfeasible since, in principle, it requires to also jointly estimate all the T idiosyncratic (auto)covariance matrices, each of size n×n, as well as the T (auto)covariance matrices of the factors. Thus, we need to explore alternative approaches. Therearethreemainsolutionstothisproblem. ThefirstisPrincipalComponent(PC)analysis,whichdelivers the optimal non-parametric estimator of a large approximate factor model, see, e.g., Stock and Watson (2002) and Bai (2003). The second is QML estimation based on a mis-specified exact model with no autocorrelations in the idiosyncratic components and the factors, and a diagonal or sparse idiosyncratic covariance matrix, see, e.g., Bai and Li (2016) and Bai and Liao (2016). These two solutions only consider equation (1), thus effectively estimating a static factor model, not a DFM. There is a third solution, which is the focus of this paper, that considersjointestimationofapproximateDFMdefinedin(1)-(2): theExpectationMaximization(EM)algorithm (Quah and Sargent, 1993, Doz et al., 2012). The EM algorithm is fully parametric and based on an iteration of two steps: (E-step) given the loadings and all the model’s parameters, the factors and their second moments are estimated via the Kalman smoother and are used to compute the expected log-likelihood conditional on the observed data; (M-step) the expected log-likelihoodismaximizedtoobtainanewestimateoftheloadingsandallthemodel’sparameters. Thesesteps are always implemented by considering a mis-specified likelihood with uncorrelated idiosyncratic components, thusmakingestimationfeasibleandproducingestimatorswithclosed-formexpressionAssuch,theEMalgorithm should be regarded as an approximation of the QML estimation method because it maximizes a mis-specified likelihood of an exact DFM using an iterative procedure. This paper focuses on the theoretical properties of the EM and Kalman smoother estimators. In a couple of breakthrough studies, Doz et al. (2011, 2012) provided the first fundamental theoretical treatment of these estimators, which quickly became popular in empirical macroeconomic research. Indeed, this approach allows the user to easily deal with data irregularities and missing values and impose restrictions that reflect any prior 2

knowledge about the data on the model.1 Wemakethreemaincontributions. First,weprovethat,asn,T →∞,theestimatoroftheloadingsobtained via the EM algorithm converges asymptotically to a unique maximum of the likelihood. It is well known that in the Gaussian quasi-likelihood case, the EM algorithm produces approximate QML estimators (Wu, 1983; Balakrishnan et al., 2017). In this paper, we refine this result by showing that, as n,T →∞, the approximation error not only depends on the number of iterations but also becomes negligible. Moreover, the EM estimator of the loadings is asymptotically equivalent to the unfeasible Ordinary Least Squares estimator we would obtain if we had observed the factors. This result holds for any reasonable, but not necessarily consistent, pre-estimator of the loadings used to initialize the EM algorithm. We derive similar results for all the estimated parameters, namely, the idiosyncratic variances, the VAR coefficients, and the covariance matrix of the VAR residuals in equation (2). Second,weprovethat,asn,T →∞,theestimatorofthefactorsobtainedviatheKalmansmoothercomputed using the parameters estimated via the EM algorithm is equivalent to the unfeasible Weighted Least Squares estimator we would obtain if we had observed the loadings and idiosyncratic variances. As a by-product of this result, we also show that the Kalman smoother and filter estimators are asymptotically equivalent. Third,weshowthattheEMestimatoroftheloadingsisasymptoticallyequivalenttothePCestimator;hence, √ it has the same consistency rate (min(n, T)), is asymptotically normal and equally efficient. Likewise, the √ Kalman smoother has the same consistency rate (min( n,T)) as the PC estimator, it is asymptotically normal, and if the idiosyncratic covariance matrix is sufficiently sparse, it is more efficient than the PC estimator. This paper is the first to fully characterize the asymptotic properties of the EM and Kalman smoother estimators. Other papers provide results that are close to ours, using more restrictive approaches or deriving only partial asymptotic results. Bai and Li (2016) considered QML estimation of the loadings for the static factor model (1) only and did not study the convergence of the employed maximization algorithm. Doz et al. (2011) considered the Kalman smoother obtained using the PC estimator of the loadings but did not derive its asymptoticdistributionandobtainedaslowerconsistencyrate. Last, Dozetal.(2012)provedconsistencyofthe Kalman smoother obtained from the EM algorithm but derived a slower rate and did not prove its asymptotic normality, nor did they prove consistency of the EM estimator of the loadings. Our results lay the theoretical foundations for the wide empirical success of the EM algorithm for estimating large dimensional DFMs (see the next section for a list of applications) and answer two long-standing critiques. First, by showing the equivalence of the consistency rates of the EM and PC estimators, we reverse the belief thatPCisasuperiorapproach. Second,byprovidingtheasymptoticdistributions,weanswerthecallbyTanner and Wong (1987) and Geweke (1993), who advocated a Bayesian approach based on the Gibbs sampler because the EM algorithm provides only point estimates. The paper is organized as follows. In Section 2, we briefly review the main applications of the EM algorithm and KS in factor analysis and alternative methods proposed to estimate DFM. In Section 3, we describe the estimation and give a guide for implementing it. All assumptions are in Section 4. The asymptotic results are in Section 5. In Section 6, we discuss efficiency of the EM estimator and the KS and compare them with the PC estimator. InSection7,weproposeestimatorsoftheasymptoticcovariancematrices. InSection8,wepresentan extensive MonteCarlo study, and in Section 9, we apply the EM algorithm to US macroeconomic data. Section 10 concludes. The proofs of all theoretical results are in the Appendix. 2 Related literature TheEMapproachisarguablythemostpopularforconductingQMLestimationofhigh-dimensionalDFMs. This approach dates back to the 1970s when it was introduced in a low-dimensional setting by, e.g., Sargent and Sims 1PC analysis in presence of missing data and parameter constraints has been studied by, e.g., Bai and Ng (2021), Fan et al. (2022),XiongandPelger(2023),amongothers. 3

(1977), Shumway and Stoffer (1982), Watson and Engle (1983), and Harvey and Peters (1990), while its use in a high-dimensional setting was first suggested by Quah and Sargent (1993) and then formalized by Doz et al. (2012). The EM approach for high-dimensional DFMs has been extensively employed by empirical macroeconomic researchers, particularly those in central banks. Its most successful applications include (see also the survey by Poncela et al., 2021): (i) counterfactual analysis (Harvey, 1996; Giannone et al., 2006, 2019); (ii) conditional forecasts (Bańbura et al., 2015); (iii) nowcasting (Giannone et al., 2008; Bańbura et al., 2011; Bańbura et al., 2013; Kim and Swanson, 2018; Cascaldi-Garcia et al., 2023); (iv) dealing with data irregularly spaced (Mariano andMurasawa,2003;Jungbackeretal.,2011;BańburaandModugno,2014;MarcellinoandSivec,2016); (v)imposing constraints on the loadings to account for smooth cross-sectional dependence in the case of ordered units (Koopman and van der Wel, 2013; Jungbacker et al., 2014) or a block-specific factor structure (Coroneo et al., 2016; Altavilla et al., 2017); (vi) building indicators of economic activity (Reis and Watson, 2010; Barigozzi and Luciani, 2023; Ng and Scanlan, 2024; Ahn and Luciani, 2024); (vi) impulse response analysis (Juvenal and Petrella,2015;Luciani,2015);(vii)modelinginternationalstockmarketdynamics(Lintonetal.,2022);(viii)extract trends from micro-panels (Barigozzi et al., 2024). In addition to the EM approach, the literature has proposed several multi-step approaches to estimate the DFM in (1)-(2). (a) Bai and Ng (2007) and Forni et al. (2009) employ PC analysis followed by VAR estimation. (b) Doz et al. (2011) employ PC analysis followed by VAR estimation and the Kalman smoother. (c) Ng et al. (2015)considerQMLestimationoftheloadingsbasedonamatrixdecompositiontechniquethatallowstheuseof theNewton-Raphsonmethod,thenfollowedbytheKalmanfilter. (d)BaiandLi(2016)considerQMLestimation of the loadings based on the EM algorithm by Rubin and Thayer (1982), followed by Weighted Least Squares to estimate the factors, VAR estimation, and, finally, the Kalman smoother. (e) Jungbacker and Koopman (2015) considerQMLestimationoftheloadingsforalow-dimensionalprojectionofthedatabasedonthepredictionerror likelihood obtained from the Kalman filter. (f) Lin and Michailidis (2020) propose an alternating minimization algorithmbasedonapenalizedlossaccountingforcross-autocorrelationintheidiosyncraticcomponents,followed by Generalized Least Squares to estimate the factors. (g) Kapetanios and Marcellino (2009) consider estimation using sub-space methods. (h) Mosley et al. (2024) propose a modified version of the EM algorithm used in this paper where the M-step allows for sparsity in the loadings. Approaches (a)-(d) consider estimation of the loadings, either via PC or QML, based only on (1), while estimation of (2) is in a second step. Approaches (e)-(h) consider joint estimation of (1)-(2). However, none of these fully develops an asymptotic theory for the proposed estimators. Moreover, (f) and (h) do not study the convergence of their numerical algorithms. Last, an alternative to the EM algorithm is represented by Bayesian estimation of large DFMs using Gibbs sampling, see, e.g., by Kose et al. (2003), Luciani and Ricci (2014), Bai and Wang (2015), D’Agostino et al. (2016), and Koopman and Mesters (2017), among many others. To conclude, classical references for QML estimation of an exact factor model with no autocorrelations are, e.g.,AndersonandRubin(1956)andAmemiyaetal.(1987),whileTippingandBishop(1999)suggesttosimplify themaximizationproblembyconsideringamis-specifiedlikelihoodwithhomoskedasticidiosyncraticcomponents. BaiandLi(2012)extendclassicalQMLestimationtothehigh-dimensionalcase. Alternatively,StockandWatson (1989) combine QML estimation with the Kalman filter by using the prediction error likelihood. A review of these methods is in Barigozzi (2024). Notation An m×m identity matrix is denoted as I . Vectors are always considered as one-column matrices. An m m dimensionalvectorofonesisdenotedasι . Anmdimensionalvectorofzerosisdenotedas0 ,anm×pmatrix m m of zeros is denoted as 0 . m×p 4

The generic (i,j) entry of a matrix A is denoted as [A] . Unless otherwise specified, we denote as ν(k)(A) ij thek-thlargesteigenvalueofagenericsquaredmatrixA. Thespectralnormforarealp×mmatrixAisdefined by ∥A∥=(ν(1)(AA′))1/2. The Frobenius norm is defined by ∥A∥ =(tr(AA′))1/2. For a generic p-dimensional F vector v =(v ···v )′, we consider the norms: ∥v∥=( (cid:80)p v2)1/2, and ∥v∥ =max |v |. 1 p j=1 j max j=1,...,p j The indicator function on the event A is denoted as I(A), i.e, I(A)=1 if A is true and 0 otherwise. All random variables (scalars, vectors and matrices) are assumed to belong to L ((Ω,F,P)), where (Ω,F,P) 2 is a common probability space. For a generic p-dimensional process {y } we adopt the following definitions. t (i) Expectations are computed using the true values of the underlying distribution unless otherwise indicated, so we write E[y ] = (cid:82) ydF (y,φ ), where F (y,φ ) is the cumulative distribution function of y computed t Rp yt n yt n t when using as parameters the true ones φ , and we write E [y ] when using as parameters φ to compute the n φ(cid:98)n t (cid:98)n cdf. (ii) For any t ∈ Z, given the pT-dimensional vector Y = (y′ ···y′)′ we denote conditioning on Y as an t 1 t t abbreviation for conditioning on the σ-algebra generated by {y ,k ≥0}. t−k Limits are always taken as min(n,T) → ∞ unless otherwise specified. We adopt the Landau O(·) and o(·) notation and the “in probability” O (·) and o (·) analogues. We denote convergence in probability and in p p p d distribution by → and →, respectively. For all quantities having a dimension growing with n and/or T, we highlight such dependence. The true scalars, vectors or matrices are denoted as, e.g., σ2, Λ , ϕ , θ, F ,. The corresponding scalars, vectors or i n n T matrices containing generic values of parameters are underlined, so: σ2, Λ , ϕ , θ, F . i n n T Foragenericprocess{y }andanyT ∈N,lettingF0 andF∞betheσ-algebrasgeneratedby{y , t≤0}and t −∞ T t {y , t≥T},respectively,wedefinethestrongmixingcoefficientsof{y }asα (T)=sup |P(A)P(B)− t t y A∈F0 ,B∈F∞ −∞ T P(AB)|. 3 Estimation via the EM algorithm In this section, we present the EM algorithm described by Shumway and Stoffer (1982) and implemented in the codes by Doz et al. (2012), which are available from theirs or ours webpages.2 This is the approach typically followed by applied researchers. Let us assume to observe an n-dimensional stochastic process x =(x ···x )′ over T periods. The DFM nt 1t nt (1)-(2) reads: x =µ +Λ F +ξ , (3) nt n n t nt (cid:88) pF F = A F +v , (4) t h t−h t h=1 where Λ =(λ ···λ )′ isthe n×r matrixof factorloadings, v isan r-dimensionalvectoroffactor innovations n 1 n t with covariance Γv = E[v v′], and p is a finite integer such that p ≥ 1. Throughout, we assume that the t t F F r-dimensional process of factors, {F }, and the n-dimensional process of idiosyncratic components {ξ } are t nt zero-mean covariance stationary processes, so that {x } is also covariance stationary and E[x ] = µ , with nt nt n µ =(µ ···µ )′. n 1 n Let us introduce the following notation. Define the nT-dimensional vectors X =(x′ ···x′ )′ and Ξ = nT n1 nT nT (ξ′ ···ξ′ )′, and the rT-dimensional vector F = (F′ ···F′ )′. Moreover, let Σξ be the diagonal matrix n1 nT T 1 T n containing the n diagonal terms of E[ξ ξ′ ], which we denote as σ2, i=1,...,n, and ΩF =E[F F′]. nt nt i T T T In principle, to achieve QML, we should estimate (i) nr loadings, (ii) ≃ n2T2 elements of the covariance matrix of Ξ , (iii) ≃ r2T2 elements of the covariance matrix of F , and (iv) the n constants in µ . Clearly nT T n the QML estimator of µ is the sample mean x¯ = T−1(cid:80)T x and we can then work with centered data: n n t=1 nt 2Seethereplicationcodesavailableat: https://sites.uw.edu/dgiannon/domenico-giannone-s-homepage/,or http://www.barigozzi.eu/codes.html,orhttps://sites.google.com/site/lucianimatteo/matlab-codes 5

x − x¯ (Bai and Li, 2012, p.440). However, all other parameters depend on each other, so they must be nt n estimated jointly—see, e.g., the first-order conditions in Bai and Li (2012, 2016) when the autocorrelation of the factors is not modeled. This is an unfeasible task since we have only nT data points; therefore, we need some regularization to reduce the number of parameters to estimate. To this end, first, we adopt the extreme form of regularization possible for the full-covariance matrix of Ξ nT by replacing it with the diagonal matrix I ⊗Σξ. Second, we assume that the parametric model (4) describes T n the dynamics of the factors—thus, we write ΩF ≡ ΩF(A,Γv), with A = (A ···A ), in order to highlight its T T 1 pF dependenceontheVARparameters. Thankstotheseassumptions,wereducethenumberofunknownparameters that need to be estimated to Q =nr+n+r2p +r(r+1)/2, the elements of the vector φ =(ϕ′ θ′)′, where n F n n ϕ =(vec(Λ )′ σ2···σ2)′ and θ =(vec(A)′ vech(Γv)′)′. n n 1 n Our starting point is then the following log-likelihood, computed in the generic values of the parameters, ϕ n and θ: ℓ (cid:0) X ;ϕ ,θ (cid:1) = − 1 logdet (cid:16) {I ⊗Λ }ΩF(A,Γv){I ⊗Λ′ }+{I ⊗Σξ} (cid:17) (5) nT n 2 T n T T n T n 1 (cid:20) (cid:16) (cid:17)−1 (cid:21) − (X −ι ⊗x¯ )′ {I ⊗Λ }ΩF(A,Γv){I ⊗Λ }′+{I ⊗Σξ} (X −ι ⊗x¯ ) , 2 nT T n T n T T n T n nT T n where we removed the constant terms to simplify the notation. Since the assumptions in Section 4 allow for correlations among idiosyncratic components, the expression in (5) is a mis-specified or quasi log-likelihood. Such mis-specification is appealing because it coincides with the classical factor analysis under the exact factor structure,andisstandardinDFMestimation(Dozetal.,2012;BaiandLi,2016). Thus,themaximizationof (5) is QML estimation rather than ML estimation. As long as the idiosyncratic components are weakly correlated, the mis-specification introduced by maximizing (5) has no effect on consistency but only on efficiency of the estimators (see Section 6). Hereafter, we denote the vector of QML estimators, which are the maximizers of (5), as φ (cid:98) ∗ n =(ϕ(cid:98)∗ n ′ θ(cid:98)∗′)′. Despite reducing the number of parameters to be estimated by introducing mis-specifications, direct maximization of (5) is still unfeasible because ΩF is a full matrix, and to estimate its entries, we need to estimate T thefactorsaswell. Therefore, westillfaceacurseofdimensionalityproblembecauseweneedtojointlyestimate the rT values of the factors and the Q parameters using just the nT observations of X . n nT There are three solutions to this problem. The first consists of rewriting the log-likelihood (5) using its prediction error formulation, where the prediction errors and their covariance are obtained via the Kalman filter (see,e.g., Harvey,1990,Chapter3.4orDurbinandKoopman,2012,Chapter7). Thisisthestandardpracticein low-dimensional DFMs (Stock and Watson, 1989). However, since there is no closed form solution for the QML estimatoroftheparametersobtainedinthisway,numericalmaximizationisrequiredandthisapproachbecomes quickly unfeasible even for moderate values of n. In high dimensions, Jungbacker and Koopman (2015) propose to follow this approach subject to a preliminary step in which the data are projected onto a lower dimensional space but do not derive the theoretical properties of this estimator. In particular, it is not clear what are the effects of this first step on the asymptotic properties of the final estimator. The second solution, proposed by Bai and Li (2016), further mis-specify the model by treating the factors as if they were serially uncorrelated. Thus, by imposing the standard identifying constraint E[F F′] = I and by t t r replacing in (5) the full-matrix ΩF(A,Γv) with just I ⊗I , the log-likelihood is considerably simplified, and T T r its maximization becomes feasible. However, because it does not exist a closed form solution, we still need a numerical approach to estimate the model, e.g., the iterative algorithm proposed by Rubin and Thayer (1982) and reintroduced by Bai and Li (2012). To our knowledge, the convergence of the Rubin and Thayer (1982) algorithm to the QML estimator has never been formally proved. A third solution consists of computing an approximation of the QML estimator using the EM algorithm, as formalized by Doz et al. (2012). We consider this approach in this paper, and we refer to Table 1 for its 6

Table 1 Expectation Maximization algorithm Input: ndimensionalvectorofdata{xnt}T t=1 numberofcommonfactorsr VARorderforthefactorspF maximumnumberofiterationskmax thresholdforconvergenceε initialestimators{λ(cid:98) ( i 0)}n i=1 ,{σ (cid:98)i 2(0)}n i=1 ,{A(cid:98) ( j 0)}p j= F 1 ,Γ(cid:98)v(0),seeAppendixA.1 Ouput: estimatedloadings{λ(cid:98)i}n i=1 estimatedfactors{F(cid:98)t}T t=1 fork=0tok=kmax do {E-STEP} runKalmanSmootherwith{xnt}T t=1 ,{λ(cid:98) i (k)}n i=1 ,{σ (cid:98)i 2(k)}n i=1 ,{A(cid:98) ( j k)}p j= F 1 ,Γ(cid:98)v(k) →{F(k)}T ,{P(k)}T ,{{C(k) }T }pF ,seeAppendixA.2 t|T t=1 t|T t=1 t,t−j|T t=j+1 j=1 computeexpectedlog-likelihoodandsufficientstatistics →Q (cid:16) {λ i }n i=1 ,{σ2 i }n i=1 ,{A j }p j= F 1 ,Γv;{λ(cid:98) i (k)}n i=1 ,{σ (cid:98)i 2(k)}n i=1 ,{A(cid:98) ( j k)}p j= F 1 ,Γ(cid:98)v(k) (cid:17) , → (cid:110)(cid:16)(cid:80)T t=1 F( t| k T )xit (cid:17)(cid:111)n i=1 , (cid:16)(cid:80)T t=1 F t ( | k T )F( t| k T )′+P( t| k T )(cid:17) , (cid:110)(cid:16)(cid:80)T t=j+1 F( t| k T )F t ( − k) j ′ |T +C( t, k t ) −j|T (cid:17)(cid:111)p j= F 1 , see(8),(9),(10),(11),(12) {M-STEP} maximizeexpectedlog-likelihood →{λ(cid:98) ( i k+1)}n i=1 ,{σ (cid:98)i 2(k+1)}n i=1 ,{A(cid:98) j (k+1)}p j= F 1 ,Γ(cid:98)v(k+1), see(13),(14),(15),(16) {CONVERGENCE} if k<kmax AND∆ℓ k <ε,see(A.14)then k∗←k runKalmanSmootherwith{xnt}T t=1 ,{λ(cid:98) i (k∗+1)}n i=1 ,{σ (cid:98)i 2(k∗+1)}n i=1 ,{A(cid:98) ( j k∗+1)}p j= F 1 ,Γ(cid:98)v(k∗+1) →{F(k∗+1)}T ,seeAppendixA.2 t|T t=1 λ(cid:98)i←λ(cid:98) ( i k∗+1),foralli=1,...,n F(cid:98)t←F( t| k T ∗+1),forallt=1,...,T return {λ(cid:98)i}n i=1 ,{F(cid:98)t}T t=1 break endif if k=kmax then print “algorithmdidnotconverge” break endif k+1←k endfor implementation. The EM algorithm is an iterative procedure which allows for QML estimation in presence of missing data (Dempster et al., 1977). In a nutshell, consider a given iteration k ≥ 0 and assume to have an estimate of the parameters φ(k). By taking expectations of the log-likelihood (5) with respect to the conditional distribution of (cid:98)n F given X and computed using φ(k), we get: T nT (cid:98)n ℓ(X ;φ )=E [ℓ(X ,F ;φ )|X ]−E [ℓ(F |X ;φ )|X ] nT n φ(cid:98) ( n k) nT T n nT φ(cid:98) ( n k) T nT n nT =Q(φ ,φ(k))−H(φ ,φ(k)), say. (6) n (cid:98)n n (cid:98)n A maximum of the log-likelihood is then a maximum of the right hand side of (6). Now, by definition of Kullback-Leibler divergence, for any k ≥0 it holds that H(φ(k+1);φ(k))≤H(φ(k);φ(k)), (7) (cid:98)n (cid:98)n (cid:98)n (cid:98)n 7

i.e.,H(φ ,φ(k))ismaximumatφ =φ(k). Itisthenenoughtolookforthemaximumonlyoftheexpectedfulln (cid:98)n n (cid:98)n information log-likelihood Q(φ ,φ(k)). This is accomplished in two steps: in the first step, for given estimated n (cid:98)n parameters φ(k), we compute Q(φ ,φ(k)) using an estimate of the factors with their associated MSE; in the (cid:98)n n (cid:98)n second step, for a given estimate of the factors, we maximize such log-likelihood to compute a new estimate of the parameters φ(k+1). As shown below, this approach solves the curse of dimensionality problem, and its (cid:98)n computational burden is minimal because all estimates have an explicit expression. Below we detail the main features of the two steps. 3.1 E-step and Kalman smoother We obtain an initial estimate of the loadings and the idiosyncratic variances using the PC estimator, and of the VARparametersbyfittingaVARonthePCestimatorofthefactors(seeAppendixA.1). Then,foranyiteration k ≥0, in the E-step, we compute the expected full-information log-likelihood, which we can decomposed as: Q(φ ,φ(k))=E [ℓ(X |F ;ϕ )|X ]+E [ℓ(F ;θ)|X ]. (8) n (cid:98)n φ(cid:98) ( n k) nT T n nT φ(cid:98) ( n k) T nT Consistently with the mis-specified log-likelihood (5), the first term on the right-hand side of (8) is: T T 1(cid:88) ℓ(X |F ;ϕ )=− logdet(Σξ)− (x −x¯ −Λ F )′(Σξ)−1(x −x¯ −Λ F ) nT T n 2 n 2 nt n n t n nt n n t t=1 = (cid:88) n (cid:40) − T log(σ2)− 1(cid:88) T (x it −x¯ i −λ′ i F t )2 (cid:41) , (9) 2 i 2 σ2 i=1 t=1 i which depends only on ϕ and is well defined as long as all the idiosyncratic components have finite positive n variances. As for the second term on the right-hand side of (8), we assume that F = 0 for t ≤ 0 and consider t r ℓ(F ;θ)= (cid:80)T ℓ(F |F ,...,F ;θ), which depends only on θ. Then we have T t=1 t t−1 t−pF T 1(cid:88) T (cid:32) (cid:88) pF (cid:33)′ (cid:32) (cid:88) pF (cid:33) ℓ(F ;θ)= − logdet(Γv)− F − A F (Γv)−1 F − A F , (10) T 2 2 t h t−h t h t−h t=1 h=1 h=1 which is well defined provided that the VAR innovations {v } have a finite full-rank covariance matrix. t Given (9) and (10), in order to compute the expected log-likelihood (8) we need to compute the sufficient statistics: E [F |X ], E [F F′|X ], E [F F′ |X ], h=1,...,p . (11) φ(cid:98) ( n k) t nT φ(cid:98) ( n k) t t nT φ(cid:98) ( n k) t t−h nT F Although exact expressions for the quantities in (11) might be hard to compute, we can approximate them by using the output of the Kalman smoother, which gives the linear projection F(k) = Proj [F |X ] and t|T φ(cid:98) ( n k) t nT the associated conditional covariance and lag-h autocovariance matrices P(k) and C(k) , respectively (see t|T t,t−h|T Appendix A.2, for the explicit expressions). This approximation does not affect consistency of our estimators (see Proposition 1). Hence, in the EM algorithm we let: E [F |X ]=F(k), E [F F′|X ]=F(k)F(k)′+P(k), (12) φ(cid:98) ( n k) t nT t|T φ(cid:98) ( n k) t t nT t|T t|T t|T E [F F′ |X ]=F(k)F(k)′ +C(k) , h=1,...,p . φ(cid:98) ( n k) t t−h nT t|T t−h|T t,t−h|T F Summing up, in the E-step, we compute rT values of the factors for a given value, φ(k), of the parameters. (cid:98)n 8

In general, this step is feasible because we have T(n−r) ≫ 0 degrees of freedom. Moreover, since the Kalman filter and smoother are linear procedures entailing the inversion of a positive definite r×r matrix, we just have to compute ≃T recursions. Remark 1. The Kalman smoother requires first running the forward iterations of the Kalman filter, which in turn requires either inverting the n×n full covariance matrix of the data or inverting the full n×n idiosyncratic covariance. This task might be challenging, if not impossible, in a high-dimensional setting. To overcome this problem, we implement the Kalman filter using an estimator of the diagonal matrix Σξ instead of the full n idiosyncraticcovariancematrix,ensuringalsopositivedefinitenessofthedatacovariancematrix,andthusmaking also its inversion feasible. This simplification is consistent with the mis-specified log-likelihood (9) because, to guarantee that (7) holds, we must take expectations with respect to the same distribution as the one used to compute the log-likelihood. 3.2 M-step In the M-step, we have to maximize (8) with respect to φ to obtain a new estimate of the parameters φ(k+1). n (cid:98)n This maximization has a closed form solution for all elements of φ(k+1). Specifically, at a given iteration k ≥0, (cid:98)n by using (12), we obtain the loadings estimators as (cid:40) T (cid:41)−1(cid:32) T (cid:33) λ(cid:98) ( i k+1) = (cid:88)(cid:16) F( t| k T )F( t| k T )′+P( t| k T ) (cid:17) (cid:88) F( t| k T )(x it −x¯ i ) , for i=1,...,n (13) t=1 t=1 and Λ(cid:98) ( n k+1) =(λ(cid:98) ( 1 k+1)···λ(cid:98) ( n k+1))′. Similarly, the estimator of the idiosyncratic variances is: T σ (cid:98)i 2(k+1) = T 1 (cid:88)(cid:110) x2 it +λ(cid:98) i (k+1)′ (cid:16) F( t| k T )F( t| k T )′+P( t| k T ) (cid:17) λ(cid:98) ( i k+1)−2x it F t ( | k T )′λ(cid:98) ( i k+1) (cid:111) for i=1,...,n. (14) t=1 Because we consider a mis-specified log-likelihood, we do not estimate the out-of-diagonal terms of the idiosyncratic covariance matrix Γξ, and we act as if those terms are equal to zero. n For simplicity, let p =1 and denote A≡A . The estimator of A is then given by: F 1 (cid:40) T (cid:41)(cid:40) T (cid:41)−1 A(cid:98) (k+1) = (cid:88)(cid:16) F(k)F(k)′ +C(k) (cid:17) (cid:88)(cid:16) F(k) F(k)′ +P(k) (cid:17) . (15) t|T t−1|T t,t−1|T t−1|T t−1|T t−1|T t=2 t=2 For p >1 we can simply write the VAR in companion form and derive the analogous of (15). F Finally, the estimator of the covariance matrix of the VAR innovations {v } is: t T Γ(cid:98) v(k+1) = 1 (cid:88)(cid:110) F(k)F(k)′+P(k) −A(cid:98) (k+1) (cid:16) F(k) F(k)′ +P(k) (cid:17) A(cid:98) (k+1)′ (16) T t|T t|T t|T t−1|T t−1|T t−1|T t=2 (cid:16) (cid:17) (cid:16) (cid:17)(cid:111) − F(k)F(k)′ +C(k) A(cid:98) (k+1)′−A(cid:98) (k+1) F(k) F(k)′+C(k)′ . t|T t−1|T t,t−1|T t−1|T t|T t,t−1|T Summing up, in the M-step, we need to compute Q values of the parameters for a given estimator of the n factors, F(k), t = 1,...,T. This step is feasible because we used the mis-specified log-likelihood (9) to estimate t|T ϕ . Therefore, wedecomposedourestimationproblemintonseparatemaximizations, eachrequiringestimating n r+1parametersusingT observations. Inthisway,estimatingthehigh-dimensionalparametervectorϕ becomes n straightforward, and estimating θ poses no problem because it is a low-dimensional problem with a closed-form solution. 9

3.3 Convergence of the EM algorithm and final estimators Given the Gaussian quasi-likelihoods (9) and (10), it is easy show that, for any fixed n, there exists an ω > 0 such that for any k ≥0 (cid:110) (cid:111) (cid:110) (cid:111) ℓ(X ;φ(k+1))−ℓ(X ;φ(k)) ≥ Q(φ(k+1),φ(k))−Q(φ(k),φ(k)) ≥ω∥φ(k+1)−φ(k)∥2, (17) nT (cid:98)n nT (cid:98)n (cid:98)n (cid:98)n (cid:98)n (cid:98)n (cid:98)n (cid:98)n wherethefirstinequalityfollowsfromDempsteretal.(1977,Lemma1)andthesecondisduetostrongconcavity of Q(·,φ ) for any φ (Wu, 1983, Condition 1). Moreover, the left-hand-side of (17) tends to zero as k → ∞ n n (Wu, 1983, Theorem 3). Therefore, the EM algorithm defines a contractive map. Consequently, the sequence {φ(k)} will converge to a maximum of the log-likelihood, as k → ∞ (see Lemma E.21 for a formal proof when (cid:98)n n→∞). In practice, we stop the EM algorithm when the log-likelihood shows no further appreciable increase. This is ensured according to a standard convergence rule (see Appendix A.3), depending on a pre-specified threshold ε. WedenotethelastiterationoftheEMalgorithmask∗. Uponconvergence,theEMestimatoroftheparameters is φ ≡φ(k∗+1). By running the Kalman smoother one last time using φ , we have the estimator of the factors (cid:98)n (cid:98)n (cid:98)n F(cid:98)t = F( t| k T ∗+1), t = 1,...,T. Finally, we estimate the common components as χ (cid:98)it = λ(cid:98)′ i F(cid:98)t , where λ(cid:98)i ≡ λ(cid:98) ( i k∗+1), i=1,...,n. 4 The Large Approximate Dynamic Factor Model 4.1 Main assumptions This section presents the assumptions under which we can consistently estimate the DFM given in (3)-(4). These assumptions are stated for an infinite-dimensional stochastic process {x , i ∈ N, t ∈ Z} of which {x = it nt (x ···x )′, t∈Z}isann-dimensionalsubprocessandtheT×nmatrix(x ···x )′ isanobservedrealization. 1t nt n1 nT Likewise,{ξ =(ξ ···ξ )′, t∈Z}isann-dimensionalsub-processoftheinfinite-dimensionalstochasticprocess nt 1t nt of idiosyncratic components {ξ , i ∈ N, t ∈ Z}, and {F = (F ···F )′, t ∈ Z} and {v = (v ···v )′, t ∈ Z} it t 1t rt t 1t rt are the r-dimensional processes of common factors and the corresponding VAR innovations, respectively. The n×r matrix of factor loadings Λ = (λ ···λ )′ forms a nested sequence as n increases. Finally, as already n 1 n noticed,theQMLestimatorofµ isimmediatelyobtainedasthesamplemeanx¯ . Thus,hereafter,forsimplicity, n n we consider the DFM for pre-centered data, or, equivalently, we set µ =0 . n n Assumption 1 (loadings and factors). (a) There exists an integer N such that for all n>N , ∥n−1Λ′ Λ −Σ ∥=0, where Σ is r×r and positive 0 0 n n Λ Λ definite; moreover, for all n ∈ N, m ≤ max ∥λ ∥ ≤ M for some finite positive reals M and m λ i=1,...,n i λ λ λ independent of n. (b) For all t∈Z, ΓF =E[F F′] is r×r and positive definite, and ∥ΓF∥≤M for some finite positive real M . t t F F (c) There exists an integer N such that for all n>N , r is a finite positive integer, independent of n, and such 1 1 that r ≤N . 1 (d) A(z)= (cid:80)pF A zk−1, such that p is a finite positive integer, A are r×r, and det(I −A(z))̸=0 for all k=1 k F k r z ∈C such that |z|≤M for some finite positive real M <1. A A (e) For all t ∈ Z, E[v ] = 0 , Γv = E[v v′] is r×r positive definite, and ∥Γv∥ ≤ M for some finite positive t r t t v real M . v (f) For all t∈Z and all k ∈Z with k ̸=0, v and v are independent. t t−k (g) For all j ,j ,j ,j =1,...,r, all t=1,...,T, and all T ∈N, 1 2 3 4 T T 1 (cid:88) 1 (cid:88) |E[v v v v ]|≤K , |E[v v ]||E[v v ]|≤K , T j1s1 j2t j3s2 j4t v T j1s1 j2t j3s2 j4t v s1,s2=1 s1,s2=1 10

for some finite positive real K independent of j ,j ,j ,j , and T. v 1 2 3 4 (h) For all t∈Z, v has pdf f (u) such that (cid:82) |f (u+v)−f (v)|du≤C ∥v∥ for any v ∈Rr and for some t vt Rr vt vt f finite positive real C independent of t; f (i) For all t≤0, v =0 . t r Parts(a)and(b)implythattheloadingsmatrixhasasymptoticallymaximumcolumnrankr (part(a)), and thefactorshaveafinitefull-rankcovariancematrix(part(b))—theseassumptionsaresimilartotherequirements inBaiandLi(2016,AssumptionsAandB)andBai(2003,AssumptionsAandB).Moreover,becauseofpart(a), for any given n ∈ N, all the factors have a finite contribution to each series (upper bound on max ∥λ ∥), i=1,...,n i and there is at least one factor that contributes to at least one series (lower bound on max ∥λ ∥). While i=1,...,n i the former condition is common, the latter is less standard but very mild as it simply guarantees that at least one loading is non-zero for any fixed n∈N. Part (c) implies the existence of a finite number of common factors r. In particular, r is identified only for n → ∞ (see also the next section). Hereafter, in parts (a) and (c), we can assume N = N = N, say, without 0 1 loss of generality. The remaining conditions of Assumption 1 characterize the VAR for the factors in (4). Part (d) implies that {F } is a weakly stationary process with a causal autoregressive representation. And in parts (e), (f), and (g), t we assume that {v } is a zero-mean r-dimensional independent process with finite positive definite covariance t matrix and finite summable 4th order cumulants. Parts (d) and (e) imply also that ΓF is finite, as required in part (b). Part (h) is an integral Lipschitz condition, which is satisfied by most continuous densities. This assumption guarantees that {F } is a strong mixing, or equivalently α-mixing, process with mixing coefficients α (T) ≤ t F exp{−c F TγF}, for all T ∈N and some finite positive reals c F and γ F independent of T (Pham and Tran, 1985, Theorem 3.1).3 Strongly mixing factors with exponentially decaying mixing coefficients are directly assumed by Fan et al. (2013, Assumption 2c). Part (i) implies F = 0 for t ≤ 0. This assumption is standard and it fixes the initial conditions for the t r solution of the VAR in (4). Assumption 2 (idiosyncratic component). (a) For all i ∈ N and all t ∈ Z, E[ξ ] = 0 and σ2 = E[ξ2] is such that C−1 ≤ σ2 ≤ C , for some finite positive it i it ξ i ξ real C independent of i. ξ (b) For all i,j ∈ N, all t ∈ Z, and all k ∈ Z, |E[ξ ξ ]| ≤ ρ|k|M , where ρ and M are finite positive reals it j,t−k ij ij independent of t such that 0≤ρ<1, M =σ2, (cid:80)n M ≤M , and (cid:80)n M ≤M for some finite ii i j=1,j̸=i ij ξ i=1,i̸=j ij ξ positive real M independent of n. ξ (c) For all i ∈ N, {ξ it } is a strong mixing process with mixing coefficients such that α ξi (T) ≤ exp(−c ξ Tγξ), for all T ∈N, and for some finite positive reals c and γ independent of T and i. ξ ξ (d) For all j =1,...,n and all n,T ∈N, T n T n 1 (cid:88) (cid:88) 1 (cid:88) (cid:88) |E[ξ ξ ξ ξ ]|≤K , |E[ξ ξ ]||E[ξ ξ ]|≤K , nT i1t jt i2s js ξ nT i1t jt i2s js ξ t,s=1i1,i2=1 t,s=1i1,i2=1 and for all t=1,...,T and all n,T ∈N, T n T n 1 (cid:88) (cid:88) 1 (cid:88) (cid:88) |E[ξ ξ ξ ξ ]|≤K , |E[ξ ξ ]||E[ξ ξ ]|≤K , nT is1 it js2 jt ξ nT is1 it js2 jt ξ s1,s2=1i,j=1 s1,s2=1i,j=1 3Independence in part (f) is not strictly necessary for having {Ft} strong mixing, as we could allow for GARCH effects by assuming geometric ergodicity of {vt} instead (Francq and Zakoïan, 2006). Indeed, geometric ergodicity implies β-mixing, which impliesstrongmixing. 11

for some finite positive real K independent of n, and T. ξ (e) For all t∈Z, as n→∞,   √ 1 n (cid:88) n λ σ i ξ 2 it → d N 0 r , n l → im ∞n 1 (cid:88) n λ i λ′ j σ E 2σ [ξ 2 it ξ jt ] . i=1 i i,j=1 i j (f) For all n∈N, Γξ =E[ξ ξ′ ] is such that ν(n)(Γξ)≥L for some finite positive real L independent of n. n nt nt n ξ ξ Part (a) imposes that the idiosyncratic components have zero mean, and the idiosyncratic variances are positiveandfinite. Westrengthenthisassumptioninpart(f)byrequiringthatthewholeidiosyncraticcovariance matrix is positive definite. Part(b)hasatwofoldpurposes. First,itlimitsthedegreeofserialcorrelationoftheidiosyncraticcomponents by assuming standard geometric decay of the autocovariances. Second, it limits the degree of cross-sectional correlation between idiosyncratic components, a standard assumption for approximate DFMs. This assumption implies the usual conditions required by Bai (2003, Assumptions C2, C3, and C4), Fan et al. (2013, Assumption 2b), and Bai and Li (2016, Assumptions C3, C4, and E1) (see Lemma C.1). In part (c), we assume that each idiosyncratic component is strongly mixing with exponentially decaying coefficients. This requirement is quite general since it allows the idiosyncratic to be non linear processes—Fan et al. (2013, Assumption 2c) make the same assumption. Inpart(d),werequirefinitesummablefourth-ordercumulants—astandardrequirementfoundintheliterature (see, e.g., Bai, 2003, Assumption F1 for the first condition, and Bai and Li, 2016, Assumption E2 for the second one—which, jointly with the mixing assumption in part (c), allows for consistent estimation of covariances by means of the sample covariances (Hannan, 1970, Theorem 6, Chapter IV, p. 210). In part (e), we assume a Central Limit Theorem, a standard assumption in the literature (e.g., Bai, 2003, AssumptionF3,andBaiandLi,2016,AssumptionF3). Thisisahigh-levelrequirementandinordertoproperly deriveit,weshouldintroducesomenotionofdependenceforrandomfields,whichtypicallyrequiressomeordering of the cross-sectional units. Now, in most applications, there is not a natural ordering of the variables. For this reason, we avoid to spell out primitive conditions that guarantee part (e) to hold. Wethenaddthenaturalrequirementofindependencebetweencommonshocksandidiosyncraticcomponents. Assumption 3 (Independence between common shocks and idiosyncratic components). The processes {ξ , i∈N, t∈Z} and {v , j =1,...,r, t∈Z} are mutually independent. it jt Assumption 3 implies that the factors and the common components are independent of the idiosyncratic components at all leads and lags and across all units. This assumption is compatible with the idea that the structural macroeconomic shocks driving the common component are independent of the idiosyncratic components representing measurement errors or local dynamics.4 Last, Assumption3jointlywithAssumptions1(f), 1(g), 1(h), 2(c), and2(d), impliesthattheprocess{F ξ } t it is strongly mixing with finite fourth moments (Bradley, 2005, Theorem 5.1.a). Then, from Ibragimov (1962, Theorem 1.7), we have the following Central Limit Theorem, for all i∈N, as T →∞, T (cid:32) T (cid:33) √ 1 (cid:88) F ξ → d N 0 , lim 1 (cid:88) E[F F′ξ ξ ] . T t it r T→∞T t s it is t=1 s,t=1 Thislastresultistypicallyassumedintheliterature—see, e.g., Bai,2003, AssumptionF4, andBaiandLi,2016, Assumption F1. 4Inprinciple,wecouldrelaxthisrequirementtoallowforweakdependenceasin,e.g.,Bai(2003,AssumptionD). 12

4.2 Additional assumptions We now state two more assumptions. These are needed only to derive some of our asymptotic results. Assumption 4 (Linearity). For all j = 1,...,r, t = 1,...,T, and n,T ∈ N, E[F |X ] is a linear function jt nT of X . nT Assumption 4 would hold if we directly assume joint Gaussianity of X and F . However, assuming nT T Gaussianity might be too stringent, while Assumption 4 is more general. For example, consider the case r = 1, let f and X be realizations of the factor and the data, and let f : RnT+1 → R be the joint pdf of F and t nT t X . Then, from Steyn (1960) and Kotz et al. (2004, Chapter 44.3), we see that if f belongs to the family nT of multivariate Pearson-type distributions, and lim f2f(f ,X ) = 0, then E[F |X = X ] is linear in ft→±∞ t t nT t nT nT X . Quah and Sargent (1993, p.292) and Doz et al. (2012, Assumption R) made similar assumptions. nT Assumption 5 (Tails). (a) For all t∈Z, all j =1,...,r, and all s>0, P(|v jt |≥s)≤exp (cid:8) −K v sδv (cid:9) for some finite positive reals K v and δ ≤2 independent of t and j. v (b) For all t∈Z, all n∈N, all s>0, P (cid:32)(cid:13) (cid:13) (cid:13)√ 1 (cid:88) n λ i ξ it (cid:13) (cid:13) (cid:13)≥s (cid:33) ≤rexp (cid:110) −κ s2 (cid:111) +rnexp (cid:26) −κ (cid:16) s √ n (cid:17)α (cid:27) , (cid:13) n σ2 (cid:13) 1 2 (cid:13) i=1 i (cid:13) for some finite positive reals κ , κ , and α≤2 independent of t and n. 1 2 In part (a), we assume an exponential-type tail inequality for the common shocks, which implies that for all s > 0, t ∈ Z, and j = 1,...,r, P(|F jt | > s) ≤ exp (cid:8) −K F sδv (cid:9) , for some finite positive reals K F and δ v ≤ 2 independent of t and j (see Lemma E.1 and Bakhshizadeh et al., 2023, Corollary 4). The same applies to part (b), which, by setting n = 1, implies that for all s > 0, t ∈ Z, and i ∈ N, P(|ξ it | ≥ s) ≤ exp (cid:8) −K ξ sδξ (cid:9) , for some finite positive real K and δ ≤ 2 independent of t and j. This is because ∥λ ∥ ≤ M and σ2 ≥ C−1, for ξ ξ i λ i ξ all i∈N, by Assumptions 1(a) and 2(a), respectively. Fan et al. (2013, Assumption 2c) also assume the factors and idiosyncratic components to belong to a distribution having exponentially decaying tails. Depending on the values of δ and δ , we are able to consider not only distributions with sub-Gaussian v ξ (δ ,δ =2)orsub-exponentialtails(δ ,δ =1),whichincludetheLaplaceandtheGeneralizedErrordistribution v ξ v ξ (Vershynin, 2018, Chapter 2), but also distributions with sub-Weibull tails (δ ,δ < 1), which can mimic a v ξ heavy tail behavior even if all moments exist (Mikosch and Nagaev, 1998; Kuchibhotla and Chakrabortty, 2022; Vladimirova et al., 2020). This is clarified in Remark 2 below. In general, part (b) is a Bernstein-type inequality implying that the weighted sums of the idiosyncratic components have Gaussian tails, as n → ∞. This is a high-level requirement and, as for Assumption 2(e) in order to properly derive it, we should introduce some notion of dependence for random fields. Here, instead, we just notice that in the simplest case of cross-sectionally independent idiosyncratic components, this condition would be a direct consequence of Bakhshizadeh et al. (2023, Corollary 4) (see also Vladimirova et al., 2020, Corollary 1 for a similar result, and Vershynin, 2018, Theorems 2.8.2 and 2.6.3, for the cases δ =1 and δ =2, ξ ξ respectively). Finally, as a consequence of parts (a) and (b), jointly with Assumptions 1(f), 1(h), 2(c), and 3, we can show that not only {F ξ } is a strongly mixing process but its components have also exponentially decaying tails. t it Therefore, for all T ∈N and all i∈N, the following Bernstein-type inequality holds: (cid:32)(cid:13) (cid:13) 1 (cid:88) T (cid:13) (cid:13) (cid:33) (cid:110) (cid:111) (cid:26) (cid:16) √ (cid:17)β (cid:27) P (cid:13)√ F ξ (cid:13)≥s ≤rexp −κ s2 +rT exp −κ s T , (cid:13) t it(cid:13) 3 4 (cid:13) T (cid:13) t=1 13

for some finite positive reals κ , κ , and β <1 independent of i and T (Merlevède et al., 2011, Theorem 1, Bosq, 3 4 2012, Theorem 1.4, p.31, and Lemma E.1). Remark 2. Following Kuchibhotla and Chakrabortty (2022) and Kuchibhotla et al. (2023), we say that a given random variable y is sub-Weibull with exponent δ if P(|y|≥s)≤exp (cid:8) −K sδ(cid:9) for any s>0 and some positive y realK . ThisisequivalenttorequiringthefollowingCramértypeconditiontohold: sup r−1/δ(E[|y|m])1/m ≤ y m≥1 M for some positive real M .5 This shows that although all moments exist, they can be quite large for small y y valuesofδ. Forexample,themthmomentofaWeibullisgivenbyE[ym]=Γ(1+m),whichisrapidlyincreasing δ as δ decreases (see, e.g., Lehman, 1963, for tabulated values of the first four moments as functions of δ). 4.3 Identification conditions To identify the DFM, we need to address four issues: first, we need to identify the number of factors; second, we need to identify the true parameters of the model; third, we need to ensure that the linear system is identified; and fourth, we need to guarantee the existence of the maxima of the log-likelihood. 4.3.1 Identification of the number of factors and of the common component Starting with the number of factors r, let the covariance matrix of {χ } be Γχ = Λ ΓFΛ′ and denote as µχ nt n n n jn the j-th largest eigenvalue of Γχ, then Assumptions 1(a) and 1(b) imply that, for any j =1,...,r, n µχ µχ C ≤lim inf jn ≤lim sup jn ≤C , (18) j n→∞ n n→∞ n j forsomefinitepositiverealsC andC . Furthermore,fromAssumption2(b)itfollowsthatthelargesteigenvalue j j of the idiosyncratic covariance matrix, Γξ, denoted as µξ , is such that n 1n µξ =∥Γξ∥≤M . (19) 1n n ξ From conditions (18) and (19) and Weyl’s inequality, it follows that the r largest eigenvalues of the covariance matrix of {x } diverge linearly in n, whereas all remaining eigenvalues stay bounded for all n∈N. Lemma C.1 nt proves this result, condition (18), and condition (19)—these conditions are directly imposed by Doz et al. (2012, Assumptions A1 and A2, respectively). Theasymptoticbehavioroftheeigenvaluesofthecovariancematrixallowsfortheidentificationofthecommon and idiosyncratic component when n → ∞ and it is the basis for all existing methods for determining r (see, e.g., Bai and Ng, 2002). 4.3.2 Identification of the loadings, the factors, and the VAR parameters All the structures equivalent to (3)-(4) can be obtained through an r×r invertible matrix R, as follows: Fo =R−1F , Λo =Λ R, Ao(L)=R−1A(L)R, Γvo =R−1Γv, Γξo =Γξ. t t n n n n Under such relationships, using only first- and second-moment information we cannot distinguish the model specified by Λo, Ao(L), Γvo, and Γξo, from the one given by Λ , A(L), Γv, and Γξ. Once the loadings and the n n factors are identified, then A(L) and Γv are also identified; Γξ is always identified. n To identify the model, we need enough a priori structure to preclude any but the trivial transformation R = I . This can be achieved by imposing additional r2 identifying constraints. Let Mχ be the r×r diagonal r n matrix with as elements the eigenvalues µχ , j = 1,...,r, of the covariance matrix of the common component, jn 5Anothernecessaryandsufficientconditionforarandomvariabley tobesub-Weibullisgivenbythefollowingconditiononits Orlicznorm: ∥y∥ =inf{η>0:E[exp((|y|/η)δ)]≤2}≤M′ forsomefinitepositiverealM′. ψδ y y 14

Γχ, sorted in descending order. Let Vχ be the n×r matrix with the corresponding normalized eigenvectors as n n columns. Then, we assume the following identifying constraints. Assumption 6 (Identification). (a) The eigenvalues of Σ ΓF are distinct. Λ (b) Σ is diagonal and ΓF =I . Λ r (c) For all j =1,...,r, [Λ ] ≥0. n 1j Part(a)isstandard. SincetheeigenvaluesofΣ ΓF areequaltothernon-zeroeigenvaluesoflim n−1Γχ, Λ n→∞ n givenbylim n−1Mχ, itimpliesthatin(18)wehaveC <C foranyj =2,...,r, and, thus, itavoidsthe n→∞ n j j−1 uninteresting difficulties related with asymptotically multiple eigenvalues, which would require more restrictions to identify the space spanned by the columns of Λ . n Part (b) is similar to what is usually imposed in PC estimation (see Remark 3 below). It implies that Γχ = Λ Λ′ = VχMχVχ′. Hence, the r non-zero eigenvalues of lim n−1Γχ, given by lim n−1Mχ, n n n n n n n→∞ n n→∞ n coincide with the diagonal entries of Σ , which are then distinct because of part (a). Since part (b) concerns Λ secondmomentsandsumsofsquares,itallowsustoidentifyΛ onlyuptoasign. Toachieveglobalidentification n we must fix also the column sign of Λ which is done through part (c) (Bai and Ng, 2013, Remark 1). n Summing up, in part (b), we are imposing r2 restrictions: r(r +1)/2 by requiring orthonormality of the factors, and r(r−1)/2 by requiring that Σ is diagonal. Consistently with the fact that in the typical empirical Λ applications the focus is on the common component only, these assumed identification conditions do not provide economic meaning to the factors; in this sense ours is an exploratory rather than confirmatory factor analysis. Moreover, and most importantly, the restrictions are imposed only in the limit n,T →∞. Thus, in our setting, the model is only asymptotically identified. This is enough to derive our asymptotic theory. Remark 3. Typically, in PC estimation it is required that: (i) for all n ∈ N, n−1Λ′ Λ is a diagonal matrix n n with finite distinct elements, and (ii) for all T ∈ N, T−1(cid:80)T F F′ = I (Forni et al., 2009, Doz et al., 2011, t=1 t t r 2012, and Bai and Ng, 2013). In classical QML estimation constraint (ii) is the same as in PC estimation, while (i) is replaced with the requirement that for all n∈N, n−1Λ′ (Σξ)−1Λ is a diagonal matrix with finite distinct n n n elements (Bai and Li, 2012, 2016, constraint IC3 therein). Differently from the present setting these constraints are assumed to hold for any given n,T ∈N. Remark 4. By letting S be a diagonal r×r matrix with entries I([Vχ] ≥ 0)−I([Vχ] < 0), j = 1,...,r, n 1j n 1j we can show that, as n→∞, λ′ coincides with vχ′S(Mχ)1/2, where vχ′ is the ith row of Vχ (see Lemma C.9). i i n i n It also follows that, by linear projection of χ onto Λ at each given t, the true factors F are identified, as nt n t n→∞, as the first r normalized PCs of the common component given by (Mχ)−1/2SVχ′χ , which are clearly n n nt orthonormal, indeed, E[(Mχ)−1/2SVχ′χ χ′ VχS(Mχ)−1/2]=I =ΓF as requested. n n nt nt n n r 4.3.3 Identification of the linear system From our Assumptions 1(a), 1(d), and 1(e), we can show that the state space formulation (3)-(4) is minimal and stable, for all n > N, where N = N in Assumption 1(a). It is convenient to consider here the factorization 0 Γv =HH′ forsomer×r matrixHhavingfullrank. Consideragainforsimplicitythecasep =1withA≡A . F 1 Then, stability, which requires |ν(1)(A)| < 1, is a direct consequence of stationarity in Assumption 1(d). While minimality holds because the couple (A,H) is controllable due to Assumption 1(e) and the couple (A,Λ ) is n observable for all n > N because Assumption 6(b) implies that rk(Λ ) = rk(Vχ) = r, for all n > N.6 This n n implies that, for all n>N, the linear system satisfies the mini-phase condition: (cid:32) (cid:33) I −Az −H r rk =2r, for |z|≥1. Λ 0 n n×r 6A linear system with pF = 1, is controllable if and only if rk[H (AH)···(A(r−1)H)] = r and it is observable if and only if rk[Λ′ n (ΛnA)′···(ΛnAr−1)′]=r (AndersonandMoore,1979,AppendixC,pp. 341-342). 15

Inotherwords, thelinearsystem(3)-(4)istheminimalstate-spacerepresentationofaDFMhavingasMcMillan degree the number of factors r (Anderson and Deistler, 2008, Section II). This result, together with Assumption 6, guarantees that the transfer function matrix W(z) = Λ (I − n r Az)−1H is identified for all z ∈C, with the exception of a zero-measure set (see also Lippi et al., 2021, Section 4.3, for a proof, and Heaton and Solo, 2004, for similar results). This implies generic identifiability of the linear system (3)-(4). 4.3.4 Existence of the EM and QML estimators Finally, we consider the issue of identification of the maxima of the log-likelihood. For any given i∈N, define O ={λ ∈Rr, λ ∈[−M ,M ]r} and O ={σ2 ∈R, σ2 ∈[C−1,C ]}, λi i i λ λ σ i 2 i i ξ ξ where M and C are defined in Assumptions 1(a) and 2(a), respectively. Likewise, define λ ξ O ={vec(A)∈Rr2 , vec(A)∈[−M ,M ]r2pF}, A A A O ={vech(Γv)∈Rr(r+1)/2, ν(j)(Γv)∈[M−1,M ],j =1,...,r}, Γv v v where M and M are defined in Assumptions 1(d) and 1(e), respectively. Moreover, for any given n ∈ N, let A v also E ={vec(Λ )′ ∈Rnr, C ≤ν(r)(n−1Λ′ Λ )≤ν(1)(n−1Λ′ Λ )≤C ,Λ′ Λ diagonal}, Λn n r n n n n 1 n n E ={vech(Γξ)′ ∈Rn(n+1)/2, L ≤ν(n)(Γξ)≤ν(1)(Γξ)≤M }, Γξ n n ξ n n ξ where C and C are defined in (18), while M and L are defined in Assumptions 2(b) and 2(f), respectively. r 1 ξ ξ Then, the search for the maximum of the expected log-likelihood in the EM algorithm and the one of the log-likelihood takes place on the set O = {On ∩ E } × {On ∩ E } × O × O , which has dimension n λi Λn σ i 2 Γξ n A Γv Q =n(r+1)+r2p +r(r+1)/2 growing with n. n F Because of Assumptions 1(a) and 2(a), the rows of the loadings matrix λ and the idiosyncratic variances σ2 i i belong to O ×O ⊂ Rr+1, which is a compact set for any given i ∈ N. Similarly, because of Assumptions λi σ i 2 1(d) and 1(e), the entries of A k , k = 1,...,p F , and H belong to O A ×O Γv ⊂ Rr2pF+r(r+1)/2, which is also a compact set. These properties are crucial as they ensure the existence of a maximum which is a solution of the EMalgorithm. Indeed,foranyiterationk ≥0theexpectedlog-likelihoodQ(φ ,φ(k))in(8),whichwemaximize n (cid:98)n in the M-step, is made of two terms: a term in (10) associated to the VAR for the factors (4), which is defined on the finite-dimensional set O ×O , and a term (9) associated to the factor equation (3) and thus defined A Γv over a set On ×On of dimension growing with n. Now, the former term poses no problem because we can use λi σ i 2 compactness of O A ×O Γv to show that the maxima θ(cid:98)(k+1) exist. Regarding the latter term, in order to prove the existence of ϕ(cid:98) ( n k+1) we can still use the same compactness argument by noticing that maximizing this term amountstoseparatelymaximizingofnterms,eachdependingonlyofλ andσ2 forgiveni∈N,andthusdefined i i on the finite-dimensional compact set O ×O . It is also easy to show that the elements of φ(k+1) are unique λi σ i 2 (cid:98)n because the M-step gives a closed form solution for each of them. This reasoning guarantees the existence and uniqueness of the EM estimators (see Lemma E.20). Let us turn to the QML estimator that maximizes the full log-likelihood (5). The existence of the QML estimator θ(cid:98)∗ poses no problem because being a finite-dimensional vector, we can use the compactness argument. Direct proof of the existence of the QML estimator ϕ(cid:98)∗ is instead more challenging, as in this case, we cannot n rely on compactness because the full log-likelihood is defined on a set of increasing dimension n. Nevertheless, we give an indirect proof of the existence of ϕ(cid:98)∗ by noticing that, under our identification n Assumptions6(a)-6(c),asn,T →∞,theelementsofϕ(cid:98)∗ areasymptoticallyequivalenttotheunfeasibleOrdinary n 16

Least Squares estimators we would obtain if the factors were observed (see Lemma E.11). Therefore, this argument ensures, at least asymptotically, both the existence and also the uniqueness of the QML estimators. We also refer to the next section for more details. Remark 5. Typically, this literature assumes that the QML estimators σ2∗ belong to a compact set (Bai and (cid:98)i Li, 2016, Assumption D). In our set-up, this assumption is implied (Gao et al., 2021, Theorem 3.1, and Mao et al., 2024, Theorem 1), and not needed for proving our results (Barigozzi, 2023). 5 Asymptotic properties ThissectionpresentstheasymptoticpropertiesoftheEMestimatoroftheparameters—i.e.,ofthefactorloadings, idiosyncraticvariances,VARcoefficients,andthecovariancematrixoftheVARinnovations—andoftheKalman smoother estimator of the factors. We assume that r, the number of common factors, is known; without loss of generality, we fix the VAR order in (4) to p = 1, and we let A ≡ A so that A(L) ≡ AL. We briefly discuss F 1 the estimation of r and p at the end of the section. F 5.1 Consistency under basic assumptions We start by proving the consistency of the EM algorithm and the Kalman smoother under the most general case where we neither impose linearity of the conditional mean (Assumption 4) nor exponentially decaying tails (Assumption 5). Proposition 1. Consider the EM estimators of the parameters Λ(cid:98)n = (λ(cid:98)1 ···λ(cid:98)n )′ with λ(cid:98)i ≡ λ(cid:98) ( i k∗+1), σ (cid:98)i 2 ≡ σ (cid:98)i 2(k∗+1) , i = 1,...,n, A(cid:98) ≡ A(cid:98)(k∗+1), and Γ(cid:98)v ≡ Γ(cid:98)v(k∗+1), and the Kalman smoother estimator of the factors, F(cid:98)t ≡F( t| k T ∗+1), t=1,...,T, k∗ ≥0. Then, under Assumptions 1, 2, 3, and 6: (a) for all ϵ>0, there exist a positive real η(ϵ), and integers n∗(ϵ) and T∗(ϵ), all independent of i, such that, for all n≥n∗(ϵ) and all T ≥T∗(ϵ), (cid:16) √ (cid:17) (a.1) P min(n, T)∥λ(cid:98)i −λ i ∥≥η(ϵ) <ϵ, for any given i=1,...,n, (cid:16) √ (cid:17) (a.2) P min(n, T)n−1/2∥Λ(cid:98)n −Λ n ∥≥η(ϵ) <ϵ, (cid:16) √ (cid:17) (a.3) P min(n, T)|σ2−σ2|≥η(ϵ) <ϵ, for any given i=1,...,n, (cid:98)i i (cid:16) √ (cid:17) (a.4) P min(n, T)∥A(cid:98) −A∥≥η(ϵ) <ϵ, (cid:16) √ (cid:17) (a.5) P min(n, T)∥Γ(cid:98) v−Γv∥≥η(ϵ) <ϵ; (b) for all ϵ>0, there exist a positive real η(ϵ), and integers n∗∗(ϵ) and T∗∗(ϵ), all independent of t, such that, for all n≥n∗∗(ϵ) and all T ≥T∗∗(ϵ), (cid:16) √ √ (cid:17) P min( n, T)∥F(cid:98)t −F t ∥≥η(ϵ) <ϵ, for any given t=1,...,T. TheconsistencyrateoftheestimatedparameteristhesameasthePCestimator(Bai,2003,Theorem2). This is because by initializing the algorithm with the consistent PC estimator, we can consider the EM estimator a “one-step” estimator(LehmannandCasella,2006, Theorem4.3). Asforthetheestimatedfactors, theyconverge √ at a slower rate than the PC estimator, due to the T term (Bai, 2003, Theorem 1). Remark 6. The estimation error for the factors and the one for the loadings both depend on the estimation error of the diagonal idiosyncratic covariance matrix ∥Σ(cid:98)ξ n −Σξ n ∥ = max i=1,...,n |σ (cid:98)i 2 −σ i 2|. The latter produces 17

a term in the estimation error of the factors which is O (T−1/2) (term D.2 in the proof of Lemma D.17) and p a term in the estimation error of the loadings which is also O (T−1/2) (term III in the proof of Proposition p d 1). These two terms have non-standard asymptotic distributions and are non-negligible. Thus, we cannot prove asymptotic normality of our estimators without additional assumptions. Remark 7. The results of Proposition 1 apply to the case in which the observed data, y =(y ···y )′ , are nt 1t nt suchthaty =y +µ t+λ′G +z . Inthiscase,ifwewriteF =∆G andξ =∆z ,thenProposition1holds it i0 i i t it t t it it forx =∆y . Thisstrategyalwaysworksfortheloadings(seeBarigozzietal.,2021, fordetails)sopart(a)still it it holds, but estimation of the factors must be modified if G is a cointegrated vector. First, we must model G as t t a VAR in levels. Second, we estimate G running the Kalman smoother in levels, and whenever z ∼ I(1) for t it some i, we add a latent state. If z ∼I(0) for all i, then part (b) stands; if z ∼I(1) for some i, we conjecture it it that part (b) would remain unchanged. We leave the derivation of the asymptotic properties of the estimated factors in this last case for further research. 5.2 Consistency and asymptotic normality IfwealsoassumethatAssumptions4and5hold,wecanrefinethepreviousresultbecausewecannowguarantee that the EM algorithm converges to the QML estimator. Proposition 2 (Loadings). Consider the EM estimators of the loadings Λ(cid:98)n =(λ(cid:98)1 ···λ(cid:98)n )′ with λ(cid:98)i ≡λ(cid:98) ( i k∗+1), k∗ ≥0. Then, under Assumptions 1, 2, 3, 4, 5, and 6: (a) for all ϵ>0, there exist a positive real η(ϵ), and integers n∗(ϵ) and T∗(ϵ), all independent of i, such that, for all n≥n∗(ϵ) and all T ≥T∗(ϵ), and some 0<δ ≤2, v (cid:16) √ (cid:17) (a.1) P min(n/log2/δvT, T)∥λ(cid:98)i −λ i ∥≥η(ϵ) <ϵ, for any given i=1,...,n, (cid:16) √ (cid:17) (a.2) P min(n/log2/δvT, T)n−1/2∥Λ(cid:98)n −Λ n ∥≥η(ϵ) <ϵ; √ (b) for any given i=1,...,n, as n,T →∞, if n−1 T log2/δvT →0, √ d T(λ(cid:98)i −λ i )→N(0 r ,V i ), where (cid:32) T T (cid:33) 1 (cid:88)(cid:88) V =(ΓF)−1 lim E[ξ ξ ]E[F F′] (ΓF)−1, i T→∞T it is t s t=1s=1 with ΓF =lim T−1(cid:80)T F F′ =I , because of Assumption 6(b); T→∞ t=1 t t r (c) for any given i = 1,...,n, if E[ξ ξ ] = 0 for all t,s = 1,...,T with t ̸= s, then, V = σ2(ΓF)−1, with it is i i ΓF =I , because of Assumption 6(b). r √ The rate of consistency of the estimated loadings, min(n/log2/δvT, T), given in Proposition 2, is new to the EM literature. This rate is the same, up to a logarithmic factor, as the one of the PC estimator (Bai, 2003, Theorem 2), which, in turn, is equivalent to the unfeasible Ordinary Least Squares (OLS) we would obtain if the factors were observed. The EM estimator is also asymptotically equivalent to the QML estimator considered by Bai and Li (2016, Theorem 1) when imposing no autocorrelation for the factors. For an explanation of the logarithmic term we refer to Remark 9 below. Efficiency is discussed in Section 6. It is important to stress that Proposition 2 requires not only T →∞, as in classical QML estimation theory, but also n→∞ otherwise no consistency can be proved. In particular, as n→∞ the factors can be treated as observed,therefore,thereisnomoreanissueofmissinginformationandtheQMLestimatoroftheloadingsmust coincidewiththeunfeasibleOLS.Thisisamanifestationoftheblessingofdimensionalitywhichisafundamental feature of approximate factor models. 18

The proof of Proposition 2 is based on the following decomposition of the estimation error into four terms: √ √ √ √ √ T(λ(cid:98)i −λ i )= T(λO i LS−λ i )+ T(λ(cid:98) ∗ i −λO i LS) + T(λ(cid:98) ∗ i ∗−λ(cid:98) ∗ i ) + T(λ(cid:98)i −λ(cid:98) ∗ i ∗) , (20) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) √ √ √ Op(1) Op(n−1 Tlog2/δvT) Op(n−1 Tlog2/δvT) op(n−1 Tlog2/δvT) which shows that the EM estimator λ(cid:98)i is asymptotically equivalent to the OLS estimator we would obtain had we observed the factors. To prove our result, we first show that the QML estimator of the loadings, λ(cid:98)∗ is asymptotically equivalent i√ to the PC estimator, which, in turn, is asymptotically equivalent to the unfeasible T-consistent OLS estimator λOLS (second term on the rhs of (20) proved in Lemma E.11, see also Barigozzi, 2023, Theorem 3 and Corollary i √ 1). In particular, both approximation errors are o (T−1/2), whenever n−1 T log2/δvT →0. This result extends p to the DFM the result by Bai and Li (2016, Theorem 1) obtained for QML estimation of a static factor model i.e., when we replace the full-matrix ΩF(A,H) with just I in the log-likelihood (5). Notice that, differently T rT from the proofs in Bai and Li (2016) and, as mentioned in Remark 5, this result does not depend on the QML estimator σ∗2. (cid:98)i Next, we show that the EM estimator converges to a global maximum of the likelihood (third term on the rhs of (20)). As we discussed in Section 5.3, we know that the sequence of estimators {λ(cid:98) (k), k ≥ 0} converges i to a local maximum of the likelihood, say λ(cid:98)∗∗, as k →∞ (see Lemma E.21). However, in general, the likelihood i mighthavemanymaximaduetotheidentificationindeterminacyoftheloadings. Nevertheless,oncewemakethe identifying Assumptions 6(b) and 6(c), there is only a unique maximum, λ(cid:98)∗, which is asymptotically equivalent √ i to the unique OLS estimator, whenever n−1 T log2/δvT → 0 (see Lemma E.22 and Ruud, 1991, Section 4, for a similar result in the case of one-to-one mapping from the factors to the data, corresponding to the case of no idiosyncratic component). This is also clear from the asymptotic expansions of λ(cid:98)∗ i −λ i obtained by Bai and Li (2012, 2016) for the static model (i.e., the factors have no dynamics) and using identification schemes different than the one we use here. Third, we show that asymptotically the error coming from running the EM a finite number of times vanishes (fourth term on the rhs of (20)). Indeed, due to the finite number of iterations, k∗, the EM algorithm delivers an estimator λ(cid:98)i ≡ λ(cid:98) ( i k∗+1), which is just an approximation of the local maximum λ(cid:98)∗ i ∗ that we would attained after an infinite number of iterations. We show that the error entailed by such approximation depends on the ratio of the Hessians of the complete and incomplete log-likelihoods, i.e., on how much information is missing because the factors are not observed (Meng and Rubin, 1994, McLachlan and Krishnan, 2007, Chapter 3.9, and √ Sundberg, 2019, Chapter 8). In this case the approximation error is o (T−1/2), provided n−1 T log2/δvT → 0 p (see Lemma E.23). This last result is a refinement of the results by Balakrishnan et al. (2017, Theorem 2) on the convergence of the EM algorithm. We refer to Section 5.3 below for more details. √ Last, λOLS is a T-consistent estimator of the factor loadings λ (first term on the rhs of (20)). i i Proposition 3 (Factors). Consider the Kalman smoother estimator of the factors F(cid:98)T = (F(cid:98)1 ···F(cid:98)T )′, with F(cid:98)t ≡F( t| k T ∗+1), t=1,...,T, k∗ ≥0. Then, under Assumptions 1, 2, 3, 4, 5, and 6: (a) for all ϵ>0, there exist a positive real η(ϵ), and integers n∗∗(ϵ) and T∗∗(ϵ), all independent of t, such that, for all n≥n∗∗(ϵ) and all T ≥T∗∗(ϵ), (cid:16) √ (cid:112) (cid:17) (a.1) P min( n,T/ logn)∥F(cid:98)t −F t ∥≥η(ϵ) <ϵ, for any given t=1,...,T, (cid:16) √ (cid:112) (cid:17) (a.2) P min( n,T/ logn)T−1/2∥F(cid:98)T −F T ∥≥η(ϵ) <ϵ; √ (b) as n,T →∞, if T−1 nlogn→0, √ d n(F(cid:98)t −F t )→N(0 r ,W t ), 19

for any given t=1,...,T, where   W t =(Σ ΛΣΛ )−1  n l → im ∞n 1 (cid:88) n (cid:88) n E[ξ it σ ξ 2 j σ t ] 2 λ i λ j (Σ ΛΣΛ )−1, i=1j=1 i j with Σ =lim n−1(cid:80)n λ (σ2)−1λ′; ΛΣΛ n→∞ i=1 i i i (c) for any given t=1,...,T, if E[ξ ξ ]=0 for all i,j =1,...,n with i̸=j, then, W =(Σ )−1. it jt t ΛΣΛ TherateofconsistencyoftheestimatedfactorsgiveninProposition3isfasterthantherateoriginallyderived √ √ √ √ by Doz et al. (2012, Proposition 1) for the same estimator—min( n,T/ logn) vs. min( n,T1/4/ logn). Moreover, this consistency rate is the same (up to a logarithmic factor) as that of the PC estimator (Bai, 2003, Theorem 1). However, while the PC estimator is equivalent to the unfeasible Ordinary Least Squares (OLS) we would obtain if the loadings were observed, the Kalman smoother estimator we are considering is equivalent to the unfeasible Weighted Least Squares (WLS) we would obtain if the loadings were observed and we knew the idiosyncraticvariances. Assuch, theKalmansmootherisalsoequivalenttothefeasibleWLSstudiedbyBaiand Li (2016, Theorem 2) and computed using the QML estimator of the loadings for a static factor model. For an explanation of the logarithmic term we refer to Remark 9 below. Efficiency is discussed in Section 6. The proof of Proposition 3 is based on the following decomposition of the estimation error into four terms √ √ √ √ √ n(F(cid:98)t −F t )= n(FW t LS−F t )+ n(F(cid:98)W t LS−FW t LS)+ n(F(cid:98)t|t −F(cid:98)W t LS)+ n(F(cid:98)t −F(cid:98)t|t ). (21) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123) √ (cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) Op(1) Op(T−1 nlogn) Op(n−1/2) Op(n−1/2) whichshowsthattheKalmansmootherestimatorF(cid:98)t isasymptoticallyequivalenttotheWLSestimatorwewould obtain had we observed the loadings and had we known the idiosyncratic variances. Toproveourresult,wefirstshowthatourestimatorofthefactors,whichisobtainedviatheKalmansmoother computed using the EM estimators of the parameters, is asymptotically equivalent to the Kalman filter, F(cid:98)t|t , whichinturnisasymptoticallyequivalenttotheWLSestimatorF(cid:98)WLS (fourthandthirdtermontherhsof (21), t respectively). Both approximation errors are O (n−1) (see Lemma F.4 and Ruiz and Poncela, 2022, Section 2.3, p for the one-factor case). Then, we take into account the estimation error of the parameters given in Proposition 2, which implies that theWLSestimatorF(cid:98)WLS convergestoitsunfeasiblecounterpartcomputedusingthetruevalueoftheparameters, t √ FWLS, with a rate which is o (n−1/2), provided T−1 nlogn → 0 (second term on the rhs of (21)). This result t p refines the result of Proposition 1 because, thanks to Assumption 5, we are now able to derive a tighter bound for ∥Σ(cid:98)ξ n −Σξ n ∥=m √ ax i=1,...,n |σ (cid:98)i 2−σ i 2|, as shown in Proposition 5 below. Last, FWLS is a n-consistent estimator of the realizations of the factors F (first term on the rhs of (21)). t t Proposition 4 (Common component). Consider the EM plus the Kalman smoother estimator of the common component χ (cid:98)it ≡ λ(cid:98) i (k∗+1)′F( t| k T ∗+1), i = 1,...,n, t = 1,...,T, with k∗ ≥ 0. Then, under Assumptions 1, 2, 3, 4, 5, and 6: (a) for all ϵ>0, there exist a positive real η(ϵ), and integers n◦(ϵ) and T◦(ϵ), all independent of i and t, such that, for all n≥n◦(ϵ) and T ≥T◦(ϵ), (cid:16) √ √ (cid:17) P min( n, T)|χ −χ |≥η(ϵ) <ϵ, (cid:98)it it for any given i=1,...,n, t=1,...,T; (b) as n,T →∞, (T−1Cλ +n−1CF)−1/2(χ −χ )→ d N(0,1), it it (cid:98)it it for any given i = 1,...,n and t = 1,...,T, where Cλ = F′V F and CF = λ′W λ , with V defined in it t i t it i t i i 20

Proposition 2(b), and W in Proposition 3(b). t Proposition 4 does not require a limit for T/n or n/T, so it holds without any constraint between the rates √ d of divergence of n and T. That said, Proposition 4 has two special cases: (a) if n/T →0, then n(χ −χ )→ √ (cid:98)it it N(0,CF); and, (b) if T/n → 0, then T(χ −χ ) → d N(0,Cλ). This is the same rate of consistency we obtain it (cid:98)it it it for the PC estimator of the common component (Bai, 2003, Theorem 3). All other estimated parameters are also consistently estimated. Proposition 5 (Idiosyncratic variances and VAR parameters). Consider the EM estimators of the parameters Σ(cid:98)ξ n = diag(σ (cid:98)1 2···σ (cid:98)n 2), with σ (cid:98)i 2 ≡ σ (cid:98)i 2(k∗+1), i = 1,...,n, A(cid:98) ≡ A(cid:98)(k∗+1), and Γ(cid:98)v ≡ Γ(cid:98)v(k∗+1), k∗ ≥ 0. Then, under Assumptions 1, 2, 3, 4, 5, and 6: (a) for all ϵ>0, there exist a positive real η(ϵ), and integers n∗(ϵ) and T∗(ϵ), all independent of i, such that, for all n≥n∗(ϵ) and all T ≥T∗(ϵ), and some 0<δ ≤2, v (cid:16) √ (cid:17) (a.1) P min(n/log2/δvT, T)|σ2−σ2|≥η(ϵ) <ϵ, for any given t=i,...,n, (cid:98)i i (a.2) P (cid:16) min(n/log2/δvT, (cid:112) T/logn)∥Σ(cid:98) ξ −Σξ∥≥η(ϵ) (cid:17) <ϵ, n n (cid:16) √ (cid:17) (a.3) P min(n/log2/δvT, T)∥A(cid:98) −A∥≥η(ϵ) <ϵ, (cid:16) √ (cid:17) (a.4) P min(n/log2/δvT, T)∥Γ(cid:98) v−Γv∥≥η(ϵ) <ϵ; √ (b) as n,T →∞, if n−1 T log2/δvT →0, √ (b.1) T(σ2−σ2)→ d N(0,σ4(κ +2)), for any given i=1,...,n, (cid:98)i i i i √ (b.2) T(vec(A(cid:98))−vec(A))→ d N(0 r2 ,Γv⊗(ΓF)−1), √ (b.3) T(vech(Γ(cid:98) v)−vech(Γv))→ d N(0 r(r+1)/2 ,2D†(Γv⊗Γv)(D†)′), with ΓF = I , because of Assumption 6(b), and where κ = E[ξ4]/σ4 −3 and D† = (D′D)−1D with D r i it i being r2×r(r+1)/2 such that Dvech(Γv)=vec(Γv). We conclude with a series of general remarks. Remark 8. From Propositions 2 and 3, we immediately have that, as n,T →∞, (cid:13) (cid:13) (cid:13) ′ (cid:13) (cid:13) (cid:13) Λ(cid:98)′ n Λ(cid:98)n −Σ (cid:13) (cid:13)=o (1), (cid:13) (cid:13) F(cid:98) T F(cid:98)T −ΓF (cid:13) (cid:13)=o (1), (cid:13) n Λ(cid:13) p (cid:13) T (cid:13) p (cid:13) (cid:13) (cid:13) (cid:13) by Assumption 1(a), which defines Σ , and because the sample covariance matrix T−1F′ F is a consistent Λ T T estimator of ΓF (see Lemma C.12). Moreover, by Assumption 6(b), Σ = lim n−1Mχ, which is a positive Λ n→∞ n definite diagonal matrix and ΓF = I . Therefore, the EM estimator of the loadings and the related Kalman r Smoother estimator of the factors satisfy the identifying constraints asymptotically, as n,T → ∞. We verify this result numerically in Section 8. This is in agreement with Assumption 6(b) which imposes the identifying constraints only in the limit n,T → ∞. This approach differs from other works on factor models (see, e.g., Bai and Li, 2012, 2016, or Bai and Ng, 2013) where the identifying constraints are assumed to hold for any given n and T (see Remark 3). In principle, we could obtain estimators satisfying the identifying constraints in finite samples by imposing them ex-post in an additional step, as Bai and Li (2012, Section 8) suggested in QML estimation of a static factor model. However, empirical works using the EM algorithm rarely apply this additional step. √ √ Remark 9. The constraints T−1 nlogn → 0 and n−1 T log2/δvT → 0 are common (up to the presence of logarithmic terms) in the factor model literature (see, e.g., Bai, 2003, Theorems 1 and 2, for PC estimation) and 21

are compatible. Indeed, they are simultaneously fulfilled if we assume that there exist some finite positive reals γ > 1/2 and γ < 2 such that Tγ < n < Tγ, as T → ∞. When n and T have the same order of magnitude, as in many macroeconomic and financial datasets, these assumptions on the relative rates of divergence of n and T are very mild. In particular, the logarithmic term in the consistency rates of Propositions 2 and 3 comes from Assumption 5 of sub-Weibull tails, which is slightly more general than the typical assumption of sub-Gaussianity made when studying estimators of high-dimensional models. This is, however, a modest price to pay. In particular, under Gaussianity, δ =2, while under distributions with sub-exponential tails, δ =1. These logarithmic terms come v v essentially from two errors. The first, which is O (n−1log2/δvT), is due to the necessity of finding a uniform p bound over t for the difference between the log-likelihood of the DFM in (5) and the static factor model loglikelihood considered in Bai and Li (2016). This terms involves the sum of squared factors (see Lemma E.9). √ The second one, which is O (T−1/2 logn), is due to the uniform bound for max |σ2−σ2| obtained when p i=1,...,n (cid:98)i i estimating the factors (see Lemma F.1 and also Fan et al., 2011, Lemmas A3 and B1). Remark 10. The asymptotic properties of the estimators are unaffected if the number of factors r is estimated. Indeed, consider a consistent estimator r i.e., such that P(r = r) → 1, as n,T → ∞, as for example the one (cid:98) (cid:98) in Bai and Ng (2002). Then, for any z ∈ R and any i = 1,...,n and t = 1,...,T, it is easy to prove that P(χ ≤z)=P({χ ≤z}|{r =r})+o (1) (see Bai, 2003, footnote 5). (cid:98)it (cid:98)it (cid:98) p Similarly, the asymptotic properties of the estimators are unaffected if the order p of the the VAR for the F factors is estimated. This approach is asymptotically equivalent to computing the BIC using the true factors F t becausewecouldestimatep F throughtheconsistentPCestimatorofthefactorsF(cid:101)t (seeLemmaH.1andBai,2003, Theorem 1). And, in turn, the BIC is known to select the true lag order consistently (Hannan, 1980, Theorem 1). Therefore, P(p =p )→1, as n,T →∞. Following the same reasoning of the previous remark, it is easy to (cid:98)F F show that, for any z ∈R and any i=1,...,n and t=1,...,T, P(χ ≤z)=P({χ ≤z}|{p =p })+o (1). (cid:98)it (cid:98)it (cid:98)F F p Remark 11. Given the asymptotic equivalence of Kalman filter, smoother, and WLS estimators of the factors, we might expect that the MSE obtained from either the Kalman filter or the smoother, i.e., P or P , t|t t|T respectively, asymptotically coincide (inflated by n) with the asymptotic covariance matrix W t of F(cid:98)t defined in Proposition 3. However, this is not the case since we estimate a mis-specified model. Indeed, as n → ∞, we can shown that both nP and nP are asymptotically equivalent to (Σ )−1 (see Lemma E.8), which, as t|t t|T ΛΣΛ shown in Proposition 3(c) is the asymptotic covariance matrix of F(cid:98)t only if the model is correctly specified. In other words, the mis-specified Kalman filter and smoother do not estimate the true MSE. Although this has no effect on our asymptotic results (see Remark 1), still we cannot use the estimated MSEs P(k∗+1) or P(k∗+1) for t|t t|T making inference. The true Kalman filter MSE, accounting for the model mis-specification, is derived by Harvey and Delle Monache (2009, Section 2.1) and, for any t=1,...,T, it is given by the recursions Π =Π +P Λ′ (Λ P Λ′ +Σξ)−1(Λ Π Λ′ +Γξ)(Λ P Λ′ +Σξ)−1Λ P t|t t|t−1 t|t−1 n n t|t−1 n n n t|t−1 n n n t|t−1 n n n t|t−1 −P Λ′ (Λ P Λ′ +Σξ)−1Λ Π −Π Λ′ (Λ P Λ′ +Σξ)−1Λ P , (22) t|t−1 n n t|t−1 n n n t|t−1 t|t−1 n n t|t−1 n n n t|t−1 Π =AΠ A′+Γv, t|t−1 t−1|t−1 where P is the one-step-ahead Kalman filter MSE (see (A.2)). As expected, nΠ is asymptotically equivat|t−1 t|t lent, asn→∞, totheasymptoticcovariancematrixW t ofF(cid:98)t (seeLemmaI.5). However, sincebothP t|t−1 and Π depend on A and Γv, for finite n this MSE accounts explicitly also for the autocorrelation on the factors. t|t−1 Remark 12. The results of Propositions 1, 2, 3, 4, and 5 would also hold if we had allowed for serial heteroskedasticity of the idiosyncratic components, i.e., for time-varying second moments so that E[ξ ξ′ ]=Γξ . nt ns n,ts For this case, which we do not consider explicitly, we refer to Bai and Li (2016), who show that the estimators of the idiosyncratic variances, σ2, i = 1,...,n, have to be considered as estimators of the average variances (cid:98)i 22

σ¯2 = T−1(cid:80)T σ2 ; hence, in all above results, we should replace σ2 with σ¯2. This approach amounts to i t=1 i,t i i maximizing a log-likelihood that has an additional degree of mis-specification because we use the time average idiosyncratic variances rather than the true time-varying variances. In this case, the asymptotic covariance matrix W of the estimated factors in Proposition 3 becomes effectively a time-varying matrix. t 5.3 Convergence of the EM algorithm under generic initialization In this section, we discuss how our asymptotic results change if we initialize the EM algorithm with a generic initial estimator of the parameters, say φˇ(0) = (vec(Λˇ(0))′ σˇ2(0)···σˇ2(0) vec(Aˇ(0))′ vech(Γv(0))′)′, having still n n 1 n elements belonging to O as defined in Section 4.3.4, and, thus, satisfying Assumptions 1, 2, and 6. n For fixed n and in a general setting, Balakrishnan et al. (2017) prove that the EM algorithm defines a contractionpathtowardsalocalmaximumofthelikelihood,φ∗∗. Togivemoredetailsandunderstandtherelation (cid:98)n with our results, we need to introduce some general definitions. First, consider any initial estimator belonging to a closed neighborhood of the local maximum of given Euclidean radius ϱ > 0, i.e., φˇ(0) ∈ B(ϱ;φ∗∗) ⊂ RQ. n (cid:98)n In our setting, we can think of B(ϱ;φ∗∗) ≡ O . Then, define the EM operator M : RQ → RQ such that (cid:98)n n T M (φ(k)) = φ(k+1). We have ∥E[M (φ )]−φ∗∗∥ ≤ β∥φ −φ∗∗∥ for some β ∈ (0,1) and all φ ∈ B(ϱ;φ∗∗) T (cid:98)n (cid:98)n T n (cid:98)n n (cid:98)n n (cid:98)n (Balakrishnan et al., 2017, Theorem 1). Second, for any given T and δ ∈ (0,1), let ε be the smallest scalar T,δ such that P(sup ∥M (φ )−E[M (φ )]∥≤ε )≥1−δ. φ n ∈B(ϱ;φ(cid:98) ∗ n ∗) T n T n T,δ It follows that, if T is large enough such that ε ≤ (1−β)ϱ, then, for any k ≥ 0, the EM operator defines T,δ a contraction towards the maximum of the likelihood with high-probability (Balakrishnan et al., 2017, Theorem 2): (cid:18) (cid:19) ε P ∥φ(k+1)−φ∗∗∥≤βk+1∥φˇ(0)−φ∗∗∥+ T,δ ≥1−δ. (23) (cid:98)n (cid:98)n n (cid:98)n 1−β Now, under our mixing Assumptions 1(d)-1(h) and 2(c), ε → 0, as T → ∞. This, jointly with (23), has T,δ two implications, as T →∞: (a) for all k ≥0, ∥φ(k+1)−φ∗∗∥≤βk+1∥φˇ(0)−φ∗∗∥+o (1); (cid:98)n (cid:98)n n (cid:98)n p (b) if k ≥log ε−1 ≡k , then ∥φ(k+1)−φ∗∗∥=o (1). 1/β T,δ T (cid:98)n (cid:98)n p These two results apply if n is fixed. But when n → ∞, we might conjecture that those results will hold for each component of φ separately. And this is, indeed, what we verify in this paper. Consider the loadings n λ i . As mentioned in the previous section, as n → ∞ the likelihood has a unique global maximum, λ(cid:98)∗ i , which is consistent because it is the QML estimator. Therefore, ∥λ(cid:98)∗ i ∗ −λ i ∥ ≤ ∥λ(cid:98)∗ i ∗ −λ(cid:98)∗ i ∥+∥λ(cid:98)∗ i −λ i ∥ = o p (1), as n,T →∞. If we initialize theEM algorithm with theconsistent PCestimator λ(cid:98) (0), we provethat, as n,T →∞, i result (a) above still applies, i.e., there exists a β ∈(0,1) such that, for all k ≥0, λ (cid:110) (cid:111) ∥λ(cid:98) ( i k+1)−λ(cid:98) ∗ i ∗∥≤ ∥λ(cid:98) ( i 0)−λ i ∥+∥λ(cid:98) ∗ i ∗−λ i ∥ β λ k+1+o p (1)=o p (1). (24) Actually, we are also able to show that the contraction factor is such that, as n,T →∞: (cid:13) (cid:13) (cid:13) (cid:32) T (cid:33)−1(cid:32) T (cid:33)(cid:13) β λ = (cid:13) (cid:13) (cid:13) I r − (cid:88)(cid:110) F∗ t|T F∗ t| ′ T +P∗ t|T (cid:111) (cid:88) F t F′ t (cid:13) (cid:13) (cid:13) +o p (1)=o p (1), (25) (cid:13) t=1 t=1 (cid:13) withF∗ andP∗ beingthefactorsandtheirassociatedMSEsobtainedfromtheKalmansmootherwhenusing t|T t|T the QML estimator of the parameters—we refer to Lemma E.23 for a proof of (24) and (25). It follows that the convergence rate in (24) is faster than the one of the initial PC estimator, and we can treat the EM estimator as if it were the QML estimator. Moreover, (25) means that (24) holds for any initial estimator (see Lemma E.26). The intuition for this result is that, provided n−1Λˇ(0)′Λˇ(0) has full-rank, any cross-sectional averaging 23

is likely to recover in high-dimensions a factors’ space not too different from the true one, an argument often used when considering estimation methods based on aggregations schemes alternative to PCs (Westerlund and Urbain, 2015; Fan and Liao, 2022). For all other parameters, a relation like (24) holds as well, with possibly different contraction factors. Hence, result (a) still applies (see Lemma E.24). However, we could not derive a result like (25) in this case because computinganalyticexpressionsofallthosecontractionfactorsisdifficult. Thus,whenweinitializethealgorithm with any generic initial estimator, we have to rely on result (b); that is, we can guarantee convergence of the EM algorithm to the QML estimator, as n,T → ∞, only if we let the algorithm run for a number of iterations k ≥k , where the larger k is, the faster is the convergence rate. T 6 Efficiency and comparison with PC analysis In this section, we compare the asymptotic covariances of the EM and Kalman smoother with those of the PC estimators, which are the optimal non-parametric estimators. From Propositions 2(b) and 3(b), we see that consistency is not affected by estimating a mis-specified model withuncorrelatedidiosyncraticcomponents,butthereisanefficiencylossduetothismis-specification,asshown by the sandwich forms of the asymptotic covariance matrices. In the case of uncorrelated idiosyncratic components, the log-likelihood we maximize is correctly specified. If the model is correctly specified, the EM estimator is the most efficient one because it is asymptotically equivalent to the Maximum Likelihood estimator. Thus, its asymptotic covariance attains the classical lower bound of the OLS estimator (see Propositions 2(c)). Likewise, the asymptotic covariance of the factors attains the WLS lower bound (see Propositions 3(c)). Although,ingeneral,theEMalgorithmandtheKalmansmootherdonotprovidethemostefficientestimators, they can provide advantages with respect to the PC estimators. Proposition 6 (Efficiency). Let VPC and WPC be the asymptotic covariance matrices of the loadings and i t factors estimated via PC analysis, then, under Assumptions 1, 2, 3, 4, 5, and 6: √ (a) if T log1/δvT/n→0, as n,T →∞, then VPC =V for any i=1,...,n; √ i i (b) if nlogn/T →0 and n−1(cid:80)n |E[ξ ξ ]|→0, as n,T →∞, then (WPC−W ) is a positive definite i,j=1,i̸=j it jt t t matrix. Part(a)followsimmediatelyonceweimposetheidentifyingAssumption6totheresultsaboutPCestimation of the loadings (see Barigozzi, 2023, Theorem 1, and Bai, 2003, Theorem 2). Therefore, although we estimate a mis-specified model, the EM estimator of the loadings is as efficient as the PC estimator. Turningtopart(b),byimposingtheidentifyingAssumption6,theasymptoticcovarianceofthePCestimator of the factors is WPC = (Σ )−1{lim n−1Λ′ ΓξΛ }(Σ )−1 (see Lemma H.1, and Bai, 2003, Theorem 1), t Λ n→∞ n n n Λ and if the true model were an exact factor model, i.e., E[ξ ξ ] = 0 if i ̸= j so that Γξ = Σξ, then we it jt n n would have WPC = (Σ )−1{lim n−1Λ′ ΣξΛ }(Σ )−1, which is the unfeasible OLS asymptotic covariance t Λ n→∞ n n n Λ matrix in presence of heteroskedastic errors. For the Kalman smoother, from Proposition 3 we have W = t (Σ )−1{lim n−1Λ′ (Σξ)−1Γξ(Σξ)−1Λ }(Σ )−1, and if the true model were an exact factor model, ΛΣΛ n→∞ n n n n n ΛΣΛ this would reduce to W =(Σ )−1, which is the unfeasible WLS asymptotic covariance matrix. In this case, t ΛΣΛ the Kalman smoother estimator is more efficient because it takes into account heteroskedasticity, whereas the PC estimator ignores the possibility of individual-specific variances. Here,wegoonestepfurtherandshowthatiftheidiosyncraticcovariancematrixΓξ issparseenough,wecan n still have efficiency gains compared to PC analysis. Indeed, if the total contribution of the off-diagonal elements Γξ is negligible compared to n, we can expect the asymptotic covariance of the estimated factors to be close to n the one we would have for an exact factor model, which we know is smaller than the PC asymptotic covariance. The sparsity condition we assume is the same as the one on Bai and Liao (2016, Assumption 3.1). Although this sparsity condition is hard to verify in practice and is seldom exactly satisfied by the data, in our MonteCarlo 24

resultsinSection8,weshowthattheKalmansmootherestimatortendstoperformbetterthanthePCestimator even under more general idiosyncratic covariance structures like banded matrices. Remark 13. If we assume uncorrelated and homoskedastic idiosyncratic components, i.e., such that Γξ =ψI n n for some ψ >0, then it is easy to see that V =VPC =ψ(ΓF)−1 =ψI , by Assumption 6(b), and W =WPC = i i r t t ψ(Σ )−1. In this case, the EM and Kalman smoother estimators are as efficient as the PC estimators. Λ Remark 14. There are other estimators that could be more efficient. First, Bai and Liao (2016), Wang et al. (2019), and Poignard and Terada (2020) proposed penalized QML-type estimators of the loadings and of the idiosyncratic covariance matrix. These estimators are used to build a GLS estimator of the factors. This approach addresses cross-sectional idiosyncratic correlations and heteroskedasticity, but not serial idiosyncratic correlations. Second, Breitung and Tenhofen (2011) propose a GLS estimator of the loadings, based on the classical Cochrane and Orcutt (1949) approach, and a WLS estimator for the factors based on that loadings estimator and estimates of the idiosyncratic variances. This approach addresses cross-sectional idiosyncratic heteroskedasticityandserialidiosyncraticcorrelations,butnotcross-sectionalidiosyncraticcorrelations. Finally, Lin and Michailidis (2020) address all idiosyncratic cross-autocorrelations by assuming a sparse VAR model for the idiosyncratic components. To this end, they apply the Cochrane and Orcutt (1949) approach and embed a penaltyintoanalternatingminimizationalgorithm. TheyrecoverthefactorsviaGLS.Theirtheoreticalanalysis applies only to the finite-dimensional case. None of these three approaches models the factors’ dynamics. 7 Inference To conduct inference, we need asymptotic covariances of the loadings and factors matrices and their estimators. Corollary 1. Under the same assumptions of Propositions 2 and 3, as n,T →∞: (a.1) foranyfiniten¯ andanygivensequenceofintegers{s(1 √ )...,s(n¯)}⊂{1,...,n},letvec(Λ(cid:98)n¯ )=(λ(cid:98)′ s(1) ···λ(cid:98)′ s(n¯) )′ and vec(Λ )=(λ′ ···λ′ )′, as n,T →∞, if n−1 T log2/δvT →0, n¯ s(1) s(n¯) √ (cid:8) (cid:9) d T vec(Λ(cid:98)n¯ )−vec(Λ n¯ ) →N(0 rn¯ ,V n¯ ), where (cid:32) T T (cid:33) 1 (cid:88)(cid:88) V =(I ⊗ΓF)−1 lim E[ξ ξ ]⊗E[F F′] (I ⊗ΓF)−1, n¯ n¯ T→∞T n¯t n¯s t s n¯ t=1s=1 with ξ =(ξ ···ξ )′, with ΓF =I , because of Assumption 6(b); n¯t s(1)t s(n¯)t r (a.2) if E[(ξ′ ···ξ′ )′(ξ′ ···ξ′ )]=I ⊗Σξ for all n,T ∈N, then V =Σξ ⊗(ΓF)−1, with ΓF =I , because n1 nT n1 nT T n n¯ n¯ r of Assumption 6(b); (b.1) foranyfiniteT¯andanygivensequenceofintegers{s(1)...,s √ (T¯)}⊂{1,...,T},letvec(F(cid:98) T¯)=(F(cid:98)′ s(1) ···F(cid:98)′ s(T¯) )′ and vec(F T¯)=(F′ s(1) ···F′ s(T¯) )′, then, as n,T →∞, if T−1 nlogn→0, √ (cid:8) (cid:9) d n vec(F(cid:98) T¯)−vec(F T¯) →N(0 rT¯,W T¯), where   W T¯ =(I T¯⊗Σ ΛΣΛ )−1  n l → im ∞n 1 (cid:88) n (cid:88) n E[ζ iT¯ζ σ jT 2 ¯ σ ] 2 ⊗λ i λ j (I T¯⊗Σ ΛΣΛ )−1, i=1j=1 i j with ζ iT¯ =(ξ is(1) ···ξ is(T¯) )′; (b.2) if E[(ζ 1 ′ T ···ζ n ′ T )′(ζ 1 ′ T ···ζ n ′ T )]=Σξ n ⊗I T for all n,T ∈N, then W T¯ =I T¯⊗(Σ ΛΣΛ )−1. For any given i,j = 1,...,n, the most general estimator of the asymptotic covariance between λ(cid:98)i and λ(cid:98)j is 25

given by:7 (cid:32) T (cid:33)−1(cid:32) T T (cid:33)(cid:32) T (cid:33)−1 V(cid:98) ( i H ,j AC) = T 1 (cid:88) F(cid:98)t F(cid:98) ′ t T 1 (cid:88)(cid:88) K(t,s) (cid:110) F(cid:98)t F(cid:98) ′ s ξ(cid:98)it ξ(cid:98)js (cid:111) T 1 (cid:88) F(cid:98)t F(cid:98) ′ t , (26) t=1 t=1s=1 t=1 where ξ(cid:98)it =x it −χ (cid:98)it is the estimated idiosyncratic component of the ith variable at time t. K(t,s)=1− M |t T − + s| 1 , if |t−s|≤M and zero otherwise, with M is such that M →∞ and M /T →0, as T →∞. Consistency of T T T T (26), as n,T →∞, follows from Bai, 2003, Theorem 6 combined with Propositions 3 and 4. For any given t = 1,...,T and k = 0,...,M , with M defined above, the most general estimator of the T T asymptotic covariance between F(cid:98)t and F(cid:98)t−k is given by: W(cid:99) ( t H ,t A − C k ) = (cid:32) n 1 (cid:88) n λ(cid:98) σ i λ(cid:98) 2 ′ i (cid:33)−1   n 1 (cid:88) n (cid:88) n K(i,j) (cid:40) σ λ(cid:98) 2 i λ σ (cid:98) 2 ′ jγ (cid:98)i ξ j,k (cid:41)  (cid:32) n 1 (cid:88) n λ(cid:98) σ i λ(cid:98) 2 ′ i (cid:33)−1 , (27) i=1 (cid:98)i i=1j=1 (cid:98)i(cid:98)j i=1 (cid:98)i where γ (cid:98)i ξ j,k = T−1(cid:80)T t=k+1 ξ(cid:98)it ξ(cid:98)j,t−k and K(i,j) = 1 if 1 ≤ i,j ≤ m n,T and zero otherwise, with m n,T → ∞ and m /min(n,T) → 0, as n,T → ∞. Consistency of (27), as n,T → ∞, follows from Bai and Ng (2006, n,T Theorem 4) combined with Propositions 2, 4, and 5. For larger values of k, E[ξ ξ ] is likely to be small due it j,t−k to our Assumption 2(b), and, thus, we can consider F(cid:98)t and F(cid:98)t−k as asymptotically uncorrelated. An alternative kernel function of the correlation based distance between i and j is considered by Kim (2022) who considers also averages of (27) when computed choosing different random permutations of the selected m cross-sectional n,T units. Alternatively, rather that smoothing via the use of a kernel, Fresoli et al. (2024) consider an estimator based on thresholding of the idiosyncratic sample covariance matrix (see also Fan et al., 2013). Finally, we can obtain an estimator of W that accounts also for the autocorrelation of the factors from the t true MSE of the Kalman filter given in (22) when using the estimated parameters. Once again this requires an estimator of the idiosyncratic covariance matrix like the estimators discussed above. The asymptotic covariance between F(cid:98)t and F(cid:98)t−k can be obtained by considering the Kalman filter with the augmented state vector (F′···F′ )′ andbythenconsideringthecorrespondingr×roff-diagonalblockoftheresultingr(k+1)×r(k+1) t t−k MSE. Having the estimated loadings λ(cid:98)i , the estimated factors F(cid:98)t , and any of the above estimators of V i,j and W , we can estimate the variance of the estimated common component by plugging these quantities into the t,t−k expression in Proposition 4(b). To conclude, the covariance estimators defined in this section can be used, together with the asymptotic distributions derived in Propositions 2 and 3, for inferential purposes. Examples are in Section 9. Remark 15. From parts (a.2) and (b.2) we see that, under a correctly specified model, each estimated row i of the loadings matrix is asymptotically uncorrelated with the other rows, and, likewise, each estimated realization of the factors at a given point in time t is asymptotically uncorrelated with the other time periods. Estimators of the asymptotic covariances are easily built in this case as: V(cid:98) ( i 0) = σ (cid:98)i 2(T−1(cid:80)T t=1 F(cid:98)t F(cid:98)′ t )−1, and W(cid:99) ( t 0) =(n−1(cid:80)n i=1 (σ (cid:98)i 2)−1λ(cid:98)i λ(cid:98)′ i )−1, while V(cid:98) ( i, 0 j ) =0 r×r if i̸=j, and W(cid:99) ( t, 0 t ) −k =0 r×r if k ̸=0. Consistency of these two estimators follows directly from Propositions 2, 3, 4, and 5. 7AsdiscussedinRemark8,thesamplecovarianceofF(cid:98)t isequaltother-dimensionalidentitymatrixonlyasymptotically;hence, itmustbeincludedintheestimatoroftheasymptoticcovariance. 26

8 Monte Carlo study Throughout, we consider a model with r =4 factors, and we simulate the data according to x =ℓ′f +ϕ ξ , f =Af +u , ξ =α ξ +e , (28) it i t i it t t−1 t it i it−1 it where ℓ has entries ℓ i ∼ id N(1,1), i = 1,...,n, j = 1,...,r; A = µAˇ{ν(1)(Aˇ)}−1, where [Aˇ] ∼ U[0.5,0.8], i ij jj [Aˇ] ∼ U[0,0.3], j,k = 1,...,r, and µ = 0.7; u i ∼ id (0,1), j = 1,...,r following either a Gaussian, an jk jt Asymmetric Laplace, or a Skew-t distribution, and with Cov(u ,u ) = 0, for i ̸= j; α = {0,δ }, i = 1,...,n, it jt i i where δ i ∼ id U(0,δ), and δ ∈ {0,0.5}; e i ∼ id (0,σ2), i = 1,...,n, following either a Gaussian distribution with i it ei σ2 ∼ U[0.5,1.5], or an Asymmetric Laplace distribution with σ2 = 1, or a Skew-t distribution with σ2 = 1; ei ei ei Cov(e ,e ) = τ|i−j|, i,j = 1,...,n, with τ ∈ {0,0.5} if |i−j| ≤ 10, and Cov(e ,e ) = 0 otherwise; and, last, it jt it jt (cid:113) ϕ i = θ i V (cid:100)ar(χ it )(V (cid:100)ar(ξ it ))−1, i = 1,...,n, where V (cid:100)ar(·) denotes the sample variance, and θ i i ∼ id U(θ¯−0.25,θ¯), and θ¯=0.5. The parameters µ, τ, δ, and θ¯are crucial and control: the persistence of the factors, the degrees of cross-sectional and serial idiosyncratic correlation in the idiosyncratic components, and the noise-to-signal ratio, respectively.8 Finally, in order to satisfy Assumptions 6(b) and 6(c), after we have generated the common component χ nt as in (28), we construct the factors as F =(Mχ)−1/2Vχ′χ and the loadings as Λ =Vχ(Mχ)1/2, where Mχ t n n nt n n n n is the r×r diagonal matrix containing the eigenvalues of T−1(cid:80)T χ χ′ , and Vχ is the n×r matrix having t=1 nt nt n as columns the corresponding normalized eigenvectors and with sign fixed such that it has non-negative entries in the first row. We consider n∈{100,200,300,500}, T ∈{100,200,300,500} and B =1000 replications. At each replication b, we run the EM algorithm as described in Algorithm 1, thus obtaining an estimate of the loadings λ(cid:98) (b), the i factorsF(cid:98) ( t b),andthecommoncomponentχ (cid:98) ( it b) =λ(cid:98) i (b)′F(cid:98) ( t b). WeinitializetheEMalgorithmeitherthroughthePC estimators as explained in Appendix A.1, or through a contaminated version of the PC estimator obtained using contaminated eigenvectors of the data. In particular, let V(cid:98)x be the n×r matrix of the r leading eigenvectors of n T−1(cid:80)T t=1 x nt x′ nt ,thenthecontaminatedeigenvectorisV(cid:98) n x,c =V(cid:98) n x+ZΥ1/2,where[Z] ij i ∼ id N(0,1),i=1,...,n, j =1,...,r, and [Υ] =ι|i−j|, i,j =1,...,r, with ι∈(0,1)—the bigger ι is, the stronger is the contamination. ij The upper plots of Figure 1 show the log-likelihood ℓ(X ;φ(k)) (blue line, left scale) as a function of the nT (cid:98)n iteration k of the EM algorithm, and the convergence criterion ∆ℓ defined in (A.14) (red line, right scale) when k we initialize the EM algorithm with the PC estimator. The results in Figure 1 show that the log-likelihood is an increasing function in the number of iterations k (as it should be). Moreover, the algorithm converges very fast: within two iterations ∆ℓ ≤ 10−3. This is to be expected since we initialize the algorithm with the consistent k PC estimator. The lower plots of Figure 1 show the percentage deviation of the log-likelihood of the algorithm initialized with the contaminated estimator (ℓc) from the one initialized with the PC estimator (ℓ ), that is ℓc/ℓ − k k k k 1. Contaminating the initialization implies starting from a much lower likelihood. However, with just a few iterations,thelog-likelihoodinitializedwiththecontaminatedestimatoristhesameastheoneproperlyinitialized. This result shows that initializing the EM with a non-consistent estimator is fine, as it just requires running the algorithm for a few more iterations. Therefore, hereafter, we focus on the case in which we inizialize the EM algorithm with the PC estimator. Moving to the performance of our estimators, we begin with Proposition 4 part (a), which gives consistency 8InthecaseoftheAsymmetricLaplacedistribution,alltheinnovationshavelocation0,scaleindexλ,andasymmetryindexκ, with κ ∼ U(.9,1.1) and λ = (cid:112) (1+κ4)κ−2, so that all the shocks have variance 1. In the case of the Skew-t distribution, all the shocks have location 0, dispersion 1, skewness index γ, and tail index ν, with νu ∼ U(4,12), γu ∼ U(−.1,.1), νe ∼ U(3,13), and γe∼U(−.15,.15). 27

Figure 1: Simulation results - Convergence of the EM algorithm Gaussian AsymmetricLaplace Skew-t 4 4 3 3 4 5 - - 2 1.5 460 - - 2 1.5 3 3 3 3 5 6 - - 2 1.5 4 4 3 3 2 3 - - 3 2.5 4 4 5 5 8 9 - - 3 2.5 334 - - 3 2.5 )k( ) ^' ; X (` n tn 4 4 4 4 2 2 3 3 8 9 0 1 ` lo ( g X 1 n 0 t " ;'^ ` (n k k)) - - - - 5 4 4 3 . . 5 5"` gol k 01 )k( ) ^' ; X (` n tn 4 4 4 5 5 5 5 6 7 ` lo ( g X 1 n 0 t " ;'^ ` (n k k)) - - - - - 5 5 4 4 3 . . . 5 5 5 "` gol k 01 )k( ) ^' ; X (` n tn 3 3 3 3 3 3 3 3 0 1 2 3 ` lo ( g X 1 n 0 t " ;'^ ` (n k k)) - - - - - 5 5 4 4 3 . . . 5 5 5 "` gol k 01 427 -5.5 454 -6 329 -6 426 -6 453 -6.5 328 -6.5 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 iteration(k) iteration(k) iteration(k) Gaussian AsymmetricLaplace Skew-t 0 0 0 -2 -2 -2 -4 -4 -4 -6 -6 -6 -8 -8 -8 -10 -12 -10 -10 -14 -12 4 4 = = 0 0 : : 3 6 -12 4 4 = = 0 0 : : 3 6 -16 4 4 = = 0 0 : : 3 6 -14 4=0:9 -14 4=0:9 -18 4=0:9 -20 0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12 iteration(k) iteration(k) iteration(k) Theupperplotsshowthelog-likelihoodℓ(XnT;φ(cid:98) ( n k))(blueline, leftscale), andtheconvergencecriterion∆ℓk definedin(A.14)(redline, rightscale)whenweinitializedtheEMalgorithmwiththePCestimator. Thelowerplotsshowthepercentagedeviationofthelog-likelihood of the algorithm initialized with the contaminated estimator (ℓc k ) from the one initialized with the PC estimator (ℓk), that is ℓc k /ℓk−1. Thebiggerιis,thestrongeristhecontamination. Thelog-likelihoodsinthisfigurewereobtainedfromasinglesimulationwhereT =100, n=100,µ=0.7,δ=0.5,τ =0.5,andθ=0.5. and rates for the common component’s estimator. The left plot in Figure 2 shows the root mean squared error: (cid:118) (cid:117) n T B RMSE= (cid:117) (cid:116) 1 (cid:88)(cid:88)(cid:88) (χ(b)−χ(b)) 2 . nTB (cid:98)it it i=1 t=1b=1 Instead, the right plot shows the relative RMSE of our estimator over the RMSE of the PC estimator—values smallerthanoneindicateabetterperformanceofourestimators. WeshowresultsforseveralDGPs,whichallow us to disentangle the effects of each single mis-specification on the performance of our estimator. TwomainresultsemergefromFigure2. First,asnandT grow,theRMSEofallDGPsconvergetowardzero, thus indicating that the mis-specification introduced by estimating a model with uncorrelated and possibly non- Gaussian idiosyncratic components, even when that is not the case, does not affect our estimator. In particular, between serial correlation (the orange line) and cross-sectional correlation (the yellow line), the mis-specification that hurts the most is cross-sectional correlation. This is good news for practitioners because the cross-sectional correlation between the idiosyncratic components can somehow be limited by avoiding to include in the dataset variables that are too similar with one another (see, e.g., the discussions in Boivin and Ng, 2006 and Luciani, 2014). Moreover, the model is consistently estimated when the shocks are asymmetric and have heavy tails even when they come from a distribution that does not meet Assumption 5 of exponentially decaying tails of the distribution.9 This is also good news for practitioners because this result tells us that we can use this model in settings likely to be non-Gaussian, as is the case in datasets of disaggregated inflation rates (see, e.g., Reis and Watson, 2010, and Ahn and Luciani, 2024), which are notoriously skewed and fat-tailed. Second,overall,ourestimatorbehavesverysimilarlytobutslightlybetterthanthePCestimator,despitethe latter being non-parametric and, thus, not affected by mis-specifications. This result confirms the conjectures based on extensive numerical studies made by Doz et al. (2012) and Bai and Li (2016), showing that, in a high-dimensional setting, the EM estimator is “as good as” the PC estimator. To evaluate the estimates of the factors and the loadings, at each replication b, we consider a multivariate 9The dark gray line for the Skew-t distribution cannot be seen in the left plot because it is underneath, thus it coincides with, theredline. 28

Figure 2: Simulation results - Common components RootMeanSquaredErrors Absolute RelativetoPC Figure 3: Simulation results - Factors and loadings TracestatisticsrelativetoPC Factors Loadings version of the R2 (see also Doz et al., 2012): (cid:18) (cid:19) tr (F( T b)′F(cid:98) ( T b) )(F(cid:98) ( T b)′ F(cid:98) ( T b) )−1(F(cid:98) T (b)′ F( T b)) tr (cid:16) (Λ n (b)′Λ(cid:98) ( n b))(Λ(cid:98)n (b)′Λ(cid:98)n (b))−1(Λ(cid:98) ( n b)′Λ n (b)) (cid:17) TR(b) = , TR(b) = , F tr (cid:16) F(b)′F(b) (cid:17) Λ tr (cid:16) Λ(b)′Λ(b) (cid:17) T T n n where F(cid:98) ( T b) and Λ(cid:98) ( n b) are the T ×r and n×r matrices of the estimated factors and loadings, respectively. These tracestatisticsaresmallerthanone, andtheytendtoonewhentheempiricalcanonicalcorrelationsbetweenthe true quantities and their estimates tend to one. Figure3reportsthevaluesofTR(b) andTR(b),relativetothesamemeasurescomputedforthePCestimator, F Λ averaged over all B replications (values larger than one indicate a better performance of our estimators). The results in Figure 3 mirror those of Figure 2: our estimator behaves very similarly to but slightly better than the PC estimator. To derive our asymptotic results, we assumed that the true factors and the loadings are asymptotically identified as in Assumption 6(b). As explained in Remark 8 the estimated factors and loading satisfy this assumption asymptotically. To verify that this is the case, in Figure 4, we show the two identification errors: r r I F (b) = 1 r (cid:88)(cid:16) ν(j)(Γ(cid:98) F(b))−1 (cid:17)2 , I Λ (b) = 1 r (cid:88)(cid:16) ν(j)(Σ(cid:98) Λ (b))−[Σ(cid:98) Λ (b)] jj (cid:17)2 , j=1 j=1 where Γ(cid:98)F(b) =T−1F(cid:98) ( T b)′ F(cid:98) ( T b) , and Σ(cid:98) ( Λ b) =n−1Λ(cid:98) ( n b)′Λ(cid:98) ( n b). According to our asymptotic results, both these quantitiesshouldtendtozeroasnandT increasebecause,givenoursimulationdesign,thetruefactorsareorthonormal and the true loadings are orthogonal in agreement with Assumption 6(b). Figure 4 shows I(b) and I(b) averaged F Λ over all B replications. The identifying constraints are more and more precisely satisfied as n and T grow. 29

Figure 4: Simulation results - Factors and loadings Identificationcriteria Factors-B−1(cid:80)B I(b) Loadings-log(B−1(cid:80)B I(b)) b=1 F b=1 Λ Next, we move to the asymptotic distribution of the common component. To this end, for each replication b and any i,t, we compute Z i ( t b) = (cid:18) T 1 F(cid:98) ( t b)′V(cid:98) ( i (HAC,b) F(cid:98) t (b)+ n 1 λ(cid:98) i (b)′W(cid:99) ( t (HAC,b) λ(cid:98) i (b) (cid:19)−1/2 (χ (cid:98) ( it b)−χ it ), (29) where we use the robust estimators of the asymptotic covariance matrices defined in (26) and (27), respectively. ForcomparisonwealsoconsiderZ(0,b) definedasin(29),butwhenusingestimatorsoftheasymptoticcovariance it matrices in the case of non-correlated idiosyncratic components (henceforth, non-robust covariance matrices). AccordingtoProposition4, Z(b) → d N(0,1)asn,T →∞. Toevaluatethegoodnessofourtheoreticalresults, it we compute the average coverage n T B C(1−α)= 1 (cid:88)(cid:88)(cid:88) I (cid:16) Z ≤Z(b) ≤Z (cid:17) , nTB α/2 it 1−α/2 i=1 t=1b=1 where Z is the α-quantile of the standard normal distribution. In Table 2, we report the coverage C(1−α), α for selected values of α ∈ (0,1), while for illustration purposes, in Figure 5, we show histograms of {Z(b) : i = it 1,...,n, t = 1,...,T, b = 1,...,B}, for some of the cases considered in Table 2. We stress that throughout this exercise the chosen bandwidths for all robust covariance estimators are not data driven, but rather fixed a priori.10 Results confirm the derived asymptotic distribution. When the idiosyncratic components are uncorrelated, the non-robust estimators of the covariance matrices offers almost perfect coverage, while the robust estimators give a slight over-coverage. In the relevant cases of serially and cross-correlated idiosyncratic components, the considered robust estimators work very well. For comparison we show also results for Z(b) when χ is estimated it it withthePCestimator, andtheasymptoticcovariancesareestimatedusingtheestimatorsinBaiandNg(2006). In this case, deviations from Gaussianity seem to lead to serious under-coverage. Last, we move to Proposition 6, which states that under a sparsity condition on the covariance matrix of the idiosyncraticcomponents, theKalmansmootherestimatorofthefactorsismoreefficientthanthePCestimator. Toverifythisresult,wecomparethetheoreticalasymptoticcovarianceofthefactors,W andWPC. Specifically, t t for each simulation, we look at the sign of the smallest eigenvalue of the matrix (WPC−W ), computed using t t the true simulated parameters, which should always be positive. Table 3 reports the percentage of times out of 5000 simulations in which (WPC −W ) is positive definite. t t TheDGPusedforthisexerciseistheonedescribedatthebeginningofthissection, butforthecaseindicatedas √ τ =0.5∗ inwhichwesetCov(e ,e )=0.5|i−j| ifi,j ≤⌊ n⌋,andCov(e ,e )=0,otherwise,inordertobetter it jt it jt proxy the assumed sparsity condition. The results in Table 3 confirm those in Proposition 6. When the sparsity 10ToestimateV(cid:98) ( i HAC) wesetMT =⌊T1/4⌋andtoestimateW(cid:99) ( t HAC) wesetm=⌊n4/5⌋. 30

Table 2: Simulation results - Common components - Coverage, C(1−α) (1−α)=0.90 (1−α)=0.95 n T EM PC EM PC 100 100 0.89 0.88 0.94 0.93 Gaussian,τ =0,δ=0 200 200 0.89 0.89 0.95 0.94 Non-robustcovariances 300 300 0.90 0.89 0.95 0.95 500 500 0.90 0.90 0.95 0.95 100 100 0.91 0.89 0.95 0.94 Gaussian,τ =0,δ=0 200 200 0.92 0.90 0.96 0.95 Robustcovariances 300 300 0.92 0.91 0.96 0.95 500 500 0.93 0.91 0.96 0.95 100 100 0.86 0.84 0.92 0.90 Gaussian,τ =0.5,δ=0.5 200 200 0.88 0.86 0.93 0.92 Robustcovariances 300 300 0.89 0.87 0.94 0.93 500 500 0.89 0.88 0.94 0.93 100 100 0.86 0.81 0.92 0.88 AsymmetricLaplace,τ =0.5,δ=0.5 200 200 0.88 0.83 0.93 0.89 Robustcovariances 300 300 0.89 0.83 0.94 0.90 500 500 0.89 0.84 0.94 0.90 100 100 0.86 0.81 0.92 0.88 Skew-t,τ =0.5,δ=0.5 200 200 0.88 0.83 0.93 0.89 Robustcovariances 300 300 0.89 0.83 0.94 0.90 500 500 0.89 0.84 0.94 0.90 Figure 5: Simulation results - Histograms of Z(b) it Seriallyandcross-correlatedidiosyncraticcomponents(τ =0.5,δ=0.5)-Robustcovariances n=100,T =100 n=200,T =200 n=300,T =300 Gaussian Gaussian Gaussian 0.45 0.45 0.45 0.40 0.40 0.40 0.35 0.35 0.35 0.30 0.30 0.30 0.25 0.25 0.25 0.20 0.20 0.20 0.15 0.15 0.15 0.10 0.10 0.10 0.05 0.05 0.05 0.00-5 -4 -3 -2 -1 0 1 2 3 4 5 0.00-5 -4 -3 -2 -1 0 1 2 3 4 5 0.00-5 -4 -3 -2 -1 0 1 2 3 4 5 n=200,T =200 n=200,T =200 n=300,T =300 n=300,T =300 AsymmetricLaplace Skew-t AsymmetricLaplace Skew-t 0.45 0.45 0.45 0.45 0.40 0.40 0.40 0.40 0.35 0.35 0.35 0.35 0.30 0.30 0.30 0.30 0.25 0.25 0.25 0.25 0.20 0.20 0.20 0.20 0.15 0.15 0.15 0.15 0.10 0.10 0.10 0.10 0.05 0.05 0.05 0.05 0.00-5 -4 -3 -2 -1 0 1 2 3 4 5 0.00-5 -4 -3 -2 -1 0 1 2 3 4 5 0.00-5 -4 -3 -2 -1 0 1 2 3 4 5 0.00-5 -4 -3 -2 -1 0 1 2 3 4 5 Table 3: Percentage of simulations in which (WPC−W ) is positive semidefinite t t n T τ =0 τ =0.5∗ τ =0.5 100 100 100.00 197.74 150.68 200 200 100.00 199.96 185.84 300 300 100.00 100.00 196.82 400 400 100.00 100.00 199.10 500 500 100.00 100.00 199.60 1000 1000 100.00 100.00 100.00 condition is verified (this is the case of columns τ =0 and τ =0.5∗), the Kalman smoother is more efficient than the PC estimator. That said, even when the sparsity condition is not verified (column τ = 0.5), the Kalman smoother tends to be more efficient than the PC estimator. We also note that the two largest eigenvalues (not shown) are always positive, meaning that, in our simulations, the PC estimator is never more efficient than the EM estimator. 9 Empirical application In this section, we consider the dataset used by Barigozzi and Luciani (2023), a typical panel of n = 103 US macroeconomic quarterly indicators observed from 1960:Q1 to 2018:Q4, thus T = 236. All variables are transformed to stationarity. In particular, we follow the common approach of taking first differences of price 31

Figure 6: US data - Convergence of the EM algorithm 1000 0.5 0 800 -0.5 600 -1 )k( ) ^' ; X (` n tn 2 4 0 0 0 0 ` lo ( g X 1 n 0 t " ;'^ ` ( n k k)) - - - 2 2 1 . . 5 5"` gol k 01 -3 0 -3.5 -200 -4 0 5 10 15 20 iteration(k) The plot shows the log-likelihood ℓ(XnT;φ(cid:98) ( n k)) (blue line, left scale), and the convergence criterion ∆ℓk defined in (A.14) (red line, right scale) when we run the EM algorithm on US data and we initialize it with the PCestimator. indexes and keeping interest rates in levels (see, e.g., Bernanke et al., 2005). The information criterion by Bai and Ng (2002) indicates r =6 common factors. We estimate that the order of the VAR for the factors is p =2. (cid:98) (cid:98)F As shown in Figure 6, the EM algorithm converges in 22 iterations when setting the threshold in (A.14) to ε=10−4 and the log-likelihood increases monotonically. As we said in the Introduction, there is extensive literature showing the effectiveness of the EM algorithm in estimatinglargeDFMs. Therefore, thepurposeofthissectionisnottoshowthatthismethodworksorthatitis superior to the PC estimator. Rather, we concentrate on the innovations brought about in this paper that have an impact on empirical applications, namely: the confidence bands for the common components and the factors, and a test of hypothesis on the factor loadings. HAC Throughout, we compute the asymptotic covariances by using V(cid:98) for the loadings, as given in (26), with i bandwidth M T = ⌈T1/4⌉, and W(cid:99) K t F = nΠ(cid:98)t|t for the factors, computed as in (22) by using the estimated parameters and applying local thresholding to the sample idiosyncratic covariance matrix (Fan et al., 2013; Fresoli et al., 2024). The first row of Figure 7 shows the common components of a few variables of interest estimated with the EM algorithm (the red line) with their 95% confidence bands, together with the observed data (the black line). Specifically, for any given i = 1,...,n and all t = 1,...,T, a (1−α)% confidence interval for the estimated common component is given by  (cid:115) (cid:115)  I χ(cid:98)it (α)=χ (cid:98)it −z (1−α/(2T)) F(cid:98)′ t V(cid:98) T H i AC F(cid:98)t + λ(cid:98)′ i W(cid:99) n K t F λ(cid:98)i , χ (cid:98)it +z (1−α/(2T)) F(cid:98)′ t V(cid:98) T H i AC F(cid:98)t + λ(cid:98)′ i W(cid:99) n K t F λ(cid:98)i . Moreover, in order to have confidence bands valid for all T observations we apply a Bonferroni correction to the critical values. For comparison, in the second row of Figure 7, we report the estimated common components obtained with the PC estimator (the blue line) with their 95% confidence bands computed using the HAC estimators in Bai and Ng (2006). The common components of core CPI inflation and the Fed funds rate estimated with the EM track the observedseriesbetterthanthoseestimatedwiththePCestimator. Thisresultispossiblyduetolocaldepartures from stationarity in those series, which create problems for PC analysis but not for the EM, as the Kalman smoother is able to track changes in the dynamics due to its recursive character. In particular, the confidence band of the common component of the Fed funds rate for the PC estimator is much wider than that of the EM estimator because the variance of the idiosyncratic component obtained with the PC estimator is nearly eight times larger than that of the EM estimator and it is also far more persistent. In addition to computing confidence bands for the in-sample estimate of the common component, we can also do so for the unconditional and conditional forecasts obtained from the model. In other words, our results 32

Figure 7: US data - Estimated common components GDPgrowthrate CoreCPIinflation Fedfundsrate ME 3.5 3.5 22.5 3.0 20.0 2.5 3.0 17.5 2.0 1.5 2.5 15.0 1.0 2.0 12.5 0.5 10.0 0.0 1.5 7.5 -0.5 -1.0 1.0 5.0 -1.5 2.5 -2.0 0.5 0.0 -2.5 0.0 -2.5 -3.0 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 CP 3.5 3.5 22.5 3.0 20.0 2.5 3.0 17.5 2.0 1.5 2.5 15.0 1.0 2.0 12.5 0.5 10.0 0.0 1.5 7.5 -0.5 -1.0 1.0 5.0 -1.5 2.5 -2.0 0.5 0.0 -2.5 0.0 -2.5 -3.0 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 Theshadedareaisthe95%confidenceband. open the possibility of computing the uncertainty around GDP nowcast, which is nothing else than a shortterm conditional forecast, and for scenario analysis performed with DFMs. Here we give a couple of simplified examples. The left chart of Figure 8 shows the 1-step ahead forecast of 2018:Q1. That is, we estimate the model up to 2017:Q4, and then we produce four different forecasts of GDP growth conditioning on the observations of an increasing number of variables for 2018:Q1. The conditional forecasts are obtained using the Kalman filter estimator of the factors, F(k∗+1), while the unconditional forecast is obtained using their one-step-ahead t|t prediction, F(k∗+1) = A(cid:98)F(k∗+1) . This exercise mimics a simplified nowcasting setting—we are omitting the t|t−1 t−1|t−1 aspect of mixed frequency—as the variables we are conditioning on are published earlier than GDP, and the sequence of conditioning mimics the calendar of data releases. As shown in the left chart of Figure 8, the model adjusts the forecast in the right direction as more hard data becomes available. In the second exercise, we produce forecasts for GDP growth for each quarter of 2018 (blue line). Then, we adjustthembasedondifferentscenariosforpayrollemployment. Wehavetwoscenarios: onewhereemployment grows at the same pace as the previous year (red line)—this is a slightly lower average pace than the model expected; another where employment grows at half the previous year’s pace, resulting in a more pessimistic forecast (green line). As shown in the right chart of Figure 8, one and two quarters ahead, the forecasts from these scenarios differ significantly. Finally, we can test for linear restrictions on the loadings. Consider testing for s linear restrictions H : 0 R′vec(Λ )=q, against the alternative H : R′vec(Λ )̸=q, where R is nr×s and q is s×1. Then, the usual n 1 n Wald-type test statistic is computed as (cid:16) (cid:17)′(cid:16) (cid:17)−1(cid:16) (cid:17) W Λ(cid:98)n =T R′vec(Λ(cid:98)n )−q R′V(cid:98) H n AC R R′vec(Λ(cid:98)n )−q , (30) where R selects only sr rows of V(cid:98) H n AC . Under H 0 , from Proposition 2 it follows that W Λ(cid:98)n → d χ2 (r) , as n,T →∞. This test is the analogous of the test derived in the case of PC estimation by Bai and Ng (2013). Testing for equal loadings is equivalent to testing for equal common components for all t=,1...,T. Table4showstheresultofthetestoffivedifferentnullhypotheses. Thefirsthypothesiswetest(column(A)) iswhetherGDPandGDI,bothmeasuresofUSaggregateoutput,haveequalloadingsand,asaconsequence,have the same common component. Our test does not reject the hypothesis of equal loadings. This result supports the idea recently explored in the literature that combining GDP and GDI can better estimate aggregate output 33

Figure 8: US data - GDP growth forecast Forecastof2018:Q1 Forecastthrough2018:Q4 4 3.5 95% CI Common 3.5 Data 3 2.5 3 2 2.5 1.5 2 1 1.5 0.5 Unc. Cond. 1 Cond. 2 1 0 Unc. Cond. 1 Cond. 2 Cond. 3 Cond. 4 Actual 2018Q1 2018Q2 2018Q3 2018Q4 Intheleftchart“Unc.” istheunconditionalforecastgivendataupto2017:Q4;“Cond. 1”istheforecastconditioningonthevaluein2018:Q1 ofallthehigh-frequencyvariablesinourdataset(stockprices,oilprices,surveys,andinterestrates);“Cond. 2” istheforecastconditioning on the value in 2018:Q1 of all the high-frequency indicators the labor market indicators published in the BLS employment report; “Cond. 3” istheforecastconditionalalsoonthevaluein2018:Q1oftheCPIandPPIdata,housingmarketindicators,andindustrialproduction; “Cond. 4” is the common component estimated given all the data up to 2018:Q1; and, lastly, “Actual” is the actual value of GDP growth publishedbytheBEA.AlltheseindicatorsarereleasedpriortoGDP,whichisreleasedattheendofthefollowingmonth: high-frequency indicatorsareavailablealmostinrealtime,thelaborreportispublishedinthefirstweekofthefollowingmonth,andtheCPI,PPI,Industrial Production,andhousingindicators,arepublishedaboutthemidofthefollowingmonth. Intherightchart“Unc.” istheunconditionalforecastgivendataupto2017:Q4;“Cond. 1” istheforecastconditionalonpayrollemployment growing at the same average pace as in 2017; “Cond. 2” is the pessimistic scenario assuming that payroll employment grows at half the averagepaceasin2017. Table 4: US data - Testing hypothesis on the loadings (A) (B) (C) (D) (E) (F) r=1 W 0.1947 0.0087 0.0736 0.1358 1.7421 9.122 Λ(cid:98)n p-value 0.66 0.93 0.79 0.71 0.19 0.00 r=2 W 0.2203 0.3708 0.5392 0.6124 3.0266 11.7768 Λ(cid:98)n p-value 0.90 0.83 0.76 0.74 0.22 0.00 r=3 W 0.2462 4.6652 1.5638 0.8066 5.3210 13.6319 Λ(cid:98)n p-value 0.97 0.20 0.67 0.85 0.15 0.00 r=4 W 0.3803 9.040 7.8699 1.2168 5.1899 110.8475 Λ(cid:98)n p-value 0.98 0.06 0.10 0.88 0.27 0.00 r=5 W 1.8934 14.7642 9.7824 1.6959 5.4872 128.9513 Λ(cid:98)n p-value 0.86 0.01 0.08 0.89 0.36 0.00 r=6 W 1.2704 28.6640 14.4703 4.7851 11.9874 126.7982 Λ(cid:98)n p-value 0.97 0.00 0.02 0.57 0.06 0.00 The null hypotheses are: (A) λGDP = λGDI; (B) λCPI = λPCE; (C) λCPIcore = λPCEcore ; (D) λCPIenergy = λPCEenergy ;(E)λCPIfood =λPCEfood ;and,(F)λGDP=λPayroll. (e.g., Aruoba et al., 2016; Barigozzi and Luciani, 2018). Thesecondhypothesis(column(B))thatwetestiswhetherCPIinflationandPCEpriceinflation, whichare two alternative measures of consumer price inflation, have equal loadings. These indexes usually differ because they are constructed differently.11 The test does not reject the null whenever r <6, which we read as signaling that, indeed, CPI inflation and PCE price inflation respond in the same way to the common factors, and thus, their difference is just idiosyncratic or, perhaps, weak/local factors. Columns (C), (D), and (E) in Table 5 test whether the difference between the PCE and CPI core sub-index, energy sub-index, and food sub-index are just idiosyncratic. The test suggests this to be the case. Finally, in column (F) of Table 4, we verify that if we test a non-sense hypothesis, the test rejects it. Specifically, we test whether the loadings of GDP growth and the growth in total non-farm employment are the same, which should not be the case. The test unequivocally reaches the same conclusion. 10 Concluding remarks ThispaperprovidestheasymptoticpropertiesofQuasiMaximumLikelihood(QML)estimationforlargeapproximate dynamic factor models, implemented via the Kalman smoother and the Expectation Maximization (EM) 11TheCPI,whichcapturestheheadlinesinnewspapers,determinesthereturnonTreasuryInflation-ProtectedSecurities,orTIPS, whiletheinflationobjectiveoftheFederalReserveisspecifiedintermsofPCEpriceinflation. 34

algorithm. Our results provide the statistical foundations of one of the most popular and successful methods for estimatinghigh-dimensionalfactormodelsfortimeseriescommonlyusedinmanypublicandprivateinstitutions to track and predict economic activity. From a technical point of view, we show and prove that the EM approach is feasible even in the highdimensional case, i.e., when the cross-sectional size n can be much larger than the sample size T, a point also made by Doz et al. (2012). Then, we show that the EM estimator converges at the same rate as the Principal Components (PC) estimator. Moreover, we show that the EM estimator of the loadings is always as efficient as the PC estimator, while the Kalman smoother estimator of the factors is more efficient than the PC estimator if the idiosyncratic covariance is sparse enough. Compared to the standard PC estimator, the EM approach has the main advantage of allowing the user to easilyimposerestrictionsthatreflectanypriorknowledgeaboutthedataonthemodel. Theusercanimposethese restrictionsbecausethestate-spaceformulationandtheKalmanSmootherallowexplicitmodelingandestimation of the dynamic evolution of the latent factors and deal with data irregularly spaced in time. Moreover, in the M-step, the user can impose restrictions on the parameters, thus allowing for constrained QML estimation. In contrast, the user cannot use the PC estimator to model the latent factors’s dynamics. In an application on a dataset of US macroeconomic time series, we show that the EM algorithm can produce estimates of the common component that track the dynamics of the observed series better than the PC estimator, especially those series displaying periods of high persistence and regime changes, like inflation and interest rates. This result suggests that the Kalman smoother might be more robust to local deviations from stationarity, a feature already highlighted by Kálman (1960) and Kálman and Bucy (1961). References Ahn, H. J. and M. Luciani (2024). Common and idiosyncratic inflation. Finance and Economics Discussion Series 2020-024r1, Board of Governors of the Federal Reserve System. Altavilla,C.,R.Giacomini,andG.Ragusa(2017). Anchoringtheyieldcurveusingsurveyexpectations. Journal of Applied Econometrics 32, 1055–1068. Amemiya, Y., W. A. Fuller, and S. G. Pantula (1987). The asymptotic distributions of some estimators for a factor analysis model. Journal of Multivariate Analysis 22, 51–64. Anderson, B. D. O. and M. Deistler (2008). Generalized linear dynamic factor models-A structure theory. In Proceedings of the 47th IEEE Conference on Decision and Control, pp. 1980–1985. Anderson, B. D. O. and J. B. Moore (1979). Optimal Filtering. Dover Publications, Inc. Anderson,T.W.andH.Rubin(1956). Statisticalinferenceinfactoranalysis. InProceedingsofthethirdBerkeley symposium on mathematical statistics and probability, Volume 5, pp. 111–150. Aruoba, S. B., F. X. Diebold, J. Nalewaik, F. Schorfheide, and D. Song (2016). Improving GDP measurement: A measurement-error perspective. Journal of Econometrics 191, 384–397. Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71, 135–171. Bai, J. and K. Li (2012). Statistical analysis of factor models of high dimension. The Annals of Statistics 40, 436–465. Bai, J. and K. Li (2016). Maximum likelihood estimation and inference for approximate factor models of high dimension. The Review of Economics and Statistics 98, 298–309. 35

Bai, J.andY.Liao(2016). Efficientestimationofapproximatefactormodelsviapenalizedmaximumlikelihood. Journal of Econometrics 191, 1–18. Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models. Econometrica 70, 191–221. Bai, J. and S. Ng (2006). Confidence intervals for diffusion index forecasts and inference for factor augmented regressions. Econometrica 74, 1133–1150. Bai, J. and S. Ng (2007). Determining the number of primitive shocks in factor models. Journal of Business and Economic Statistics 25, 52–60. Bai, J. and S. Ng (2013). Principal components estimation and identification of static factors. Journal of Econometrics 176, 18–29. Bai, J. and S. Ng (2021). Matrixcompletion, counterfactuals, and factor analysis of missing data. Journal of the American Statistical Association, 1–18. available online. Bai,J.andP.Wang(2015). IdentificationandBayesianestimationofdynamicfactormodels. JournalofBusiness & Economic Statistics 33, 221–240. Bakhshizadeh, M., A. Maleki, and V. H. de la Pena (2023). Sharp concentration results for heavy-tailed distributions. Information and Inference: A Journal of the IMA 12, 1655–1685. Balakrishnan, S., M. J. Wainwright, and B. Yu (2017). Statistical guarantees for the EM algorithm: From population to sample-based analysis. The Annals of Statistics 45, 77–120. Bańbura, M., D. Giannone, and M. Lenza (2015). Conditional forecasts and scenario analysis with vector autoregressions for large cross-sections. International Journal of Forecasting 31, 739–756. Bańbura, M., D. Giannone, M. Modugno, and L. Reichlin (2013). Now-casting and the real-time data flow. In Handbook of economic forecasting, Volume 2, pp. 195–237. Elsevier. Bańbura, M., D. Giannone, and L. Reichlin (2011). Nowcasting. In M. P. Clements and D. F. Hendry (Eds.), Oxford Handbook on Economic Forecasting. New York: Oxford University Press. Bańbura,M.andM.Modugno(2014).Maximumlikelihoodestimationoffactormodelsondatasetswitharbitrary pattern of missing data. Journal of Applied Econometrics 29, 133–160. Barigozzi, M. (2023). Asymptotic equivalence of principal component and quasi maximum likelihood estimators in large approximate factor models. Technical Report arXiv:2307.09864. Barigozzi, M. (2024). Quasi maximum likelihood estimation of high-dimensional factor models. In Oxford Research Encyclopedia of Economics and Finance. Oxford University Press. Barigozzi, M., A. Cuzzola, M. Grazzi, and D. Moschella (2024). Factoring in the micro: A transaction-level dynamicfactorapproachtothedecompositionofexportvolatility. OxfordBulletinofEconomicsandStatistics. forthcoming. Barigozzi, M., M. Lippi, and M. Luciani (2021). Large-dimensional dynamic factor models: Estimation of impulse-response functions with I(1) cointegrated factors. Journal of Econometrics 221, 455–482. Barigozzi, M. and M. Luciani (2018). Do National Account statistics underestimate US real output growth? FEDS Notes 2018-01-09, Board of Governors of the Federal Reserve System. 36

Barigozzi, M. and M. Luciani (2023). Measuring the output gap using large datasets. The Review of Economics and Statistics 105, 1500–1514. Bernanke,B.S.,J.Boivin,andP.S.Eliasz(2005).Measuringtheeffectsofmonetarypolicy: AFactor-Augmented Vector Autoregressive (FAVAR) approach. The Quarterly Journal of Economics 120, 387–422. J. Boivin and S. Ng (2006). Are more data always better for factor analysis? Journal of Econometrics 132, 169–194. Bosq, D. (2012). Nonparametric statistics for stochastic processes: estimation and prediction. Springer Science & Business Media. Bradley,R.C.(2005). Basicpropertiesofstrongmixingconditions.asurveyandsomeopenquestions. Probability Surveys 2, 107–144. Breitung, J. and J. Tenhofen (2011). GLS estimation of dynamic factor models. Journal of the American Statistical Association 106, 1150–1166. Cascaldi-Garcia, D., M. Luciani, and M. Modugno (2023). Lessons from nowcasting GDP across the world. International Finance Discussion Papers 1385, Board of Governors of the Federal Reserve System. Chamberlain, G. and M. Rothschild (1983). Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica 51, 1281–1304. Cochrane, D. and G. H. Orcutt (1949). Application of least squares regression to relationships containing autocorrelated error terms. Journal of the American statistical association 44, 32–61. Coroneo, L., D. Giannone, and M. Modugno (2016). Unspanned macroeconomic factors in the yield curve. Journal of Business and Economic Statistics 34, 472–485. D’Agostino, A., D. Giannone, M. Lenza, and M. Modugno (2016). Nowcasting business cycles: A Bayesian approach to dynamic heterogeneous factor models. In S. Koopman and E. Hillebrand (Eds.), Dynamic Factor Models, Volume 35 of Advances in Econometrics, pp. 569–594. Emerald Publishing Ltd. Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 39, 1–38. Doz,C.,D.Giannone,andL.Reichlin(2011). Atwo-stepestimatorforlargeapproximatedynamicfactormodels based on Kalman filtering. Journal of Econometrics 164, 188–205. Doz, C., D. Giannone, and L. Reichlin (2012). A quasi maximum likelihood approach for large approximate dynamic factor models. The Review of Economics and Statistics 94(4), 1014–1024. Durbin, J. and S. J. Koopman (2012). Time Series Analysis by State Space Methods. Oxford University Press. Fan, J. and Y. Liao (2022). Learning latent factors from diversified projections and its applications to overestimated and weak factors. Journal of the American Statistical Association 117, 909–924. Fan, J., Y. Liao, and M. Mincheva (2011). High dimensional covariance matrix estimation in approximate factor models. The Annals of Statistics 39, 3320. Fan, J., Y. Liao, and M. Mincheva (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75, 603–680. Fan,J.,R.Masini,andM.C.Medeiros(2022). Doweexploitallinformationforcounterfactualanalysis? benefits of factor models and idiosyncratic correction. Journal of the American Statistical Association 117, 574–590. 37

Forni, M., D. Giannone, M. Lippi, and L. Reichlin (2009). Opening the black box: Structural factor models versus structural VARs. Econometric Theory 25, 1319–1347. Forni, M., M. Hallin, M. Lippi, and L. Reichlin (2000). The Generalized Dynamic Factor Model: Identification and estimation. The Review of Economics and Statistics 82, 540–554. Francq, C. and J.-M. Zakoïan (2006). Mixing properties of a general class of GARCH (1,1) models without moment assumptions on the observed process. Econometric Theory 22, 815–834. Fresoli, D., P. Poncela, and E. Ruiz (2024). Dealing with idiosyncratic cross-correlation when constructing confidence regions for PC factors. Technical Report arXiv:2407.06883. Gao, Z., J. Guo, and Y. Ma (2021). A note on statistical analysis of factor models of high dimension. Science China Mathematics 64, 1905–1916. Geweke, J. F. (1993). A dynamic index model for large cross sections. Comment. In Business cycles, indicators and forecasting. University of Chicago Press. Ghahramani, Z. and G. E. Hinton (1996). Parameter estimation for linear dynamical systems. Technical report, Cambridge University. mimeo. Giannone, D., M. Lenza, and L. Reichlin (2019). Money, credit, monetary policy and the business cycle in the euro area: what has changed since the crisis? International Journal of Central Banking 15, 137–173. Giannone, D., L. Reichlin, and L. Sala (2006). Tracking Greenspan: Systematic and nonsystematic monetary policy revisited. Discussion papers 3550, CEPR. Giannone, D., L. Reichlin, and D. Small (2008). Nowcasting: The real-time informational content of macroeconomic data. Journal of Monetary Economics 55, 665–676. Hannan, E. J. (1970). Multiple time series. John Wiley & Sons. Hannan, E. J. (1980). The estimation of the order of an ARMA process. The Annals of Statistics 8, 1071–1081. Harvey, A. C. (1990). Forecasting, structural time series models and the Kalman filter. Cambridge University Press. Harvey, A. C. (1996). Intervention analysis with control groups. International Statistical Review 64, 313–328. Harvey, A. C. and D. Delle Monache (2009). Computing the mean square error of unobserved components extracted by misspecified time series models. Journal of Economic Dynamics and Control 33, 283–295. Harvey, A. C. and S. Peters (1990). Estimation procedures for structural time series models. Journal of Forecasting 9, 89–108. Heaton,C.andV.Solo(2004). Identificationofcausalfactormodelsofstationarytimeseries. The Econometrics Journal 7, 618–627. Ibragimov,I.A.(1962).Somelimittheoremsforstationaryprocesses.TheoryofProbabilityanditsApplications 7, 349–382. Jungbacker, B. and S. J. Koopman (2015). Likelihood-based dynamic factor analysis for measurement and forecasting. Econometrics Journal 18, C1–C21. Jungbacker, B., S. J. Koopman, and M. Van der Wel (2011). Maximum likelihood estimation for dynamic factor models with missing data. Journal of Economic Dynamics and Control 35, 1358–1368. 38

Jungbacker, B., S. J. Koopman, and M. Van der Wel (2014). Smooth dynamic factor analysis with application to the US term structure of interest rates. Journal of Applied Econometrics 29, 65–90. Juvenal,L.andI.Petrella(2015). Speculationintheoilmarket. JournalofAppliedEconometrics 30,1099–1255. Kálman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering 82, 35–45. Kálman, R. E. and Bucy, R. S. (1961). New Results in Linear Filtering and Prediction Theory. Journal of Basic Engineering 83, 95–108. Kapetanios, G. and M. Marcellino (2009). A parametric estimation method for dynamic factor models of large dimensions. Journal of Time Series Analysis 30, 208–238. Kim,H.H.andN.R.Swanson(2018). Methodsforbackcasting,nowcastingandforecastingusingfactor-MIDAS: With an application to korean GDP. Journal of Forecasting 37, 281–302. Kim, M. S. (2022). Robust inference for diffusion-index forecasts with cross-sectionally dependent data. Journal of Business & Economic Statistics 40, 1153–1167. Koopman,S.J.andG.Mesters(2017).EmpiricalBayesmethodsfordynamicfactormodels.ReviewofEconomics and Statistics 99, 486–498. Koopman, S. J. and M. van der Wel (2013). Forecasting the us term structure of interest rates using a macroeconomic smooth dynamic factor model. International Journal of Forecasting 29, 676–694. Kose, M. A., C. Otrok, and C. H. Whiteman (2003). International business cycles: World, region, and countryspecific factors. The American Economic Review 93, 1216–1239. Kotz, S., N. Balakrishnan, and N. L. Johnson (2004). Continuous multivariate distributions. Volume 1: Models and applications. John Wiley & Sons. Kuchibhotla, A. K., L. D. Brown, A. Buja, E. I. George, and L. Zhao (2023). Uniform-in-submodel bounds for linear regression in a model-free framework. Econometric Theory 39, 1202–1248. Kuchibhotla, A. K. and A. Chakrabortty (2022). Moving beyond sub-Gaussianity in high-dimensional statistics: Applications in covariance estimation and linear regression. Information and Inference: A Journal of the IMA 11, 1389–1456. Lawley, D. N. and A. E. Maxwell (1971). Factor Analysis as a Statistical Method. Butterworths, London. Lehman, E. H. (1963). Shapes, moments and estimators of the weibull distribution. IEEE Transactions on Reliability 12, 32–38. Lehmann, E. L. and G. Casella (2006). Theory of point estimation. Springer Science & Business Media. Lin,J.andG.Michailidis(2020). Systemidentificationofhigh-dimensionallineardynamicalsystemswithserially correlated output noise components. IEEE Transactions on Signal Processing 68, 5573–5587. Linton, O. B., H. Tang, and J. Wu (2022). A structural dynamic factor model for daily global stock market returns. Technical Report arXiv:2202.03638. Lippi, M., M. Deistler, and B. D. O. Anderson (2021). High-dimensional dynamic factor models: A selective survey and lines of future research. mimeo, EIEF. 39

Luciani, M. (2014). Forecasting with Approximate Dynamic Factor Models: The role of non-pervasive shocks. International Journal of Forecasing 30, 20–29. Luciani, M. (2015). Monetary policy and the housing market: A structural factor analysis. Journal of Applied Econometrics 30, 199–218. Luciani, M. and L. Ricci (2014). Nowcasting Norway. International Journal of Central Banking 10, 215–248. Mao, J., Z. Gao, B.-Y. Jing, and J. Guo (2024). On the statistical analysis of high-dimensional factor models. Statistical Papers, 1–29. available online. Marcellino,M.andV.Sivec(2016).Monetary,fiscalandoilshocks: Evidencebasedonmixedfrequencystructural FAVARs. Journal of Econometrics 193, 335–348. Mariano, R. S. and Y. Murasawa (2003). A new coincident index of business cycles based on monthly and quarterly series. Journal of Applied Econometrics 18, 427–443. McLachlan, G. and T. Krishnan (2007). The EM algorithm and extensions, Volume 382. John Wiley & Sons. Meng,X.-L.andD.B.Rubin(1994). OntheglobalandcomponentwiseratesofconvergenceoftheEMalgorithm. Linear Algebra and its Applications 199, 413–425. Merlevède, F., M. Peligrad, and E. Rio (2011). A Bernstein type inequality and moderate deviations for weakly dependent sequences. Probability Theory and Related Fields 151, 435–474. Mikosch, T. and A. V. Nagaev (1998). Large deviations of heavy-tailed sums with applications in insurance. Extremes 1, 81–110. Mosley,L.,T.T.Chan,andA.Gibberd(2024). Thesparsedynamicfactormodel: aregularisedquasi-maximum likelihood approach. Statistics and Computing 34, 1–19. Ng, C. T., C. Y. Yau, and N. H. Chan (2015). Likelihood inferences for high-dimensional factor analysis of time series with applications in finance. Journal of Computational and Graphical Statistics 24(3), 866–884. Ng, S. and S. Scanlan (2024). Constructing high frequency economic indicators by imputation. Econometrics Journal 27, C1–C30. Pham,T.D.andL.T.Tran(1985). Somemixingpropertiesoftimeseriesmodels. Stochastic processes and their applications 19, 297–303. Poignard, B. and Y. Terada (2020). Statistical analysis of sparse approximate factor models. Electronic Journal of Statistics 14, 3315–3365. Poncela, P., E. Ruiz, and K. Miranda (2021). Factor extraction using Kalman filter and smoothing: This is not just another survey. International Journal of Forecasting 37, 1399–1425. Quah,D.andT.J.Sargent(1993). Adynamicindexmodelforlargecrosssections. InBusinesscycles, indicators and forecasting. University of Chicago Press. Reis, R.andM.W.Watson(2010). Relativegoods’prices, pureinflation, andthePhillipscorrelation. American Economic Journal Macroeconomics 2, 128–157. Rubin, D. B. and D. T. Thayer (1982). EM algorithms for ML factor analysis. Psychometrika 47, 69–76. Ruiz,E.andP.Poncela(2022). FactorextractioninDynamicFactorModels: UsingKalmanFilterandPrincipal Components in practice. Foundations and Trends in Econometrics 12, 121–231. 40

Ruud, P. A. (1991). Extensions of estimation methods using the EM algorithm. Journal of Econometrics 49(3), 305–341. Sargent, T. J. and C. A. Sims (1977). Business cycle modeling without pretending to have too much a priori economic theory. In New methods in business cycle research. Federal Reserve Bank of Minneapolis. Shumway, R. H. and D. S. Stoffer (1982). An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis 3, 253–264. Spearman, C. (1904). General intelligence objectively determined and measured. American Journal of Psychology 15, 201–293. Steyn, H. S. (1960). On regression properties of multivariate probability functions of Pearson’s types. In Indagationes Mathematicae (Proceedings), Volume 63, pp. 302–311. Elsevier. Stock, J. H. and M. W. Watson (1989). New indexes of coincident and leading economic indicators. In O. J. Blanchard and S. Fischer (Eds.), NBER Macroeconomics Annual 1989. MIT press. Stock,J.H.andM.W.Watson(2002). Forecastingusingprincipalcomponentsfromalargenumberofpredictors. Journal of the American Statistical Association 97, 1167–1179. Stock, J. H. and M. W. Watson (2016). Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics. In J. B. Taylor and H. Uhlig (Eds.), Handbook of Macroeconomics, Volume 2, pp. 415–525. Elsevier. Sundberg, R. (2019). Statistical modelling by exponential families. Cambridge University Press. Tanner,M.A.andW.H.Wong(1987). Thecalculationofposteriordistributionsbydataaugmentation. Journal of the American Statistical Association 82, 528–540. Tipping, M. E. and C. M. Bishop (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61, 611–622. Vershynin,R.(2018). High-dimensionalprobability: Anintroductionwithapplicationsindatascience,Volume47. Cambridge University Press. Vladimirova, M., S. Girard, H. Nguyen, and J. Arbel (2020). Sub-Weibull distributions: Generalizing sub- Gaussian and sub-Exponential properties to heavier tailed distributions. Stat 9(1), e318. Wang, S., H. Yang, and C. Yao (2019). On the penalized maximum likelihood estimation of high-dimensional approximate factor model. Computational Statistics 34, 819–846. Watson, M. W. and R. F. Engle (1983). Alternative algorithms for the estimation of dynamic factor, mimic and varying coefficients regression models. Journal of Econometrics 23, 385–400. Westerlund, J. and J.-P. Urbain (2015). Cross-sectional averages versus principal components. Journal of Econometrics 185, 372–377. Wu, J. C. F. (1983). On the convergence properties of the EM algorithm. The Annals of Statistics 11, 95–103. Xiong, R. and M. Pelger (2023). Large dimensional latent factor modeling with missing observations and applications to causal inference. Journal of Econometrics 233, 271–301. 41

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Supplementary material for the paper Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm Matteo Barigozzi Matteo Luciani UniversitàdiBologna FederalReserveBoard matteo.barigozzi@unibo.it matteo.luciani@frb.gov Table of contents A Further details on estimation 2 A.1 Principal Component estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 A.2 Kalman filter and smoother . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 A.3 Stopping rule for the EM algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 B Proof of main results 4 B.1 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 B.2 Proof of Proposition 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 B.3 Proof of Proposition 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 B.4 Proof of Proposition 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 B.5 Proof of Proposition 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 B.6 Proof of Corollary 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 B.7 Proof of Proposition 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 C General lemmas 22 D Lemmas necessary for proving Proposition 1 31 E Lemmas necessary for proving Proposition 2 47 F Lemmas necessary for proving Proposition 3 78 G Lemmas necessary for proving Proposition 5 80 H Lemmas necessary for proving Proposition 6 84 I Derivation of the Kalman filter MSE 85 J Data description and data treatment 90 M.BarigozzigratefullyacknowledgesfinancialsupportfromMIUR(PRIN2020,Grant2020N9YFFE). Disclaimer: the views expressed in this paper are those of the authors and do not necessarily reflect the views and policies of the BoardofGovernorsortheFederalReserveSystem. Page 1

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Notation Parameters Parameters φ true value n φ generic value n φ∗ QML estimator maximizing the log-likelihood (5) (cid:98)n φ† QML estimator maximizing the log-likelihood in Bai and Li (2016, Eq. 3) (cid:98)n φ(0) PC estimator used in E-step at iteration 0 (cid:98)n φ(k) estimator used in E-step at iteration k>0 (cid:98)n φ(k+1) estimator computed in M-step at iteration k≥0 (cid:98)n φ ≡φ(k∗+1) final estimator computed in M-step at iteration k∗ (cid:98)n (cid:98)n An analogous notation is used for the sub-vectors of parameters ϕ and θ and all their elements. n Factors Factors F true value t F(cid:101)t PC estimator F and P (s=t−1,t,T) estimator and its pseudo-MSE computed via KF-KS using φ t|s t|s n F and P (s=t−1,t,T) estimator and its MSE computed via KF-KS using φ but with Γξ 0,t|s 0,t|s n n F∗ and P∗ (s=t−1,t,T) estimator and its pseudo-MSE computed via KF-KS using φ∗ t|s t|s (cid:98)n F(k) and P(k) (s=t−1,t,T) estimator and its pseudo-MSE computed via KF-KS using φ(k), k≥0 t|s t|s (cid:98)n F(cid:98)t|t ≡F( t| k t ∗+1) final estimator computed via KF at iteration (k∗+1) using φ (cid:98)n P(cid:98)t|t ≡P( t| k t ∗+1) pseudo-MSE computed via KF at iteration (k∗+1) using φ (cid:98)n F(cid:98)t ≡F(cid:98)t|T ≡F( t| k T ∗+1) final estimator computed via KS at iteration (k∗+1) using φ (cid:98)n P(cid:98)t|T ≡P( t| k T ∗+1) pseudo-MSE computed via KS at iteration (k∗+1) using φ (cid:98)n F(cid:98)W t LS WLS estimator computed using φ (cid:98)n A Further details on estimation Hereafter, we assume without loss of generality that µ = 0 and p = 1 with A ≡ A . When p > 1, it is enough n n F 1 F to write the VAR in (4) in companion form and to modify the estimation accordingly, using the augmented state vector (F′···F′ )′. t t−pF+1 A.1 Principal Component estimators Let Γ(cid:98)x n be the sample covariance matrix of the data and denote as M(cid:99)x n the diagonal matrix with entries the r-largest eigenvaluesofΓ(cid:98)x n ,andasV(cid:98)n x then×r matrixofthecorrespondingnormalizedeigenvectors. Moreover,letS(cid:98) (0) bear×r Page 2

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm diagonal matrix with entries I([V(cid:98)n x] 1j ≥0)−I([V(cid:98)n x] 1j <0), j =1,...,r. Then, Λ(cid:98) ( n 0) =V(cid:98)n xS(cid:98) (0) (M(cid:99) x n )1/2, F(cid:101)t =(Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′x nt =(M(cid:99) x n )−1/2S(cid:98) (0) V(cid:98)n x′x nt , ξ(cid:101)nt =x nt −Λ(cid:98) ( n 0)′F(cid:101)t , t=1,...,T (cid:32) T (cid:33)(cid:32) T (cid:33)−1 A(cid:98) (0) = (cid:88) F(cid:101)t F(cid:101) ′ t−1 (cid:88) F(cid:101)t−1 F(cid:101) ′ t−1 , t=2 t=2 v (cid:101)t =F(cid:101)t −A(cid:98) (0)F(cid:101)t−1 , t=1,...,T, T Γ(cid:98) v(0) =T−1(cid:88) v (cid:101)t v (cid:101)t ′. t=1 Finally, letting λ(cid:98) i (0)′ be the i-th row of Λ(cid:98) ( n 0), and ξ(cid:101)it the ith component of ξ(cid:101)nt , then T σ (cid:98)i 2(0) =T−1(cid:88) ξ(cid:101)i 2 t , i=1,...,n. t=1 The vector of initial estimates of parameters is then: (cid:16) (cid:17)′ φ (cid:98) ( n 0) = vec(Λ(cid:98) ( n 0))′ σ (cid:98)1 2(0)···σ (cid:98)n 2(0) vec(A(cid:98) (0))′ vech(Γ(cid:98) v(0))′ , and it is used to run the first iteration of the EM algorithm. A.2 Kalman filter and smoother The following iterations are stated for given initial conditions F and P and given the true parameters φ . 0|0 0|0 n Forward iterations - Filtering The Kalman filter is based on the forward iterations for t=1,...,T: F =AF , (A.1) t|t−1 t−1|t−1 P =AP A′+Γv, (A.2) t|t−1 t−1|t−1 F =F +P Λ′(Λ P Λ′ +Σξ)−1(x −Λ F ), (A.3) t|t t|t−1 t|t−1 n n t|t−1 n n nt n t|t−1 P =P −P Λ′(Λ P Λ′ +Σξ)−1Λ P . (A.4) t|t t|t−1 t|t−1 n n t|t−1 n n n t|t−1 Moreover, by combining (A.2) and (A.4), we obtain the Riccati difference equation: P −AP A′+AP Λ′(Λ P Λ′ +Σξ)−1Λ P A′ =Γv. (A.5) t+1|t t|t−1 t|t−1 n n t|t−1 n n n t|t−1 Backward iterations - Smoothing The Kalman smoother is then based on the backward iterations for t=T,...,1: F =F +P A′P−1 (F −F ), (A.6) t|T t|t t|t t+1|t t+1|T t+1|t P =P +P A′P−1 (P −P )P−1 AP . (A.7) t|T t|t t|t t+1|t t+1|T t+1|t t+1|t t|t Finally,C canbeobtainedfromastatespacemodelwithanaugmentedstatevectorcontainingbothF andF , t,t−1|T t t−1 by taking the r×r off-diagonal block of the 2r×2r matrix defined in (A.7) but for the augmented model. An equivalent way of implementing (A.6), which does not require matrix inversion is in Durbin and Koopman (2012, Page 3

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Chapter 4.4, pp.87-91), which is defined by the backward iterations for t=T,...,1 F =F +P r , (A.8) t|T t|t−1 t|t−1 t−1 r =Λ′(Λ P Λ′ +Σξ)−1(x −Λ F )+L′r , (A.9) t−1 n n t|t−1 n n t n t|t−1 t t P =P (I −N P ), (A.10) t|T t|t−1 r t−1 t|t−1 N =Λ′(Λ P Λ′ +Σξ)−1Λ +L′N L , (A.11) t−1 n n t|t−1 n n n t t t L =A−AP Λ′(Λ P Λ′ +Σξ)−1Λ , (A.12) t t|t−1 n n t|t−1 n n n C =P L′(I −N P ), C =C′ , (A.13) t,t+1|T t|t−1 t r t t+1|t t,t−1|T t,t+1|T where r =0 , N =0 and by construction AP =L P . T r T r t|t t t|t−1 Intialization of Kalman filter and smoother The Kalman filter is initialized as follows. At the first iteration of the EM algorithm, i.e., when k =0, we set F(0) =0 0|0 r andP(0) =I ,consistentlywithAssumption6(b). OtherinitializationsasP(0) =κ I forsomefiniterealκ >0arealso 0|0 r 0|0 0 r 0 possible. At any successive iteration of the EM algorithm, i.e, when k>0, we set F(k) =F(k−1) and P(0) =I . 0|0 0|T 0|0 r To run the Kalman smoother we start using the last predictions of the Kalman filter. Thus, for any k ≥ 0, we set F(k) = A(cid:98)(k)F(k) where F(k) is obtained from (A.3), and we set P(k) = A(cid:98)(k)P(k) A(cid:98)(k)′ +Γ(cid:98)v(k), where P(k) is T+1|T T|T T|T T+1|T T|T T|T obtained from (A.4). A.3 Stopping rule for the EM algorithm To stop the EM algorithm we adopt the following convergence rule. We fix a maximum finite number of iterations k , max and we stop it at the first iteration k∗ ≤k such that: max ∆ℓ k∗ = 1 2 (cid:12) (cid:12) (cid:12) (cid:12) ℓ ℓ ( ( X X n n T T ; ; φ (cid:98) φ (cid:98) ( n k ( n k ∗ ∗ + + 1 1 ) ) ) ) − + ℓ ℓ ( ( X X n n T T ; ; φ (cid:98) φ (cid:98) ( n k ( n k ∗ ∗ ) ) ) ) (cid:12) (cid:12) (cid:12) (cid:12) <ε, (A.14) whereεisapre-specifiedtolerancelevel. Inthiscase,thelog-likelihoodiscomputedusingitspredictionerrorformulation obtained from the Kalman filter: T T ℓ(X ;φ )= − 1(cid:88) logdet(Λ P Λ′ +Σξ)− 1(cid:88) (x −Λ F )′(Λ P Λ′ +Σξ)−1(x −Λ F ), nT n 2 n t|t−1 n n 2 nt n t|t−1 n t|t−1 n n nt n t|t−1 t=1 t=1 where F and P are computed using (A.1) and (A.2), respectively, when using generic values of the parameters. t|t−1 t|t−1 SimilarconvergencecriteriacanbefoundinBoothandHobert(1999)andMcLachlanandKrishnan(2007,Chapter4.9). B Proof of main results Hereafter, we assume without loss of generality that µ =0 and p =1 with A≡A . n n F 1 B.1 Proof of Proposition 1 Consider the EM algorithm initialized using the PC estimators of the parameters as defined in Section A.1. At k∗ = 0, from (13), we have (cid:32) T (cid:33)−1(cid:32) T (cid:33) λ(cid:98) ( i 1) = T−1(cid:88) F( t| 0 T )F( t| 0 T )′+P( t| 0 T ) T−1(cid:88) F( t| 0 T )x it . (B.1) t=1 t=1 Now, (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)T−1(cid:88) T F(0)F(0)′−T−1(cid:88) T F F′ (cid:13) (cid:13)≤2 (cid:13) (cid:13)T−1(cid:88) T (F(0) −F )F′ (cid:13) (cid:13)+ (cid:13) (cid:13)T−1(cid:88) T (F(0) −F )(F(0) −F )′ (cid:13) (cid:13), (B.2) (cid:13) t|T t|T t t(cid:13) (cid:13) t|T t t(cid:13) (cid:13) t|T t t|T t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 t=1 t=1 Page 4

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm and (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)T−1(cid:88) T F(0)x −T−1(cid:88) T F x (cid:13) (cid:13)≤ (cid:13) (cid:13)T−1(cid:88) T (F(0) −F )x (cid:13) (cid:13). (B.3) (cid:13) t|T it t it(cid:13) (cid:13) t|T t it(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 t=1 Throughout, let y =F or y =x . Then, we have to consider t t t it (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T (F( t| 0 T ) −F t )y t ′ (cid:13) (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:13) T−1(cid:88) T (F( t| 0 T ) −F( t| 0 t ))y t ′ (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) T−1(cid:88) T (F( t| 0 t )−F(cid:98)W t LS(0))y t ′ (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) T−1(cid:88) T (F(cid:98)W t LS(0)−F t )y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 t=1 t=1 =I+II+III, say. (B.4) Let us consider each term in (B.4). First, I ≤ t= m 1, a .. x .,T ∥P( t| 0 t )∥∥A(cid:98) (0)∥ t= m 1, a .. x .,T ∥(P( t+ 0) 1|t )−1∥ (cid:40)(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T F( t+ 0) 1|T y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) +∥A(cid:98) (0)∥ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T F( t+ 0) 1|t+1 y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:41) t=1 t=1 =O (n−1), (B.5) p byLemmasD.8,D.11,andD.18,andsince∥A(cid:98)(0)∥≤∥A∥+∥A(cid:98)(0)−A∥=O p (1),byAssumption1(d)andLemmaD.3(i). Second, from (D.46) and (D.50) in the proof of Lemma D.16 II ≤O p (n−1) (cid:40)(cid:13) (cid:13) (cid:13) (cid:13) n−1/2T−1(cid:88) T x nt y t ′ (cid:13) (cid:13) (cid:13) (cid:13) +∥A(cid:98) (0)∥ (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T F( t− 0) 1|t−1 y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:41) =O p (n−1), (B.6) (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 because of Lemma D.18 and since (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)n−1/2T−1(cid:88) T Λ F F′λ (cid:13) (cid:13)≤n−1/2∥Λ ∥ (cid:13) (cid:13)T−1(cid:88) T F F′ (cid:13) (cid:13)M =O (1), (B.7) (cid:13) n t t i(cid:13) n (cid:13) t t(cid:13) λ p (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)n−1/2T−1(cid:88) T Λ F ξ (cid:13) (cid:13)≤n−1/2∥Λ ∥ (cid:13) (cid:13)T−1(cid:88) T F ξ (cid:13) (cid:13)=O (T−1/2), (cid:13) n t it(cid:13) n (cid:13) t it(cid:13) p (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)n−1/2T−1(cid:88) T ξ F′λ (cid:13) (cid:13)≤ (cid:13) (cid:13)n−1/2T−1(cid:88) T ξ F′ (cid:13) (cid:13)M =O (T−1/2), (cid:13) nt t i(cid:13) (cid:13) nt t(cid:13) λ p (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)n−1/2T−1(cid:88) T ξ ξ (cid:13) (cid:13)≤ (cid:13) (cid:13)n−1/2T−1(cid:88) T ξ ξ −n−1/2E[ξ ξ ] (cid:13) (cid:13)+n−1/2∥E[ξ ξ ]∥=O (T−1/2)+O(1), (cid:13) nt it(cid:13) (cid:13) nt it nt it (cid:13) nt it p (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 byAssumption1(a),LemmasC.2,C.12(i),C.12(ii),C.12(iii),andC.12(iv),andbecausen−1∥E[ξ ξ ]∥2 ≤n−1(cid:80)n |E[ξ2ξ2]|≤ nt it j=1 jt it K , by Assumption 2(d). Note that the first and third relations in (B.7) cover also the case y =F′. ξ t t Page 5

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Finally, let us consider the last term in (B.4). From (D.52) in the proof of Lemma D.17 (cid:13) (cid:13) III ≤ (cid:13) (cid:13) (cid:13) T−1(cid:88) T (Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1(Λ n −Λ(cid:98) ( n 0))F t y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) T−1(cid:88) T (Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1ξ nt y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) ≤∥(Λ′ n (Σξ n )−1Λ n )−1∥∥Λ′ n (Σξ n )−1(Λ n −Λ(cid:98) ( n 0))∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 +∥n(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1n−1/2Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1−n(Λ′ n (Σξ n )−1Λ n )−1n−1/2Λ′ n (Σξ n )−1∥ (cid:13) (cid:13) ·n−1/2∥Λ n −Λ(cid:98) ( n 0)∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) +n∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1∥n−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T Λ′ n (Σξ n )−1ξ nt y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) +n∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1∥n−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T {Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1−Λ′ n (Σξ n )−1}ξ nt y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 =III +III +III +III , say. (B.8) a b c d Then, III =O (n−1/2T−1/2), (B.9) a p by (B.7) and (D.54)-(D.56) in the proof of Lemma D.17. Moreover, III =O (max(n−2,T−1)), (B.10) b p by (B.7) and Lemmas D.1(ii) and D.5(v). Regarding III , if y =F , we have c t t (cid:13) (cid:13) III c =n∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1∥n−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T Λ′ n (Σξ n )−1ξ nt F′ t (cid:13) (cid:13) (cid:13) =O p (n−1/2T−1/2), (B.11) (cid:13) (cid:13) t=1 by Lemmas D.5(iii) and C.8(iv). If y =x , we have t it (cid:13) (cid:13) III c ≤n∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1∥n−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T Λ′ n (Σξ n )−1ξ nt F′ t (cid:13) (cid:13) (cid:13) ∥λ i ∥ (cid:13) (cid:13) t=1 (cid:13) (cid:13) +n∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T Λ′ n (Σξ n )−1ξ nt ξ it (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 =O (n−1/2T−1/2)), (B.12) p by Assumption 1(a), and Lemmas D.5(iii), C.8(iv), and C.8(v). Last, III =O (max(n−1,T−1/2)), (B.13) d p by (B.7) and Lemma D.5(v). From (B.8), (B.9), (B.10), (B.11), (B.12), and (B.13) III =O (max(n−1,T−1/2)). (B.14) p Combining (B.5), (B.6), and (B.14) we have (cid:13) (cid:13) (cid:13) (cid:13)T−1(cid:88) T (F(0) −F )F′ (cid:13) (cid:13)=O (max(n−1,T−1/2)), (B.15) (cid:13) t|T t t(cid:13) p (cid:13) (cid:13) t=1 (cid:13) (cid:13) (cid:13) (cid:13)T−1(cid:88) T (F(0) −F )x (cid:13) (cid:13)=O (max(n−1,T−1/2)), (B.16) (cid:13) t|T t it(cid:13) p (cid:13) (cid:13) t=1 Page 6

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm which once substituted into (B.2) and (B.3), jointly with Lemma D.12 give (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)T−1(cid:88) T {F(0)F(0)′+P(0)}−T−1(cid:88) T F F′ (cid:13) (cid:13)≤ (cid:13) (cid:13)T−1(cid:88) T F(0)F(0)′−T−1(cid:88) T F F′ (cid:13) (cid:13)+ max ∥P(0)∥ (cid:13) (cid:13) t|T t|T t|T t t(cid:13) (cid:13) (cid:13) (cid:13) t|T t|T t t(cid:13) (cid:13) t=1,...,T t|T t=1 t=1 t=1 t=1 =O (max(n−1,T−1/2))+O (n−1), p p and (cid:13) (cid:13) (cid:13) (cid:13)T−1(cid:88) T F(0)x −T−1(cid:88) T F x (cid:13) (cid:13)=O (max(n−1,T−1/2)). (cid:13) t|T it t it(cid:13) p (cid:13) (cid:13) t=1 t=1 Therefore, from (B.1) ∥λ(cid:98) ( i 1)−λO i LS∥=O p (max(n−1,T−1/2)), (B.17) with λOLS =(T−1(cid:80)T F F′)−1(T−1(cid:80)T F x ). And by Lemma D.19(i) we also have i t=1 t t t=1 t it ∥λOLS−λ ∥=O (T−1/2), (B.18) i i p indeed,recallingthatΓF =I byAssumption6(b),byLemmaC.12(i)andWeyl’sinequality(MerikoskiandKumar,2004, r Theorem 1) we have |ν(r)(T−1(cid:80)T F F′)−1| = O (T−1/2) which implies ∥(T−1(cid:80)T F F′)−1∥ = O (1). From (B.17) t=1 t t p t=1 t t p and (B.18) ∥λ(cid:98) ( i 1)−λ i ∥≤∥λ(cid:98) ( i 1)−λO i LS∥+∥λO i LS−λ i ∥=O p (max(n−1,T−1/2)). (B.19) Moreover, by letting y =n−1/2x the above proof leads to t nt (cid:13) (cid:13) (cid:13) (cid:13)n−1/2T−1(cid:88) T (F(0) −F )x′ (cid:13) (cid:13)=O (max(n−1,T−1/2)), (B.20) (cid:13) t|T t nt(cid:13) p (cid:13) (cid:13) t=1 and, therefore, using also Lemma D.19(ii), we have n−1/2∥Λ(cid:98) ( n 1)−Λ n ∥=O p (max(n−1,T−1/2)). (B.21) This proves parts (a.1) and (a.2) when k∗ =0. Following the same reasoning leading to (B.19) and by Lemma D.19(iv) we can easily prove that ∥A(cid:98) (1)−A∥≤∥A(cid:98) (1)−AOLS∥+∥AOLS−A∥=O p (max(n−1,T−1/2)), (B.22) whereA(cid:98)(1)isdefinedin(15)andAOLS =(T−1(cid:80)T t=1 F t F′ t−1 )(T−1(cid:80)T t=1 F t−1 F′ t−1 )−1(recallthatF 0 =0 r byAssumption 1(i)). To prove (B.22) we also use the fact that max ∥C(0) ∥ = O(n−1) since it can be obtained by the upper t=1,...,T t,t−1|T right block of P(0) when this is computed from the the Kalman smoother having the augmented state vector (F′F′ )′. t|T t t−1 This proves part (a.4) when k∗ =0. Likewise, using (B.22) and the same reasoning leading to (B.19), by Lemma D.19(v), we can easily prove also that ∥Γ(cid:98) v(1)−Γv∥≤∥Γ(cid:98) v(1)−ΓvOLS∥+∥ΓvOLS−Γv∥=O p (max(n−1,T−1/2)). (B.23) where Γ(cid:98)v(1) is defined in (16) and ΓvOLS = T−1(cid:80)T t=1 (F t −AOLSF t−1 )(F t −AOLSF t−1 )′. To √ prove (B.23) we need to usealsotheintermediatequantityT−1(cid:80)T t=1 (F t −A(cid:98)(1)F t−1 )(F t −A(cid:98)(1)F t−1 )′,whichismin(n, T)-consistentbecauseof (B.22). This proves part (a.5) when k∗ =0. Finally, using again the same reasoning leading to (B.19), by Lemma D.19(iii), we can prove that |σ2(1)−σ2|≤|σ2(1)−σ2OLS|+|σ2OLS−σ2|=O (max(n−1,T−1/2)), (B.24) (cid:98)i i (cid:98)i i i i p whereσ2(1)isdefinedin(14)andσ2OLS =T−1(cid:80)T (x −λOLS′F )2. Toprove(B.24)weneedtousealsotheintermediate (cid:98)i i t=1√ it i t quantityT−1(cid:80)T t=1 (x it −λ(cid:98) ( i 1)′F t )2,whichismin(n, T)-consistentbecauseof (B.19). Thisprovespart(a.3)whenk∗ =0. Page 7

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Now, from (B.19) and (B.24) using the same reasoning of the proof of Lemma D.4(ii), which in turn requires (B.15) and (B.16), again it follows that (cid:12) (cid:12) n−1 (cid:12) (cid:12) (cid:88) n (σ(1)2−σ2) (cid:12) (cid:12)=O (max(n−1,T−1/2)). (B.25) (cid:12) (cid:98)i i (cid:12) p (cid:12) (cid:12) i=1 And, using the same reasoning as in the proof of Lemma D.5 but using now (B.21), (B.24), and (B.25), n−1∥Λ(cid:98) ( n 1)′(Σ(cid:98) ξ n (1))−1Λ(cid:98) ( n 1)−Λ′ n (Σξ n )−1Λ n ∥=O p (max(n−1,T−1/2)), n−1/2∥Λ(cid:98) ( n 1)′(Σ(cid:98) ξ n (1))−1−Λ′ n (Σξ n )−1∥=O p (max(n−1,T−1/2)), (B.26) n∥(Λ(cid:98) ( n 1)′(Σ(cid:98) ξ n (1))−1Λ(cid:98) ( n 1))−1∥=O p (1) n∥(Λ(cid:98) ( n 1)′(Σ(cid:98) ξ n (1))−1Λ(cid:98) ( n 1))−1−(Λ′ n (Σξ n )−1Λ n )−1∥=O p (max(n−1,T−1/2)). From (B.21), (B.22), (B.23), and the relations in (B.26) and following the same steps as in the proofs of Lemmas D.8, D.11, and D.12, we get max ∥P(1) ∥=O (1), max ∥(P(1) )−1∥=O (1), (B.27) t=1,...,T t|t−1 p t=1,...,T t|t−1 p max ∥P(1)∥=O (n−1), max ∥P(1)∥=O (n−1). t=1,...,T t|t p t=1,...,T t|T p and also ∥F(1)∥ = O (1) and ∥F(1)∥ = O (1) by the same arguments in Lemma D.14. It follows that we can apply the t|t p t|T p same steps as in the proofs of Lemmas D.15, D.16, and D.17 to get ∥F(1) −F ∥=O (max(n−1/2,T−1/2)). t|T t p This proves part (b) when k∗ =0. Thenwecanshowthat(B.15)and(B.16)stillholdwhenusingF(1) inplaceofF(0) andusingalsothelastof (B.27) √ t|T √ t|T we prove min(n, T)-consistency of λ(cid:98) ( i 2). Similarly we can prove min(n, T)-consistency of n−1/2Λ(cid:98) ( n 2), A(cid:98)(2), Γ(cid:98)v(2), and σ2(2). It is then clear that we can repeat the same reasoning leading to (B.25), (B.26), and (B.27) but when k∗ =1. So i these arguments hold for all k∗ ≥0. This completes the proof. □ B.2 Proof of Proposition 2 For part (a.1), for any k∗ ≥0, we have (recall that λ(cid:98)i ≡λ(cid:98) ( i k∗+1)) (λ(cid:98)i −λ i )=(λ(cid:98)i −λ(cid:98) ∗ i ∗)+(λ(cid:98) ∗ i ∗−λ(cid:98) ∗ i )+(λ(cid:98) ∗ i −λO i LS)+(λO i LS−λ i ) =L.1+L.2+L.3+L.4, say. (B.28) From Lemma E.23 ∥L.1∥=O (max(n−2log4/δvT,n−1T−1log1/δvT (cid:112) logn,T−3/2(cid:112) logn)). (B.29) p From Lemma E.22(i) ∥L.2∥=O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)). (B.30) p From Lemma E.11(i) ∥L.3∥=O (max(n−1,n−1/2T−1/2,T−1)). (B.31) p From Lemma D.19(i) ∥L.4∥=O (T−1/2). (B.32) p By using (B.29), (B.30), (B.31), and (B.32) into (B.28), we prove part (a.1). Page 8

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm For part (a.2), for any k∗ ≥0, we have (recall that Λ(cid:98)n ≡Λ(cid:98) ( n k∗+1)) (Λ(cid:98)n −Λ n )=(Λ(cid:98)n −Λ(cid:98) ∗ n ∗)+(Λ(cid:98) ∗ n ∗−Λ(cid:98) ∗ n )+(Λ(cid:98) ∗ n −ΛO n LS)+(ΛO n LS−Λ n ), and the proof follows from Lemmas D.19(ii), E.11(ii), and E.25(i), and since n n−1∥Λ(cid:98) ∗ n ∗−Λ(cid:98) ∗ n ∥2 =n−1(cid:88) ∥λ(cid:98) ∗ i ∗−λ(cid:98) ∗ i ∥2 ≤ max ∥λ(cid:98) ∗ i ∗−λ(cid:98) ∗ i ∥2 i=1,...,n i=1 =O (max(n−2log4/δvT,n−1T−1logn,T−2)), (B.33) p by Lemma E.22(i). √ For part (b), from part (a.1) and (B.28), if n−1 Tlog2/δvT →0, as n,T →∞, we have √ √ T(λ(cid:98)i −λ i )= T(λO i LS−λ i )+o p (1) (cid:32) T (cid:33)−1(cid:32) T (cid:33) = T−1(cid:88) F F′ T−1/2(cid:88) F ξ +o (1). (B.34) t t t it p t=1 t=1 Now,since{F ξ }isstronglymixingwithexponentiallydecayingcoefficientsbyBradley(2005,Theorem5.1.a)(seealso t it (E.5) in the proof of Lemma E.1), and given that by Assumption 5 the following Cramér condition holds supr−1/δ(E[|ξ F |m])1/m ≤K, it jt m≥1 (cid:16) (cid:17) for some finite positive reals δ ∈ 0, δvδξ and K independent of t, i, and j (Kuchibhotla and Chakrabortty, 2022, δv+δξ Section 2), then the Central Limit Theorem by Ibragimov (1962, Theorem 1.7) applies, i.e., T (cid:32) T (cid:33) T−1/2(cid:88) F ξ →d N 0 , lim T−1 (cid:88) E(cid:2) F F′ξ ξ (cid:3) . (B.35) t it r t s it is T→∞ t=1 s,t=1 Therefore, by Lemmas C.12(i) and C.13, and Assumption 1(b) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32) T−1(cid:88) t= T 1 F t F′ t (cid:33)−1 −(ΓF)−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32) T−1(cid:88) t= T 1 F t F′ t (cid:33)−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) t= T 1 F t F′ t −ΓF (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ∥(ΓF)−1∥ =O (T−1/2). (B.36) p Thus from (B.34), (B.35), and (B.36), by Slutsky’s Theorem, we have √ T(λ(cid:98)i −λ i )→d N(0 r ,V i ), where, (cid:40) T (cid:41) (cid:40) T (cid:41) V =(ΓF)−1 lim T−1 (cid:88) E(cid:2) F ξ ξ F′(cid:3) (ΓF)−1 = T−1 (cid:88) E[ξ ξ ]E(cid:2) F F′(cid:3) , i t it is s it is t s T→∞ s,t=1 s,t=1 since ΓF =I because of Assumption 6(b) and {F } and {ξ } are independent processes because of Lemma C.11. This r t it proves part (b). Part (c) is straightforward. This completes the proof. □ B.3 Proof of Proposition 3 Recall the definitions F(cid:98)t ≡F(cid:98)t|T ≡F t ( | k T ∗+1), for any k∗ ≥0. From Lemmas F.4(ii) and F.4(iii), ∥F(cid:98)t|T −F t ∥≤∥F(cid:98)t|T −F(cid:98)t|t ∥+∥F(cid:98)t|t −F(cid:98)W t LS∥+∥F(cid:98)W t LS−F t ∥ =∥F(cid:98)W t LS−F t ∥+O p (n−1). (B.37) where F(cid:98)W t LS =(Λ(cid:98)′ n (Σ(cid:98)ξ n )−1Λ(cid:98)n )−1Λ(cid:98)′ n (Σ(cid:98)ξ n )−1x nt . Page 9

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Now, ∥F(cid:98)W t LS−F t ∥≤∥(Λ(cid:98) ′ n (Σ(cid:98) ξ n )−1Λ(cid:98)n )−1Λ(cid:98) ′ n (Σ(cid:98) ξ n )−1(Λ n −Λ(cid:98)n )∥∥F t ∥ +∥(Λ(cid:98) ′ n (Σ(cid:98) ξ n )−1Λ(cid:98)n )−1Λ(cid:98) ′ n (Σ(cid:98) ξ n )−1ξ nt ∥ ≤∥(Λ′ n (Σξ n )−1Λ n )−1Λ′ n (Σξ n )−1(Λ n −Λ(cid:98)n )∥∥F t ∥ +∥(Λ(cid:98) ′ n (Σ(cid:98) ξ n )−1Λ(cid:98)n )−1Λ(cid:98) ′ n (Σ(cid:98) ξ n )−1−(Λ′ n (Σξ n )−1Λ n )−1Λ′ n (Σξ n )−1∥ ·∥Λ n −Λ(cid:98)n ∥∥F t ∥+∥(Λ′ n (Σξ n )−1Λ n )−1Λ′ n (Σξ n )−1ξ nt ∥ +∥(Λ(cid:98) ′ n (Σ(cid:98) ξ n )−1Λ(cid:98)n )−1Λ(cid:98) ′ n (Σ(cid:98) ξ n )−1ξ nt −(Λ′ n (Σξ n )−1Λ n )−1Λ′ n (Σξ n )−1ξ nt ∥ =A+B+C+D, say. (B.38) Let us consider each term in (B.38). First, consider term A and notice that n−1/2∥Λ(cid:98)n −ΛO n LS∥≤n−1/2∥Λ(cid:98)n −Λ(cid:98) ∗ n ∗∥+n−1/2∥Λ(cid:98) ∗ n ∗−Λ(cid:98) ∗ n ∥+n−1/2∥Λ(cid:98) ∗ n −ΛO n LS∥ =O (max(n−2log4/δvT,n−1T−1log1/δvT (cid:112) logn,T−3/2logn)) p +O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)) p +O (max(n−1log2/δvT,n−1/2T−1/2,T−1)) p =O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)), (B.39) p by Lemmas E.11(ii), E.22(i), and E.25(i) (see also (B.33) in the proof of Proposition 2). Therefore, from (B.39) A≤∥(Λ′(Σξ)−1Λ )−1Λ′(Σξ)−1(Λ −ΛOLS)∥∥F ∥ n n n n n n n t +n∥(Λ′ n (Σξ n )−1Λ n )−1∥n−1/2∥Λ n ∥∥(Σξ n )−1∥n−1/2∥Λ(cid:98)n −ΛO n LS∥∥F t ∥ ={A.1+A.2}∥F ∥, say. (B.40) t Then, A.1≤n∥(Λ′(Σξ)−1Λ )−1∥n−1∥Λ′(Σξ)−1(Λ −ΛOLS)∥ n n n n n n n =n∥(Λ′ n (Σξ n )−1Λ n )−1∥n−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) t= T 1 Λ′ n (Σξ n )−1ξ nt F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32) T−1(cid:88) t= T 1 F t F′ t (cid:33)−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) =O (n−1/2T−1/2), (B.41) p √ by Lemmas C.3(iii), C.8(iv), and C.13. Moreover, A.2 = O (max(n−1log2/δvT,n−1/2T−1/2 logn,T−1)), because of p (B.39) and Lemmas C.2, C.3(iii), and Assumption 2(a) which implies ∥(Σξ)−1∥ ≤ C . This, jointly with (B.40) and n ξ (B.41) implies that A=O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)), (B.42) p since ∥F ∥=O (1) because E[F2]=1, j =1,...,r, by Assumption 6(b). t p jt Second, by Proposition 2(a) and Lemma F.2(v) B=∥n(Λ(cid:98) ′ n (Σ(cid:98) ξ n )−1Λ(cid:98)n )−1n−1/2Λ(cid:98) ′ n (Σ(cid:98) ξ n )−1−n(Λ′ n (Σξ n )−1Λ n )−1n−1/2Λ′ n (Σξ n )−1∥n−1/2∥Λ n −Λ(cid:98)n ∥∥F t ∥ =O (max(n−2log4/δvT,T−1(cid:112) logn,n−1T−1/2log2/δvT (cid:112) logn)), (B.43) p and again since ∥F ∥=O (1) because E[F2]=1, j =1,...,r, by Assumption 6(b). t p jt Third, C≤n∥(Λ′(Σξ)−1Λ )−1∥n−1∥Λ′(Σξ)−1ξ ∥=O (n−1/2), (B.44) n n n n n nt p by Lemmas C.3(iii) and C.7(i). Page 10

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Fourth, and last, D≤n∥(Λ(cid:98) ′ n (Σ(cid:98) ξ n )−1Λ(cid:98)n )−1−(Λ′ n (Σξ n )−1Λ n )−1∥n−1∥Λ′ n (Σξ n )−1ξ nt ∥ +n∥(Λ′ n (Σξ n )−1Λ n )−1∥n−1∥{Λ(cid:98) ′ n (Σ(cid:98) ξ n )−1−Λ′ n (Σξ n )−1}ξ nt ∥ +n∥(Λ(cid:98) ′ n (Σ(cid:98) ξ n )−1Λ(cid:98)n )−1−(Λ′ n (Σξ n )−1Λ n )−1∥n−1∥{Λ(cid:98) ′ n (Σ(cid:98) ξ n )−1−Λ′ n (Σξ n )−1}ξ nt ∥ =D.1+D.2+D.3, say. (B.45) √ Then, D.1=O (max(n−3/2log2/δvT,n−1/2T−1/2 logn)), by Lemmas C.7(i) and F.2(iv). Moreover, p D.2=n∥(Λ′(Σξ)−1Λ )−1∥ n n n (cid:110) (cid:111) · n−1∥(Λ(cid:98)n −Λ n )′(Σξ n )−1ξ nt ∥+n−1∥Λ′ n [(Σ(cid:98) ξ n )−1−(Σξ n )−1]ξ nt ∥+n−1∥(Λ(cid:98)n −Λ n )′[(Σ(cid:98) ξ n )−1−(Σξ n )−1]ξ nt ∥ =n∥(Λ′(Σξ)−1Λ )−1∥{D.2.a+D.2.b+D.2.c}, say. n n n We then have the following results. First, (cid:13) (cid:13) D.2.a=n−1 (cid:13) (cid:13) (cid:13) (cid:88) n (λ(cid:98)i −λ i )(σ i 2)−1ξ it (cid:13) (cid:13) (cid:13) ≤ max ∥λ(cid:98)i −λ i ∥n−1∥(Σξ n )−1ξ nt ∥ (cid:13) (cid:13) i=1,...,n i=1 =O (max(n−3/2log2/δvT,n−1/2T−1/2(cid:112) logn)), (B.46) p by Lemmas C.7(i) and F.1(i). Second, (cid:13) (cid:13) (cid:13) (cid:13) D.2.b=n−1 (cid:13) (cid:13) (cid:88) n λ {(σ2)−1−(σ2)−1}ξ (cid:13) (cid:13)=n−1 (cid:13) (cid:13) (cid:88) n λ (σ2σ2)−1{σ2−σ2}ξ (cid:13) (cid:13) (cid:13) i (cid:98)i i it(cid:13) (cid:13) i i(cid:98)i (cid:98)i i it(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) i=1 i=1 ≤ max |σ2−σ2| (cid:40) n−1 (cid:13) (cid:13) (cid:13) (cid:88) n λ {σ2(σ2−σ2)}−1ξ (cid:13) (cid:13) (cid:13)+n−1 (cid:13) (cid:13) (cid:13) (cid:88) n λ (σ4)−1ξ (cid:13) (cid:13) (cid:13) (cid:41) (cid:98)i i (cid:13) i i (cid:98)i i it(cid:13) (cid:13) i i it(cid:13) i=1,...,n (cid:13) (cid:13) (cid:13) (cid:13) i=1 i=1 =O (max(n−3/2log2/δvT,n−1/2T−1/2(cid:112) logn)), (B.47) p by Lemmas C.7(i) and F.1(ii). Third, clearly by (B.46) and (B.47) D.2.c=o (max(n−3/2log2/δvT,n−1/2T−1/2(cid:112) logn)). (B.48) p √ By (B.46), (B.47), and (B.48), and Lemma C.3 we have D.2=O (max(n−3/2log2/δvT,n−1/2T−1/2 logn). Last, D.3= p O (max(n−2log4/δvT,T−1logn))D.2, by Lemma F.2(ii), thus it is dominated by D.2. Therefore, p D=O (max(n−3/2log2/δvT,n−1/2T−1/2(cid:112) logn). (B.49) p By substituting (B.42), (B.43), (B.44), and (B.49), into (B.38) we have ∥F(cid:98)W t LS−F t ∥=O p (max(n−1/2,T−1(cid:112) logn)), (B.50) which, once substituted into (B.37), proves part (a.1). For part (a.2), let F(cid:98) K T F = (F(cid:98)1|1 ···F(cid:98)T|T )′ and F(cid:98) W T LS = (F(cid:98)W 1 LS···F(cid:98)W T LS)′, and recall that, by definition, F(cid:98)T = (F(cid:98)1|T ···F(cid:98)T|T )′ =(F(cid:98)1 ···F(cid:98)T )′. From (B.38), we have T−1/2∥F(cid:98)T −F T ∥≤T−1/2∥F(cid:98)T −F(cid:98) K T F ∥+T−1/2∥F(cid:98) K T F −F(cid:98) W T LS ∥+T−1/2∥F(cid:98) W T LS −F T ∥. (B.51) By Lemma F.5 the first two terms on the rhs of (B.51) are such that T−1/2∥F(cid:98)T −F(cid:98) K T F ∥≤ max ∥F(cid:98)t −F(cid:98)t|t ∥=O p (n−1log1/δvT), t=1,...,T T−1/2∥F(cid:98) K T F −F(cid:98) W T LS ∥≤ max ∥F(cid:98)t|t −F(cid:98)W t LS∥=O p (n−1log1/δvT). (B.52) t=1,...,T Page 11

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm While, for the last term on the rhs of (B.51), letting E =(ξ ···ξ )′, we have nT n1 nT T−1/2∥F(cid:98) W T LS −F T ∥≤n∥(Λ(cid:98) ′(Σ(cid:98) ξ n )−1Λ(cid:98)n )−1∥ (cid:110) (cid:111) ·n−1T−1/2 ∥Λ(cid:98) ′(Σ(cid:98) ξ n )−1(Λ n −Λ(cid:98)n )F′ T ∥+∥{Λ(cid:98) ′(Σ(cid:98) ξ n )−1−Λ′(Σξ n )−1}E′ nT ∥+∥Λ′(Σξ n )−1E′ nT ∥ =n∥(Λ(cid:98) ′(Σ(cid:98) ξ n )−1Λ(cid:98)n )−1∥{A+B+C}, say. (B.53) For the first term on the rhs of (B.53) we have A≤n−1∥Λ′(Σξ n )−1(Λ n −Λ(cid:98)n )∥T−1/2∥F T ∥+n−1/2∥Λ(cid:98) ′(Σ(cid:98) ξ n )−1−Λ′(Σξ n )−1∥n−1/2∥Λ(cid:98)n −Λ n ∥T−1/2∥F T ∥ ={A +A }T−1/2∥F ∥, say, (B.54) 1 2 T where A 1 ≤n−1∥Λ′(Σξ n )−1(Λ n −ΛO n LS)∥+n−1/2∥Λ∥∥(Σξ n )−1∥n−1/2∥ΛO n LS−Λ(cid:98)n ∥ =O (n−1/2T−1/2)+O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)) p p =O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)), (B.55) p by (B.39), (B.41), Lemma C.2, and Assumption 2(a), while A =O (max(T−1/2(cid:112) logn,n−1log2/δvT))O (max(T−1/2,n−1log2/δvT)), (B.56) 2 p p by Proposition 2(a) and F.2(ii). By substituting (B.55) and (B.56) into (B.54) A=O (max(n−1log2/δvT,T−1(cid:112) logn,n−1/2T−1/2(cid:112) logn)). (B.57) p Moving to the second term on the rhs of (B.53), we have B≤n−1T−1/2∥(Λ(cid:98)n −Λ n )′(Σξ n )−1E′ nT ∥+n−1T−1/2∥Λ′ n {(Σ(cid:98) ξ n )−1−(Σξ n )−1}E′ nT ∥ +n−1T−1/2∥(Λ(cid:98)n −Λ n )′{(Σ(cid:98) ξ n )−1−(Σξ n )−1}E′ nT ∥ =B +B +B , say. (B.58) 1 2 3 Then, considering each term on the rhs of (B.58), B 1 ≤n−1T−1/2∥(ΛO n LS−Λ n )′(Σξ n )−1E′ nT ∥+n−1/2∥Λ(cid:98)n −ΛO n LS∥∥(Σξ n )−1∥n−1/2T−1/2∥E′ nT ∥ =B +B say, 1.a 1.b where B 1.a ≤n−1T−1/2 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) t= T 1 F t ξ n ′ t (Σξ n )−1E′ nT (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32) T−1(cid:88) t= T 1 F t F′ t (cid:33)−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) =O p (n−1/2T−1/2), √ by Lemmas C.8(vi) and C.13. Whereas B = O (max(n−1log2/δvT,n−1/2T−1/2 logn,T−1)) by (B.39) and Lemma 1.b p C.7(vi). Therefore, B =O (max(max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)). (B.59) 1 p Page 12

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Moreover, letting ζ =(ξ ···ξ )′, i i1 iT B 2 =n−1T−1/2∥Λ′ n (Σ(cid:98) ξ n )−1{Σξ n −Σ(cid:98) ξ n }(Σξ n )−1E nT ∥ (cid:13) (cid:13) =n−1T−1/2 (cid:13) (cid:13) (cid:88) n λ ζ′(σ2σ2)−1{σ2−σ2} (cid:13) (cid:13) (cid:13) i i (cid:98)i i i (cid:98)i (cid:13) (cid:13) (cid:13) i=1 (cid:13) (cid:13) ≤C max |σ2−σ2| (cid:26) min σ2 (cid:27)−1 n−1T−1/2 (cid:13) (cid:13) (cid:88) n λ ζ′ (cid:13) (cid:13) ξ i (cid:98)i (cid:98)i (cid:13) i i(cid:13) i=1,...,n i=1,...,n (cid:13) (cid:13) i=1 ≤C ξ max |σ i 2−σ (cid:98)i 2|∥(Σ(cid:98) ξ n )−1∥n−1T−1/2∥Λ′ n E′ nT ∥ i=1,...,n =O (max(n−1log2/δvT,T−1/2(cid:112) logn))O (n−1/2), (B.60) p p by Assumption 2(a) and Lemmas F.1(ii), F.1(ii), and Last, B 3 ≤n−1/2∥Λ(cid:98)n −Λ n ∥∥(Σ(cid:98) ξ n )−1−(Σξ n )−1∥n−1/2T−1/2∥E nT ∥ =O (max(T−1/2,n−1log2/δvT))·O (max(T−1/2(cid:112) logn,n−1log2/δvT)), (B.61) p p by Proposition 2(a) and Lemmas C.7(vi) and F.1(v). By substituting (B.59), (B.60), and (B.61) into (B.58) B=O (n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1). (B.62) p Finally, for the third term on the rhs of (B.53), we have C =O (n−1/2), (B.63) p by Lemma C.7(v). By substituting (B.57), (B.62), and (B.63) into (B.53), and since n∥(Λ(cid:98)′(Σ(cid:98)ξ n )−1Λ(cid:98)n )−1∥=O p (1) by Lemma F.2(iii), T−1/2∥F(cid:98) W T LS −F T ∥=O p (max(n−1/2,T−1(cid:112) logn)), which, once substituted in (B.51) together with (B.52), proves part (a.2). √ Turning to part (b), from (B.37), (B.38), (B.44) and (B.50), if T−1 nlogn→0, as n,T →∞, we have √ √ n(F(cid:98)t −F t )= n(F(cid:98)W t LS−F t )+o p (1) (cid:40) n (cid:41) =n(Λ′(Σξ)−1Λ )−1 n−1/2(cid:88) λ ξ (σ2)−1 +o (1). (B.64) n n n i it i p i=1 Then, by Assumption 2(e), as n→∞, it holds that n (cid:32) n (cid:33) n−1/2(cid:88) λ ξ (σ2)−1 →d N 0 , lim n−1 (cid:88) λ λ′E[ξ ξ ](σ2σ2)−1 . (B.65) i it i r i j it jt i j n→∞ i=1 i,j=1 Moreover, from Lemma C.3(iii) we have that ∥n−1Λ′(Σξ)−1Λ −Σ ∥=o (1), n n n ΛΣΛ p for some finite and positive definite r×r matrix Σ , which, jointly with Lemma C.3(v), implies ΛΣΛ ∥n(Λ′(Σξ)−1Λ )−1−(Σ )−1∥=o (1). (B.66) n n n ΛΣΛ p From (B.64), (B.65), and (B.66), by Slutsky’s Theorem, we have √ n(F(cid:98)t −F t )→d N(0 r ,W t ), Page 13

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm where (cid:40) n (cid:41) W =(Σ )−1 lim n−1 (cid:88) λ λ′E[ξ ξ ](σ2σ2)−1 (Σ )−1. t ΛΣΛ i j it jt i j ΛΣΛ n→∞ i,j=1 This proves part (b). Part (c) is straightforward. This completes the proof. □ B.4 Proof of Proposition 4 First notice that χ (cid:98)it −χ it =(F(cid:98)t −F t )′λ i +F(cid:98) ′ t (λ(cid:98)i −λ i )=(F(cid:98)t −F t )′λ i +F′ t (λ(cid:98)i −λ i )+(F(cid:98)t −F t )′(λ(cid:98)i −λ i ). (B.67) Then, |χ (cid:98)it −χ it |≤∥λ i ∥∥F(cid:98)t −F t ∥+∥λ(cid:98)i −λ i ∥∥F t ∥+∥λ(cid:98)i −λ i ∥∥F(cid:98)t −F t ∥ =O (max(n−1/2,T−1(cid:112) logn))+O (max(T−1/2,n−1log2/δvT))+o (max(n−1/2,T−1/2)) p p p =O (max(n−1/2,T−1/2)), (B.68) p byPropositions2(a)and3(a),Assumption1(a),andsince∥F ∥=O (1)becauseE[F2]=1,j =1,...,r,byAssumption t p jt 6(b). This proves part (a). √ √ Forpart(b),letusdenoteδ =min( n, T),forsimplicityofnotation. Considerthefirsttermontherhsof (B.67). nT Define K =n(Λ′(Σξ)−1Λ )−1. Then, from (B.64) in the proof of Proposition 3 Λ n n n n δ n,T λ′ i (F(cid:98)t −F t )=δ n,T n−1λ′ i K Λ (cid:88) λ′ j (σ j 2)−1ξ jt +O p (δ n,T T−1(cid:112) logn) j=1 =δ n−1/2A +o (1), say, (B.69) nT it p since δ T−1 ≤ δ max(n−1,T−1) = δ δ−2 → 0. Similarly, consider the second term on the rhs of (B.67) and define nT nT nT nT K =(T−1(cid:80)T F F′)−1. Then, from (B.34) in the proof of Proposition 2 F t=1 t t T δ n,T F′ t (λ(cid:98)i −λ i )=δ n,T T−1F′ t K F (cid:88) F s ξ is +O p (δ n,T n−1log2/δvT) s=1 =δ T−1/2B +o (1), say, (B.70) nT it p since δ n−1 ≤δ max(n−1,T−1)=δ δ−2 →0. From Propositions 2(b) and 3(b), as n,T →∞, nT nT nT nT A →d N(0,CF) and B →d N(0,Cλ), (B.71) it it it it where CF = λ′W λ and Cλ = F′V F . Moreover, A and B are asymptotically independent, since the former is a it i t i it t i t it it cross-sectionalsumofrandomvariables,whilethelatteristhesumofagiventimeseriesandunderLemmasC.1(i)-C.1(iii) are weakly serially and cross-sectionally correlated in the same sense as assumed by Bai (2003, Assumption C). Define a =δ n−1/2 and b =δ T−1/2. Then, substituting (B.69) and (B.70) into (B.67), we obtain nT nT nT nT δ nT (χ (cid:98)it −χ it )=a nT A it +b nT B it +δ nT (F(cid:98)t −F t )′(λ(cid:98)i −λ i )+o p (1) =a A +b B +o (δ max(n−1/2,T−1/2))+o (1) nT it nT it p nT p =a A +b B +o (1), (B.72) nT it nT it p becauseof (B.68). From(B.71)and(B.72),bySlutsky’stheoremandfollowingthesamereasoningasinBai(2003,proof of Theorem 3), as n,T →∞, we have δ (cid:16) a2 CF +b2 Cλ (cid:17)−1/2 (χ −χ )=(n−1CF +T−1Cλ)−1/2(χ −χ )→d N(0,1), nT nT it nT it (cid:98)it it it it (cid:98)it it which completes the proof. □ Page 14

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm B.5 Proof of Proposition 5 For part (a.1), for any k∗ ≥0, we have (recall that σ2 ≡σ2(k∗+1)) (cid:98)i (cid:98)i (σ2−σ2)=(σ2−σ2∗∗)+(σ2∗∗−σ2∗)+(σ2∗−σ2OLS)+(σ2OLS−σ2) (cid:98)i i (cid:98)i (cid:98)i (cid:98)i (cid:98)i (cid:98)i i i i =O (max(n−1log2/δvT,T−1/2)) p +O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)) p +O (max(n−1log2/δvT,n−1/2T−1/2,T−1))+O (T−1/2), p p by Lemmas D.19(iii), E.11(iii), E.22(ii), and E.24(i). For part (a.2), for any k∗ ≥0, we have (recall that Σ(cid:98)ξ n ≡Σ(cid:98) ξ n (k∗+1)) (Σ(cid:98) ξ n −Σξ n )=(Σ(cid:98) ξ n −Σ(cid:98) ξ n ∗∗)+(Σ(cid:98) ξ n ∗∗−Σ(cid:98) ξ n ∗)+(Σ(cid:98) ξ n ∗−Σξ n ), and the proof follows from Lemma E.14(iii) and since (cid:18) (cid:19)1/2 ∥Σ(cid:98) ξ n ∗∗−Σ(cid:98) ξ n ∗∥= max |σ (cid:98)i 2∗∗−σ (cid:98)i 2∗|2 ≤ max |σ (cid:98)i 2∗∗−σ (cid:98)i 2∗| i=1,...,n i=1,...,n =O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)), p by Lemma E.22(i), and (cid:18) (cid:19)1/2 ∥Σ(cid:98) ξ n −Σ(cid:98) ξ n ∗∗∥= max |σ (cid:98)i 2−σ (cid:98)i 2∗∗|2 ≤ max |σ (cid:98)i 2−σ (cid:98)i 2∗∗| i=1,...,n i=1,...,n =O (max(n−1log2/δvT,T−1/2(cid:112) logn)), p by Lemma E.25(ii). For part (a.3), for any k∗ ≥0, we have (recall that A(cid:98) ≡A(cid:98)(k∗+1)) (A(cid:98) −A)=(A(cid:98) −A(cid:98) ∗∗)+(A(cid:98) ∗∗−A(cid:98) ∗)+(A(cid:98) ∗−AOLS)+(AOLS−A) =O (max(n−1log2/δvT,T−1/2)) p +O (n−1log2/δvT)+O (n−1log2/δvT)+O (T−1/2), (B.73) p p p by Lemmas D.19(iv), E.12(i), E.22(iii), and E.24(ii). For part (a.4), for any k∗ ≥0, we have (recall that Γ(cid:98)v ≡Γ(cid:98)v(k∗+1)) (Γ(cid:98) v−Γv)=(Γ(cid:98) v−Γ(cid:98) v∗∗)+(Γ(cid:98) v∗∗−Γ(cid:98) v∗)+(Γ(cid:98) v∗−ΓvOLS)+(ΓvOLS−Γv) =O (max(n−1log2/δvT,T−1/2)) p +O (n−1log2/δvT)+O (n−1log2/δvT)+O (T−1/2), p p p by Lemmas D.19(v), E.12(ii), E.22(iv), and E.24(iii). This completes the proof of part (a). To prove part (b), we need sharper rates and, to this end, we use the closed form expressions of the estimated parameters in (14), (15), and (16). Page 15

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Start with part (b.1). From (14), for any k∗ ≥0, we have T σ (cid:98)i 2 =T−1(cid:88) (x it −λ(cid:98) ′ i F t )2 t=1 (cid:26) T T T (cid:27) +λ(cid:98) ′ i 2T−1(cid:88) (F( t| k T ∗)−F t )F′ t +T−1(cid:88) (F( t| k T ∗)−F t )(F( t| k T ∗)−F t )′+T−1(cid:88) P( t| k T ∗) λ(cid:98)i t=1 t=1 t=1 T −2λ(cid:98) ′ i T−1(cid:88) (F t ( | k T ∗)−F t )x it t=1 T (cid:40) T (cid:41) T =T−1(cid:88) (x it −λO i LS′F t )2+(λ(cid:98)i −λO i LS)′ T−1(cid:88) F t F′ t (λ(cid:98)i −λO i LS)−2(λ(cid:98)i −λO i LS)′T−1(cid:88) F t (x it −λ i OLS′F t ) t=1 t=1 t=1 (cid:26) T T T (cid:27) +λ(cid:98) ′ i 2T−1(cid:88) (F( t| k T ∗)−F t )F′ t +T−1(cid:88) (F( t| k T ∗)−F t )(F( t| k T ∗)−F t )′+T−1(cid:88) P( t| k T ∗) λ(cid:98)i t=1 t=1 t=1 T −2λ(cid:98) ′ i T−1(cid:88) (F t ( | k T ∗)−F t )x it . (B.74) t=1 Therefore, since by construction T−1(cid:80)T F (x −λOLS′F )=0, from (B.74) t=1 t it i t (cid:12) (cid:12) (cid:13) (cid:13) (cid:12) (cid:12) (cid:12) σ (cid:98)i 2−T−1(cid:88) T (x it −λO i LS′F t )2 (cid:12) (cid:12) (cid:12) ≤∥λ(cid:98)i −λO i LS∥2 (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) (cid:12) (cid:12) (cid:13) (cid:13) t=1 t=1 (cid:26) (cid:13) T (cid:13) (cid:13) T (cid:13) +∥λ(cid:98)i ∥2 2 (cid:13) (cid:13) (cid:13) T−1(cid:88) (F( t| k T ∗)−F t )F′ t (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) T−1(cid:88) P( t| k T ∗)(cid:13) (cid:13) (cid:13) t=1 t=1 (cid:13) T (cid:13)(cid:27) (cid:13) T (cid:13) + (cid:13) (cid:13) (cid:13) T−1(cid:88) (F( t| k T ∗)−F t )(F( t| k T ∗)−F t )′(cid:13) (cid:13) (cid:13) +2∥λ(cid:98)i ∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) (F( t| k T ∗)−F t )x it (cid:13) (cid:13) (cid:13) t=1 t=1 (cid:13) (cid:13) =∥λ(cid:98)i −λO i LS∥2 (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) +∥λ(cid:98)i ∥2{2A+B+C}+2∥λ(cid:98)i ∥D, say. (B.75) (cid:13) (cid:13) t=1 For term A on the rhs of (B.75) we have A=O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1(cid:112) logn)), (B.76) p by Lemma G.2(i). Term B is dominated by term A. For term C on the rhs of (B.75) we have C ≤ max ∥P(k∗)∥=O (n−1), (B.77) t=1,...,T t|T p by Lemma F.3(iv) when k∗ ≥1 and by Lemma D.12 when k∗ =0. For term D on the rhs of (B.75) we have (cid:13) T (cid:13) (cid:13) T (cid:13) D≤ (cid:13) (cid:13) (cid:13) T−1(cid:88) (F t ( | k T ∗)−F t )F t (cid:13) (cid:13) (cid:13) ∥λ i ∥+ (cid:13) (cid:13) (cid:13) T−1(cid:88) (F( t| k T ∗)−F t )ξ it (cid:13) (cid:13) (cid:13) t=1 t=1 =O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1(cid:112) logn)), (B.78) p by Lemmas G.2(i) and G.2(ii), and Assumption 1(a). Now by substituting (B.76), (B.77), and (B.78) into (B.75), we have (cid:12) (cid:12) (cid:13) (cid:13) (cid:12) (cid:12) (cid:12) σ (cid:98)i 2−T−1(cid:88) T (x it −λO i LS′F t )2 (cid:12) (cid:12) (cid:12) =|σ (cid:98)i 2−σ i 2OLS|=∥λ(cid:98)i −λO i LS∥2 (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) (B.79) (cid:12) (cid:12) (cid:13) (cid:13) t=1 t=1 +O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1(cid:112) logn)) p =O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1(cid:112) logn)), p by (B.28)-(B.31) in the proof of Proposition 2, Lemma C.12(i) combined with Assumption 6(b), and since ∥λ(cid:98)i ∥ ≤ ∥λ(cid:98)i −λ i ∥+∥λ i ∥=O p (1) by Proposition 2 and Assumption 1(a). Page 16

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm √ Therefore, from (B.79) and Lemma D.19(i), if n−1 Tlog2/δvT →0, as n,T →∞, we have √ √ T(σ2−σ2)= T(σ2OLS−σ2)+o (1) (cid:98)i i i i p T √ (cid:40) T (cid:41) √ =T−1/2(cid:88) ξ2 + T(λOLS−λ )′ T−1(cid:88) F F′ (λOLS−λ )− Tσ2+o (1) it i i t t i i i p t=1 t=1 T =T−1/2(cid:88) {ξ2 −σ2}+o (1), (B.80) it i p t=1 where we used also (D.62) in the proof of Lemma D.19. Now, by Assumption 2(c) and Davidson (1994, Theorem 14.1), we have that {ξ2 −σ2} is strongly mixing with exponentially decaying coefficients and such that by Assumption it i 5 sup m≥1 r−1/δξ(E[|ξ i 2 t − σ i 2|m])1/m ≤ K 1 for some finite positive real K 1 independent of t and i (Kuchibhotla and Chakrabortty, 2022, Section 2). Then, the Central Limit Theorem by Ibragimov (1962, Theorem 1.7) applies, i.e., T T−1/2(cid:88) {ξ2 −σ2}→d N(0,E[ξ4]−σ4). (B.81) it i it i t=1 From (B.80), (B.81), and Slutsky’s Theorem, and by noticing that the excess kurtosis is given by κ = E[ξ4]/σ4−3, so i it i that E[ξ4]=σ4(κ +3), we prove part (b.1). it i i For part (b.2), from (15), for any k∗ ≥0, we have ∥A(cid:98) −AOLS∥≤ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) t= T 2 (cid:16) F t ( | k T ∗)F t ( − k∗ 1 ) | ′ T +C( t, k t ∗ − ) 1|T (cid:17) −T−1(cid:88) t= T 2 F t F′ t−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32) T−1(cid:88) t= T 2 F t−1 F′ t−1 (cid:33)−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:40) T−1(cid:88) t= T 2 (cid:16) F t ( − k∗ 1 ) |T F( t− k∗ 1 ) | ′ T +P( t− k∗ 1 ) |T (cid:17) (cid:41)−1 − (cid:40) T−1(cid:88) t= T 2 F t−1 F′ t−1 (cid:41)−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) t= T 2 F t F′ t−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13)T−1(cid:88) T (cid:16) F(k∗)F(k∗)′ +C(k∗) (cid:17) −T−1(cid:88) T F F′ (cid:13) (cid:13) (cid:13) t|T t−1|T t,t−1|T t t−1(cid:13) (cid:13) (cid:13) t=2 t=2 (cid:13) (cid:13) · (cid:13) (cid:13) (cid:13) (cid:13) (cid:40) T−1(cid:88) T (cid:16) F t ( − k∗ 1 ) |T F t ( − k∗ 1 ) | ′ T +P t ( − k∗ 1 ) |T (cid:17) (cid:41)−1 − (cid:40) T−1(cid:88) T F t−1 F′ t−1 (cid:41)−1(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=2 t=2 (cid:13) =A+B+C, say. (B.82) Now, A≤ (cid:40)(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) t= T 2 F t ( | k T ∗)F( t− k∗ 1 ) | ′ T −T−1(cid:88) t= T 2 F t F′ t−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) + t= m 2, a .. x .,T ∥C( t, k t ∗ − ) 1|T ∥ (cid:41) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32) T−1(cid:88) t= T 2 F t−1 F′ t−1 (cid:33)−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) =O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1(cid:112) logn)), (B.83) p and B≤ (cid:40)(cid:13) (cid:13) (cid:13)T−1(cid:88) T F(k∗) F(k∗)′ −T−1(cid:88) T F F′ (cid:13) (cid:13) (cid:13)+ max ∥P(k∗) ∥ (cid:41) (cid:13) (cid:13) t−1|T t−1|T t−1 t−1(cid:13) (cid:13) t=2,...,T t−1|T t=2 t=2 · (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:40) T−1(cid:88) t= T 2 (cid:16) F t ( − k∗ 1 ) |T F( t− k∗ 1 ) | ′ T +P( t− k∗ 1 ) |T (cid:17) (cid:41)−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32) T−1(cid:88) t= T 2 F t−1 F′ t−1 (cid:33)−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) t= T 2 F t F′ t−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) =O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1(cid:112) logn)), (B.84) p where we used: Lemma G.2(i), Lemma F.3(iv) when k∗ ≥1 or Lemma D.12 when k∗ =0 which can be both applied to C(k∗) canbeobtainedbytheupperrightblockofP(k∗) whenthisiscomputedfromthetheKalmansmootherhaving t,t−1|T t|T the augmented state vector (F′F′ )′, and we also used the fact that ∥(T−1(cid:80)T F F′ )−1∥ = O (1) by Lemma t t−1 t=2 t−1 t−1 p (C.13). For B we also used Lemma C.12(i) combined with the fact that ∥ΓF∥ ≤ 1 by Cauchy-Schwartz inequality and 1 Assumption 6(b). Clearly C is dominated by A and B. Page 17

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm By substituting (B.83) and (B.84) into (B.82), we have ∥A(cid:98) −AOLS∥=O p (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1(cid:112) logn)). √ Therefore, if n−1 Tlog2/δvT →0, as n,T →∞, we have √ √ T(vec(A(cid:98))−vec(A))= T(vec(AOLS)−vec(A))+o p (1), and the proof of part (b.2) follows directly from Hamilton (1994, Proposition 11.2) and Slutsky’s Theorem. For part (b.3), following a decomposition analogous to the one in (B.74), from (16), for any k∗ ≥0, we have T (cid:40) T (cid:41) Γ(cid:98) v =T−1(cid:88) (F t −A(cid:98)F t−1 )(F t −AOLSF t−1 )′+(A(cid:98) −AOLS) T−1(cid:88) F t−1 F′ t−1 (A(cid:98) −AOLS)′ t=2 t=2 T T −(A(cid:98) −AOLS)T−1(cid:88) F t−1 (F t −AOLS′F t−1 )′−T−1(cid:88) (F t −AOLS′F t−1 )F′ t−1 (A(cid:98) −AOLS)′ t=2 t=2 T T (cid:40) T T (cid:41) +T−1(cid:88) (F t ( | k T ∗)F t ( | k T ∗)′+P( t| k T ∗))−T−1(cid:88) F t F′ t +A(cid:98) T−1(cid:88) (F( t− k∗ 1 ) |T F( t− k∗ 1 ) | ′ T +P( t− k∗ 1 ) |T )−T−1(cid:88) F t−1 F′ t−1 A(cid:98) ′ t=2 t=2 t=2 t=2 (cid:40) T T (cid:41) −A(cid:98) T−1(cid:88) (F t ( − k∗ 1 ) |T F( t| k T ∗)′+C( t, k t ∗ − ) 1 ′ |T )−T−1(cid:88) F t−1 F′ t t=2 t=2 (cid:40) T T (cid:41) − T−1(cid:88) (F t ( − k∗ 1 ) |T F t ( | k T ∗)′+C( t, k t ∗ − ) 1|T )−T−1(cid:88) F t F′ t−1 A(cid:98) ′. t=2 t=2 Therefore, since by construction T−1(cid:80)T F (F −AOLS′F )′ = T−1(cid:80)T (F −AOLS′F )F′ = 0 , by using t=2 t−1 t t−1 t=2 t t−1 t−1 r×r againLemmasG.2andLemmaF.3(iv)whenk∗ ≥1orLemmaD.12whenk∗ =0whichcanbebothappliedtoC(k∗) , t,t−1|T as argued above, and using also (B.73), we obtain ∥Γ(cid:98) v−ΓvOLS∥=O p (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1(cid:112) logn)). √ Therefore, if n−1 Tlog2/δvT →0, as n,T →∞, we have √ √ T(vech(Γ(cid:98) v)−vech(Γv))= T(vech(ΓvOLS)−vech(Γv))+o p (1), andtheproofofpart(b.3)followsdirectlyfromHamilton(1994,Proposition11.2)andSlutsky’sTheorem. Thiscompletes the proof. □ B.6 Proof of Corollary 1 For part (a), for ease of notation and without loss of generality let s(j)=j for all j =1,...,n¯. Then, from (B.28), since n¯ is finite,   √ T(vec(Λ(cid:98)n¯ )−vec(Λ n¯ ))=  I n¯ ⊗ (cid:32) T−1(cid:88) T F t F′ t (cid:33)−1  (cid:32) T−1/2(cid:88) T vec(F t ξ n ′ ¯t ) (cid:33) +o p (1). (B.85)   t=1 t=1 Moreover, since n¯ is finite we can still apply the Central Limit Theorem by Ibragimov (1962, Theorem 1.4) so that, as T →∞, T T−1/2(cid:88) vec(F ξ′ )→d N(0 ,Σ ), (B.86) t n¯t n¯r n¯ t=1 Page 18

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm with     ′ F ξ F ξ Σ n¯ = lim E     T−1/2(cid:88) T   t . . . 1t     T−1/2(cid:88) T   t . . . 1t       T→∞  t=1  F ξ   t=1  F ξ   t n¯t t n¯t T T = lim T−1 (cid:88) E(cid:2) ξ ξ′ ⊗F F′(cid:3) = lim T−1 (cid:88) E(cid:2) ξ ξ′ (cid:3) ⊗E(cid:2) F F′(cid:3) , (B.87) n¯t n¯s t s n¯t n¯s t s T→∞ T→∞ t,s=1 t,s=1 where we used the fact that {F } and {ξ } are independent processes because of Lemma C.11. The proof of part t nt (a.1) follows from Lemmas C.12(i) and C.13, (B.85), (B.86), and Slutsky’s theorem. For part (a.2) just notice that, if E[(ξ′ ···ξ′ )′(ξ′ ···ξ′ )] = I ⊗Σξ for all n,T ∈ N, then E[ξ ξ′ ] = 0 if t ̸= s while E[ξ ξ′ ] = Σξ. By n1 nT n1 nT T n n¯t n¯s n¯×n¯ n¯t n¯t n¯ substituting into (B.87) we get Σ = Σξ ⊗ΓF, and V = (I ⊗ΓF)−1(Σξ ⊗ΓF)(I ⊗ΓF)−1 = Σξ ⊗(ΓF)−1, thus n¯ n¯ n¯ n¯ n¯ n¯ n¯ proving part (a.2). Turning to part (b), for ease of notation and without loss of generality let s(j) = j for all j = 1,...,T¯. Then, from (B.64) in the proof of Proposition 3, since T¯ is finite √ (cid:16) (cid:17)(cid:16) (cid:17) n(F(cid:98)T¯−F T¯)= I T¯⊗n(Λ′ n (Σξ n )−1Λ n )−1 I T¯⊗n−1/2Λ′ n (Σξ n )−1 Ξ nT¯+o p (1). (B.88) whereΞ nT¯ =(ξ n ′ 1 ···ξ n ′ T¯ )′. Moreover,sinceT¯ isfinitewecanstillapplytheCentralLimitTheoreminAssumption2(e), so that as n→∞ (cid:16) I T¯⊗n−1/2Λ′ n (Σξ n )−1 (cid:17) Ξ nT¯ =n−1/2(cid:88) n vec(cid:0) λ i (σ i 2)−1(ξ i1 ···ξ iT¯) (cid:1) →d N(0 T¯r ,Ω T¯), (B.89) i=1 with (cid:16) (cid:17) (cid:16) (cid:17) Ω T¯ = lim n−1 I T¯⊗Λ′ n (Σξ n )−1 E[Ξ nT¯Ξ′ nT¯ ] I T¯⊗(Σξ n )−1Λ n n→∞   λ (σ2)−1ξ   λ (σ2)−1ξ ′ = lim E     n−1/2(cid:88) n   i i . . . i1     n−1/2(cid:88) n   i i . . . i1       n→∞  i=1  λ (σ2)−1ξ   i=1  λ (σ2)−1ξ   i i iT i i iT n = n l → im ∞n 1 (cid:88) (cid:8)E[ζ iT¯ζ j ′ T¯ ]⊗ (cid:0) λ i λ′ j (σ i 2σ j 2)−1(cid:1)(cid:9) , i,j=1 and recall that ζ iT¯ = (ξ i1 ···ξ iT¯)′. The proof of part (b.1) follows from (B.66) in the proof of Proposition 3, (B.88), (B.89),andSlutsky’stheorem. Forpart(b.2)justnoticethat,ifE[(ζ′ ···ζ′ )′(ζ′ ···ζ′ )]=Σξ ⊗I foralln,T ∈N, 1T nT 1T nT n T then E[ζ iT¯ζ j ′ T¯ ] = 0 T¯×T¯ if i ̸= j while E[ζ iT¯ζ i ′ T¯ ] = σ i 2I T¯ . By substituting into (B.87) we get Ω T¯ = I T¯ ⊗Σ ΛΣΛ , and W T¯ =(I T¯⊗Σ ΛΣΛ )−1(I T¯⊗Σ ΛΣΛ )(I T¯⊗Σ ΛΣΛ )−1 =I T¯⊗(Σ ΛΣΛ )−1, thus proving part (b.2). This completes the proof. □ B.7 Proof of Proposition 6 Under Assumptions 1, 2, 3, and 6, from Barigozzi (2023, Theorem 1), the asymptotic covariances of the PC estimator of the loadings is VPC =(ΓF)−1 (cid:32) lim 1 (cid:88) T E[ξ ξ ]E[F F′] (cid:33) (ΓF)−1, (B.90) i T→∞T it is t s t,s=1 where ΓF =I , by Assumption 6(b). Therefore, the expression of VPC in (B.90) coincides with the expression of V for r i i the asymptotic covariances of the EM estimator given in Proposition 2(b). This proves part (a). Page 19

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm √ Turning to part (b). Since, because of Proposition 3, lim n,T→∞ nE[(F(cid:98)t −F t )]=0 r , then (cid:110) (cid:111) W t = lim nCov(F(cid:98)t −F t ,F(cid:98)t −F t )= lim nE[(F(cid:98)t −F t )(F(cid:98)t −F t )′]−nE[(F(cid:98)t −F t )]E[(F(cid:98)t −F t )′] n,T→∞ n,T→∞ (cid:110) (cid:111) = lim nE[(F(cid:98)t −F t )(F(cid:98)t −F t )′] (B.91) n,T→∞ = lim n{(Λ′(Σξ)−1Λ )−1Λ′(Σξ)−1Γξ(Σξ)−1Λ (Λ′(Σξ)−1Λ )−1}. n n n n n n n n n n n n→∞ √ Similarly, because of Lemma H.1, lim n,T→∞ nE[(F(cid:101)t −F t )]=0 r , which implies (cid:110) (cid:111) WP t C = lim nCov(F(cid:101)t −F t ,F(cid:101)t −F t )= lim nE[(F(cid:101)t −F t )(F(cid:101)t −F t )′]−nE[(F(cid:101)t −F t )]E[(F(cid:101)t −F t )′] n,T→∞ n,T→∞ (cid:110) (cid:111) = lim nE[(F(cid:101)t −F t )(F(cid:101)t −F t )′] (B.92) n,T→∞ = lim n{(Mχ)−1Λ′ΓξΛ (Λ′Λ )−1} n n n n n n n→∞ = lim n{(Λ′Λ )−1Λ′ΓξΛ (Λ′Λ )−1}, n n n n n n n n→∞ where we used the fact that lim n−1Mχ =lim n−1Λ′Λ , because of Assumption 6(b). √ n→∞ n n→ √ ∞ n n Moreover,if nlogn/T →0(whichimpliesalso n/T →0),Proposition3(seeinparticular(B.64)initsproof)and Lemma H.1 (see in particular (H.8) in its proof) jointly imply that (cid:32) √ (cid:33) (cid:32) √ (cid:33) (cid:32) (cid:32) (cid:33)(cid:33) √ n n ( ( F F (cid:98) (cid:101) t t − − F F t t ) ) = n(Λ′ n (Σ √ ξ n n ) ( − M 1Λ χ n n )− )− 1Λ 1Λ ′ n ′ n ξ n (Σ t ξ n )−1ξ nt +o p (1)→ p N 0 2r , W U′ t t W U P t t C , where (cid:110) (cid:111) U t = lim nCov(F(cid:98)t −F t ,F(cid:101)t −F t )= lim nE[(F(cid:98)t −F t )(F(cid:101)t −F t )′]−nE[(F(cid:98)t −F t )]E[(F(cid:101)t −F t )′] n→∞ n,T→∞ (cid:110) (cid:111) = lim nE[(F(cid:98)t −F t )(F(cid:101)t −F t )′] (B.93) n,T→∞ = lim n{(Λ′(Σξ)−1Λ )−1Λ′(Σξ)−1ΓξΛ (Λ′Λ )−1}. n n n n n n n n n n→∞ From (B.91) and (B.92) it follows that (cid:110) (cid:111) WP t C−W t = lim nE[(F(cid:101)t −F(cid:98)t )(F(cid:101)t −F(cid:98)t )′] n,T→∞ (cid:110) (cid:111) (cid:110) (cid:111) + lim nE[(F(cid:98)t −F t )(F(cid:101)t −F(cid:98)t )′] + lim nE[(F(cid:101)t −F(cid:98)t )(F(cid:98)t −F t )′] n,T→∞ n,T→∞ =C +C +C′, say. (B.94) 1 2 2 LetusnowdefineOξ =Γξ−Σξ,whichisthen×nmatrixofoff-diagonalentriesoftheidiosyncraticcovariance. Because n n n of (B.91), (B.93), and Lemma C.9 we can write (cid:110) (cid:111) C 2 = lim nE[(F(cid:98)t −F t )(F(cid:101)t −F(cid:98)t )′] n,T→∞ (cid:110) (cid:111) = lim nE[(F(cid:98)t −F t )(F(cid:101)t −F t )′]−nE[(F(cid:98)t −F t )(F(cid:98)t −F t )′] =U t −W t (B.95) n,T→∞ (cid:110) (cid:104) (cid:105)(cid:111) = lim n(Λ′(Σξ)−1Λ )−1Λ′(Σξ)−1Γξ Λ (Λ′Λ )−1−(Σξ)−1Λ (Λ′(Σξ)−1Λ )−1Λ′ n n n n n n n n n n n n n n n n→∞ (cid:110) (cid:104) (cid:105)(cid:111) = lim n(Λ′(Σξ)−1Λ )−1Λ′(Σξ)−1Oξ Λ (Λ′Λ )−1−(Σξ)−1Λ (Λ′(Σξ)−1Λ )−1Λ′ n n n n n n n n n n n n n n n n→∞ (cid:110) (cid:111) =(Σ )−1Σ1/2 lim Vχ′(Σξ)−1OξVχ Σ−1/2 ΛΣΛ Λ n→∞ n n n n Λ (cid:110) (cid:111) −(Σ )−1Σ1/2 lim Vχ′(Σξ)−1Oξ(Σξ)−1Vχ Σ1/2(Σ )−1. ΛΣΛ Λ n→∞ n n n n n Λ ΛΣΛ Page 20

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Now, let V =Vχ′(Σξ)−1Oξ(Σξ)−1Vχ, then, for any h,k=1,...,r, n n n n n n n [V ] = (cid:88) [Vχ′] [(Σξ)−1] [Oξ] [(Σξ)−1] [Vχ] n hk n hi n ij n jℓ n ℓm n mk i,j,ℓ,m=1 n = (cid:88) [Vχ] [(Σξ)−1] [Oξ] [(Σξ)−1] [Vχ] . (B.96) n ih n ii n iℓ n ℓℓ n ℓk i,ℓ=1 i̸=ℓ Now,letι bethen-dimensionalvectorwithithentryequaloneandallotherequalzero,andlets ther-dimensional ni j vector with jth entry equal one and all other equal zero. Then, since Vχ = ΓχVχ(Mχ)−1, there exists a finite positive n n n n integer n¯, such that, for all n≥n¯ max max |vχ|= max max |ι′ Vχs | ij ni n j i=1,...,nj=1,...,r i=1,...,nj=1,...,r = max max |ι′ ΓχVχ(Mχ)−1s | ni n n n j i=1,...,nj=1,...,r ≤ max ∥ι′ Γχ∥∥Vχ∥ max |(µχ )−1| ni n n jn i=1,...,n j=1,...,r = max ∥λ′ΓFΛ′∥n−1C−1 i n r i=1,...,n ≤ max ∥λ ∥∥ΓF∥∥Λ ∥n−1C−1 i n r i=1,...,n ≤M2M n−1/2C−1, (B.97) λ F r where we used Assumptions 1(a) and 1(b), and Lemmas C.1(iv) and C.2. Therefore, from (B.96) and (B.97), and Assumption 2(a), (cid:18) (cid:19)2(cid:18) (cid:19)2 n max |[V ] |≤ max max |[Vχ] | max (σ2)−1 (cid:88) |[Oξ] | n hk n ih i n iℓ h,k=1,...r h=1,...ri=1,...n i=1,...n i,ℓ=1 (cid:40) (cid:18) (cid:19)2 (cid:41)−1 n n ≤C2 n min σ2 (cid:88) |[Oξ] |≤n−1C2C2 (cid:88) |[Oξ] |. (B.98) v i n iℓ v ξ n iℓ i=1,...n i,ℓ=1 i,ℓ=1 Moreover, since by assumption we have n n lim n−1 (cid:88) |[Oξ] |= lim n−1 (cid:88) |[Γξ] |=0, n iℓ n iℓ n→∞ n→∞ i,ℓ=1 i,ℓ=1 i̸=ℓ then, from (B.98), we have lim max |[V ] |=0. Therefore, n→∞ h,k=1,...r n hk (cid:110) (cid:111) lim Vχ′(Σξ)−1Oξ(Σξ)−1Vχ =0 . (B.99) n n n n n r×r n→∞ Following the same reasoning it also holds that (cid:110) (cid:111) lim Vχ′(Σξ)−1OξVχ =0 . (B.100) n n n n r×r n→∞ By substituting, (B.99) and (B.100) into (B.95), we obtain C =0 . Finally, 2 r×r (cid:110) (cid:111) C 1 = lim nE[(F(cid:101)t −F(cid:98)t )(F(cid:101)t −F(cid:98)t )′] n,T→∞ (cid:110) (cid:111) (cid:110) (cid:111) = lim nE[(F(cid:98)t −F t )(F(cid:98)t −F t )′] + lim nE[(F(cid:101)t −F t )(F(cid:101)t −F t )′] n,T→∞ n,T→∞ (cid:110) (cid:111) (cid:110) (cid:111) − lim nE[(F(cid:101)t −F t )(F(cid:98)t −F t )′] − lim nE[(F(cid:98)t −F t )(F(cid:101)t −F t )′] n,T→∞ n,T→∞ =W +WPC−(U +U′). t t t t Therefore, C is positive definite since W and WPC are positive definite by Assumptions 1(a), 2(a), and 2(f), and, from 1 t t (B.94) we prove part (ii). This completes the proof. □ Page 21

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm C General lemmas Lemma C.1. Under Assumptions 1 and 2: (i) for all n∈N and T ∈N, (nT)−1(cid:80)n (cid:80)T |E[ξ ξ ]|≤M , for some finite positive real M independent of n i,j=1 t,s=1 it js 1 1 and T; (ii) for all n∈N and t∈Z, n−1(cid:80)n |E[ξ ξ ]|≤M , for some finite positive real M independent of n and t; i,j=1 it jt 2 2 (iii) for all i∈N and T ∈N, T−1(cid:80)T |E[ξ ξ ]|≤M , for some finite positive real M independent of i and T; t,s=1 it is 3 3 (iv) for all j =1,...,r, C ≤liminf n−1µχ ≤limsup n−1µχ ≤C , for some finite positive reals C and C ; j n→∞ jn n→∞ jn j j j (v) for all n∈N, µξ =∥Γξ∥≤M , where M is defined in part (ii); 1n n 2 2 (vi) for all j =1,...,r, C ≤liminf n−1µx ≤limsup n−1µχ ≤C , and for all n∈N, µx ≤M , where j n→∞ jn n→∞ jn j r+1,n 2 M is defined in part (ii). 2 Proof. Using Assumptions 2(a) and 2(b), we have: (nT)−1 (cid:88) n (cid:88) T |E[ξ ξ ]|=n−1 (cid:88) n T (cid:88) −1 (cid:18) 1− |k| (cid:19) |E[ξ ξ ]| it js T it j,t−k i,j=1t,s=1 i,j=1k=−(T−1) n ∞ ≤n−1 (cid:88) (cid:88) ρ|k|M ij i,j=1k=−∞ n ∞ n ∞ =n−1(cid:88) (cid:88) ρ|k|M +n−1 (cid:88) (cid:88) ρ|k|M ii ij i=1k=−∞ i,j=1,i̸=jk=−∞ n ∞ n ∞ =n−1(cid:88) (cid:88) ρ|k|σ2+ max (cid:88) (cid:88) ρ|k|M i ij i=1,...,n i=1k=−∞ j=1,j̸=ik=−∞ C (1+ρ) M (1+ρ) ≤ ξ + ξ . 1−ρ 1−ρ Similarly, n n n n−1 (cid:88) |E[ξ ξ ]|≤n−1(cid:88) σ2+ max (cid:88) M ≤C +M , it jt i ij ξ ξ i=1,...,n i,j=1 i=1 j=1,j̸=i and T−1 (cid:88) T |E[ξ ξ ]|= T (cid:88) −1 (cid:18) 1− |k| (cid:19) |E[ξ ξ ]| it is T it i,t−k t,s=1 k=−(T−1) ≤ (cid:88) ∞ ρ|k|M ≤ 1+ρ σ2 ≤ C ξ (1+ρ) . ii 1−ρ i 1−ρ k=−∞ Defining, M = (Cξ+Mξ)(1+ρ), M =C +M , and M = Cξ(1+ρ), we prove parts (i), (ii), and (iii). 1 1−ρ 2 ξ ξ 3 1−ρ Forpart(iv),firstnoticethat,foralln∈N,thernon-zeroeigenvaluesofΓχarealsothereigenvaluesofn−1Λ′Λ ΓF. n n n Thus, by Merikoski and Kumar (2004, Theorem 7), for all j =1,...,r and all n∈N, we have n−1ν(r)(Λ′Λ ) ν(j)(ΓF)≤n−1µχ ≤n−1ν(j)(Λ′Λ ) ν(1)(ΓF). n n jn n n The proof then follows from Assumptions 1(a) and 1(b). Indeed, by continuity of eigenvalues, Assumption 1(a) implies that, for all j =1,...,r, and all n>N 0 n−1ν(j)(Λ′Λ )=ν(j)(Σ ), (C.1) n n Λ and there exist finite positive reals m and M such that 0 < m2 ≤ ν(r)(Σ ) ≤ ν(1)(Σ ) ≤ M2 < ∞. Similarly, λ λ λ Λ Λ λ Assumption 1(b), implies that there exist finite positive reals m and M such that 0 < m ≤ ν(r)(ΓF) ≤ ν(1)(ΓF) ≤ F F F M <∞. F For part (v), by Assumptions 2(a) and 2(b): n n ∥Γξ∥≤ max (cid:88) |E[ξ ξ ]|= max σ2+ max (cid:88) M ≤C +M =M . n it jt i ij ξ ξ 2 i=1,...,n i=1,...,n i=1,...,n j=1 j=1,j̸=i Page 22

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Part(vi)followsfromparts(iv)and(v)andWeyl’sinequality(MerikoskiandKumar,2004,Theorem1). Thiscompletes the proof. □ Lemma C.2. Under Assumption 1, as n→∞, n−1/2∥Λ ∥=O(1). n Proof. By definition and using Assumption 1(a), (cid:113) (cid:113) lim n−1/2∥Λ ∥= lim n−1/2 ν(1)(Λ′Λ )= ν(1)(Σ )≤M . n n n Λ λ n→∞ n→∞ This completes the proof. □ Lemma C.3. For any r×r symmetric and positive definite matrix P with ∥P∥≤C for some finite positive real C , P P under Assumptions 1 and 2, as n→∞, (i) n∥(Λ′(Σξ)−1Λ +P−1)−1∥=O(1); n n n (ii) n∥(Λ′(Γξ)−1Λ +P−1)−1∥=O(1); n n n (iii) n∥(Λ′(Σξ)−1Λ )−1∥=O(1); n n n (iv) n∥(Λ′(Γξ)−1Λ )−1∥=O(1); n n n (v) n−1∥Λ′(Σξ)−1Λ ∥=O(1); n n n (vi) n−1∥Λ′(Γξ)−1Λ ∥=O(1); n n n (vii) n−1/2∥Λ′(Σξ)−1∥=O(1); n n (viii) n−1/2∥Λ′(Γξ)−1∥=O(1). n n Proof. By Merikoski and Kumar (2004, Theorems 1, which is Weyl’s inequality, and 7) n n n∥(Λ′(Σξ)−1Λ +P−1)−1∥= ≤ n n n ν(r)(Λ′(Σξ)−1Λ +P−1) ν(r)(Λ′(Σξ)−1Λ )+ν(r)(P−1) n n n n n n n ≤ ν(r)(Λ Λ′)ν(n)((Σξ)−1)+[ν(1)(P)]−1 n n n n ≤ ν(r)(Λ′Λ )[ν(1)(Σξ)]−1+[ν(1)(P)]−1 n n n n ≤ ν(r)(Λ′Λ )C−1+C−1 n n ξ P 1 = , (C.2) n−1ν(r)(Λ′Λ )C−1+n−1C−1 n n ξ P by Assumption 2(a) and since P finite by assumption. Then, as shown in (C.1) in the proof of Lemma C.1(iv), C ≤lim inf n−1ν(j)(Λ′Λ )≤lim sup n−1ν(j)(Λ′Λ )≤C , j =1,...,r. (C.3) j n n n n j n→∞ n→∞ Thus, we have lim n−1ν(r)(Λ′Λ )≥C , which, once substituted in (C.2), proves part (i). n→∞ n n r For part (ii) the proof is the same as part (i), but in (C.2) we use Lemma C.1(v) instead of Assumption 2(a), thus replacing C with M . Parts (iii) and (iv) are obvious by just setting P−1 =0 in the first step of (C.2). ξ 2 r×r For part (v) we have n−1∥Λ′(Σξ)−1Λ ∥=n−1ν(1)(Λ′(Σξ)−1Λ )=n−1ν(1)(Λ Λ′(Σξ)−1) n n n n n n n n n ν(1)(Λ′Λ ) ≤n−1ν(1)(Λ Λ′)ν(1)((Σξ)−1)= n n n n n nν(n)(Σξ) n ν(1)(Λ′Λ ) ≤ n n , (C.4) nC−1 ξ by Assumption 2(a). Then, by (C.3) in the proof of Lemma C.6, we have lim n−1ν(1)(Λ′Λ ) ≤ C , which, once n→∞ n n 1 substituted in (C.4), proves part (v). Forpart(vi)theproofisthesameaspart(v),butin(C.4)weuseAssumption2(f)insteadofAssumption2(a),thus replacing C−1 with L . ξ ξ Page 23

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm For part (vii) √ ∥Λ (Σξ)−1∥≤∥Λ ∥∥(Σξ)−1∥≤∥Λ ∥ max (σ2)−1 =O( n), n n n n n i i=1,...,n byLemmaC.2andAssumption2(a). Andforpart(viii)theproofisthesameasforpart(vii)butusingAssumption2(f) instead of Assumption 2(a). This proves parts (vii) and (viii) and completes the proof. □ Lemma C.4. Given two invertible matrices K and H the following holds: (H+K)−1 =K−1−(H+K)−1HK−1. Proof. We have (H+K)−1 =(H+K)−1−K−1+K−1 =(H+K)−1(K−(H+K))K−1+K−1 =(H+K)−1(−H)K−1+K−1 =K−1−(H+K)−1HK−1. (C.5) This completes the proof. □ Lemma C.5. For m<n with m independent of n and given (a) an m×m matrix A symmetric and positive definite with ∥A∥≤M ; A (b) an n×n matrix B symmetric and positive definite with ∥B∥≤M ; B (c) an n×m matrix C such that C′C is positive definite with M ≤lim inf n−1ν(j)(C′C)≤lim sup n−1ν(j)(C′C)≤M , j =1,...,m; Cj n→∞ n→∞ Cj where, M , M , M and M are finite positive reals independent of n and m, then the following holds A B Cj Cj ∥(A−1+C′B−1C)−1C′B−1C−I ∥=O(n−1). m Proof. In Lemma C.4 set K =C′B−1C and H =A−1, then, (A−1+C′B−1C)−1 =(C′B−1C)−1−(A−1+C′B−1C)−1A−1(C′B−1C)−1. (C.6) which implies (A−1+C′B−1C)−1C′B−1C−I =(A−1+C′B−1C)−1A−1. (C.7) m Then, by Weyl’s inequality (Merikoski and Kumar, 2004, Theorem 1): ν(m)(A−1+C′B−1C)≥ν(m)(A−1)+ν(m)(C′B−1C)={ν(1)(A)}−1+ν(m)(C′B−1C). (C.8) From (C.8), we have ∥(A−1+C′B−1C)−1A−1∥≤∥(A−1+C′B−1C)−1∥ ∥A−1∥ (cid:110) (cid:111)−1 (cid:110) (cid:111)−1 = ν(m)(A−1+C′B−1C) ν(m)(A) (cid:110) (cid:111)−1 (cid:110) (cid:111)−1 ≤ ν(m)(C′B−1C)+{ν(1)(A)}−1 ν(m)(A) . (C.9) Forfirsttermontherhsof (C.9),themeigenvaluesofC′B−1C arealsothemlargestnon-zeroeigenvaluesofCC′B−1, and the m largest non-zero eigenvalues of CC′ are also the m eigenvalues of C′C. Therefore, because of Merikoski and Kumar (2004, Theorem 7): (cid:110) (cid:111)−1 n−1ν(m)(CC′B−1)≥n−1ν(m)(CC′)ν(m)(B−1)=n−1ν(m)(C′C) ν(1)(B) . Thus, by conditions (b) and (c) (cid:110) (cid:111)−1 (cid:110) (cid:111)−1 M n ν(m)(C′B−1C) ≤n ν(m)(C′C) ν(1)(B)≤ B . M Cm Page 24

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Moreover, by condition (a), ν(1)(A)>0 and ν(1)(A)≤M , i.e., {ν(1)(A)}−1 ≥M , and A A M ν(m)(C′B−1C)+{ν(1)(A)}−1 ≥ν(m)(C′B−1C)≥ B . M Cm For the second term on the rhs of (C.9), by condition (a), (cid:110) (cid:111)−1 1 ν(m)(A) ≤ , LA for some finite positive real L . Hence, from (C.9) A M n∥(A−1+C′B−1C)−1A−1∥≤ B . M L Cm A and by using it in (C.7) we complete the proof. □ Lemma C.6. For any r×r symmetric and positive definite matrix P with ∥P∥≤M for some finite positive real M , P P under Assumptions 1 and 2, as n→∞, (i) n∥(Λ′(Σξ)−1Λ +P−1)−1Λ′(Σξ)−1Λ −I ∥=O(1); n n n n n n r (ii) n∥(Λ′(Γξ)−1Λ +P−1)−1Λ′(Γξ)−1Λ −I ∥=O(1); n n n n n n r (iii) n2∥(Λ′(Σξ)−1Λ +P−1)−1−(Λ′(Σξ)−1Λ )−1∥=O(1); n n n n n n (iv) n2∥(Λ′(Γξ)−1Λ +P−1)−1−(Λ′(Γξ)−1Λ )−1∥=O(1). n n n n n n Proof. Both results follow from Lemma C.5 since: P satisfies condition (a) by assumption, Σξ satisfies condition (b) n because of Assumption 2(a) and Γξ satisfies condition (b) because of Lemma C.1(v) and Assumption 2(f), and Λ′Λ n n n satisfiescondition(c)sinceitispositivedefinitebecauseofAssumptions1(a),and,moreover,itseigenvaluesaresuchthat C ≤liminf n−1ν(j)(Λ′Λ )≤limsup n−1ν(j)(Λ′Λ )≤C , for j =1,...,r, as shown in (C.3) in the proof of j n→∞ n n n→∞ n n j Lemma C.3. Turning to part (iii), n2∥(Λ′(Σξ)−1Λ +P−1)−1−(Λ′(Σξ)−1Λ )−1∥ n n n n n n ≤n2∥(Λ′(Σξ)−1Λ +P−1)−1(Λ′(Σξ)−1Λ )−I ∥∥(Λ′(Σξ)−1Λ )−1∥=O(1). n n n n n n r n n n by part (i) and Lemma C.3(iii). This proves part (iii). Part(iv)isprovedinthesamewaybutusingLemmaC.3(iv)insteadofLemmaC.3(iv). Thiscompletestheproof. □ Lemma C.7. Under Assumptions 1 and 2, as n,T →∞, (i) n−1/2∥Λ′(Σξ)−1ξ ∥=O (1), uniformly in t; n n nt p (ii) n−1/2∥Λ′(Γξ)−1ξ ∥=O (1), uniformly in t; n n nt p (iii) n−1/2T−1/2∥ (cid:80)T Λ′(Σξ)−1ξ ∥=O (1); t=1 n n nt p (iv) n−1/2T−1/2∥Λ′E′ ∥ =O (1); n nT F p (v) n−1/2T−1/2∥Λ′(Σξ)−1E′ ∥ =O (1); n n nT F p (vi) n−1/2T−1/2∥E ∥ =O (1); nT F p where E =(ξ ···ξ )′. nT n1 nT Proof. Throughout, let λ be the (i,j)the entry of Λ . For part (i), we have ij n E (cid:104) ∥n−1/2Λ′(Σξ)−1ξ ∥2 (cid:105) = (cid:88) r n−1E (cid:34)(cid:32) (cid:88) n λ ij ξ it (cid:33)2(cid:35) n n nt σ2 j=1 i=1 i ≤r max n−2(cid:88) n (cid:88) n |λ ij ||λ kj | E[ξ ξ ] j=1,...,r σ2σ2 it kt i=1k=1 i k n n ≤rn−1M2C2(cid:88)(cid:88) |E[ξ ξ ]|≤rM2C2M , (C.10) λ ξ it kt λ ξ 2 i=1k=1 whereinthethirdstepweusedAssumption1(a)(sincemax |λ |≤∥λ ∥≤M ,foralli=1,...,n),andAssumption j=1,...,r ij i λ 2(a), and in the last step we used Lemma C.1(ii). By Chebychev’s inequality and since the constants in (C.10) do not Page 25

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm depend on t, we prove part (i). For part (ii), we have (cid:104) (cid:105) (cid:104) (cid:105) E ∥n−1/2Λ′(Γξ)−1ξ ∥2 =E n−1ξ′ (Γξ)−1Λ Λ′(Γξ)−1ξ n n nt nt n n n n nt (cid:104) (cid:110) (cid:111)(cid:105) =E n−1tr Λ′(Γξ)−1ξ ξ′ (Γξ)−1Λ n n nt nt n n =n−1tr (cid:110) Λ′(Γξ)−1E(cid:2) ξ ξ′ (cid:3) (Γξ)−1Λ (cid:111) n n nt nt n n (cid:110) (cid:111) =n−1tr Λ′(Γξ)−1Λ n n n (cid:16) (cid:17) ≤rn−1ν(1) Λ′(Γξ)−1Λ n n n =rn−1∥Λ′(Γξ)−1Λ ∥=O(1), (C.11) n n n by Lemma C.3(vi). By Chebychev’s inequality and since the constants in (C.11) do not depend on t, we complete the proof of part (ii). For part (iii), we have E   (cid:13) (cid:13) (cid:13) (cid:13) n−1/2T−1/2(cid:88) T Λ′ n (Σξ n )−1ξ nt (cid:13) (cid:13) (cid:13) (cid:13) 2 =n−1T−1(cid:88) r E   (cid:32) (cid:88) T (cid:88) n λ i σ j 2 ξ it (cid:33)2  (cid:13) t=1 (cid:13) j=1 t=1i=1 i ≤r max n−1T−1(cid:88) T (cid:88) T (cid:88) n (cid:88) n |λ ij ||λ kj | E[ξ ξ ] j=1,...,r σ2σ2 it ks t=1s=1i=1k=1 i k T T n n ≤rn−1T−1M2C2(cid:88)(cid:88)(cid:88)(cid:88) |E[ξ ξ ]|≤rM2C2M , λ ξ it ks λ ξ 1 t=1s=1i=1k=1 by Assumptions 1(a), 2(a), and Lemma C.1(i). By Chebychev’s inequality we prove part (iii). For part (iv), we have E (cid:20)(cid:13) (cid:13)n−1/2T−1/2Λ′E′ (cid:13) (cid:13) 2 (cid:21) =n−1T−1(cid:88) r (cid:88) T E (cid:34)(cid:32) (cid:88) n λ ξ (cid:33)2(cid:35) (cid:13) n nT(cid:13) ik it F k=1t=1 i=1 n ≤n−1rM2 max (cid:88) |E[ξ ξ ]|≤rM2M , λ it jt λ 2 t=1,...,T i,j=1 by Assumption 1(a) and Lemma C.1(ii). By Chebychev’s inequality we prove part (iv). Part (v) is proved as parts (iv), but using also Assumption 2(a). For part (vi), we have E (cid:20)(cid:13) (cid:13)n−1/2T−1/2E (cid:13) (cid:13) 2 (cid:21) =n−1T−1(cid:88) T (cid:88) n E[ξ2] (cid:13) nT(cid:13) it F t=1i=1 ≤ max max E[ξ2]= max σ2 ≤C , it i ξ t=1,...,Ti=1,...,n i=1,...,n by Assumption 2(a). By Chebychev’s inequality we prove part (vi). This completes the proof. □ Lemma C.8. Under Assumptions 1, 2, 3, and 6, as n,T →∞: √ (i) nT∥n−1T−1(cid:80)T Λ′ξ F′∥ =O (1); √ t=1 n nt t F p (ii) nT∥n−3/2T−1(cid:80)T Λ′ξ ξ′ ∥ =O (1); √ t=1 n nt nt F p (iii) nT∥n−2T−1(cid:80)T Λ′ξ ξ′ Λ ∥ =O (1); √ t=1 n nt nt n F p (iv) nT∥n−1T−1(cid:80)T Λ′(Σξ)−1ξ F′∥ =O (1); √ t=1 n n nt t F p (v) nT∥n−3/2T−1(cid:80)T Λ′(Σξ)−1ξ ξ′ ∥ =O (1); √ t=1 n n nt nt F p (vi) nT∥n−1T−3/2(cid:80)T F ξ′ (Σξ)−1E ∥ =O (1); t=1 t nt n nT F p where E =(ξ ···ξ )′. nT n1 nT Page 26

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Proof. Throughout, let λ be the (i,j)the entry of Λ . For part (i), ij n E   (cid:13) (cid:13) (cid:13) (cid:13) n−1T−1(cid:88) T Λ′ n ξ nt F′ t (cid:13) (cid:13) (cid:13) (cid:13) 2 = n−2T−2 (cid:88) T (cid:88) n (cid:88) r λ ih λ jh E[ξ it ξ js F kt F ks ] (cid:13) (cid:13) t=1 F t,s=1i,j=1k,h=1 T n ≤n−2T−2M2r2 max (cid:88) (cid:88) |E[ξ ξ ]||E[F F ]| λ it js kt ks k,h=1,...,r t,s=1i,j=1 ≤n−1T−1M2r2M , (C.12) λ 1 by Lemma C.1(i) and because |E[F F ]| ≤ 1 by Cauchy-Schwarz inequality and Assumption 6(b). By Chebychev’s kt ks inequality we prove part (i). For part (ii), E   (cid:13) (cid:13) (cid:13) (cid:13) n−3/2T−1(cid:88) T Λ′ n ξ nt ξ n ′ t (cid:13) (cid:13) (cid:13) (cid:13) 2 = (cid:88) r (cid:88) n E   (cid:12) (cid:12) (cid:12) (cid:12) n−3/2T−1(cid:88) T (cid:88) n λ ik ξ it ξ jt (cid:12) (cid:12) (cid:12) (cid:12) 2  (cid:13) (cid:13) (cid:12) (cid:12) t=1 F k=1j=1 t=1i=1 ≤rM λ n−1T−1 max E   (cid:12) (cid:12) (cid:12) (cid:12) n−1/2T−1/2(cid:88) T (cid:88) n ξ it ξ jt (cid:12) (cid:12) (cid:12) (cid:12) 2  j=1,...,n (cid:12) (cid:12) t=1i=1 T n ≤rM n−2T−2 max (cid:88) (cid:88) |E[ξ ξ ξ ξ ]| λ it jt ℓs js j=1,...,n t,s=1i,ℓ=1 ≤rM n−1T−1K , λ ξ by Assumptions 1(a) and 2(d). By Chebychev’s inequality we prove part (ii). For part (iii), following the proof of part (ii), E   (cid:13) (cid:13) (cid:13) (cid:13) n−2T−1(cid:88) T Λ′ n ξ nt ξ n ′ t Λ n (cid:13) (cid:13) (cid:13) (cid:13) 2 = (cid:88) r E   (cid:12) (cid:12) (cid:12) (cid:12) n−2T−1(cid:88) T (cid:88) n λ ik λ jh ξ it ξ jt (cid:12) (cid:12) (cid:12) (cid:12) 2  (cid:13) (cid:13) (cid:12) (cid:12) t=1 F k,h=1 t=1i,j=1 T n n ≤rM n−4T−2 (cid:88) (cid:88) (cid:88) |E[ξ ξ ξ ξ ]| λ i1t j1t i2s j2s t,s=1i1,j1=1i2,j2=1 ≤rM n−1T−1K , λ ξ by Assumptions 1(a) and 2(d). By Chebychev’s inequality we prove part (iii). Parts (iv) and (v) are proved as parts (i) and (ii), respectively, but using also Assumption 2(a). For part (vi) E   (cid:13) (cid:13) (cid:13) (cid:13) n−1T−3/2(cid:88) T F t ξ n ′ t (Σξ n )−1E nT (cid:13) (cid:13) (cid:13) (cid:13) 2 =n−2T−3(cid:88) r (cid:88) T E   (cid:32) (cid:88) T (cid:88) n F kt ξ it (σ i 2)−1ξ is (cid:33)2  (cid:13) (cid:13) t=1 F k=1s=1 t=1i=1 r T T n ≤n−2T−3C2(cid:88)(cid:88) (cid:88) (cid:88) E[F F ξ ξ ξ ξ ] ξ kt1 kt2 it1 jt2 is js k=1s=1t1,t2=1i,j=1 T n ≤n−2T−2rC2 max max |E[F F ]| max (cid:88) (cid:88) |E[ξ ξ ξ ξ ]| ξ k=1,...,rt1,t2=1,...,T kt1 kt2 s=1,...,T t1,t2=1i,j=1 it1 jt2 is js ≤n−1T−1rC2 max max E[F2]K ξ kt ξ k=1,...,rt=1,...,T ≤n−1T−1rC2K , ξ ξ by Assumptions 2(a) and 2(d), Lemma C.11, and Cauchy-Schwarz inequality jointly with Assumption 6(b). By Chebychev’s inequality we prove part (vi). This completes the proof. □ Lemma C.9. Under Assumptions 1 and 6, for all n>N : ∥n−1/2Λ −n−1/2VχS(Mχ)1/2∥=0, for some r×r positive 0 n n n diagonal matrix S independent of n with entries I([Vχ] ≥0)−I([Vχ] <0), j =1,...,r, and where N is defined in n 1j n 1j 0 Assumption 1(a). Page 27

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Proof. By Assumption 6(b), for all n∈N, n−1Γχ =n−1Λ Λ′ =n−1VχMχVχ′. (C.13) n n n n n n Firstnoticethat,thernon-zeroeigenvaluesofn−1Γχarethereigenvaluesofn−1Λ′Λ andforalln>N ,∥Λ′Λ −Σ ∥= n n n 0 n n Λ 0,byAssumption1(a). Moreover,Σ isdiagonalandpositivedefinitebyAssumptions6(b)and1(a),respectively. Hence, Λ for all n>N , 0 n−1Mχ =Σ and n(Mχ)−1 =Σ−1. (C.14) n Λ n Λ Furthermore, it must be that the columns of Λ span the same space as the columns of Vχ. Since the eigenvectors n n arenormalizedandforalln>N ,∥n−1Mχ−Σ ∥=0by (C.14),thereexisttwor×r matricesK andK suchthat, 0 n Λ 1n 2n for all n>N , 0 n−1/2Λ =VχK and Vχ =n−1/2Λ K . (C.15) n n 1n n n 2n Let K = lim n−1/2(Vχ′Vχ)−1Vχ′Λ = lim n−1/2Vχ′Λ 1 n n n n n n n→∞ n→∞ and √ K = lim n(Λ′Λ )−1n−1/2Λ′Vχ = lim n(Λ′Λ )−1Λ′Vχ. 2 n n n n n n n n n→∞ n→∞ Then, by linear projection from (C.15) we have that lim K =K , (C.16) 1n 1 n→∞ which is positive definite since the columns of Vχ are linear combinations of the columns of Λ so, for all n > N , n n 0 rk(n−1/2Vχ′Λ ) = rk(n−1Λ′Λ ) = rk(Σ ) = r by Assumption 1(a). Moreover, for all n > N , ∥K ∥ ≤ n−1/2∥Λ ∥, n n n n Λ 0 1 n which is finite by Lemma C.2. Similarly, from (C.15) we also have that lim K =K . (C.17) 2n 2 n→∞ whichexistsandispositivedefinite,since∥n(Λ′Λ )−1−nΣ−1∥=0,foralln>N ,andΣ isfiniteandpositivedefinite n n Λ 0 Λ by Assumption 1(a). Moreover, for all n>N , ∥K ∥≤n∥(Λ′Λ )−1∥n−1/2∥Λ ∥=∥Σ−1∥n−1/2∥Λ ∥, which is finite by 0 2 n n n Λ n Assumption 1(a) and Lemma C.2. By using (C.15) into the rhs of (C.13), we get, for all n>N , 0 n−2Λ K MχK′ Λ′ =VχMχVχ′, n 2n n 2n n n n n which, since eigenvectors are normalized, implies, that, for all n>N , we can write 0 n−2Vχ′Λ K MχK′ Λ′Vχ =n−1Mχ. (C.18) n n 2n n 2n n n n From(C.18)wemusthaveI =lim n−1/2Vχ′Λ K =K K ,so,asexpectedK =K−1 andK =K−1. Itfollows r n→∞ n n 2n 1 2 1 2 2 1 that, for all n>N , 0 Vχ′Λ (Λ′Λ )−1Λ′Vχ =I . (C.19) n n n n n n r Now, from (C.13), we can also write that for all n>N 0 n−1/2Λ R =n−1/2Vχ(Mχ)1/2, (C.20) n n n n for some r×r matrix R . Let, n R= lim (Λ′Λ )−1Λ′Vχ(Mχ)1/2 = lim n−1/2K (Mχ)1/2 =K lim n−1/2(Mχ)1/2. (C.21) n n n n n 2n n 2 n n→∞ n→∞ n→∞ Hence, for all n>N , R=K (Σ )1/2 by (C.14). So R is finite by Assumption 1(a) and since K is finite. 0 2 Λ 2 Page 28

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Moreover, from (C.21) (cid:110) √ (cid:111) (cid:110) √ (cid:111) R−1 = lim n(Mχ)−1/2 K−1 = lim n(Mχ)−1/2 K n 2 n 1 n→∞ n→∞ √ = lim n(Mχ)−1/2K = lim (Mχ)−1/2Vχ′Λ . (C.22) n 1n n n n n→∞ n→∞ From (C.21) and (C.22), and using Assumption 1(a), we also have R−1 = lim (Mχ)−1/2Vχ′Λ = lim (Mχ)−1/2(Mχ)−1/2(Mχ)1/2Vχ′Λ (Λ′Λ )−1Λ′Λ n n n n n n n n n n n n n→∞ n→∞ (cid:110) (cid:111) = lim n(Mχ)−1 R′Σ . (C.23) n Λ n→∞ Hence,foralln>N ,by (C.14),R−1 =(Σ )−1R′Σ . Thus,R−1 isfinitebyAssumption1(a)andsinceR isfinite. It 0 Λ Λ 2 follows that R is positive definite. Moreover, R is orthogonal, indeed, from (C.13) and (C.21) RR′ = lim (Λ′Λ )−1Λ′Vχ(Mχ)1/2(Mχ)1/2Vχ′Λ (Λ′Λ )−1 n n n n n n n n n n n→∞ = lim (Λ′Λ )−1Λ′ΓχΛ (Λ′Λ )−1 n n n n n n n n→∞ = lim (Λ′Λ )−1Λ′Λ Λ′Λ (Λ′Λ )−1 =I . (C.24) n n n n n n n n r n→∞ By substituting (C.24) into (C.23), for all n>N , by (C.14), 0 R−1 =Σ−1R−1Σ . (C.25) Λ Λ By right-multiplying (C.25) by R and left-multiplying by Σ we have Λ Σ =R−1Σ R, Λ Λ whichimpliesR=J whereJ isanr×r diagonalmatrixwithentries±1independentofn. Therefore, from(C.20), for all n>N , 0 Λ J =n−1/2Vχ(Mχ)1/2 n n n or, equivalently, for all n>N , 0 n−1/2Λ =n−1/2VχJ(Mχ)1/2. n n n Finally, by Assumption 6(c) it must be that J =S. This completes the proof. □ Lemma C.10. Under Assumptions 1, 2, and 3, as n→∞, n−1/2∥x ∥=O (1), uniformly in t. nt p Proof. We have, E (cid:104) ∥n−1/2x ∥2 (cid:105) =n−1(cid:88) n E[x2]=n−1(cid:88) n (cid:110) λ′ΓFλ +σ2 (cid:111) nt it i i i i=1 i=1 ≤ max λ′ΓFλ + max σ2 ≤M2∥ΓF∥≤M2M +C , (C.26) i i i λ λ F ξ i=1,...,n i=1,...,n byAssumption3,Assumptions1(a)and1(b),andAssumption2(a). Theproofofpart(i)followsbyChebychev’sinequality and by noticing that the bound in (C.26) does not depend on t. This completes the proof. □ Lemma C.11. Under Assumptions 1 and 3, the processes {ξ , i∈N, t∈Z} and {F , j =1,...,r, t∈Z} are mutually it jt independent. Proof. It is enough to notice that F = (cid:80)∞ AkHu , then, by Assumption 3, we complete the proof. □ t k=0 t−k Lemma C.12. Under Assumptions 1, 2, and 3, as n,T →∞, (i) for k=0,1, TE[∥T−1(cid:80)T F F′ −ΓF∥2]=O(1), with ΓF =E[F F′ ]; t=1 t t−k k F k t t−k (ii) Tmax E[∥T−1(cid:80)T F ξ ∥2]=O(1); i=1,...,n t=1 t it (iii) TE[∥n−1/2T−1(cid:80)T F ξ ∥2]=O(1); t=1 t nt F Page 29

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm (iv) Tmax E[|T−1(cid:80)T ξ ξ −E[ξ ξ ]|2]=O(1); i,j=1,...,n t=1 it jt it jt (v) Tmax E[|T−1(cid:80)T x x −E[x x ]|2]=O(1); i,j=1,...,n t=1 it jt it jt (vi) TE[∥n−1T−1(cid:80)T ξ ξ′ −n−1Γξ∥2]=O(1); t=1 nt nt n F (vii) TE[∥n−1T−1(cid:80)T x x′ −n−1Γx∥2]=O(1). t=1 nt nt n F Proof. Part (i) follows since {F } is ergodic, because of Assumption 1(d) which implies that {F } has summable t t autocovariances, and therefore {F F′ } is also ergodic (White, 2001, Theorem 3.35, and Stout, 1974, pp. 170, 182). In t t−k particular, ΓF is finite because of Assumptions 1(b), 1(d) and 1(e), and E[∥F ∥4∥] is also finite because of Assumptions k t 1(d)-1(g). See also Hamilton (1994, Proposition 11.1, pp. 298-299), which can be applied using the fact that {v } is an t independent process by Assumption 1(f) and thus it is a martingale difference process. This proves part (i). For part (ii), as T →∞, E   (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t ξ it (cid:13) (cid:13) (cid:13) (cid:13) 2 =T−2(cid:88) r E   (cid:32) (cid:88) T F jt ξ it (cid:33)2 =T−2(cid:88) r (cid:88) T E[F jt ξ it F js ξ is ] (C.27) (cid:13) (cid:13) t=1 j=1 t=1 j=1t,s=1 r T ≤T−2(cid:88) (cid:88) |E[F F ]||E[ξ ξ ]| jt js it is j=1t,s=1 r T ≤T−2(cid:88) (cid:88) E[F2]|E[ξ ξ ]|≤T−1rM M , jt it is F 3 j=1t,s=1 because of Lemma C.1(iii), and where we also used Lemma C.11, Cauchy-Schwarz inequality, and the fact that F is jt weaklystationarybyAssumptions1(b),1(d),and1(e). Bynoticingthattheconstantsontherhsof (C.27)donotdepend on i we prove part (ii). Part (iii) follows directly from part (ii), indeed E   (cid:13) (cid:13) (cid:13) (cid:13) n−1/2T−1(cid:88) T F t ξ nt (cid:13) (cid:13) (cid:13) (cid:13) 2 =n−1T−2(cid:88) r (cid:88) n E   (cid:32) (cid:88) T F jt ξ it (cid:33)2  (cid:13) (cid:13) t=1 F j=1i=1 t=1 r (cid:32) T (cid:33)2 ≤T−2(cid:88) max E  (cid:88) F jt ξ it ≤T−1rM F M 3 . i=1,...,n j=1 t=1 For part (iv), notice that, since {ξ } is a strongly mixing process with exponentially decaying coefficients, because of it Assumption 2(c), then it is also ergodic (White, 2001, Proposition 3.44, and Rosenblatt, 1972), so that also {ξ ξ } is it jt ergodic (White, 2001, Theorem 3.35, and Stout, 1974, pp. 170, 182). In particular, E[ξ ξ ] is finite by Assumption 2(a) it jt and E[|ξ ξ ξ ξ |] is also finite because by Assumption 2(d), and both are bounded by constant independent of i and j. it jt is js This proves part (iv). Part (v) follows directly from parts (i), (ii), and (iv). Part (vi) follows directly from part (iv), and part (vii) follows directly from part (v). This completes the proof. □ Lemma C.13. Under Assumptions 1, 2, and 3, as n,T →∞, ∥(T−1(cid:80)T F F′)−1∥=O (1). t=1 t t p Proof. From Lemma C.12(i), and Merikoski and Kumar (2004, Theorem 1) which is Weyl’s inequality (cid:12) (cid:12) (cid:12)ν(r) (cid:32) T−1(cid:88) T F F′ (cid:33) −ν(r)(ΓF) (cid:12) (cid:12) (cid:12)≤ (cid:13) (cid:13) (cid:13)T−1(cid:88) T F F′ −ΓF (cid:13) (cid:13) (cid:13)=O (T−1/2). (cid:12) t t (cid:12) (cid:13) t t (cid:13) p (cid:12) (cid:12) (cid:13) (cid:13) t=1 t=1 This implies (note that x−y≥−|x−y| for any x,y∈R) (cid:32) T (cid:33) r (cid:32) T (cid:33) (cid:40) (cid:32) T (cid:33)(cid:41)r det T−1(cid:88) F F′ = (cid:89) ν(j) T−1(cid:88) F F′ ≥ ν(r) T−1(cid:88) F F′ t t t t t t t=1 j=1 t=1 t=1 ≥ (cid:40) ν(r)(ΓF)− (cid:12) (cid:12) (cid:12)ν(r) (cid:32) T−1(cid:88) T F F′ (cid:33) −ν(r)(ΓF) (cid:12) (cid:12) (cid:12) (cid:41)r >0, (cid:12) t t (cid:12) (cid:12) (cid:12) t=1 by Assumption 1(b). Thus, ∥(T−1(cid:80)T F F′)−1∥=O (1). This completes the proof. □ t=1 t t p Page 30

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm D Lemmas necessary for proving Proposition 1 Lemma D.1. Consider the initial estimator of the loadings Λ(cid:98) ( n 0) = (λ(cid:98) ( 1 0)···λ(cid:98) ( n 0))′ defined in Section A.1, then, under Assumptions 1, 2, 3, and 6, as n,T →∞: √ (i) min(n, √ T)∥λ(cid:98) ( i 0)−λ i ∥=O p (1), uniformly in i; (ii) min(n, T)n−1/2∥Λ(cid:98) ( n 0)−Λ n ∥=O p (1). Proof. BothresultsaredirectconsequencesofBarigozzi(2023,Theorem1),seealsoBai(2003,Theorem2)undersimilar assumptions. This completes the proof. □ Lemma D.2. Consider the initial estimator of the factors F(cid:101)t defined in Section A.1, then, under Assumptions 1, 2, 3, and 6, as n,T →∞: (i) for k=0,1, min(n−1,T−1/2)∥T−1(cid:80)T t=k+1 (F(cid:101)t −F t )F′ t−k ∥=O p (1); (ii) min(n−1,T−1/2)∥n−1/2T−1(cid:80)T t=1 (F(cid:101)t −F t )ξ n ′ t ∥=O p (1); (iii) min(n−1,T−1/2)∥n−1T−1(cid:80)T t=1 (F(cid:101)t −F t )ξ n ′ t Λ n ∥=O p (1); (iv) min(n−1,T−1/2)∥T−1(cid:80)T t=1 (F(cid:101)t −F t )ξ it ∥=O p (1), uniformly in i; (v) min(n−1,T−1/2)∥T−1(cid:80)T t=1 (F(cid:101)t −F t )∥=O p (1). Proof. For part (i), by definition F(cid:101)t −F t =(Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′x nt −F t (cid:110) (cid:111) (cid:110) (cid:111) = (Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′Λ n −I r F t + (Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′−(Λ′ n Λ n )−1Λ′ n ξ nt +(Λ′Λ )−1Λ′ξ . (D.1) n n n nt Using (D.1) the first term on the rhs of (D.4) is such that (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T (F(cid:101)t −F t )F′ t (cid:13) (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:13) T−1(cid:88) T (cid:110) (Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′Λ n −I r (cid:111) F t F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) T−1(cid:88) T √ n (cid:110) (Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′−(Λ′ n Λ n )−1Λ′ n (cid:111) n−1/2ξ nt F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) + (cid:13) (cid:13)T−1(cid:88) T (Λ′Λ )−1Λ′ξ F′ (cid:13) (cid:13) (cid:13) n n n nt t(cid:13) (cid:13) (cid:13) t=1 ≤ (cid:40)(cid:13) (cid:13) (cid:13)T−1(cid:88) T F F′ (cid:13) (cid:13) (cid:13)+ (cid:13) (cid:13) (cid:13)n−1/2T−1(cid:88) T ξ F′ (cid:13) (cid:13) (cid:13) (cid:41) O (max(n−1,T−1/2)) (cid:13) t t(cid:13) (cid:13) nt t(cid:13) p (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 (cid:13) (cid:13) +∥n(Λ′Λ )−1∥ (cid:13) (cid:13)n−1T−1(cid:88) T Λ′ξ F′ (cid:13) (cid:13) n n (cid:13) n nt t(cid:13) (cid:13) (cid:13) t=1 =O (max(n−1,T−1/2))+O (n−1/2T−1/2), (D.2) p p by Lemma D.1(ii), which does not depend on t, and Lemmas C.2 (jointly with Assumption 1(a)), C.12(i), and C.12(iii), and part (i). The case k=1 is proved in the same way and this proves part (iii). For part (ii), as in part (i), we have (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) n−1/2T−1(cid:88) T (F(cid:101)t −F t )ξ n ′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 ≤ (cid:40)(cid:13) (cid:13) (cid:13)n−1/2T−1(cid:88) T F ξ′ (cid:13) (cid:13) (cid:13)+ (cid:13) (cid:13) (cid:13)n−1T−1(cid:88) T ξ ξ′ (cid:13) (cid:13) (cid:13) (cid:41) O (max(n−1,T−1/2)) (cid:13) t nt(cid:13) (cid:13) nt nt(cid:13) p (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 (cid:13) (cid:13) +∥n(Λ′Λ )−1∥ (cid:13) (cid:13)n−3/2T−1(cid:88) T Λ′ξ ξ′ (cid:13) (cid:13) n n (cid:13) n nt nt(cid:13) (cid:13) (cid:13) t=1 =O (max(n−1,T−1/2))+O (n−1/2T−1/2), p p byLemmaD.1(ii),whichdoesnotdependont,andLemmasC.2(jointlywithAssumption1(a)),C.12(iii),andC.12(vi), Page 31

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm and Lemma C.8(ii). This proves part (ii). Part (iii) follows directly from part (ii) but using Lemma C.8(iii). For part (iv), as in part (i), we have (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T (F(cid:101)t −F t )ξ it (cid:13) (cid:13) (cid:13) (cid:13) ≤ (cid:40)(cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t ξ it (cid:13) (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) (cid:13) n−1/2T−1(cid:88) T ξ nt ξ it (cid:13) (cid:13) (cid:13) (cid:13) (cid:41) O p (max(n−1,T−1/2)) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 t=1 (cid:13) (cid:13) +∥n(Λ′Λ )−1∥ (cid:13) (cid:13)n−1T−1(cid:88) T Λ′ξ ξ (cid:13) (cid:13) n n (cid:13) n nt it(cid:13) (cid:13) (cid:13) t=1 =O (max(n−1,T−1/2))+O (n−1/2T−1/2), p p bythesameargumentsusedtoprovepart(ii)andnoticingthatnow|ξ |=O (1)byAssumption2(a). Part(v)isproved it p as part (iv) setting ξ =1. This completes the proof. □ it Lemma D.3. Consider the initial estimator of the VAR parameters A(cid:98)(0) and Γ(cid:98)v(0) defined in Section A.1, then, under Assumptions 1, 2, 3, and 6, as n,T →∞: √ (i) min(n, √ T)∥A(cid:98)(0)−A∥=O p (1); (ii) min(n, T)∥Γ(cid:98)v(0)−Γv∥=O p (1). Proof. For part (i), in agreement with Assumption 1(i), we can always set F(cid:101)0 =0 r , so it follows that, by construction T−1(cid:80)T t=2 F(cid:101)t−1 F(cid:101)′ t−1 =T−1(cid:80)T t=1 F(cid:101)t−1 F(cid:101)′ t−1 =I r . Then, by Assumption 6(b), we have T A(cid:98) (0)−A=T−1(cid:88) F(cid:101)t F(cid:101) ′ t−1 −ΓF 1 t=2 (cid:40) T T (cid:41) (cid:40) T (cid:41) = T−1(cid:88) F(cid:101)t F(cid:101) ′ t−1 −T−1(cid:88) F t F′ t−1 + T−1(cid:88) F t F′ t−1 −ΓF 1 . (D.3) t=2 t=2 t=2 Now, T T T T T−1(cid:88) F(cid:101)t F(cid:101) ′ t−1 −T−1(cid:88) F t F′ t−1 =T−1(cid:88) (F(cid:101)t −F t )F′ t−1 +T−1(cid:88) F t (F(cid:101)t−1 −F t−1 )′ t=2 t=2 t=2 t=2 T +T−1(cid:88) (F(cid:101)t −F t )(F(cid:101)t−1 −F t−1 )′. (D.4) t=2 Now, by Lemma D.2(i) the first and second term in the rhs of (D.4) are O (max(n−1,T−1/2)), while the third term on p the rhs is dominated by the first two. Hence, for the first term on the rhs of (D.3) we have (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T F(cid:101)t F(cid:101) ′ t−1 −T−1(cid:88) T F t F′ t−1 (cid:13) (cid:13) (cid:13) =O p (max(n−1,T−1/2)). (cid:13) (cid:13) t=2 t=2 The second term on the rhs of (D.3) is O (T−1/2), by Lemma C.12(i). This proves part (i). p Part (ii) follows from the results in Forni et al. (2009, Proposition P) combined with part (i). This completes the proof. □ Lemma D.4. Consider the initial estimator of the idiosyncratic variances σ2(0), i = 1,...,n, defined in Section A.1, (cid:98)i then, under Assumptions 1, 2, 3, and 6, as n,T →∞: √ (i) min(n, T)|σ(0)2−σ2|=O (1), uniformly in i; √ (cid:98)i i p (ii) min(n, T)n−1| (cid:80)n (σ(0)2−σ2)|=O (1). i=1 (cid:98)i i p Page 32

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Proof. Start from T (σ (cid:98)i 2(0)−σ i 2)=T−1(cid:88) (x it −λ(cid:98) ( i 0)′F(cid:101)t )2−E[(x it −λ′ i F t )2] t=1 = (cid:40) T−1(cid:88) T x2 it −E[x2 it ] (cid:41) + (cid:110) λ(cid:98) ( i 0)′λ(cid:98) ( i 0)−λ′ i λ i (cid:111) t=1 (cid:40) T (cid:41) (cid:40) T (cid:41) −2 T−1(cid:88) λ′ i F t F(cid:101) ′ t λ(cid:98) ( i 0)−λ′ i λ i −2 T−1(cid:88) ξ it F(cid:101) ′ t λ(cid:98) ( i 0) , (D.5) t=1 t=1 since ΓF = I r by Assumption 6(b), E[F t ξ it ] = 0 r by Lemma C.11, and T−1(cid:80)T t=1 F(cid:101)t F(cid:101)′ t = I r by construction. For part (i), from (D.5) we have (cid:12) (cid:12) |σ (cid:98)i 2(0)−σ i 2|≤ (cid:12) (cid:12) (cid:12) T−1(cid:88) T x2 it −E[x2 it ] (cid:12) (cid:12) (cid:12) + (cid:12) (cid:12) (cid:12) λ(cid:98) ( i 0)′λ(cid:98) ( i 0)−λ′ i λ i (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) t=1 (cid:12) (cid:12) (cid:12) (cid:12) +2 (cid:12) (cid:12) (cid:12) T−1(cid:88) T λ′ i F t F′ t λ(cid:98) ( i 0)−λ′ i λ i (cid:12) (cid:12) (cid:12) +2 (cid:12) (cid:12) (cid:12) T−1(cid:88) T λ′ i F t (F(cid:101)t −F t )′λ(cid:98) ( i 0) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) t=1 t=1 (cid:12) (cid:12) (cid:12) (cid:12) +2 (cid:12) (cid:12) (cid:12) T−1(cid:88) T ξ it F′ t λ(cid:98) ( i 0) (cid:12) (cid:12) (cid:12) +2 (cid:12) (cid:12) (cid:12) T−1(cid:88) T ξ it (F(cid:101)t −F t )′λ(cid:98) ( i 0) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) t=1 t=1 (cid:12) (cid:12) ≤ (cid:12) (cid:12) (cid:12) T−1(cid:88) T x2 it −E[x2 it ] (cid:12) (cid:12) (cid:12) + (cid:12) (cid:12) (cid:12) λ(cid:98) ( i 0)′λ(cid:98) ( i 0)−λ′ i λ i (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) t=1 (cid:13) (cid:13) +2∥λ i ∥∥λ(cid:98) ( i 0)−λ i ∥+2∥λ i ∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t −I r (cid:13) (cid:13) (cid:13) {∥λ(cid:98) ( i 0)−λ i ∥+∥λ i ∥} (cid:13) (cid:13) t=1 (cid:13) (cid:13) +2∥λ i ∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t (F(cid:101)t −F t )′ (cid:13) (cid:13) (cid:13) {∥λ(cid:98) ( i 0)−λ i ∥+∥λ i ∥} (cid:13) (cid:13) t=1 (cid:13) (cid:13) (cid:13) (cid:13) +2 (cid:13) (cid:13) (cid:13) T−1(cid:88) T ξ it F′ t (cid:13) (cid:13) (cid:13) ∥λ i ∥+2 (cid:13) (cid:13) (cid:13) T−1(cid:88) T ξ it F′ t (cid:13) (cid:13) (cid:13) ∥λ(cid:98) ( i 0)−λ i ∥ (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 (cid:13) (cid:13) +2 (cid:13) (cid:13) (cid:13) T−1(cid:88) T ξ it (F(cid:101)t −F t )′ (cid:13) (cid:13) (cid:13) {∥λ(cid:98) ( i 0)−λ i ∥+∥λ i ∥} (cid:13) (cid:13) t=1 =O (max(n−1,T−1/2)), (D.6) p where,weusedmultipletimesAssumption1(a)andLemmaD.1(i),and,wealsoused: LemmaC.12(v)forthefirstterm, LemmaC.12(i)forthefourthterm,LemmaD.2(i)forthefifthterm,LemmaC.12(ii)forthesixthandseventhterm,and Lemma D.2(iv) for the last term. This proves part (i). Page 33

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm As for part (ii), from (D.5) we have n−1 (cid:12) (cid:12) (cid:12) (cid:12) (cid:88) n (σ (cid:98)i 2(0)−σ i 2) (cid:12) (cid:12) (cid:12) (cid:12) ≤n−1 (cid:12) (cid:12) (cid:12) (cid:12) (cid:88) n (cid:40) T−1(cid:88) T x2 it −E[x2 it ] (cid:41)(cid:12) (cid:12) (cid:12) (cid:12) +n−1 (cid:12) (cid:12) (cid:12) (cid:12) (cid:88) n {λ(cid:98) ( i 0)′λ(cid:98) ( i 0)−λ′ i λ i } (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) i=1 i=1 t=1 i=1 +2n−1 (cid:12) (cid:12) (cid:12) (cid:12) (cid:88) n (cid:40) T−1(cid:88) T λ′ i F t F′ t λ(cid:98) ( i 0)−λ′ i λ i (cid:41)(cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) i=1 t=1 +2n−1 (cid:12) (cid:12) (cid:12) (cid:12) (cid:88) n (cid:40) T−1(cid:88) T λ′ i F t (F(cid:101)t −F t )′λ(cid:98) ( i 0) (cid:41)(cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) i=1 t=1 +2n−1 (cid:12) (cid:12) (cid:12) (cid:12) (cid:88) n (cid:40) T−1(cid:88) T ξ it F′ t (cid:41) λ(cid:98) ( i 0) (cid:12) (cid:12) (cid:12) (cid:12) +2n−1 (cid:12) (cid:12) (cid:12) (cid:12) (cid:88) n (cid:40) T−1(cid:88) T ξ it (F(cid:101)t −F t )′ (cid:41) λ(cid:98) ( i 0) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) i=1 t=1 i=1 t=1 =n−1 (cid:12) (cid:12) (cid:12) (cid:12) (cid:88) n (cid:40) T−1(cid:88) T x2 it −E[x2 it ] (cid:41)(cid:12) (cid:12) (cid:12) (cid:12) +n−1 (cid:12) (cid:12) (cid:12) (cid:12) (cid:88) n {λ(cid:98) ( i 0)′λ(cid:98) ( i 0)−λ′ i λ i } (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) i=1 t=1 i=1 (cid:13) (cid:13) +2n−1/2∥Λ n ∥n−1/2 (cid:13) (cid:13) (cid:13) Λ(cid:98) ( n 0)−Λ n (cid:13) (cid:13) (cid:13) +2n−1∥Λ′ n Λ n ∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t −I r (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) +2n−1/2∥Λ n ∥n−1/2 (cid:13) (cid:13) (cid:13) Λ(cid:98) ( n 0)−Λ n (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t −I r (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) +2n−1∥Λ′ n Λ n ∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t (F(cid:101)t −F t )′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) +2n−1/2∥Λ n ∥n−1/2∥Λ(cid:98) ( n 0)−Λ n ∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t (F(cid:101)t −F t )′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:12) (cid:12) (cid:12) (cid:12) +2n−1 (cid:12) (cid:12) (cid:12) T−1(cid:88) T F′ t Λ′ n ξ nt (cid:12) (cid:12) (cid:12) +2n−1 (cid:12) (cid:12) (cid:12) T−1(cid:88) T F′ t (Λ(cid:98) ( n 0)−Λ n )′ξ nt (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) t=1 t=1 (cid:12) (cid:12) +2n−1 (cid:12) (cid:12) (cid:12) T−1(cid:88) T (F(cid:101)t −F t )′Λ′ n ξ nt (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) t=1 (cid:12) (cid:12) +2n−1 (cid:12) (cid:12) (cid:12) T−1(cid:88) T (F(cid:101)t −F t )′(Λ(cid:98) ( n 0)−Λ n )′ξ nt (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) t=1 =O (max(n−1,T−1/2)). (D.7) p The result in (D.7) follows from repeated use of Lemmas C.2, C.12(i), D.1(ii), and D.2(i) as well as the following results. First, n−1 (cid:12) (cid:12) (cid:12) (cid:88) n (cid:40) T−1(cid:88) T x2 −E[x2] (cid:41)(cid:12) (cid:12) (cid:12)≤n−1 (cid:13) (cid:13) (cid:13)T−1(cid:88) T x x′ −E[x x′ ] (cid:13) (cid:13) (cid:13) =O (T−1/2), (cid:12) it it (cid:12) (cid:13) nt nt nt nt (cid:13) p (cid:12) (cid:12) (cid:13) (cid:13) i=1 t=1 t=1 F by Lemma C.12(vii). Second, (cid:12) (cid:12) (cid:13) (cid:13) n−1 (cid:12) (cid:12)T−1(cid:88) T F′Λ′ξ (cid:12) (cid:12)≤n−1 (cid:13) (cid:13)T−1(cid:88) T Λ′ξ F′ (cid:13) (cid:13) =O (n−1/2T−1/2), (cid:12) t n nt(cid:12) (cid:13) n nt t(cid:13) p (cid:12) (cid:12) (cid:13) (cid:13) t=1 t=1 F by Lemma C.8(i). Third, (cid:12) (cid:12) (cid:13) (cid:13) n−1 (cid:12) (cid:12) (cid:12) T−1(cid:88) T (F(cid:101)t −F t )′Λ′ n ξ nt (cid:12) (cid:12) (cid:12) ≤n−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T Λ′ n ξ nt (F(cid:101)t −F t )′ (cid:13) (cid:13) (cid:13) (cid:12) (cid:12) (cid:13) (cid:13) t=1 t=1 F (cid:13) (cid:13) ≤ √ rn−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T Λ′ n ξ nt (F(cid:101)t −F t )′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 =O (max(n−1,T−1/2)), p Page 34

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm by Lemma D.2(iii). Fourth, (cid:12) (cid:12) (cid:13) (cid:13) n−1 (cid:12) (cid:12) (cid:12) T−1(cid:88) T F′ t (Λ(cid:98) ( n 0)−Λ n )′ξ nt (cid:12) (cid:12) (cid:12) ≤n−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T (Λ(cid:98) ( n 0)−Λ n )′ξ nt F′ t (cid:13) (cid:13) (cid:13) (cid:12) (cid:12) (cid:13) (cid:13) t=1 t=1 F (cid:13) (cid:13) ≤ √ rn−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T (Λ(cid:98) ( n 0)−Λ n )′ξ nt F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) ≤ √ rn−1/2∥Λ(cid:98) ( n 0)−Λ n ∥n−1/2 (cid:13) (cid:13) (cid:13) T−1(cid:88) T ξ nt F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 =O (max(n−1,T−1/2))+O (T−1/2), p p by Lemmas C.12(iii) and D.1(ii). And, last (cid:12) (cid:12) (cid:13) (cid:13) n−1 (cid:12) (cid:12) (cid:12) T−1(cid:88) T (F(cid:101)t −F t )′(Λ(cid:98) ( n 0)−Λ n )′ξ nt (cid:12) (cid:12) (cid:12) ≤n−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T (Λ(cid:98) ( n 0)−Λ n )′ξ nt (F(cid:101)t −F t )′ (cid:13) (cid:13) (cid:13) (cid:12) (cid:12) (cid:13) (cid:13) t=1 t=1 F (cid:13) (cid:13) ≤ √ rn−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T (Λ(cid:98) ( n 0)−Λ n )′ξ nt (F(cid:101)t −F t )′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) ≤ √ rn−1/2∥Λ(cid:98) ( n 0)−Λ n ∥n−1/2 (cid:13) (cid:13) (cid:13) T−1(cid:88) T ξ nt (F(cid:101)t −F t )′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 =O (max(n−1,T−1/2)), p by Lemmas C.12(iii) and D.2(ii). This completes the proof. □ Lemma D.5. Under Assumptions 1, 2, 3, and 6, as n,T →∞: √ (i) min(n, √ T)n−1∥Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)−Λ′ n (Σξ n )−1Λ n ∥=O p (1); (ii) min(n, T)n−1/2∥Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1−Λ′ n (Σξ n )−1∥=O p (1); (iii) n∥(Λ(cid:98) ( n 0) √ ′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1∥=O p (1); (iv) min(n, √ T)n √ ∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1−(Λ′ n (Σξ n )−1Λ n )−1∥=O p (1); (v) min(n, T) n∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1−(Λ′ n (Σξ n )−1Λ n )−1Λ′ n (Σξ n )−1∥=O p (1). Proof. Start with n−1∥Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0) −Λ′ n (Σξ n )−1Λ n ∥≤2n−1∥{Λ(cid:98) ( n 0)−Λ n }′(Σξ n )−1Λ n ∥ +n−1∥Λ′ n {(Σ(cid:98) ξ n (0))−1−(Σξ n )−1}Λ n ∥ +2n−1∥{Λ(cid:98) ( n 0)−Λ n }′{(Σ(cid:98) ξ n (0))−1−(Σξ n )−1}Λ n ∥ +n−1∥{Λ(cid:98) ( n 0)−Λ n }′{(Σ(cid:98) ξ n (0))−1−(Σξ n )−1}{Λ(cid:98) ( n 0)−Λ n }∥ ≤2n−1/2∥Λ(cid:98) ( n 0)−Λ n ∥∥(Σξ n )−1∥n−1/2∥Λ n ∥ +n−1∥Λ′ n {(Σ(cid:98) ξ n (0))−1−(Σξ n )−1}Λ n ∥ +2n−1/2∥Λ(cid:98) ( n 0)−Λ n ∥n−1/2∥{(Σ(cid:98) ξ n (0))−1−(Σξ n )−1}Λ n ∥ +n−1∥Λ(cid:98) ( n 0)−Λ n ∥2∥(Σ(cid:98) ξ n (0))−1−(Σξ n )−1∥. (D.8) Consider each term on the rhs of (D.8). The first term is n−1/2∥Λ(cid:98) ( n 0)−Λ n ∥∥(Σξ n )−1∥n−1/2∥Λ n ∥=O p (max(n−1,T−1/2)), (D.9) by Lemma D.1(ii), and also Assumption 2(a) for which ∥(Σξ)−1∥ ≤ C , and Lemma C.2 for which n−1/2∥Λ ∥ = O(1). n ξ n The third term is n−1/2∥Λ(cid:98) ( n 0)−Λ n ∥n−1/2∥{(Σ(cid:98) ξ n (0))−1−(Σξ n )−1}Λ n ∥ (cid:110) (cid:111) ≤n−1/2∥Λ(cid:98) ( n 0)−Λ n ∥ ∥(Σ(cid:98) ξ n (0))−1∥+∥(Σξ n )−1∥ n−1/2∥Λ n ∥ =O (max(n−1,T−1/2)), (D.10) p Page 35

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm by Lemmas C.2 and D.1(ii), Assumption 2(a), and since, by Lemma D.4(i), for all j =1,...,n, (cid:110) (cid:111)−1 (cid:26) (cid:27)−1 (cid:26) (cid:27)−1 ∥(Σ(cid:98) ξ n (0))−1∥= ν(n)(Σ(cid:98) ξ n (0)) = min σ (cid:98)i 2(0) = min σ i 2+σ (cid:98)i 2(0)−σ i 2 i=1,...,n i=1,...,n (cid:26) (cid:27)−1 ≤ min σ2+ min (σ2(0)−σ2) i (cid:98)i i i=1,...,n i=1,...,n (cid:26) (cid:27)−1 ≤ min σ2− min |σ2(0)−σ2| i (cid:98)i i i=1,...,n i=1,...,n (cid:110) (cid:111)−1 ≤ C−1−|σ2(0)−σ2| ≤C +O (max(n−1,T−1/2)). (D.11) ξ (cid:98)j j ξ p By the same arguments, the fourth term is (cid:110) (cid:111) n−1∥Λ(cid:98) ( n 0)−Λ n ∥2 ∥(Σ(cid:98) ξ n (0))−1∥+∥(Σξ n )−1∥ =o p (max(n−1,T−1/2)). (D.12) Finally, the second term on the rhs of (D.8) is (cid:13) (cid:13) n−1∥Λ′ n {(Σ(cid:98) ξ n (0))−1−(Σξ n )−1}Λ n ∥=n−1 (cid:13) (cid:13) (cid:13) (cid:88) n λ i λ′ i {(σ (cid:98)i 2(0))−1−(σ i 2)−1} (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) i=1  r (cid:34) n (cid:35)2 1/2 ≤n−1  (cid:88) (cid:88) λ ih λ ik {(σ (cid:98)i 2(0))−1−(σ i 2)−1}  k,h=1 i=1 (cid:12) (cid:12) ≤rM n−1 (cid:12) (cid:12) (cid:88) n {(σ2(0))−1−(σ2)−1} (cid:12) (cid:12) λ (cid:12) (cid:98)i i (cid:12) (cid:12) (cid:12) i=1 (cid:12) (cid:12) =rM n−1 (cid:12) (cid:12) (cid:88) n (σ2(0))−1(σ2)−1{σ2(0)−σ2} (cid:12) (cid:12) λ (cid:12) (cid:98)i i (cid:98)i i (cid:12) (cid:12) (cid:12) i=1 (cid:12) (cid:12) =rM C n−1 (cid:12) (cid:12) (cid:26) min σ2(0) (cid:27)−1 (cid:88) n {σ2(0)−σ2} (cid:12) (cid:12) λ ξ (cid:12) (cid:98)i (cid:98)i i (cid:12) (cid:12) i=1,...,n (cid:12) i=1 (cid:12) (cid:12) =rM C2n−1 (cid:12) (cid:12) (cid:88) n {σ2(0)−σ2} (cid:12) (cid:12) λ ξ (cid:12) (cid:98)i i (cid:12) (cid:12) (cid:12) i=1 =O (max(n−1,T−1/2)), (D.13) p by Lemma D.4(ii), (D.11), and Assumptions 1(a) and 2(a). By substituting (D.9), (D.10), (D.12), and (D.13) into (D.8), we prove part (i). Page 36

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm For part (ii), we have n−1∥Λ(cid:98) ( n 0)(Σ(cid:98) ξ n (0))−1−Λ n (Σξ n )−1∥2 =n−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:88) n λ(cid:98) ( i 0)(σ (cid:98)i 2(0))−1− (cid:88) n λ i (σ i 2)−1 (cid:13) (cid:13) (cid:13) (cid:13) 2 (cid:13) (cid:13) i=1 i=1 =n−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:88) n λ(cid:98) ( i 0)(σ (cid:98)i 2(0)−σ i 2+σ i 2)−1− (cid:88) n λ i (σ i 2)−1 (cid:13) (cid:13) (cid:13) (cid:13) 2 (cid:13) (cid:13) i=1 i=1 =n−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:88) n λ(cid:98) ( i 0)σ i 2{(σ (cid:98)i 2(0)−σ i 2)/σ i 2+1}−1− (cid:88) n λ i (σ i 2)−1 (cid:13) (cid:13) (cid:13) (cid:13) 2 (cid:13) (cid:13) i=1 i=1 ≤n−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:26) min (σ (cid:98)i 2(0)−σ i 2)/σ i 2+1 (cid:27)−1 (cid:88) n λ(cid:98) ( i 0)(σ i 2)−1− (cid:88) n λ i (σ i 2)−1 (cid:13) (cid:13) (cid:13) (cid:13) 2 (cid:13) i=1,...,n (cid:13) i=1 i=1 =n−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:26) 1− min (σ (cid:98)i 2(0)−σ i 2)/σ i 2+o (cid:18) min (σ (cid:98)i 2(0)−σ i 2)/σ i 2 (cid:19)(cid:27) (cid:88) n λ(cid:98) ( i 0)(σ i 2)−1− (cid:88) n λ i (σ i 2)−1 (cid:13) (cid:13) (cid:13) (cid:13) 2 (cid:13) i=1,...,n i=1,...,n (cid:13) i=1 i=1 (cid:13) (cid:13)2 (cid:26) (cid:12) (cid:12)(cid:27)2(cid:13) (cid:13)2 ≤n−1(cid:13) (cid:13) Λ(cid:98) ( n 0)(Σξ n )−1−Λ n (Σξ n )−1(cid:13) (cid:13) + min (cid:12) (cid:12) (σ (cid:98)i 2(0)−σ i 2)/σ i 2(cid:12) (cid:12) (cid:13) (cid:13) Λ(cid:98) ( n 0)(Σξ n )−1(cid:13) (cid:13) i=1,...,n (cid:13) (cid:13)2 (cid:26) (cid:27)2 ≤n−1(cid:13) (cid:13) Λ(cid:98) ( n 0)(Σξ n )−1−Λ n (Σξ n )−1(cid:13) (cid:13) + min |σ (cid:98)i 2(0)−σ i 2| L2 ξ n−1∥Λ n ∥2∥(Σξ n )−1∥2 i=1,...,n (cid:26) (cid:12) (cid:12)(cid:27)2 + min (cid:12) (cid:12) σ (cid:98)i 2(0)−σ i 2(cid:12) (cid:12) L2 ξ n−1∥Λ(cid:98) ( n 0)−Λ n ∥2∥(Σξ n )−1∥2 i=1,...,n =O (max(n−2,T−1)), p by Lemmas C.2, D.1(ii), D.4(ii), and Assumptions 2(a) and 2(f). This proves part (ii). For part (iii), by part (ii) and Merikoski and Kumar (2004, Theorem 1) which is Weyl’s inequality, we have n−1|ν(r)(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−ν(r)(Λ′ n (Σξ n )−1)Λ n )|≤n−1∥Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)−Λ′ n (Σξ n )−1Λ n ∥ =O (max(n−1,T−1/2)). (D.14) p Moreover (note that x−y≥−|x−y| for any x,y∈R), det(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))= (cid:89) r ν(j)(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))≥ (cid:110) ν(r)(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)) (cid:111)r j=1 (cid:110) (cid:111)r ≥ ν(r)(Λ′ n (Σξ n )−1Λ n )−|ν(r)(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−ν(r)(Λ′ n (Σξ n )−1)Λ n )| , thus, by Lemma C.3(iv), which implies lim n−1ν(r)(Λ′(Σξ)−1Λ )>0, and (D.14), it follows that, with probability n→∞ n n n tending to one as n,T →∞, we have det(n−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))>0, or, equivalently n−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0) is positive definite, i.e., n∥(Λ(cid:98)n (0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1∥=O p (1). This proves part (iii). For part (iv), we have n∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1−(Λ′ n (Σξ n )−1Λ n )−1∥ ≤n∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1∥n−1∥Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)−Λ′ n (Σξ n )−1Λ n ∥n∥(Λ′ n (Σξ n )−1Λ n )−1∥ =O (max(n−1,T−1/2)), p because of parts (i) and (iii) and Lemma C.3(iii). Part (v) follows directly from parts (ii) and (iv). This completes the proof. □ Lemma D.6. For all T ∈N, (i) P , P , and P are deterministic r×r matrices, for all t=1,...,T; t|t−1 t|t t|T (ii) ∥P ∥≤∥P ∥, for all t=1,...,T −1; t+1|t t|t−1 (iii) P , P , and P are deterministic r×r matrices, for all t=1,...,T; 0,t|t−1 0,t|t 0,t|T (iv) ∥P ∥≤∥P ∥, for all t=1,...,T −1. 0,t+1|t 0,t|t−1 Proof. Since P = I is deterministic, then, for all t = 1,...,T, P , P , and P do not depend on the actual 0|0 r t|t−1 t|t t|T Page 37

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm observations because of (A.2) and (A.4). This proves part (i). As for part (ii), since F is based on less information than F and since P and P are deterministic, t|t−1 t+1|t t|t−1 t+1|t then (P −P ) is a positive definite matrix for all t=1,...,T −1 (see, e.g., Harvey, 1990, Chapter 3.3, p. 123). t|t−1 t+1|t As consequence, ∥P ∥≤∥P ∥ for all t=1,...,T −1 (see, e.g., Marshall et al., 2011, Proposition L1, p. 360). t+1|t t|t−1 The proof of parts (iii) and (iv) is identical to parts (i) and (ii), respectively, since also in this case P =I . This 0,0|0 r completes the proof. □ Lemma D.7. Under Assumptions 1 and 2, for all T ∈N, (i) max ∥P ∥≤M for some finite positive real M ; t=1,...,T t|t−1 P P (ii) min ν(r)(P )≥M for some finite positive real M . t=1,...,T t|t−1 P P (iii) max ∥P ∥≤M for some finite positive real M ; t=1,...,T 0,t|t−1 P P (iv) min ν(r)(P )≥M for some finite positive real M . t=1,...,T 0,t|t−1 P P Proof. Given that P =I is obviously positive definite, by (A.2) and Weyl’s inequality (Merikoski and Kumar, 2004, 0|0 r Theorem 1) it follows that ν(r)(P )≥ν(r)(AA′)+ν(r)(Γv)≥M−1, (D.15) 1|0 v for some finite positive real M−1, since Γv has full rank by Assumption 1(e) and ν(r)(AA′) is real and such that v ν(r)(AA′)≥(ν(r)(A))2 ≥0. From P =I and (A.2) it follows also that 0|0 r ∥P ∥≤∥A∥2∥P ∥+∥Γv∥2 ≤M2 +M2 =M , say. (D.16) 1|0 0|0 A v P By Lemma D.6(ii) and (D.16), we have max ∥P ∥≤∥P ∥≤M , t|t−1 1|0 P t=2,...,T since M is independent of t and P is deterministic because of Lemma D.6(i). This proves part (i). P t|t−1 For part (ii), by Merikoski and Kumar (2004, Theorems 1 and 7) and (A.2) ν(r)(P )≥ν(r)(AA′)ν(r)(P )+ν(r)(Γv)≥M−1, (D.17) t|t−1 t|t v by the same arguments leading to (D.15) and since ν (P ) ≥ 0 because P is at least positive semidefinite by min t|t t|t construction. By letting M = M−1, we prove part (ii). Parts (iii) and (iv) are proved exactly as parts (i) and (ii), P v respectively, since also in this case P =I . This completes the proof. □ 0,0|0 r Lemma D.8. Under Assumptions 1, 2, 3, and 6, as n,T →∞: (i) max ∥P(0) ∥=O (1); t=1,...,T t|t−1 p (ii) max ∥(P(0) )−1∥=O (1). t=1,...,T t|t−1 p Proof. For part (i), max ∥P(0) ∥≤ max ∥P ∥+ max ∥P(0) −P ∥ t=1,...,T t|t−1 t=1,...,T t|t−1 t=1,...,T t|t−1 t|t−1 =O (1)+O (max(n−1,T−1/2)), p p by Lemma D.7(i) and since the second term on the rhs depends only on the estimation error of A(cid:98)(0), Γ(cid:98)v(0), n−1/2Λ(cid:98) ( n 0), n−1/2Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1 andn−1(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1,whichareallboundedbyLemmasD.1(ii),D.3,D.5(ii),andD.5(iv). This proves part (i). Part (ii) is proved in the same way as part (i) but using Lemma D.7(ii). This completes the proof. □ Lemma D.9. For m<n with m independent of n and given (a) an n×n matrix A symmetric and positive definite with ∥A∥≤M ; A (a) an m×m matrix B symmetric and positive definite with ∥B∥≤M ; B (c) an n×m matrix U such that ∥n−1U′U∥≤M and rk(U)=m; U (d) an m×n matrix V such that ∥n−1VV′∥≤M and rk(V)=m; V where, M , M , M , and M are finite positive reals independent of n and m, then the following holds A B U V (i) (A+UBV)−1 =A−1−A−1UB(I +VA−1UB)−1VA−1; m Page 38

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm (ii) (A+UBV)−1 =A−1−A−1UBV(I +VA−1UBV)−1A−1. m Proof. Both results are proved by Henderson and Searle (1981, eq. (24) and eq. (25), respectively). □ Lemma D.10. For any r×r symmetric and positive definite matrix P with ∥P∥≤M for some finite positive real M , P P under Assumptions 1 and 2 PΛ′(Λ PΛ′ +Σξ)−1Λ =P((Λ′(Σξ)−1Λ )−1+P)−1. n n n n n n n n Proof. In Lemma D.9(i) set A = Σξ, B = I , U = Λ P, and V = Λ′. Then, by noticing that the assumptions of n r n n Lemma D.9 are satisfied because of Assumptions 1(a) and 2(a), it follows that: (Λ PΛ′ +Σξ)−1 =(Σξ)−1−(Σξ)−1Λ P(I +Λ′(Σξ)−1Λ P)−1Λ′(Σξ)−1. n n n n n n r n n n n n Therefore, PΛ′(Λ PΛ′ +Σξ)−1Λ =PΛ′{(Σξ)−1−(Σξ)−1Λ P(I +Λ′(Σξ)−1Λ P)−1Λ′(Σξ)−1}Λ n n n n n n n n n r n n n n n n =P{Λ′(Σξ)−1Λ −Λ′(Σξ)−1Λ P(I +Λ′(Σξ)−1Λ P)Λ′(Σξ)−1Λ }. (D.18) n n n n n n r n n n n n n Now,inLemmaD.9(ii)setA=(Λ′(Σξ)−1Λ )−1,B=P,U =I ,andV =I ,andnoticethattheassumptionstherein n n n r r are satisfied because of Lemmas C.3(iii) and C.3(v). Then, for the last line of (D.18) we have Λ′(Σξ)−1Λ −Λ′(Σξ)−1Λ P(I +Λ′(Σξ)−1Λ P)Λ′(Σξ)−1Λ =((Λ′(Σξ)−1Λ )−1+P)−1. (D.19) n n n n n n r n n n n n n n n n By substituting (D.19) into (D.18) we complete the proof. □ Lemma D.11. Under Assumptions 1, 2, 3, and 6, as n,T →∞, max n∥P(0)∥=O (1). t=1,...,T t|t p Proof. From (A.4) by using Lemma D.10, but with Λ(cid:98) ( n 0) and Σ(cid:98) ξ n (0) in place of Λ n and Σξ n , it holds that: P( t| 0 t ) =P( t| 0 t ) −1 −P( t| 0 t ) −1 Λ(cid:98) ( n 0)′(Λ(cid:98) ( n 0)P( t| 0 t ) −1 Λ(cid:98) ( n 0)′+Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)P( t| 0 t ) −1 (cid:110) (cid:111) = I r −P( t| 0 t ) −1 Λ(cid:98) ( n 0)′(Λ(cid:98) ( n 0)P( t| 0 t ) −1 Λ(cid:98) ( n 0)′+Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0) P( t| 0 t ) −1 (cid:110) (cid:111) = I r −P( t| 0 t ) −1 ((Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1+P( t| 0 t ) −1 )−1 P( t| 0 t ) −1 . (D.20) Then, by setting in Lemma C.4 K =P( t| 0 t ) −1 and H =(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1, for the last line of (D.20) we have ((Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1+P( t| 0 t ) −1 )−1 =(P( t| 0 t ) −1 )−1 (D.21) −((Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1+P( t| 0 t ) −1 )−1(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1(P( t| 0 t ) −1 )−1. By substituting (D.21) into (D.20) we get P( t| 0 t ) =P( t| 0 t ) −1 ((Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1+P( t| 0 t ) −1 )−1(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1. (D.22) Finally, by using again (D.21) into (D.22) (cid:110) (cid:111) P( t| 0 t ) =P( t| 0 t ) −1 (P( t| 0 t ) −1 )−1−((Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1+P( t| 0 t ) −1 )−1(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1(P( t| 0 t ) −1 )−1 ·(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1 (cid:110) (cid:111) = I r −P( t| 0 t ) −1 ((Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1+P( t| 0 t ) −1 )−1(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1(P( t| 0 t ) −1 )−1 ·(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1 =(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1 −P( t| 0 t ) −1 ((Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1+P( t| 0 t ) −1 )−1(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1 ·(P( t| 0 t ) −1 )−1(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1. (D.23) Page 39

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm NoticethatwecoulduseLemmasC.4andD.10toderive(D.23)sinceallinversesusedarewelldefinedbecauseofLemmas D.5(iii), D.8(i), and D.8(ii) and (D.11) in the proof of Lemma D.4. Therefore, from (D.23) t= m 1, a .. x .,T n∥P( t| 0 t )∥≤n∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1∥ + t= m 1, a .. x .,T ∥P( t| 0 t ) −1 ∥ t= m 1, a .. x .,T ∥(P( t| 0 t ) −1 )−1∥n∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1∥2 ·∥((Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1+P( t| 0 t ) −1 )−1∥ =O (1)+O (n−1), p p because of Lemmas D.5(iii), D.8(i), and D.8(ii), and since, by Merikoski and Kumar (2004, Theorem 1) which is Weyl’s inequality, (cid:110) (cid:111)−1 ∥((Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1+P( t| 0 t ) −1 )−1∥= ν(r)((Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1+P( t| 0 t ) −1 ) (cid:110) (cid:111)−1 ≤ ν(r)((Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1)+ν(r)(P( t| 0 t ) −1 ) (cid:26)(cid:104) (cid:105)−1 (cid:27)−1 = ν(1)(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)) +ν(r)(P( t| 0 t ) −1 ) (cid:26)(cid:104) (cid:105)−1 (cid:27)−1(cid:110) (cid:111)−1 = ν(1)(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))ν(r)(P( t| 0 t ) −1 ) +1 ν(r)(P( t| 0 t ) −1 ) (cid:26) (cid:104) (cid:105)−1 (cid:27)(cid:110) (cid:111)−1 = 1− ν(1)(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))ν(r)(P( t| 0 t ) −1 ) ν(r)(P( t| 0 t ) −1 ) +O p (n−2) =O (1), (D.24) p again by Lemmas D.5(iii), D.8(i), and D.8(ii). This completes the proof. □ Lemma D.12. Under Assumptions 1, 2, 3, and 6, as n,T →∞, max n∥P(0)∥=O (1). t=1,...,T t|T p Proof. From (A.7), we get ∥P(0) −P(0)∥≤∥P(0)∥2∥A(cid:98) (0)∥2∥(P(0) )−1∥2{∥P(0) ∥+∥P(0) ∥}. (D.25) t|T t|t t|t t+1|t t+1|T t+1|t Start with t=T −1, then from (D.25), ∥P(0) −P(0) ∥≤∥P(0) ∥2∥A(cid:98) (0)∥2∥(P(0) )−1∥2{∥P(0) ∥+∥P(0) ∥} T−1|T T−1|T−1 T−1|T−1 T|T−1 T|T T|T−1 =O (n−2). (D.26) p by Lemmas D.8(i), D.8(ii), and D.11, and since ∥A(cid:98)(0)∥ ≤ ∥A∥+∥A(cid:98)(0)−A∥ = O p (1), by Assumption 1(d) and Lemma D.3(i). From (D.26) it follows that ∥P(0) ∥≤∥P(0) ∥+∥P(0) −P(0) ∥=O (n−1)+O (n−2). (D.27) T−1|T T−1|T−1 T−1|T T−1|T−1 p p Thus, at t=T −2, from (D.25) and (D.27), ∥P(0) −P(0) ∥≤∥P(0) ∥2∥A(cid:98) (0)∥2∥(P(0) )−1∥2{∥P(0) ∥+∥P(0) ∥} T−2|T T−2|T−2 T−2|T−2 T−1|T−2 T−1|T T−1|T−2 =O (n−2). (D.28) p From (D.28) it follows that ∥P(0) ∥≤∥P(0) ∥+∥P(0) −P(0) ∥=O (n−1)+O (n−2). (D.29) T−2|T T−2|T−2 T−2|T T−2|T−2 p p Since all the bounds in (D.26)-(D.29) are the same for all t, from Lemma D.11 and (D.25) we have max n∥P(0)∥≤ max n∥P(0)∥+ max n∥P(0) −P(0)∥=O (1)+O (n−1). t=1,...,T t|T t=1,...,T t|t t=1,...,T t|T t|t p p Page 40

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm This completes the proof. □ Lemma D.13. For m<n, and given symmetric positive definite matrices A of dimension m×m and B of dimension n×n, and for C of dimension n×m with rk(C)=m, the following holds AC′(CAC′+B)−1 =(A−1+C′B−1C)−1C′B−1. (D.30) Proof. Recall the Woodbury forumla (CAC′+B)−1 =B−1−B−1C(A−1+C′B−1C)−1C′B−1. (D.31) Denote D=(A−1+C′B−1C)−1 then from (D.31) the lhs of (D.30) is equivalent to AC′(cid:2) B−1−B−1CDC′B−1(cid:3) =A (cid:2) C′B−1−C′B−1CDC′B−1(cid:3) =A (cid:2) I−C′B−1CD (cid:3) C′B−1. Then, (D.30) becomes A (cid:2) I−C′B−1CD (cid:3) C′B−1 =DC′B−1, or equivalently multiplying both sides on the right by BC(C′C)−1 A (cid:2) I−C′B−1CD (cid:3) =D. (D.32) Now multiplying (D.32) on the left by A−1 and on the right by D−1 (cid:2) D−1−C′B−1C (cid:3) =A−1, which is equivalent to A−1+C′B−1C−C′B−1C−A−1 =0 , m×m which is always true. □ LemmaD.14. UnderAssumptions1,2,3,and6,asn,T →∞,foralls=0,...,T,∥F(0)∥=O (1),uniformlyint≤s. t|s p Proof. LetΩ(cid:98) F s (0) =E φ(cid:98) (0) [F s F s ′],whichisrs×rshavingther×r generic(t 1 ,t 2 )blockdenotedby[Ω(cid:98) F s (0)] t1,t2 andsuch that [Ω(cid:98) F s (0)] t2,t1 =[Ω(cid:98) F s (0)]′ t1,t2 and (cid:16) (cid:17) vec [Ω(cid:98) F s (0)] t1,t2 =(I r ⊗A(cid:98) (0))|t1−t2|(I r2 −{A(cid:98) (0)⊗A(cid:98) (0)})−1vec(Γ(cid:98) v(0)), t 1 ,t 2 =1,...,s. Notice that although Ω(cid:98) F s (0) depends on A(cid:98)(0) and Γ(cid:98)v(0), for simplicity of notation, hereafter, we omit such dependence. Let also ΩF = E[F F′], clearly ΩF is positive definite, since by Assumptions 1(b) and 1(d), ∥[ΩF] ∥ ≤ ∥ΓF∥ for all s s s s s t1,t2 t ̸=t . Moreover, recall that ΩF is a block-Toeplitz matrix and define the corresponding circulant matrix as ΦF, then 1 2 s s (see, e.g., Gray, 2006, Lemma 4.3 and Section 3.1). s−1|ν(1)(ΩF)−ν(1)(ΨF)|=O(s−1/2), and ν(1)(ΨF)=O(s). Thus, s s s √ ∥ΩF∥=ν(1)(ΩF)≤ν(1)(ΨF)+|ν(1)(ΩF)−ν(1)(ΨF)|=O(s)+O( s), s s s s s which implies s∥(ΩF)−1∥=O(1). (D.33) s Now the r×r generic (t ,t ) block of (ΩF)−1 is an analytic function of A and Γv, which, in the case r =1, is given by 1 2 s (see Akaike, 1973 for the case r>1)   1+ 1 A2 i i f f t 1 t = = t 2 t =1 a a n n d d 1 t < 1 = t t , 2 t = < s, s, [(ΩF)−1] =E[v2](1−A2)−2· 1 2 1 2 s t1,t2 t  − 0 A other if wise. |t 1 −t 2 |=1, Then, because of Lemma D.3 and (D.33), we have that, as n,s → ∞, s(Ω(cid:98) F s (0))−1 is positive definite with probability Page 41

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm tending to one, i.e., s∥(Ω(cid:98) F s (0))−1∥=O p (1). (D.34) Let now (cid:110) (cid:111)−1 K(cid:99)n (0 s ) =Ω(cid:98) F s (0)(I s ⊗Λ(cid:98) ( n 0)′) (I s ⊗Λ(cid:98) ( n 0))Ω(cid:98) F s (0)(I s ⊗Λ(cid:98) ( n 0)′)+(I s ⊗Σ(cid:98) ξ n (0)) (cid:110) (cid:111)−1(cid:110) (cid:111) = (I s ⊗Λ(cid:98) ( n 0)′)(I s ⊗Σ(cid:98) ξ n (0))−1(I s ⊗Λ(cid:98) ( n 0))+(Ω(cid:98) F s (0))−1 (I s ⊗Λ(cid:98) ( n 0)′)(I s ⊗Σ(cid:98) ξ n (0))−1 (cid:110) (cid:111)−1(cid:110) (cid:111) = I s ⊗Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)+(Ω(cid:98) F s (0))−1 I s ⊗Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1 , (D.35) whichisrs×nsandwhereinthesecondlineweusedLemmaD.13. Noticethatalltheinversesin(D.35)arewelldefined by (D.34), Lemma D.5(iii), and (D.11) in the proof of Lemma D.4. Then, by definition of linear projection, we have F( t| 0 s ) =Proj φ(cid:98) ( n 0) [F t |X ns ]=(ι′ t,s ⊗I r ){K(cid:99)n (0 s )}X ns (cid:110) (cid:111)−1(cid:110) (cid:111) =(ι′ t,s ⊗I r ) I s ⊗Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0) I s ⊗Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1 X ns (D.36) (cid:104) (cid:110) (cid:111)(cid:105) +(ι′ t,s ⊗I r ) {K(cid:99)n (0 s )}−1− I s ⊗(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1(I s ⊗Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1 X ns , where ι′ is the tth row of I and X =(x′ ···x′ )′ is an ns-dimensional vector. t,s s ns n1 ns Then, by (C.6) in the proof of Lemma C.5, we have (cid:13) (cid:110) (cid:111)(cid:13) n3/2s(cid:13) (cid:13) {K(cid:99)n (0 s )}−1− I s ⊗(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1) (cid:13) (cid:13) (cid:13)(cid:110) (cid:111) (cid:110) (cid:111)(cid:13) ≤n2s(cid:13) (cid:13) I s ⊗(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1+(Ω(cid:98) F s (0))−1 − I s ⊗(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1 (cid:13) (cid:13) (cid:13) (cid:13) ·n−1/2(cid:13) (cid:13) I s ⊗(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1)(cid:13) (cid:13) (cid:13)(cid:110) (cid:111) (cid:110) (cid:111)(cid:13) =n2s(cid:13) (cid:13) I s ⊗(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1+(Ω(cid:98) F s (0))−1 (Ω(cid:98) F s (0))−1 I s ⊗(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1 (cid:13) (cid:13) (cid:13) (cid:13) ·n−1/2(cid:13) (cid:13) I s ⊗(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1)(cid:13) (cid:13) =O p (1). (D.37) Indeed, for the first term on the rhs of (D.37), by Lemmas C.3(i), C.3(v), and D.5(i), and by (D.34), we have (cid:13)(cid:110) (cid:111) (cid:110) (cid:111)(cid:13) n2s(cid:13) (cid:13) I s ⊗(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1+(Ω(cid:98) F s (0))−1 (Ω(cid:98) F s (0))−1 I s ⊗(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1 (cid:13) (cid:13) (cid:13)(cid:110) (cid:111)(cid:13) ≤n(cid:13) (cid:13) I s ⊗(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1+(Ω(cid:98) F s (0))−1 (cid:13) (cid:13) s∥(Ω(cid:98) F s (0))−1∥n∥I s ⊗(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1∥ =O (1). (D.38) p Alternatively to bound the first term on the rhs of (D.38) we can use directly Lemma D.5(iii) and D.34. And, for the second term on the rhs of (D.37), by Lemmas C.3(vii) and D.5(ii), we have n−1/2∥I s ⊗(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1)∥=O p (1). (D.39) (cid:104) (cid:110) (cid:111)(cid:105) Now,letΠ(cid:98) ( s 0) = {K(cid:99)n (0 s )}−1− I s ⊗(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1(I s ⊗Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1 ,andsimilarlydefineΠ s whenusing the true parameters. By Lemmas D.1(ii), D.3, D.5(ii), and D.5(iv), and (D.37), we have n3/2s∥Π(cid:98) ( s 0)−Π s ∥=O p (max(n−1,T−1/2)). (D.40) Page 42

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Moreover, Π is rs×ns such that s  (cid:80)s π x  (ι′ t,s ⊗I r )Π s X ns =(ι′ t,s ⊗I r )   τ=1 . . . 1τ nτ  = (cid:32) (cid:88) s π 1τ x nτ ··· (cid:88) s π sτ x nτ (cid:33) ι t,s   (cid:80)s π x τ=1 τ=1 τ=1 sτ nτ s (cid:88) = π x , (D.41) tτ nτ τ=1 withπ ,t,τ =1,...,nbeingr×nsub-blockofΠ andsincevec(ABC)=(C′⊗A)vec(B). Furthermore,sinceclearly tτ s from (D.33) √ √ max s∥(ι′ ⊗I )[(Ω )−1]∥=O(1), max s∥[(Ω )−1](ι ⊗I )∥=O(1), (D.42) t,s r s s τ,s r t=1,...,s τ=1,...,s using the same reasoning as in (D.38) and (D.39), but when considering the true parameters, we also have that max ∥π ∥=O(n−3/2s−1/2). Thus, from (D.41) we have t,τ=1,...,s tτ (cid:13) (cid:13) ∥(ι′ ⊗I )Π X ∥≤ max ∥π ∥ (cid:13) (cid:13) (cid:88) s x (cid:13) (cid:13)=O (n−1), (D.43) t,s r s ns tτ (cid:13) nτ(cid:13) p t,τ=1,...,s (cid:13) (cid:13) τ=1 since, by Assumption 1(d) and Lemma C.1(iii) E (cid:34)(cid:13) (cid:13) (cid:13) (cid:88) s x (cid:13) (cid:13) (cid:13) 2(cid:35) = (cid:88) n (cid:88) s (cid:88) s E[x x ]= (cid:88) n (cid:88) s (cid:88) s λ′E[F F ]λ +E[ξ ξ ] (cid:13) nτ(cid:13) iτ1 iτ2 i τ1 τ2 i iτ1 iτ2 (cid:13) (cid:13) τ=1 i=1τ1=1τ2=1 i=1τ1=1τ2=1 n s s n s s ≤ (cid:88) (cid:88) (cid:88) E[x x ]= (cid:88) (cid:88) (cid:88) |λ′A|τ1−τ2|λ |+|E[ξ ξ ]| iτ1 iτ2 i i iτ1 iτ2 i=1τ1=1τ2=1 i=1τ1=1τ2=1 (cid:40) s s s s (cid:41) ≤nM2 (cid:88) (cid:88) ∥A|τ1−τ2|∥+ (cid:88) (cid:88) |E[ξ ξ ]| λ iτ1 iτ2 τ1=1τ2=1 τ1=1τ2=1 s−1 ≤nsM2 (cid:88) (cid:0) 1−s−1|k| (cid:1) Mk +nsM2M λ A λ 3 k=−(s−1) s−1 ≤nsM22 (cid:88) Mk +nsM2M λ A λ 3 k=0 ≤nsM22(1−M )−1+nsM2M . λ A λ 3 Thus, from (D.36), (D.37), (D.40), and (D.43), and since vec(ABC)=(C′⊗A)vec(B) and ∥ι′ ⊗I ∥=1, t,s r ∥F( t| 0 s )∥≤∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1x t ∥ + (cid:13) (cid:13)(ι′ t,s ⊗I r )Π s X ns (cid:13) (cid:13)+∥ι′ t,s ⊗I r ∥∥Π(cid:98) ( s 0)−Π s ∥∥X ns ∥ ≤n∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1∥n−1/2∥Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1∥n−1/2∥x t ∥+O p (n−1)+o p (n−1s−1/2) =O (1), (D.44) p √ where we used also Lemmas C.3(v), C.3(vii), C.10, D.5(i), and D.5(ii), and the facts that ∥X ∥=O( ns) by the same ns reasoning as in Lemma C.10, and ∥ι′ ⊗I ∥=1. This completes the proof. □ t,s r Lemma D.15. Under Assumptions 1, 2, 3, and 6, as n,T →∞, n∥F(0) −F(0)∥=O (1), uniformly in t. t|T t|t p Proof. From (A.6) and (A.1) ∥F(0) −F(0)∥≤∥P(0)∥∥A(cid:98) (0)∥∥(P(0) )−1∥{∥F(0) ∥+∥F(0) ∥} t|T t|t t|t t+1|t t+1|T t+1|t ≤∥P(0)∥∥A(cid:98) (0)∥∥(P(0) )−1∥{∥F(0) ∥+∥A(cid:98) (0)∥∥F(0)∥} t|t t+1|t t+1|T t|t =O (n−1), (D.45) p Page 43

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm byLemmasD.8(ii),D.11,andD.14(whens=T ands=t),andsince∥A(cid:98)(0)∥≤∥A∥+∥A(cid:98)(0)−A∥=O p (1),byAssumption 1(d) and Lemma D.3(i). This completes the proof. Lemma D.16. Under Assumptions 1, 2, 3, and 6, as n,T → ∞, n∥F(0) −FWLS(0)∥ = O (1), uniformly in t, where t|t t p FW t LS(0) =(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1x nt . Proof. From (A.3) and (A.1), by Lemma D.13 F( t| 0 t ) =F( t| 0 t ) −1 +P( t| 0 t ) −1 Λ(cid:98) ( n 0)′(Λ(cid:98) ( n 0)P( t| 0 t ) −1 Λ(cid:98) ( n 0)′+Σ(cid:98) ξ n (0))−1(x nt −Λ(cid:98) ( n 0)F( t| 0 t ) −1 ) =(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)+(P( t| 0 t ) −1 )−1)−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1x nt (cid:110) (cid:111) + I r −(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)+(P(cid:98) ( t| 0 t ) −1 )−1)−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0) A(cid:98) (0)F( t− 0) 1|t−1 =(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1x nt (cid:110) (cid:111) + (Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)+(P( t| 0 t ) −1 )−1)−1−(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1 Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1x nt (cid:110) (cid:111) + I r −(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)+(P(cid:98) ( t| 0 t ) −1 )−1)−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0) A(cid:98) (0)F( t− 0) 1|t−1 . (D.46) Notice that the inverses in (D.46) are all well defined by Lemmas D.5(iii) and D.8(ii) and (D.11) in the proof of Lemma D.4. Now, by Lemma C.5 ∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)+(P( t| 0 t ) −1 )−1)−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)−I r ∥=O p (n−1). (D.47) Furthermore, by Lemmas C.6(iii) and D.5(iv) ∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)+(P( t| 0 t ) −1 )−1)−1−(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1∥=O(n−2), (D.48) and by Lemmas C.3(vii) and D.5(ii), √ ∥Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1∥=O p ( n). (D.49) Indeed, we can apply Lemmas C.5 and C.6(iii), since ∥(P( t| 0 t ) −1 )−1∥ = O p (1) by Lemma D.8(ii), ∥(Σ(cid:98) ξ n (0))−1∥ = O p (1) by (D.11) in the proof of Lemma D.4, and, by Lemmas C.2 and D.1(ii) we have n−1∥Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0)−Λ′ n Λ n ∥≤2n−1∥Λ′ n (Λ(cid:98) ( n 0)−Λ n )∥+n−1∥(Λ(cid:98) ( n 0)−Λ n )′(Λ(cid:98) ( n 0)−Λ n )∥ ≤2n−1/2∥Λ n ∥n−1/2∥Λ(cid:98) ( n 0)−Λ n ∥+n−1∥Λ(cid:98) ( n 0)−Λ n ∥2 =O (max(n−1,T−1/2)). p which, by Weyl’s inequality (Merikoski and Kumar, 2004, Theorem 1), implies n−1|ν(j)(Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))−ν(j)(Λ′ n Λ n )|≤n−1∥Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0)−Λ′ n Λ n ∥=O p (max(n−1,T−1/2)), and, therefore, for j =1,...,r,, C j ≤p-liminf n−1ν(j)(Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))≤p-limsup n−1ν(j)(Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))≤C j . n,T→∞ n,T→∞ By using (D.47), (D.48), (D.49) into (D.46): ∥F( t| 0 t )−(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1x nt ∥ ≤n∥(Λ(cid:98)n (0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)+(P( t| 0 t ) −1 )−1)−1−(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1∥n−1/2∥Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1∥n−1/2∥x nt ∥ +∥I r −(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)+(P(cid:98) ( t| 0 t ) −1 )−1)−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0)∥∥A(cid:98) (0)∥∥F( t− 0) 1|t−1 ∥ =O (n−1), (D.50) p by Lemmas C.10 and D.14 (when s = t−1), and since ∥A(cid:98)(0)∥ ≤ ∥A∥+∥A(cid:98)(0)−A∥ = O p (1), by Assumption 1(d) and Page 44

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Lemma D.3(i). This completes the proof. □ √ √ Lemma D.17. Under Assumptions 1, 2, 3, and 6, as n,T →∞, min( n, T)∥F(0) −F ∥=O (1), uniformly in t. t|T t p Proof. From Lemmas D.15 and D.16 ∥F( t| 0 T ) −F t ∥≤∥F( t| 0 T ) −F( t| 0 t )∥+∥F( t| 0 t )−F(cid:98)W t LS(0)∥+∥F(cid:98)W t LS(0)−F t ∥ =∥F(cid:98)W t LS(0)−F t ∥+O p (n−1). (D.51) where FW t LS(0) =(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1x nt . Now, ∥F(cid:98)W t LS(0)−F t ∥≤∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1(Λ n −Λ(cid:98) ( n 0))∥∥F t ∥ +∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1ξ nt ∥ ≤∥(Λ′ n (Σξ n )−1Λ n )−1Λ′ n (Σξ n )−1(Λ n −Λ(cid:98) ( n 0))∥∥F t ∥ +∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1−(Λ′ n (Σξ n )−1Λ n )−1Λ′ n (Σξ n )−1∥ ·∥Λ n −Λ(cid:98) ( n 0)∥∥F t ∥+∥(Λ′ n (Σξ n )−1Λ n )−1Λ′ n (Σξ n )−1ξ nt ∥ +∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1ξ nt −(Λ′ n (Σξ n )−1Λ n )−1Λ′ n (Σξ n )−1ξ nt ∥ =A+B+C+D, say. (D.52) Let us consider each term in (D.52). First, consider term A and let ΛOLS = ( (cid:80)T x F′)( (cid:80)T F F′)−1. Then, from n t=1 nt t t=1 t t Barigozzi (2023, Corollary 1) n−1/2∥Λ(cid:98) ( n 0)−ΛO n LS∥=O p (max(n−1,n−1/2T−1/2)). (D.53) Therefore, from (D.53) A≤∥(Λ′(Σξ)−1Λ )−1Λ′(Σξ)−1(Λ −ΛOLS)∥∥F ∥ n n n n n n n t +n∥(Λ′ n (Σξ n )−1Λ n )−1∥n−1/2∥Λ n ∥∥(Σξ n )−1∥n−1/2∥Λ(cid:98) ( n 0)−ΛO n LS∥∥F t ∥ ={A.1+A.2}∥F ∥, say. (D.54) t Then, A.1≤n∥(Λ′(Σξ)−1Λ )−1∥n−1∥Λ′(Σξ)−1(Λ −ΛOLS)∥ n n n n n n n =n∥(Λ′ n (Σξ n )−1Λ n )−1∥n−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) t= T 1 Λ′ n (Σξ n )−1ξ nt F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32) T−1(cid:88) t= T 1 F t F′ t (cid:33)−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) =O (n−1/2T−1/2), (D.55) p by Lemmas C.3(iii) and C.8(iv), and also, recalling that ΓF = I by Assumption 6(b), by Lemma C.12(i) and Weyl’s r inequality (Merikoski and Kumar, 2004, Theorem 1) we have |ν(r)(T−1(cid:80)T F F′)−1| = O (T−1/2) which implies t=1 t t p ∥(T−1(cid:80)T F F′)−1∥=O (1). t=1 t t p Moreover,A.2=O (max(n−1,n−1/2T−1/2)),becauseof (D.53)andLemmasC.2,C.3(iii),andAssumption2(a)which p implies ∥(Σξ)−1∥≤C . This, jointly with (D.54) and (D.55) implies that n ξ A=O (max(n−1,n−1/2T−1/2)), (D.56) p since ∥F ∥=O (1) because E[F2]=1, j =1,...,r, by Assumption 6(b). t p jt Page 45

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Second, by Lemmas D.1(ii) and D.5(v), B =∥n(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1n−1/2Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1−n(Λ′ n (Σξ n )−1Λ n )−1n−1/2Λ′ n (Σξ n )−1∥ ·n−1/2∥Λ n −Λ(cid:98) ( n 0)∥∥F t ∥ =O (max(n−2,T−1)), (D.57) p and since ∥F ∥=O (1) because E[F2]=1, j =1,...,r, by Assumption 6(b). Third, t p jt C ≤n∥(Λ′(Σξ)−1Λ )−1∥n−1∥Λ′(Σξ)−1ξ ∥=O (n−1/2), (D.58) n n n n n nt p by Lemmas C.3(iii) and C.7(i). Fourth, and last, D≤n∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1−(Λ′ n (Σξ n )−1Λ n )−1∥n−1∥Λ′ n (Σξ n )−1ξ nt ∥ +n∥(Λ′ n (Σξ n )−1Λ n )−1∥n−1/2∥Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1−Λ′ n (Σξ n )−1∥n−1/2∥ξ nt ∥ +n∥(Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1Λ(cid:98) ( n 0))−1−(Λ′ n (Σξ n )−1Λ n )−1∥n−1/2∥Λ(cid:98) ( n 0)′(Σ(cid:98) ξ n (0))−1−Λ′ n (Σξ n )−1∥n−1/2∥ξ nt ∥ =D.1+D.2+D.3, say. Then, D.1 = O (max(n−3/2,n−1/2T−1/2)), by Lemmas C.7(i) and D.5(iv), while we have D.2 = O (max(n−1,T−1/2)), p p √ because of Lemmas C.3(iii) and D.5(ii) and since ∥ξ ∥ = O ( n) because E[ξ2] = σ2 ≤ C by Assumption 2(a). Last nt p it i ξ √ D.3=O (n−2,T−1) by Lemmas D.5(ii) and D.5(iv) and since ∥ξ ∥=O ( n). Therefore, p nt p D=O (max(n−1,T−1/2)). (D.59) p By substituting (D.56), (D.57), (D.58), and (D.59), into (D.52) and then into (D.51), we complete the proof. □ Lemma D.18. Under Assumptions 1, 2, 3, and 6, as n,T →∞, for s=t and s=T: (i) ∥T−1(cid:80)T F(0)F′∥=O (1); t=1 t|s t p (i) ∥T−1(cid:80)T F(0)F′λ ∥=O (1), uniformly in i; t=1 t|s t i p (ii) ∥T−1(cid:80)T F(0)ξ ∥=O (1), uniformly in i. t=1 t|s it p Proof. First notice that, for all k=t−T,...,t−1, (cid:13) (cid:13) (cid:13) (cid:13)n−1/2T−1(cid:88) T x F′ (cid:13) (cid:13)=O (1), (D.60) (cid:13) n,t−k t(cid:13) p (cid:13) (cid:13) t=1 byLemmaC.12. Theproofofpart(i)followsbyiteratingeitherforwardorbackwardssinceboth∥T−1(cid:80)T F(0)F′∥and t=1 t|t t ∥T−1(cid:80)T F(0)F′∥ are functions of (D.60), because of Lemmas D.15 and D.16. t=1 t|T t Part (ii) follows from part (i) and Assumption 1(a). Part (iii) follows by substituting F with ξ in (E.93) and then t it by applying Lemma C.12(ii). This completes the proof. □ Lemma D.19. Let ϕOLS = (vec(ΛOLS)′ σ2OLS···σ2OLS)′ be the vector of OLS estimators of the entries of ϕ obtained n n 1 (cid:98)n n when F is known, that is whose entries maximize ℓ(X |F ;ϕ ). Let also θOLS = (vec(AOLS)′ vech(ΓvOLS))′ be the t nT T n vector of OLS estimators of the entries of θ obtained when F is known, that is whose entries maximize ℓ(F ;θ). Then, t T under Assumptions 1, 2, 3, and 6, as n,T →∞, √ (i) T∥λOLS−λ ∥=O (1), uniformly in i; √ i i p (ii) Tn−1/2∥ΛOLS−Λ ∥=O (1); √ n n p (iii) T|σ2OLS−σ2|=O (1); √ i i p (iv) √ T∥A(cid:98)OLS−A∥=O p (1); (v) T∥Γ(cid:98)vOLS−Γv∥=O p (1). Proof. For part (i), ∥λO i LS−λ i ∥≤ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32) T−1(cid:88) t= T 1 F t F′ t (cid:33)−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) t= T 1 F t x it (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) =O p (T−1/2), (D.61) Page 46

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm where for the numerator we used Lemma C.12(ii) while for the denominator we used Lemma C.13. Part (ii) is proved in the same way but using Lemma C.12(iii) instead of Lemma C.12(ii). For part (iii), first notice that T T σ2OLS =T−1(cid:88) ξOLS2 =T−1(cid:88) (x −λOLS′F )2 i it it i t t=1 t=1 T =T−1(cid:88) {x2 +λ′F F′λ −2x λ′F } it i t t i it i t t=1 T +T−1(cid:88) {2λOLS′F F′(λOLS−λ )+(λOLS−λ )′ F F′(λOLS−λ )−2x (λOLS−λ )′F } i t t i i i i t t i i it i i t t=1 T =T−1(cid:88) (x −λ′F )2 it i t t=1 T +T−1(cid:88) {2λOLS′F F′(λOLS−λ )+(λOLS−λ )′ F F′(λOLS−λ ) i t t i i i i t t i i t=1 −2(λOLS−λ )′F F′λOLS−2(λOLS−λ )′F ξOLS} i i t t i i i t it T T =T−1(cid:88) ξ2 +T−1(cid:88) (λOLS−λ )′ F F′(λOLS−λ ), (D.62) it i i t t i i t=1 t=1 since by construction (cid:80)T F ξOLS =0 . Then, t=1 t it r (cid:12) (cid:12) (cid:12) (cid:12) |σ2OLS−σ2|≤ (cid:12) (cid:12)T−1(cid:88) T (x −λOLS′F )2−T−1(cid:88) T (x −λ′F )2 (cid:12) (cid:12)+ (cid:12) (cid:12)T−1(cid:88) T (x −λ′F )2−E[ξ2] (cid:12) (cid:12) i i (cid:12) it i t it i t (cid:12) (cid:12) it i t it (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) t=1 t=1 t=1 (cid:13) (cid:13) (cid:12) (cid:12) ≤∥λOLS−λ ∥2 (cid:13) (cid:13)T−1(cid:88) T F F′ (cid:13) (cid:13)+ (cid:12) (cid:12)T−1(cid:88) T ξ2 −E[ξ2] (cid:12) (cid:12) i i (cid:13) t t(cid:13) (cid:12) it it (cid:12) (cid:13) (cid:13) (cid:12) (cid:12) t=1 t=1 =O (T−1/2), p because of part (i) and Lemma C.12(iv), and also by Lemma C.12(i) combined with Assumption 6(b). For part (iv) consistency follows from the fact that {v } is an independent process by Assumption 1(f) and thus it t is a martingale difference process (Hamilton, 1994, Proposition 11, pp. 298-299). Part (v) follows from part (iv) and (Hamilton, 1994, Proposition 11.2, pp. 301), since by Assumption 1(g), the fourth order cumulants of {v } are all finite. t This completes the proof. □ E Lemmas necessary for proving Proposition 2 Lemma E.1. Under Assumptions 1, 2, 3, and 5 (i) For all t ∈ Z, all j = 1,...,r, and all s > 0, P(|F | ≥ s) ≤ exp{−K sδv}, for some finite positive real K jt F F independent of t and j; (ii) for all i∈N and all T ∈N, P (cid:32)(cid:13) (cid:13) (cid:13)T−1/2(cid:88) T F ξ (cid:13) (cid:13) (cid:13)≥s (cid:33) ≤rexp{−κ s2}+rTexp{−κ (s √ T)β}, (cid:13) t it(cid:13) 3 4 (cid:13) (cid:13) t=1 forsomefinitepositiverealsκ ,κ ,andβ independentofiandT andsuchthat 1 = 1+1 >1andγ =min(γ ,γ ) 3 4 β γ δ F ξ and δ∈(0, δvδξ ). δv+δξ Proof. For all j =1,...,r, (cid:12) (cid:12) (cid:12) (cid:12) |F |≤ (cid:88) r (cid:12) (cid:12) (cid:88) ∞ [Ak] v (cid:12) (cid:12)≤r max (cid:12) (cid:12) (cid:88) ∞ [Ak] v (cid:12) (cid:12). jt (cid:12) jℓ ℓ,t−k(cid:12) (cid:12) jℓ ℓ,t−k(cid:12) (cid:12) (cid:12) ℓ=1,...,r(cid:12) (cid:12) ℓ=1 k=0 k=0 Now, since because of Assumption 1(d) the coefficients [Ak] are summable over k, for any ϵ>0 and η >0 there exists jℓ Page 47

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm a positive integer K¯ =K¯(ϵ,η) independent of j, ℓ, and t such that (cid:12) (cid:12)  (cid:12) ∞ (cid:12) P (cid:12) (cid:12) (cid:88) [Ak] jℓ v ℓ,t−k (cid:12) (cid:12)≥η≤ϵ, (cid:12) (cid:12) (cid:12)k=K¯+1 (cid:12) thus we can always find K¯ such that we can write (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) K¯ (cid:12) (cid:12) ∞ (cid:12) |F jt |≤r max (cid:12) (cid:12) (cid:88) [Ak] jℓ v ℓ,t−k (cid:12) (cid:12)+ (cid:12) (cid:12) (cid:88) [Ak] jℓ v ℓ,t−k (cid:12) (cid:12) ℓ=1,...,r(cid:12) (cid:12) (cid:12) (cid:12) (cid:12)k=0 (cid:12) (cid:12)k=K¯+1 (cid:12) (cid:12) (cid:12) (cid:12) K¯ (cid:12) ≤rK¯ max max |[Ak] jℓ | (cid:12) (cid:12) (cid:88) v ℓ,t−k (cid:12) (cid:12)+o p (1) ℓ=1,...,rk=0,...,K¯ (cid:12) (cid:12) (cid:12)k=0 (cid:12) (cid:12) (cid:12) (cid:12) K¯ (cid:12) ≤rK¯C A (cid:12) (cid:12) (cid:88) v ℓ,t−k (cid:12) (cid:12)+o p (1), (E.1) (cid:12) (cid:12) (cid:12)k=0 (cid:12) for some positive real C independent of j, where we used again Assumption 1(d) to bound the coefficients. A Then, from Assumption 5(a), (E.1), and Bakhshizadeh et al. (2023, Corollary 4 and Section III), since {v } is an t independent process, and by Assumption 1 and the union bound, for all j =1,...r and all s>0 it holds that:  (cid:12) (cid:12)  (cid:12) K¯ (cid:12) P(|F jt |≥s)≤PrK¯C A (cid:12) (cid:12) (cid:88) v ℓ,t−k (cid:12) (cid:12)≥s+o p (1) (cid:12) (cid:12) (cid:12)k=0 (cid:12) (cid:40) (cid:18) s (cid:19)2 (cid:41) (cid:40) (cid:18) s (cid:19)δv (cid:41) ≤exp −C′ +K¯exp −C′′ rK¯C rK¯C A A (cid:110) (cid:111) (cid:110) (cid:111) ≤K¯exp −C′′′sδv ≤exp −K sδv , (E.2) F for somefinitepositive reals C′, C′′, C′′′, and K independent of t and j and where fromthe second line we omitted the F second term which is negligible. This proves part (i). For all i = 1,...,n and all n ∈ N, consider Assumption 5(b) when n = 1, λ = 1, and σ2 = 1, then, because of i i Assumption 3 and (E.2), by Fan et al. (2011, Lemma A2), for all j =1,...r and all s>0 it holds that: (cid:110) (cid:111) P(|F ξ |≥s)≤exp −κ sδ , (E.3) jt it Fξ (cid:16) (cid:17) for some finite positive reals κ and δ ∈ 0, δvδξ independent of i, j, and t. Moreover, because of Assumptions 1(d), Fξ δv+δξ 1(e), 1(f), and 1(h), by Pham and Tran (1985, Theorem 3.1), {F } is a strong mixing process with mixing coefficients t α (T)≤exp{−c TγF}, (E.4) F F for all T ∈N and some finite positive reals c and γ independent of T . Then, because of (E.4) and Assumption 2(c), F F by Bradley (2005, Theorem 5.1.a), we have that, for all i∈N and all j =1,...,r, {F ξ } is strong mixing with mixing jt it coefficients: α (T)≤α (T)+α (T)≤exp{−c TγF}+exp{−c Tγξ}≤2exp{−c Tγ}, (E.5) Fξ F ξ F ξ Fξ for all T ∈N and some finite positive reals c and γ =min(γ ,γ ) independent of T. Fξ F ξ Now, since for all i∈N and all j =1,...,r, {F ξ } satisfies (E.3) and (E.5), we can apply the results by Merlevède jt it et al. (2011, Theorem 1) or equivalently by Bosq (2012, Theorem 1.4, p.31), which imply that, for all s>0, P (cid:32)(cid:12) (cid:12) (cid:12)T−1/2(cid:88) T F ξ (cid:12) (cid:12) (cid:12)≥s (cid:33) ≤exp (cid:26) −c s2 (cid:27) +Texp (cid:26) −c (cid:16) s √ T (cid:17)β (cid:27) (E.6) (cid:12) jt it(cid:12) 1 2 (cid:12) (cid:12) t=1 for some finite positive reals c , c , and β independent of i, j, and T, where 1 = 1 + 1 >1. Finally, note that 1 2 β γ δ (cid:13) (cid:13) (cid:12) (cid:12) (cid:12) (cid:12) (cid:13) (cid:13)T−1/2(cid:88) T F ξ (cid:13) (cid:13)≤ (cid:88) r (cid:12) (cid:12)T−1/2(cid:88) T F ξ (cid:12) (cid:12)≤r max (cid:12) (cid:12)T−1/2(cid:88) T F ξ (cid:12) (cid:12), (E.7) (cid:13) t it(cid:13) (cid:12) jt it(cid:12) (cid:12) jt it(cid:12) (cid:13) (cid:13) (cid:12) (cid:12) j=1,...,r(cid:12) (cid:12) t=1 j=1 t=1 t=1 Page 48

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm and from (E.6) and (E.7), P (cid:32)(cid:13) (cid:13) (cid:13)T−1/2(cid:88) T F ξ (cid:13) (cid:13) (cid:13)≥s (cid:33) ≤P (cid:32) r max (cid:12) (cid:12) (cid:12)T−1/2(cid:88) T F ξ (cid:12) (cid:12) (cid:12)≥s (cid:33) (cid:13) t it(cid:13) (cid:12) jt it(cid:12) (cid:13) (cid:13) j=1,...,r(cid:12) (cid:12) t=1 t=1 √ (cid:26) (cid:18) s (cid:19)2(cid:27) (cid:26) (cid:18) s T (cid:19)β(cid:27) ≤rexp −c +rTexp −c , 1 r 2 r by setting κ = c1 and κ = c2, we prove part (ii) and complete the proof. □ 3 r2 4 rβ Lemma E.2. Under Assumptions 1 and 5, as T →∞, log−1/δvT max ∥F ∥=O (1). t p t=1,...,T Proof. We have, r max ∥F ∥≤ max (cid:88) |F |≤ max max |F |=O (log1/δvT), t jt jt p t=1,...,T t=1,...,T t=1,...,Tj=1,...,r j=1 by Lemma E.1(i) and the union bound. This completes the proof. □ Lemma E.3. Under Assumptions 1, 2, 3, and 5, as n,T →∞, log−1/δvT max n−1/2∥x ∥=O (1). nt p t=1,...,T Proof. By Assumption 5(b), for all s>0, setting λ =ι and σ2 =1 therein, i r i P (cid:16) n−1/2∥ξ ∥≥s (cid:17) ≤exp (cid:8) −K s2(cid:9) +nexp (cid:110) −K (s √ n)δξ (cid:111) . (E.8) nt ξ ξ Then, by (E.8) and the union bound, for all s>0, it holds that: (cid:18) (cid:19) (cid:16) (cid:17) P max n−1/2∥ξ ∥≥s ≤TP n−1/2∥ξ ∥≥s nt nt t=1,...,T ≤Texp (cid:8) −K s2(cid:9) +nTexp (cid:110) −K (s √ n)δξ (cid:111) . (E.9) ξ ξ Thus, from Lemmas C.2 and E.2, and (E.9) max n−1/2∥x ∥≤n−1/2∥Λ ∥ max ∥F ∥+ max n−1/2∥ξ ∥ nt n t nt t=1,...,T t=1,...,T t=1,...,T =O (log1/δvT)+O ( (cid:112) logT)+O (n−δξ/2max(log1/δξn,log1/δξT)) p p p =O (log1/δvT). p This completes the proof. □ Lemma E.4. Under Assumptions 1 and 2, for all T ∈N, as n→∞, (i) max n∥P ∥=O(1); t=1,...,T t|t (ii) max n∥P ∥=O(1); t=1,...,T t|T (iii) max n∥P ∥=O(1); t=1,...,T 0,t|t (iv) max n∥P ∥=O(1); t=1,...,T 0,t|T (v) max n2∥P −P ∥=O(1); t=1,...,T t|T t|t (vi) max n2∥P −P ∥=O(1). t=1,...,T 0,t|T 0,t|t Proof. From (A.4), using Lemma D.13 we have P =P −P Λ′(Λ P Λ′ +Σξ)−1Λ P t|t t|t−1 t|t−1 n n t|t−1 n n n t|t−1 =P −(Λ′(Σξ)−1Λ +P−1 )−1Λ′(Σξ)−1Λ P , (E.10) t|t−1 n n n t|t−1 n n n t|t−1 indeedP ispositivedefiniteandfinitebyLemmaD.7(i)andD.7(ii). Therefore,sinceP andP aredeterministic t|t−1 t|t−1 t|t Page 49

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm by Lemma D.6(i), we have max n∥P ∥≤n∥P −{(Λ′(Σξ)−1Λ +P−1 )−1Λ′(Σξ)−1Λ −I +I }P ∥ t|t t|t−1 n n n t|t−1 n n n r r t|t−1 t=1,...,T ≤n∥(Λ′(Σξ)−1Λ +P−1 )−1Λ′(Σξ)−1Λ −I ∥∥P ∥=O(1), (E.11) n n n t|t−1 n n n r t|t−1 because of Lemma C.6(i), which is independent of t, and Lemma D.7(i). This proves part (i). For part (ii), from (A.7), we have ∥P −P ∥=∥P A′P−1 (P −P )P−1 AP ∥ t|T t|t t|t t+1|t t+1|T t+1|t t+1|t t|t ≤∥P ∥2∥A∥2∥P−1 ∥2 (cid:8) ∥P ∥+∥P ∥ (cid:9) . (E.12) t|t t+1|t t+1|T t+1|t Now, given that P is obtained by the last iteration of the Kalman filter, for t=T −1 (E.12) becomes: T|T ∥P −P ∥≤∥P ∥2∥A∥2∥P−1 ∥2 (cid:8) ∥P ∥+∥P ∥ (cid:9) T−1|T T−1|T−1 T−1|T−1 T|T−1 T|T T|T−1 ≤∥P ∥2 M A 2 (cid:8) ∥P ∥+M (cid:9) =O(n−2), (E.13) T−1|T−1 M2 T|T P P by part (i), Assumption 1(d), and Lemma D.7(i) and D.7(ii). Therefore, from part (i) and (E.13) ∥P ∥≤∥P −P ∥+∥P ∥=O(n−2)+O(n−1). (E.14) T−1|T T−1|T T−1|T−1 T−1|T−1 For t=T −2, (E.12) becomes: ∥P −P ∥=∥P A′P−1 (P −P )P−1 AP ∥ T−2|T T−2|T−2 T−2|T−2 T−1|T−2 T−1|T T−1|T−2 T−1|T−2 T−2|T−2 ≤∥P ∥2∥A∥2∥P−1 ∥2 (cid:8) ∥P ∥+∥P ∥ (cid:9) T−2|T−2 T−1|T−2 T−1|T T−1|T−2 ≤∥P ∥2 M A 2 (cid:8) ∥P ∥+M (cid:9) =O(n−2), (E.15) T−2|T−2 M2 T−1|T P P by part (i), Assumption 1(d), Lemma D.7, and (E.14). Therefore, from part (i) and (E.15) ∥P ∥≤∥P −P ∥+∥P ∥=O(n−2)+O(n−1). (E.16) T−2|T T−2|T T−2|T−2 T−2|T−2 By comparing (E.14) and (E.16) it is clear that, the same asymptotic bound holds for all t=T,...,1 ∥P −P ∥≤∥P ∥2 M A 2 (cid:8) ∥P ∥+M (cid:9) =O(n−2), (E.17) t|T t|t t|t M2 t+1|T P P and since ∥P ∥=O(n−1) then it is asymptotically negligible, thus (E.17) holds uniformly in t=T,...,1 because of t+1|T part (i), and since P and P are deterministic because of Lemma D.6(i). It follows that, by part (i) t|T t|t max n∥P ∥≤ max n∥P −P ∥+ max n∥P ∥=O(n−1)+O(1)=O(1). t|T t|T t|t t|t t=1,...,T t=1,...,T t=1,...,T This proves part (ii). Parts(iii)and(iv)areprovedexactlyasparts(i)and(ii),respectively,butusingLemmasC.6(ii),D.7(iii),andD.7(iv) instead of Lemmas C.6(i), D.7(i), and D.7(ii). Part (v) is proved by (E.17). Part (vi) is proved as part (v) by repeating the same reasoning leading to the proof of part (iv). This completes the proof. □ Lemma E.5. Under Assumptions 1, 2, and 5, as n,T →∞, (i) ∥F ∥=O (1), uniformly in t; t|t−1 p (ii) ∥F ∥=O (1), uniformly in t; 0,t|t−1 p (iii) log−1/δvT max ∥F ∥=O (1). t=1,...,T 0,t|t−1 p Proof. Given that F =0 , from (A.1) it follows that F =0 , then, from (A.3) 0|0 r 1|0 r F =P Λ′(Λ P Λ′ +Σξ)−1x =(Λ′(Σξ)−1Λ +P−1)−1Λ′(Σξ)−1x 1|1 1|0 n n 1|0 n n n1 n n n 1|0 n n n1 Page 50

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm by Lemma D.13 which can be applied since P is positive definite by Lemma D.7(ii). Thus, 1|0 ∥F ∥≤∥(Λ′(Σξ)−1Λ +P−1)−1∥∥Λ ∥∥(Σξ)−1∥∥x ∥=O (1), (E.18) 1|1 n n n 1|0 n n n1 p by Lemmas C.3(i), C.2, and C.10, and Assumption 2(a). Then, from (A.1), ∥F ∥≤∥A∥∥F ∥=O (1), (E.19) 2|1 1|1 p because of (E.18) and Assumption 1(d). Therefore, from (A.3), using (E.19) and the same arguments leading to (E.18), ∥F ∥≤∥F ∥+∥(Λ′(Σξ)−1Λ +P−1)−1∥∥Λ ∥∥(Σξ)−1∥{∥x ∥+∥Λ ∥∥F ∥} 2|2 2|1 n n n 1|0 n n n1 n 2|1 =O (1). (E.20) p Itisthenclearthat(E.19)and(E.20)holdforallt=1,...,T,andtheresultfollowsfromLemmaC.10. Thisprovespart (i). For part (ii), the proof is identical to part (i) but using Lemma C.3(ii) instead of Lemma C.3(i). For part (iii) repeat thesamestepsasinpart(ii)butusingLemmaE.3insteadofLemmaC.10,andnoticingthat∥x ∥≤max ∥x ∥, nt t=1,...,T nt for all t=1,...,T. This completes the proof. □ Lemma E.6. Under Assumptions 1, 2, and 5, as n,T →∞, √ (i) n∥F −F ∥=O (1), uniformly in t; t|t t p √ (ii) n∥F −F ∥=O (1), uniformly in t; 0,t|t t p √ (iii) log−1/δvT n max ∥F −F ∥=O (1). t=1,...,T 0,t|t t p Proof. SinceP is positivedefinite because ofLemma D.7(ii), wecan usethe Woodbury formula inLemma D.13so t|t−1 that, for any t=1,...,T, the Kalman filter estimator defined in (A.3) can be written as: F =F +P Λ′(Λ P Λ′ +Σξ)−1(x −Λ F ) t|t t|t−1 t|t−1 n n t|t−1 n n nt n t|t−1 =F +(Λ′(Σξ)−1Λ +P−1 )−1Λ′(Σξ)−1(x −Λ F ) t|t−1 n n n t|t−1 n n nt n t|t−1 =F +(Λ′(Σξ)−1Λ +P−1 )−1Λ′(Σξ)−1(Λ F +ξ −Λ F ) t|t−1 n n n t|t−1 n n n t nt n t|t−1 =F +(Λ′(Σξ)−1Λ +P−1 )−1Λ′(Σξ)−1Λ F t|t−1 n n n t|t−1 n n n t −(Λ′(Σξ)−1Λ +P−1 )−1Λ′(Σξ)−1Λ F n n n t|t−1 n n n t|t−1 +(Λ′(Σξ)−1Λ +P−1 )−1Λ′(Σξ)−1ξ , (E.21) n n n t|t−1 n n nt where in the last step we used the definition of x in (3). Then, nt ∥F −(Λ′(Σξ)−1Λ +P−1 )−1Λ′(Σξ)−1Λ F ∥ t|t−1 n n n t|t−1 n n n t|t−1 ≤∥F ∥∥I −(Λ′(Σξ)−1Λ +P−1 )−1Λ′(Σξ)−1Λ ∥=O (n−1), (E.22) t|t−1 r n n n t|t−1 n n n p by Lemma E.5(i) and Lemma C.6(i), which can be applied by Lemma D.7(i). Similarly, ∥(Λ′(Σξ)−1Λ +P−1 )−1Λ′(Σξ)−1Λ F −F ∥ n n n t|t−1 n n n t t ≤∥(Λ′(Σξ)−1Λ +P−1 )−1Λ′(Σξ)−1Λ −I ∥∥F ∥=O (n−1), (E.23) n n n t|t−1 n n n r t p by the same arguments leading to (E.22) and since ∥F ∥ = O (1) uniformly in t = 1,...,T, because E[∥F ∥2] = t p t (cid:80)r E[F2]=tr(ΓF)=r by Assumption 6(b). From (E.22) and (E.23) it follows that j=1 jt ∥F −F ∥≤∥(Λ′(Σξ)−1Λ )−1Λ′(Σξ)−1ξ ∥+O (n−1) t|t t n n n n n nt p ≤∥n(Λ′(Σξ)−1Λ )−1∥∥n−1Λ′(Σξ)−1ξ ∥+O (n−1) n n n n n nt p =O (n−1/2), (E.24) p by Lemma C.3(iii) and Lemma C.7(i). This proves part (i). Part (ii) is proved in the same way, but using Lemmas D.7(iii), D.7(iv), E.5(ii), C.6(ii), C.3(iv), and C.7(ii), instead Page 51

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm ofLemmasD.7(i),D.7(ii),E.5(i),C.6(i),C.3(iii),andC.7(i). Forpart(iii)repeatthesamestepsasinpart(ii)butusing LemmaE.5(iii)insteadofLemmaE.5(ii),andsincemax ∥F ∥=O (log1/δvT)byLemmaE.2. Thiscompletesthe t=1,...,T t p proof. □ Lemma E.7. Under Assumptions 1 and 2, as n→∞, (i) n∥F −F ∥=O (1), uniformly in t; t|T t|t p (ii) n∥F −F ∥=O (1), uniformly in t. 0,t|T 0,t|t p Proof. For any t=1,...,T, using (A.6) ∥F −F ∥≤∥P ∥∥A′∥∥P−1 ∥∥(F −AF )∥. (E.25) t|T t|t t|t t+1|t t+1|T t|t Now, given that F is obtained by the last iteration of the Kalman filter, for t=T −1 (E.25) becomes: T|T ∥F −F ∥≤∥P ∥∥A∥∥P−1 ∥ (cid:8) ∥F ∥+∥A∥∥F ∥ (cid:9) T−1|T T−1|T−1 T−1|T−1 T|T−1 T|T T−1|T−1 ≤∥P ∥ M A (cid:8) ∥F −F ∥+∥F ∥+M (cid:2) ∥F −F ∥+∥F ∥ (cid:3)(cid:9) T−1|T−1 M T|T T T A T−1|T−1 T−1 T−1 P =O (n−1), (E.26) p by Assumption 1(d), and Lemmas D.7(ii), E.4(i), and E.6(i), and since ∥F ∥=O (1) uniformly in t=1,...,T, because t p E[∥F ∥2]= (cid:80)r E[F2]=tr(ΓF)=r by Assumption 6(b). For t=T −2 (E.25) becomes: t j=1 jt ∥F −F ∥≤∥P ∥∥A′∥∥P−1 ∥∥(F −AF )∥ T−2|T T−2|T−2 T−2|T−2 T−1|T−2 T−1|T T−2|T−2 ≤∥P ∥ M A (cid:8) ∥F −F ∥+∥F −F ∥+∥F ∥ T−1|T−1 M T−1|T T−1|T−1 T−1|T−1 T−1 T−1 P (cid:2) (cid:3)(cid:9) +M ∥F −F ∥+∥F ∥ A T−2|T−2 T−2 T−2 =O (n−1), (E.27) p by (E.26), Assumption 1(d), and Lemmas D.7(ii), E.4(i), and E.6(i), and since ∥F ∥ = O (1) uniformly in t = 1,...,T. t p By comparing (E.26) and (E.27) it is clear that, the same asymptotic bound holds uniformly in t = T,...,1, i.e., from (E.25) we get ∥F −F ∥≤∥P ∥ M A (cid:8) ∥F −F ∥+∥F −F ∥+∥F ∥ t|T t|t t|t M t+1|T t+1|t+1 t+1|t+1 t+1 t+1 P +M (cid:2) ∥F −F ∥+∥F ∥ (cid:3)(cid:9) =O (n−1). (E.28) A t|t t t p This proves part (i). Part(ii)isprovedinthesamewaybutusingLemmasD.7(iv),E.4(iii),andE.6(ii),insteadofLemmasD.7(ii),E.4(i), and E.6(i). This completes the proof. □ Lemma E.8. Under Assumptions 1, 2, 3, and 5, as n→∞, (i) n∥F −FWLS∥=O (1), uniformly in t; t|t t p (ii) n∥F −FGLS∥=O (1), uniformly in t; 0,t|t t p (iii) n2max ∥P −(Λ′(Σξ)−1Λ )−1∥=O (1); t=1,...,T t|t n n n p (iv) n2max ∥P −(Λ′(Γξ)−1Λ )−1∥=O (1); t=1,...,T 0,t|t n n n p (v) n2max ∥P −(Λ′(Σξ)−1Λ )−1∥=O (1); t=1,...,T t|T n n n p (vi) n2max ∥P −(Λ′(Γξ)−1Λ )−1∥=O (1); t=1,...,T 0,t|T n n n p where FWLS =(Λ′(Σξ)−1Λ )−1Λ′(Σξ)−1x and FGLS =(Λ′(Γξ)−1Λ )−1Λ′(Γξ)−1x . t n n n n n t t n n n n n t Proof. For part (i), from (A.3) we have F −FWLS =(Λ′(Σξ)−1Λ +P )−1Λ′(Σξ)−1x −FWLS t|t t n n n t|t−1 n n t t +F −(Λ′(Σξ)−1Λ +P )−1Λ′(Σξ)−1Λ F . (E.29) t|t−1 n n n t|t−1 n n n t|t−1 Page 52

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Then, ∥(Λ′(Σξ)−1Λ +P )−1Λ′(Σξ)−1x −FWLS∥ (E.30) n n n t|t−1 n n t t ≤∥(Λ′(Σξ)−1Λ +P )−1−(Λ′(Σξ)−1Λ )−1∥∥Λ ∥∥(Σξ)−1∥∥x ∥=O (n−1), n n n t|t−1 n n n n n t p by Lemmas C.6(iii), C.2, and C.10, and Assumption 2(a). And, ∥F −(Λ′(Σξ)−1Λ +P )−1Λ′(Σξ)−1Λ F ∥ t|t−1 n n n t|t−1 n n n t|t−1 ≤∥I −(Λ′(Σξ)−1Λ +P )−1Λ′(Σξ)−1Λ ∥∥F ∥ r n n n t|t−1 n n n t|t−1 =O (n−1), (E.31) p by (E.22) in the proof of Lemma E.6(i). By substituting (E.30) and (E.31) into (E.29), we prove part (i). Part (ii) is proved as part (i), but using Lemmas C.6(iv) and E.6(ii) instead of Lemmas C.6(iii) and E.6(i). Forpart(iii),from(A.4),usingthesamestepsleadingto(D.23)intheproofofLemmaD.11,butwhenusingthetrue parameters, it holds that: P =(Λ′(Σξ)−1Λ )−1 t|t n n n −P ((Λ′(Σξ)−1Λ )−1+P )−1(Λ′(Σξ)−1Λ )−1(P )−1(Λ′(Σξ)−1Λ )−1. (E.32) t|t−1 n n n t|t−1 n n n t|t−1 n n n Notice that all inverses in (E.32) are well defined because of Lemmas C.3(iii), D.7(i), and D.7(ii) and Assumption 2(a). Therefore, from (E.32) n2 max ∥P −(Λ′(Σξ)−1Λ )−1∥≤ max ∥P ∥ max ∥(P )−1∥n2∥(Λ′(Σξ)−1Λ )−1∥2 t|t n n n t|t−1 t|t−1 n n n t=1,...,T t=1,...,T t=1,...,T ·∥((Λ′(Σξ)−1Λ )−1+P )−1∥ n n n t|t−1 =O(1), (E.33) because of Lemmas C.3(iii), D.7(i), and D.7(ii), and since ∥((Λ′(Σξ)−1Λ )−1+P )−1∥=O(1), because of the same n n n t|t−1 arguments leading to (D.24) in the proof of Lemma D.11, which this time hold by Lemmas C.3(iii), D.7(i), and D.7(ii), instead of Lemmas D.5(iii), D.8(i), and D.8(ii). Part (iv) is proved as part (iii), but using Lemmas C.3(iv), and Assumption 2(f) instead of Lemmas C.3(iii) and Assumption 2(a). For part (v), n2 max ∥P −(Λ′(Σξ)−1Λ )−1∥≤n2 max ∥P −P ∥+n2 max ∥P −(Λ′(Σξ)−1Λ )−1∥=O(1), t|T n n n t|T t|t t|t n n n t=1,...,T t=1,...,T t=1,...,T because of part (iii) and Lemma E.4(v). Part (vi) is proved as part (v), but using part (iv) and Lemma E.4(vi). This completes the proof. □ Lemma E.9. Let ℓ (X ;ϕ ) be the log-likelihood obtained when A=0 and Γv =I . Then, under Assumptions 1, 0 nT r×r r n 2, 3, 4, 5, and 6, as n,T →∞, (cid:12) (cid:12) nlog−2/δvT sup (nT)−1(cid:12)ℓ(X ;ϕ ,θ)−ℓ (X ;ϕ )(cid:12)=O (1). (cid:12) nT n 0 nT n (cid:12) p φ n ∈On Proof. Throughoutweconsidergenericvaluesoftheparameterssuchthatφ ∈O whereO ={On ∩E }×{On ∩ n n n λi Λn σ i 2 E Γξ n }×O A ×O Γv as defined in Section 4.3.4. Thus the elements of φ n satisfy Assumptions 1(a), 1(d), 1(e), 2(a), 2(b), and 2(f). Page 53

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm First, recall that the log-likelihood (5) can be written as ℓ(X ;ϕ ,θ)=ℓ(X |F ;ϕ )+ℓ(F ;θ)−ℓ(F |X ;ϕ ,θ) nT nT T T T nT n n n T = − T logdet(Σξ)− 1(cid:88) (x −Λ F )′(Σξ)−1(x −Λ F ) 2 n 2 nt n t n nt n t t=1 T − T logdet(Γv)− 1(cid:88) (F −AF )′(Γv)−1(F −AF ) (E.34) 2 2 t t−1 t t−1 t=1 T 1(cid:88) + logdet(P (ϕ ,θ)) 2 0,t|T n t=1 T + 1(cid:88) (F −F (ϕ ,θ))′(P (ϕ ,θ))−1(F −F (ϕ ,θ)), 2 t 0,t|T n 0,t|T n t 0,t|T n t=1 where we used the fact that F =0 by Assumption 1(i) and we used the definitions: 0 r F (ϕ ,θ)=E [F |X ]≡F , 0,t|T n φ n t nT 0,t|T P (ϕ ,θ)=E [(F −F (ϕ ,θ))(F −F (ϕ ,θ))′|X ]≡P . (E.35) 0,t|T n φ n t 0,t|T n t 0,t|T n nT 0,t|T Now, since (E.34) holds for any F we can always choose F =F for all t=1,...,T, so that t t 0,t|T T ℓ(X ;ϕ ,θ)= − T logdet(Σξ)− 1(cid:88) (x −Λ F )′(Σξ)−1(x −Λ F ) nT n 2 n 2 nt n 0,t|T n nt n 0,t|T t=1 T − T logdet(Γv)− 1(cid:88) (F −AF )′(Γv)−1(F −AF ) 2 2 0,t|T 0,t−1|T 0,t|T 0,t−1|T t=1 T 1(cid:88) + logdet(P ). (E.36) 2 0,t|T t=1 Second,considerthelog-likelihood(5)whentheautocorrelationofthefactorsisnotaccountedfor,i.e.,whenA=0 r×r and Γv =I , r ℓ (X ;ϕ )=− T logdet (cid:16) Λ Λ′ +Σξ (cid:17) − 1(cid:88) T (cid:20) x′ (cid:16) Λ Λ′ +Σξ (cid:17)−1 x (cid:21) , (E.37) 0 nT n 2 n n n 2 nt n n n nt t=1 where we are imposing Assumption 6(b), so that we can set ΓF = I in the log-likelihood. Clearly, (E.37) can also be r written as ℓ (X ;ϕ )=ℓ (X |F ;ϕ )+ℓ (F )−ℓ (F |X ;ϕ ) 0 nT 0 nT T 0 T 0 T nT n n n T = − T logdet(Σξ)− 1(cid:88) (x −Λ F )′(Σξ)−1(x −Λ F ) 2 n 2 nt n t n nt n t t=1 T T − 1(cid:88) F′F + 1(cid:88) logdet(V (ϕ )) (E.38) 2 t t 2 0,t|T n t=1 t=1 T + 1(cid:88) (F −G (ϕ ))′(V (ϕ ))−1(F −G (ϕ )), 2 t 0,t|T n 0,t|T n t 0,t|T n t=1 where G (ϕ )=E [F |X ]≡G , 0,t|T n ϕ n t nT 0,t|T V (ϕ )=E [(F −G (ϕ ))(F −G (ϕ ))′|X ]≡V . (E.39) 0,t|T n ϕ n t 0,t|T n t 0,t|T n nT 0,t|T Page 54

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Now, since (E.38) holds for any F we can always choose F =G for all t=1,...,T, so that t t 0,t|T T ℓ (X ;ϕ )= − T logdet(Σξ)− 1(cid:88) (x −Λ G )′(Σξ)−1(x −Λ G ) 0 nT n 2 n 2 nt n 0,t|T n nt n 0,t|T t=1 T T − 1(cid:88) G′ G + 1(cid:88) logdet(V ). (E.40) 2 0,t|T 0,t|T 2 0,t|T t=1 t=1 UnderAssumption4theconditionalmean,F in(E.35)isalinearfunctionofX ,soitcanbeobtainedbylinear 0,t|T nT projection, and, therefore it is given by the correctly specified Kalman smoother, i.e., using as parameters ϕ and θ and n when replacing Σξ with Γξ, thus, n n F =F . (E.41) 0,t|T 0,t|T Likewise, under Assumption 4, G in (E.39) is also linear, however, since this is the conditional mean for the case in 0,t|T which no dynamics for the factors is specified, it is given by the simpler linear projection (using Lemma D.13) G =Λ′(Λ Λ′ +Γξ)−1x =(Λ′(Γξ)−1Λ +I )−1Λ′(Γξ)−1x =FREG. (E.42) 0,t|T n n n n nt n n n r n n nt 0,t Now,considerthegeneralizedleastsquaresestimatorofthefactorscomputedforgenericvaluesoftheparametersϕ : n FGLS =(Λ′(Γξ)−1Λ )−1Λ′(Γξ)−1x , (E.43) t n n n n n nt Then, since we restrict to φ ∈O , it follows that n n ∥FREG−FGLS∥≤n∥(Λ′(Γξ)−1Λ +I )−1−(Λ′(Γξ)−1Λ )−1∥n−1/2∥Λ′(Γξ)−1∥n−1/2∥x ∥ 0,t t n n n r n n n n n nt =O(n−1)O (1), (E.44) p by Lemmas C.3(viii), C.6(iv), and C.10. Moreover, by Lemmas E.7(ii) and E.8(ii) ∥F −FGLS∥≤∥F −F ∥+∥F −FGLS∥=O(n−1)O (1). (E.45) 0,t|T t 0,t|T 0,t|t 0,t|t t p Inparticular,theboundin(E.45)isaproductofastochasticandanon-stochastictermbecauseof (E.28)intheproofof Lemma E.7, and (E.30) and (E.31) in the proof of Lemma E.8. Therefore, from (E.44) and (E.45) (see also Bai and Li, 2016) ∥F −FREG∥=O(n−1)O (1). (E.46) 0,t|T 0,t p The results in (E.44)-(E.46) depend on t only through the O (1) term which in turn is just function of x and do not p nt dependonthechoiceoftheparametersaslongastheybelongtoO ,asassumed. Sincetheparametersaredeterministic n and using Lemma E.3, we then have sup max ∥F −FREG∥=O(n−1)O (log1/δvT). (E.47) 0,t|T 0,t p φ n ∈On t=1,...,T Then, consider the conditional covariance in (E.39) and letting K =(Λ′(Γξ)−1Λ +I )−1Λ′(Γξ)−1, we have n n n n r n n V =(I −K Λ )E [F F′|X ](I −K Λ )′+K E [ξ ξ′ |X ]K′, 0,t|T r n n ϕ n t t nT r n n n ϕ n nt nt nT n where we used also Lemma C.11. Moreover, E[V ]=(I −K Λ )E[E [F F′|X ]](I −K Λ )′+K E[E [ξ ξ′ |X ]]K′, 0,t|T r n n ϕ n t t nT r n n n ϕ n nt nt nT n is finite and positive definite, since ∥K Λ ∥ = O(1) by Lemmas C.2, C.3(vi), and C.3(viii), and because n n E[sup max E [F F′|X ]]andE[sup max E [ξ ξ′ |X ]]arebothfiniteandpositivedef- φ n ∈On t=1,...,T ϕ n t t nT φ n ∈On t=1,...,T ϕ n nt nt nT Page 55

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm inite by Assumptions 1(b) and 2(f), and Lemma C.1(v). Therefore, by Markov’s inequality sup max ∥V ∥=O (1), sup max ∥(V )−1∥=O (1). (E.48) 0,t|T p 0,t|T p φ n ∈On\{φn} t=1,...,T φ n ∈On\{φn} t=1,...,T This bound is tighter when φ =φ . Indeed, in that case from (A.4) and (A.7), setting A=0 therein, we have n r×r n V (ϕ )=I −I Λ′(Λ Λ′ +Γξ)−1Λ =I −(Λ′(Γξ)−1Λ +I )−1Λ′(Γξ)−1Λ . 0,t|T n r r n n n n n r n n n r n n n wherewealsousedAssumption6(b)forwhichΓF =I . Therefore,V (ϕ )isdeterministicandindependentoft,and r 0,t|T n such that max ∥V (ϕ )∥=O(n−1), max ∥(V (ϕ ))−1∥=O(n), (E.49) 0,t|T n 0,t|T n t=1,...,T t=1,...,T becausebyLemmaC.6(ii)whichholdssinceΓξ ispositivedefinitebyAssumption2(f). Furthermore,followingthesame n steps leading to (E.33) in the proof of Lemma E.8 we have max ∥V (ϕ )−(Λ′(Γξ)−1Λ )−1∥=O(n−2). (E.50) 0,t|T n n n n t=1,...,T Turning to the conditional covariance in (E.35), because of (E.41) we can write P =E [(F −F +FREG−FREG)(F −F +FREG−FREG)′|X ] 0,t|T φ n t 0,t|T 0,t 0,t t 0,t|T 0,t 0,t nT =V +E [(FREG−F )(FREG−F )′|X ] 0,t|T φ n 0,t 0,t|T 0,t 0,t|T nT +E [(FREG−F )(F −FREG)′|X ]+E [(F −FREG)(FREG−F )′|X ]. (E.51) φ n 0,t 0,t|T t 0,t nT φ n t 0,t 0,t 0,t|T nT From (E.47) and (E.51) we have sup max ∥P −V ∥=O (n−1log1/δvT). (E.52) 0,t|T 0,t|T p φ n ∈On\{φn} t=1,...,T This bound is tighter when φ = φ . Indeed, we have P (φ ) = P , which is deterministic by Lemma D.6(iii), n n 0,t|T n 0,t|T and ∥P (φ )−V (φ )∥=∥P −V (φ )∥ 0,t|T n 0,t|T n 0,t|T 0,t|T n ≤∥P −(Λ′(Γξ)−1Λ )−1∥+∥(Λ′(Γξ)−1Λ )−1−V (φ )∥ 0,t|T n n n n n n 0,t|T n =O(n−2), (E.53) by (E.50) and Lemma E.8(vi). Now, consider ℓ(X ;ϕ ,θ)−ℓ (X ;ϕ ) nT 0 nT n n = − 1(cid:88) T (cid:26)(cid:16) F −FREG (cid:17)′ Λ′(Σξ)−1Λ (cid:16) F −FREG (cid:17) 2 0,t|T 0,t n n n 0,t|T 0,t t=1 (cid:16) (cid:17)′ (cid:16) (cid:17)(cid:27) T +2 F −FREG Λ′(Σξ)−1Λ FREG+2x′ (Σξ)−1Λ F −FREG − logdet(Γv) 0,t|T 0,t n n n 0,t nt n n 0,t|T 0,t 2 T T − 1(cid:88) (F −AF )′(Γv)−1(F −AF )+ 1(cid:88) FREG′FREG 2 0,t|T 0,t−1|T 0,t|T 0,t−1|T 2 0,t 0,t t=1 t=1 T T 1(cid:88) 1(cid:88) + logdet(P )− logdet(V ). (E.54) 2 0,t|T 2 0,t|T t=1 t=1 Page 56

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Consider all terms on the rhs of (E.54). First, by (E.47), for all φ ∈O , n n (cid:12) (cid:12) (cid:12)− 1(cid:88) T (cid:26)(cid:16) F −FREG (cid:17)′ Λ′(Σξ)−1Λ (cid:16) F −FREG (cid:17) +2 (cid:16) F −FREG (cid:17)′ Λ′(Σξ)−1Λ FREG (cid:12) 2 0,t|T 0,t n n n 0,t|T 0,t 0,t|T 0,t n n n 0,t t=1 +2x′ nt Λ′ n (Σξ n )−1Λ n (cid:16) F 0,t|T −FR 0, E t G (cid:17)(cid:27)(cid:12) (cid:12) (cid:12) (cid:12) ≤T max ∥F −FREG∥∥Λ ∥2∥(Σξ)−1∥ max ∥FREG∥ 0,t|T 0,t n n 0,t t=1,...,T t=1,...,T +T max ∥F −FREG∥∥(Σξ)−1∥∥Λ ∥ max ∥x ∥ 0,t|T 0,t n n nt t=1,...,T t=1,...,T +T max ∥F −FREG∥2∥Λ ∥2∥(Σξ)−1∥ max ∥x ∥ 0,t|T 0,t n n nt t=1,...,T t=1,...,T =O (Tlog2/δvT), (E.55) p where we used Lemmas C.2 and E.3, which implies also that ∥FREG∥=O (log1/δvT), and Assumption 2(a). 0,t p Second, for all φ ∈O , n n (cid:12) T T (cid:12) (cid:12) (cid:12)− T logdet(Γv)− 1(cid:88) (F −AF )′(Γv)−1(F −AF )+ 1(cid:88) FREG′FREG(cid:12) (cid:12) (cid:12) 2 2 0,t|T 0,t−1|T 0,t|T 0,t−1|T 2 0,t 0,t (cid:12) t=1 t=1 ≤ (cid:12) (cid:12) (cid:12) (cid:12) T 2 logdet(Γv) (cid:12) (cid:12) (cid:12) (cid:12) + (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) 2 1(cid:88) T (F 0,t|T −AF 0,t−1|T )′(Γv)−1(F 0,t|T −AF 0,t−1|T ) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) + (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) 2 1(cid:88) T FR 0, E t G′FR 0, E t G (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) t=1 t=1 =O (Tlog2/δvT), (E.56) p because of (E.46), Assumptions 1(d) and 1(e), and Lemma E.3 jointly with the same arguments leading to (E.55). Third, by (E.48) and (E.52), and Merikoski and Kumar (2004, Theorem 1), which is Weyl’s inequality (cid:12) (cid:12) (cid:12) 1(cid:88) T logdet(P )− 1(cid:88) T logdet(V ) (cid:12) (cid:12) (cid:12)= (cid:12) (cid:12) (cid:12) 1(cid:88) T log (cid:16) det{P (V )−1} (cid:17) (cid:12) (cid:12) (cid:12) (cid:12)2 0,t|T 2 0,t|T (cid:12) (cid:12)2 0,t|T 0,t|T (cid:12) t=1 t=1 t=1 = (cid:12) (cid:12) (cid:12) 1(cid:88) T (cid:88) r log (cid:18)(cid:110) ν(j)(P ) (cid:111)(cid:110) ν(j)(V ) (cid:111)−1 (cid:19)(cid:12) (cid:12) (cid:12) (cid:12)2 0,t|T 0,t|T (cid:12) t=1j=1 ≤ (cid:12) (cid:12) (cid:12) 1(cid:88) T (cid:88) r log (cid:18)(cid:110) ν(1)(P −V )+ν(j)(V ) (cid:111)(cid:110) ν(j)(V ) (cid:111)−1 (cid:19)(cid:12) (cid:12) (cid:12) (cid:12)2 0,t|T 0,t|T 0,t|T 0,t|T (cid:12) t=1j=1 ≤ (cid:12) (cid:12) (cid:12) 1(cid:88) T (cid:88) r log (cid:18) 1+ (cid:110) n∥P −V ∥ (cid:111)(cid:110) nν(j)(V ) (cid:111)−1 (cid:19)(cid:12) (cid:12) (cid:12) (cid:12)2 0,t|T 0,t|T 0,t|T (cid:12) t=1j=1 (cid:12) (cid:12) ≤ max (cid:12) (cid:12) T (cid:110) n∥P −V ∥ (cid:111)(cid:110) nν(j)(V ) (cid:111)−1(cid:12) (cid:12)+o(Tn−1) t=1,...,T(cid:12)2 0,t|T 0,t|T 0,t|T (cid:12) =O (Tn−1log1/δvT), (E.57) p where in the second last line we took into account also (E.49) and (E.53), hence, (E.57) holds for all φ ∈O . n n Summing up, by noticing that (E.55), (E.56), and (E.57) hold for all φ ∈O , from (E.54) we have: n n (cid:12) (cid:12) sup (nT)−1(cid:12)ℓ(X ;ϕ ,θ)−ℓ (X ;ϕ )(cid:12)=O (n−1log2/δvT). (cid:12) nT n 0 nT n (cid:12) p φ n ∈On This completes the proof. □ Lemma E.10. Let ϕ(cid:98)∗ n = (vec(Λ(cid:98)∗ n )′ σ (cid:98)1 2∗···σ (cid:98)n 2∗)′ be the vector of QML estimators of the entries of ϕ n maximizing ℓ(X nT ;ϕ n ,θ) defined in (5), and let ϕ(cid:98)† n = (vec(Λ(cid:98)† n )′ σ (cid:98)1 2†···σ (cid:98)n 2†)′ be the vector of QML estimators of the entries of ϕ n maximizing ℓ (X ;ϕ ) defined in (E.37) in the proof of Lemma E.9. Then, under Assumptions 1, 2, 3, 4, 5, and 6, as 0 nT n n,T →∞, (i) nlog−2/δvTmax i=1,...,n ∥λ(cid:98)∗ i −λ(cid:98) † i ∥=O p (1); (ii) nlog−2/δvTn−1/2∥Λ(cid:98)∗ n −Λ(cid:98)† n ∥=O p (1); (iii) nlog−2/δvTmax |σ2∗−σ2†|=O (1). i=1,...,n (cid:98)i (cid:98)i p Page 57

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Proof. From Lemma E.9 we have (cid:12) (cid:12) (cid:12) (cid:12) (nT)−1(cid:12) sup ℓ(X ;ϕ ,θ)− sup ℓ (X ;ϕ )(cid:12) (cid:12) (cid:12)ϕ n ∈On nT n ϕ n ∈On 0 nT n (cid:12) (cid:12) (cid:12) (cid:12) ≤(nT)−1 sup (cid:12)ℓ(X ;ϕ ,θ)−ℓ (X ;ϕ )(cid:12) (cid:12) nT n 0 nT n (cid:12) ϕ n ∈On =O (n−1log2/δvT). (E.58) p Therefore, by continuity of the log-likelihoods and (E.58), we have i= m 1, a .. x .,n (cid:32) | ∥ σ (cid:98) λ(cid:98) i 2 ∗ i ∗ − − λ σ (cid:98) (cid:98) i 2 † i † ∥ | (cid:33) = i= m 1, a .. x .,n (cid:13) (cid:13) (cid:13) (cid:13) a (λ rg ′ i σ m 2 i a )′ x ∈On ℓ(X nT ;ϕ n ,θ)−a (λ rg ′ i σ m 2 i a )′ x ∈On ℓ 0 (X nT ;ϕ n ) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) = max (cid:13) (cid:13)argmax (nT)−1ℓ(X nT ;ϕ ,θ)−argmax (nT)−1ℓ 0 (X nT ;ϕ ) (cid:13) (cid:13) i=1,...,n(cid:13)(λ′ i σ2 i )′∈O1 n (λ′ i σ2 i )′∈On n (cid:13) =O (n−1log2/δvT). p This proves parts (i) and (iii), while part (ii) is a direct consequence of part (i). This completes the proof. □ Lemma E.11. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞, √ (i) min(nlog−2/δvT, √ nT,T)∥λ(cid:98)∗ i −λO i LS∥=O p (1), uniformly in i; (ii) min(nlog−2/δvT, √ nT,T)n−1/2∥Λ(cid:98)∗ n −ΛO n LS∥=O p (1); (iii) min(nlog−2/δvT, nT,T)|σ2∗−σ2OLS|=O (1), uniformly in i. (cid:98)i i p Proof. For part (i) ∥λ(cid:98) ∗ i −λO i LS∥≤∥λ(cid:98) ∗ i −λ† i ∥+∥λ(cid:98) † i −λ( i 0)∥+∥λ(cid:98) ( i 0)−λO i LS∥, (E.59) where λ(cid:98) (0) is the PC estimator defined in Appendix A.1. Then, from Barigozzi (2023, Theorem 3) i ∥λ(cid:98) † i −λ(cid:98) ( i 0)∥=O p (n−1), (E.60) while from Barigozzi (2023, Corollary 1 and Proposition B.3) ∥λ(cid:98) ( i 0)−λO i LS∥=O p (max(n−1,n−1/2T−1/2,T−1)). (E.61) Part (i) then follows by substituting (E.60), (E.61), and Lemma E.10(i) into (E.59). For part (ii), from Barigozzi (2023, Theorem 3) n−1/2∥Λ(cid:98) † n −Λ(cid:98) ( n 0)∥=O p (n−1), (E.62) while from Barigozzi (2023, Corollary 1 and Proposition B.3) n−1/2∥Λ(cid:98) ( n 0)−ΛO n LS∥=O p (max(n−1,n−1/2T−1/2,T−1)). (E.63) Then, part (ii) is proved analogously to part (i), using (E.62), (E.63), and Lemma E.10(ii). For part (iii), let σ (cid:98)i 2‡ =T−1(cid:80)T t=1 (x it −λ(cid:98) † i ′F t )2. Then, consider |σ2∗−σ2OLS|≤|σ2∗−σ2†|+|σ2†−σ2‡|+|σ2‡−σ2OLS|. (E.64) (cid:98)i i (cid:98)i (cid:98)i (cid:98)i (cid:98)i (cid:98)i i From Bai and Li (2016, Theorem S.2 and eq. (S.33) in the online supplement) and noticing that the estimator of the idiosyncratic variances is unaffected by the chosen identifying constraints, we have |σ2†−σ2‡|=O (max(n−1,n−1/2T−1/2,T−1)). (E.65) (cid:98)i (cid:98)i p Page 58

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Moreover, (cid:12) (cid:12) |σ (cid:98)i 2‡−σ i 2OLS|= (cid:12) (cid:12) (cid:12) T−1(cid:88) T (x it −λ(cid:98) † i ′F t )2−T−1(cid:88) T (x it −λO i LS′F t )2 (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) t=1 t=1 (cid:12) (cid:12) (cid:12) (cid:12) ≤ (cid:12) (cid:12) (cid:12) T−1(cid:88) T (cid:110) λ(cid:98) † i ′F t F′ t λ(cid:98) † i −λO i LS′F t F′ t λO i LS (cid:111)(cid:12) (cid:12) (cid:12) +2 (cid:12) (cid:12) (cid:12) T−1(cid:88) T (cid:110) λ(cid:98) † i ′F t x it −λO i LS′F t x it (cid:111)(cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) t=1 t=1 (cid:13) (cid:13) (cid:13) (cid:13) ≤∥λ(cid:98) † i −λO i LS∥2 (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) +2∥λ(cid:98) † i −λO i LS∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) ∥λO i LS∥ (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 (cid:13) (cid:13) +2∥λ(cid:98) † i −λO i LS∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t x it (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 =O (max(n−1,n−1/2T−1/2,T−1)), (E.66) p by (E.60), (E.61), Lemma C.12(i) combined with Assumption 6(b), Lemma C.12(ii), and since ∥λOLS∥≤∥λOLS−λ ∥+ i i i ∥λ ∥=O (1)byAssumption1(a)andLemmaD.19(i). Part(iii)thenfollowsbysubstituting (E.65),(E.66),andLemma i p E.10(iii) into (E.64). This completes the proof. □ Lemma E.12. Let θ(cid:98)∗ = (vec(A(cid:98)∗)′ vech(Γ(cid:98)v∗)′)′ be the QML estimator of the entries of θ maximizing ℓ(X nT ;ϕ ,θ) n defined in (5), then, under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞, (i) nlog−2/δvT∥A(cid:98)∗−AOLS∥=O p (1); (ii) nlog−2/δvT∥Γ(cid:98)v∗−ΓvOLS∥=O p (1). Proof. Throughoutweconsidergenericvaluesoftheparameterssuchthatφ ∈O whereO ={On ∩E }×{On ∩ n n n λi Λn σ i 2 E Γξ n }×O A ×O Γv as defined in Section 4.3.4. Thus the elements of φ n satisfy Assumptions 1(a), 1(d), 1(e), 2(a), 2(b), and 2(f). The log-likelihood depends on A and Γv only through ℓ(F ;θ) and ℓ(F |X ;ϕ ,θ). Let us consider both log- T T nT n likelihoods separately. First, since by Assumption 1(i) F =0 , we have 0 r T ℓ(F ;θ)=− T logdet(Γv)− 1(cid:88) (F −AF )′(Γv)−1(F −AF ), T 2 2 t t−1 t t−1 t=1 which is clearly maximized by θOLS. Second, (see also (E.34), (E.35), and (E.41) in the proof of Lemma E.9) T T ℓ(F |X ;ϕ ,θ)= − 1(cid:88) logdet(P )− 1(cid:88) (F −F )′(P )−1(F −F ), T nT n 2 0,t|T 2 t 0,t|T 0,t|T t 0,t|T t=1 t=1 Now, by (E.47), (E.52), and (E.53) in the proof of Lemma E.9, we have that (cid:12) (cid:12) sup (cid:12) (cid:12)ℓ(F T |X nT ;ϕ ,θ)−ℓ 0 (F T |X nT ;ϕ ) (cid:12) (cid:12)=O p (n−1log2/δvT). (E.67) φ n ∈On (cid:12) n n (cid:12) The proof of parts (i) and (ii) follows from (E.67) and by continuity of the log-likelihood and since ℓ (F |X ;ϕ ) does 0 T nT n not depend on θ. This completes the proof. □ Lemma E.13. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞, √ (i) min(nlog−2/δvT, √ T)∥λ(cid:98)∗ i −λ i ∥=O p (1), uniformly in i; (ii) min(nlog−2/δvT, √ T)n−1/2∥Λ(cid:98)∗ n −Λ n ∥=O p (1); (iii) min(nlog−2/δvT, T)|σ2∗−σ2|=O (1), uniformly in i; √ (cid:98)i i p (iv) min(nlog−2/δvT, √ T)∥A(cid:98)∗−A∥=O p (1); (v) min(nlog−2/δvT, T)∥Γ(cid:98)v∗−Γv∥=O p (1). Proof. The proof follows directly from Lemmas D.19, E.11, and E.12. □ Lemma E.14. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞, √ (i) min( Tlog−1/2n,nlog−2/δvT) max i=1,...,n ∥λ(cid:98)∗ i −λ i ∥=O p (1); Page 59

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm √ (ii) min( Tlog−1/2n,nlog−2/δvT) max |σ2∗−σ2|=O (1); √ i=1,...,n (cid:98)i i p (iii) min( Tlog−1/2n,nlog−2/δvT)∥Σ(cid:98)ξ n ∗−Σξ n ∥=O p (1); (iv) ∥(Σ(cid:98)ξ n√ ∗)−1∥=O p (1); (v) min( Tlog−1/2n,nlog−2/δvT)∥(Σ(cid:98)ξ n ∗)−1−(Σξ n )−1∥=O p (1). Proof. For part (i) consider max ∥λ(cid:98) ∗ i −λ i ∥≤ max ∥λ(cid:98) ∗ i −λ(cid:98) ( i 0)∥+ max ∥λ(cid:98) ( i 0)−λ i ∥. (E.68) i=1,...,n i=1,...,n i=1,...,n First,notethat(E.60)intheproofofLemmaE.11holdsforalli,thisisseenfromtheproofofBarigozzi(2023,Theorem 3), thus, jointly with Lemma E.10(i), max ∥λ(cid:98) ∗ i −λ(cid:98) ( i 0)∥≤ max ∥λ(cid:98) ∗ i −λ(cid:98) † i ∥+ max ∥λ(cid:98) † i −λ(cid:98) ( i 0)∥=O p (n−1log2/δvT). (E.69) i=1,...,n i=1,...,n i=1,...,n Second, from Barigozzi (2023, equation (A.5) in the proof of Theorem 1 in the supplementary material ), when imposing Assumption 6(b), (cid:40) T n (cid:41) (cid:40) T (cid:41) λ(cid:98) ( i 0)−λ i = (nT)−1λ′ i (cid:88)(cid:88) F t ξ jt λ′ j n(Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))−1 + T−1(cid:88) F t ξ it (Λ′ n Λ n )(Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))−1 t=1j=1 t=1 (cid:40) T n (cid:41) + (nT)−1(cid:88)(cid:88) ξ it ξ jt λ′ j n(Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))−1 t=1j=1 (cid:40) T n (cid:41) + (nT)−1λ′ i (cid:88)(cid:88) F t ξ jt (λ(cid:98) ( j 0)−λ j )′n(Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))−1 t=1j=1 (cid:40) T (cid:41) + T−1(cid:88) F t ξ it Λ′ n (Λ(cid:98) ( n 0)−Λ n )(Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))−1 t=1 (cid:40) T n (cid:41) + (nT)−1(cid:88)(cid:88) ξ it ξ jt (λ(cid:98) ( j 0)−λ j )′n(Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))−1 t=1j=1 =1.a+1.b+1.c+1.d+1.e+1.f, say, where max ∥1.a∥=O (n−1/2T−1/2), p i=1,...,n max ∥1.b∥=O (T−1/2(cid:112) logn), (E.70) p i=1,...,n max ∥1.c∥=O (max(n−1,n−1/2T−1/2(cid:112) logn)), p i=1,...,n which follows by using • for 1.a Assumption 1(a), Lemma D.1(ii), and Barigozzi (2023, Proposition B.3(a)); • for 1.b Assumption 1(a), Lemma D.1(ii), and Lemma E.1(ii) and the union bound; • for 1.c Assumption 1(a), Lemma D.1(ii), E.1(ii) and the union bound, and Barigozzi (2023, Proposition B.3(c)), where we also used the fact that max |(nT)−1(cid:80)T (cid:80)n E[ξ ξ ]|=O(n−1) by Assumption 2(b); i=1,...,n t=1 j=1 it jt while 1.d, 1.e, and 1.f are dominated by and 1.a, 1.b, and 1.c, respectively, because of Lemma D.1(i). It follows that max ∥λ(cid:98) ( i 0)−λ i ∥=O p (max(n−1,T−1/2(cid:112) logn)). (E.71) i=1,...,n By substituting (E.69) and (E.71) into (E.68), we have max ∥λ(cid:98) ∗ i −λ i ∥=O p (max(n−1,T−1/2(cid:112) logn))+O p (n−1log2/δvT). i=1,...,n This proves part (i). Page 60

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm For part (ii), first, consider max |σ2∗−σ2|≤ max |σ2∗−σ2†|+ max |σ2†−σ2‡| (cid:98)i i (cid:98)i (cid:98)i (cid:98)i (cid:98)i i=1,...,n i=1,...,n i=1,...,n + max |σ2‡−σ2OLS|+ max |σ2OLS−σ2|. (E.72) (cid:98)i i i i i=1,...,n i=1,...,n From (E.65) in the proof of Lemma E.11 max |σ2†−σ2‡|=O (max(n−1,T−1(cid:112) logn,n−1/2T−1/2(cid:112) logn)). (E.73) (cid:98)i (cid:98)i p i=1,...,n Indeed from Bai and Li (2016, equation (S.25) in the online supplement) we see that (E.73) is decomposed into the sum of13terms,andalldependonionlythroughλ whichissuchthatmax ∥λ ∥≤M byAssumption1(a),withthe i i=1,...,n i λ exceptions of the following terms A 1 = max (cid:12) (cid:12) (cid:12) (cid:12) (λ(cid:98) † i −λ i )′ (cid:40) T−1(cid:88) T F t F′ t (cid:41) (λ(cid:98) † i −λ i ) (cid:12) (cid:12) (cid:12) (cid:12) =O p (max(T−1logn,n−2log4/δvT)), i=1,...,n(cid:12) (cid:12) t=1 (cid:12) (cid:12) A 2 = max 2 (cid:12) (cid:12) (cid:12) λ′ i Λ(cid:98) † n ′(Σ(cid:98) ξ n †)−1(Λ(cid:98) † n −Λ n )T−1(cid:88) T F t ξ it (cid:12) (cid:12) (cid:12) =O p ((n−1/2T−1/2(cid:112) logn), i=1,...,n (cid:12) (cid:12) t=1 A 3 = max 2 (cid:12) (cid:12) (cid:12) (cid:12) λ′ i Λ(cid:98) † n ′(Σ(cid:98) ξ n †)−1 (cid:40) T−1(cid:88) T ξ nt (cid:41)(cid:40) T−1(cid:88) T ξ it (cid:41)(cid:12) (cid:12) (cid:12) (cid:12) =O p (T−1(cid:112) logn), i=1,...,n (cid:12) (cid:12) t=1 t=1 where we used: • for A part (i) and Lemmas E.10(i) and C.12(i) jointly with Assumption 6(b); 1 • for A Lemma E.1(ii) and the union bound, plus Bai and Li (2012, Lemma B.4 and Corollary A.1 in the online 2 supplement); • for A the same arguments used for A plus Bai and Li (2016, Lemma S.10 in the online supplement). 3 2 Second, notice that, by Lemmas C.12(i) and D.1(ii), term 1.b in (E.70) is such that 1.b=(λOLS−λ )+o (T−1/2(cid:112) logn). (E.74) i i p Thus, from the above arguments and (E.74), max ∥λ(cid:98) ( i 0)−λO i LS∥=O p (max(n−1,n−1/2T−1/2(cid:112) logn)). (E.75) i=1,...,n and, therefore, from (E.69) and (E.75) max ∥λ(cid:98) † i −λO i LS∥≤ max ∥λ(cid:98) † i −λ(cid:98) ( i 0)∥+ max ∥λ(cid:98) ( i 0)−λO i LS∥ i=1,...,n i=1,...,n i=1,...,n =O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn)). (E.76) p Then, from (E.66) in the proof of Lemma E.11 (cid:13) (cid:13) max |σ (cid:98)i 2‡−σ i 2OLS|≤ max ∥λ(cid:98) † i −λO i LS∥2 (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) i=1,...,n i=1,...,n (cid:13) (cid:13) t=1 (cid:13) (cid:13) +2 max ∥λ(cid:98) † i −λO i LS∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) max ∥λO i LS∥ i=1,...,n (cid:13) (cid:13) i=1,...,n t=1 (cid:13) (cid:13) +2 max ∥λ(cid:98) † i −λO i LS∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) max ∥λ i ∥ i=1,...,n (cid:13) (cid:13) i=1,...,n t=1 (cid:13) (cid:13) +2 max ∥λ(cid:98) † i −λO i LS∥ max (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t ξ it (cid:13) (cid:13) (cid:13) i=1,...,n i=1,...,n(cid:13) (cid:13) t=1 =O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn)), (E.77) p Page 61

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm by (E.70), (E.76), Assumption 1(a), Lemma C.12(i) combined with Assumption 6(b), and since max ∥λOLS∥ ≤ √ i=1,...,n i max ∥λOLS−λ ∥+max ∥λ ∥=O (T−1/2 logn)+O(1) again by (E.70) and Assumption 1(a). i=1,...,n i i i=1,...,n i p Finally, (cid:12) (cid:12) (cid:12) (cid:12) max |σ2OLS−σ2|≤ max (cid:12) (cid:12)T−1(cid:88) T x2 −E[x2] (cid:12) (cid:12)+ max (cid:12) (cid:12)λOLS′T−1(cid:88) T F F′λOLS−λ′λ (cid:12) (cid:12) i i (cid:12) it it (cid:12) (cid:12) i t t i i i(cid:12) i=1,...,n i=1,...,n(cid:12) (cid:12) i=1,...,n(cid:12) (cid:12) t=1 t=1 (cid:13) (cid:13) +2 max (cid:13) (cid:13)λOLS′T−1(cid:88) T F F′ −λ′ (cid:13) (cid:13) max ∥λ ∥ (cid:13) i t t i(cid:13) i i=1,...,n(cid:13) (cid:13) i=1,...,n t=1 (cid:13) (cid:13) +2 max ∥λOLS∥ max (cid:13) (cid:13)T−1(cid:88) T F ξ (cid:13) (cid:13) i (cid:13) t it(cid:13) i=1,...,n i=1,...,n(cid:13) (cid:13) t=1 =O (T−1/2(cid:112) logn). (E.78) p Indeed, for the first term on the rhs of (E.78) we have (cid:12) (cid:12) (cid:13) (cid:13) (cid:12) (cid:12) max (cid:12) (cid:12)T−1(cid:88) T x2 −E[x2] (cid:12) (cid:12)≤ max ∥λ ∥2 (cid:13) (cid:13)T−1(cid:88) T F F′ −I (cid:13) (cid:13)+ max (cid:12) (cid:12)T−1(cid:88) T ξ2 −E[ξ2] (cid:12) (cid:12) (cid:12) it it (cid:12) i (cid:13) t t r(cid:13) (cid:12) it it (cid:12) i=1,...,n(cid:12) (cid:12) i=1,...,n (cid:13) (cid:13) i=1,...,n(cid:12) (cid:12) t=1 t=1 t=1 (cid:13) (cid:13) +2 max ∥λ ∥ max (cid:13) (cid:13)T−1(cid:88) T F ξ (cid:13) (cid:13) i (cid:13) t it(cid:13) i=1,...,n i=1,...,n(cid:13) (cid:13) t=1 =O (T−1/2(cid:112) logn), (E.79) p by Assumption 1(a), Lemma C.12(i) for the first term, Barigozzi et al. (2018, Lemma 3.ii) and Fan et al. (2011, Lemmas A3 and B1), which we can apply because of Assumption 5(b) for the second term, and Lemma E.1(ii) and the union bound for the third term. For all other on the rhs of (E.78) we just need to use Assumption 1(a), Lemma C.12(i) and √ thefactthatmax ∥λOLS∥≤max ∥λOLS−λ ∥+max ∥λ ∥=O (T−1/2 logn)+O(1)againby(E.70) i=1,...,n i i=1,...,n i i i=1,...,n i p and Assumption 1(a). By using Lemma E.10(iii), (E.73), (E.77), and (E.78) into (E.72) we have max |σ2∗−σ2|=O (n−1log2/δvT)+O (max(n−1,T−1(cid:112) logn,n−1/2T−1/2(cid:112) logn)) (cid:98)i i p p i=1,...,n +O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn))+O (T−1/2(cid:112) logn) p p =O (max(n−1log2/δvT,T−1/2(cid:112) logn)), p which proves part (ii). Part (iii) immediately follows from part (ii), indeed ∥Σ(cid:98) ξ n ∗−Σξ n ∥≤ max |σ (cid:98)i 2∗−σ i 2|=O p (max(n−1log2/δvT,T−1/2(cid:112) logn)). i=1,...,n For part (iv) we have (cid:26) (cid:27)−1 ∥(Σ(cid:98) ξ n ∗)−1∥= min σ (cid:98)i 2∗ ≤C ξ +O p (max(n−1log2/δvT,T−1/2(cid:112) logn)), i=1,...,n because of part (ii) and Assumption 2(a). To conclude, for part (v) we have ∥(Σ(cid:98) ξ n ∗)−1−(Σξ n )−1∥≤∥(Σ(cid:98) ξ n ∗)−1∥∥Σ(cid:98) ξ n ∗−Σξ n ∥∥(Σξ n )−1∥=O p (max(n−1log2/δvT,T−1/2(cid:112) logn)), by parts (iii), (iv), and Assumption 2(a). This completes the proof. □ Lemma E.15. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞: √ (i) min(nlog−2/δvT, √ Tlog−1/2n)n−1∥Λ(cid:98)∗ n ′(Σ(cid:98)ξ n ∗)−1Λ(cid:98)∗ n −Λ′ n (Σξ n )−1Λ n ∥=O p (1); (ii) min(nlog−2/δvT, Tlog−1/2n)n−1/2∥Λ(cid:98)∗ n ′(Σ(cid:98)ξ n ∗)−1−Λ′ n (Σξ n )−1∥=O p (1); (iii) n∥(Λ(cid:98)∗ n ′(Σ(cid:98)ξ n ∗)−1Λ(cid:98)∗ n )−1∥=O p (1); Page 62

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm √ (iv) min(nlog−2/δvT, Tlog−1/2n)n∥(Λ(cid:98)∗ n ′(Σ(cid:98)ξ n ∗)−1Λ(cid:98)∗ n )−1−(Λ′ n (Σξ n )−1Λ n )−1∥=O p (1); √ (v) ω n,Tδv n∥(Λ(cid:98)∗ n ′(Σ(cid:98)ξ n ∗)−1Λ(cid:98)∗ n )−1Λ(cid:98)∗ n√ ′(Σ(cid:98)ξ n ∗)−1−(Λ′ n (Σξ n )−1Λ n )−1Λ′ n (Σξ n )−1∥=O p (1), with ω =min(nlog−2/δvT, Tlog−1/2n). n,Tδv Proof. For part (i) we have n−1∥Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n −Λ′ n (Σξ n )−1Λ n ∥≤2n−1∥{Λ(cid:98) ∗ n −Λ n }′(Σξ n )−1Λ n ∥ +n−1∥Λ′ n {(Σ(cid:98) ξ n ∗)−1−(Σξ n )−1}Λ n ∥ +2n−1∥{Λ(cid:98) ∗ n −Λ n }′{(Σ(cid:98) ξ n ∗)−1−(Σξ n )−1}Λ n ∥ +n−1∥{Λ(cid:98) ∗ n −Λ n }′{(Σ(cid:98) ξ n ∗)−1−(Σξ n )−1}{Λ(cid:98) ∗ n −Λ n }∥ ≤2n−1/2∥Λ(cid:98) ∗ n −Λ n ∥∥(Σξ n )−1∥n−1/2∥Λ n ∥ +∥{(Σ(cid:98) ξ n ∗)−1−(Σξ n )−1}∥n−1∥Λ n ∥2 +2n−1/2∥Λ(cid:98) ∗ n −Λ n ∥∥{(Σ(cid:98) ξ n ∗)−1−(Σξ n )−1}∥n−1/2∥Λ n ∥ +n−1∥Λ(cid:98) ∗ n −Λ n ∥2∥(Σ(cid:98) ξ n ∗)−1−(Σξ n )−1∥ =O (max(n−1log2/δvT,T−1/2(cid:112) logn)), (E.80) p by Assumptions 1(a), 2(a), and Lemmas E.13(ii) and E.14(v). This proves part (i). Part (ii) is proved in the same way as part (i). For part (iii), by part (ii) and Merikoski and Kumar (2004, Theorem 1) which is Weyl’s inequality, we have n−1|ν(r)(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−ν(r)(Λ′ n (Σξ n )−1)Λ n )| ≤n−1∥Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n −Λ′ n (Σξ n )−1Λ n ∥ =O (max(n−1log2/δvT,T−1/2(cid:112) logn)). (E.81) p Moreover (note that x−y≥−|x−y| for any x,y∈R), det(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )= (cid:89) r ν(j)(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )≥ (cid:110) ν(r)(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n ) (cid:111)r j=1 (cid:110) (cid:111)r ≥ ν(r)(Λ′ n (Σξ n )−1Λ n )−|ν(r)(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−ν(r)(Λ′ n (Σξ n )−1)Λ n )| , thus, by Lemma C.3(iv), which implies lim n−1ν(r)(Λ′(Σξ)−1Λ ) > 0, and (E.81) it follows that, with probability n→∞ n n n tendingtooneasn,T →∞, wehavedet(n−1Λ(cid:98)∗ n ′(Σ(cid:98)ξ n ∗)−1Λ(cid:98)∗ n )>0, or, equivalentlyn−1Λ(cid:98)∗ n ′(Σ(cid:98)ξ n ∗)−1Λ(cid:98)∗ n ispositivedefinite, i.e. n∥(Λ(cid:98)∗ n ′(Σ(cid:98)ξ n ∗)−1Λ(cid:98)∗ n )−1∥=O p (1). This proves part (iii). For part (iv), we have n∥(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1−(Λ′ n (Σξ n )−1Λ n )−1∥ ≤n∥(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1∥n−1∥Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n −Λ′ n (Σξ n )−1Λ n ∥n∥(Λ′ n (Σξ n )−1Λ n )−1∥ =O (max(n−1log2/δvT,T−1/2(cid:112) logn)), p because of parts (i) and (iii) and Lemma C.3(iii). Part (v) follows directly from parts (ii) and (iv). This completes the proof. □ Lemma E.16. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞: (i) max ∥P∗ ∥=O (1); t=1,...,T t|t−1 p (ii) max ∥(P∗ )−1∥=O (1); t=1,...,T t|t−1 p (iii) max n∥P∗ ∥=O (1); t=1,...,T t|t p (iv) max n∥P∗ ∥=O (1). t=1,...,T t|T p Page 63

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Proof. For part (i), max ∥P∗ ∥≤ max ∥P ∥+ max ∥P∗ −P ∥ t|t−1 t|t−1 t|t−1 t|t−1 t=1,...,T t=1,...,T t=1,...,T =O (1)+O (max(n−1log2/δvT,T−1/2(cid:112) logn)), p p by Lemma D.7(i) and since the second term on the rhs depends only on the estimation error of A(cid:98)∗, Γ(cid:98)v∗, n−1/2Λ(cid:98)∗ n , n−1/2Λ(cid:98)∗ n ′(Σ(cid:98)ξ n ∗)−1 andn−1(Λ(cid:98)∗ n ′(Σ(cid:98)ξ n ∗)−1Λ(cid:98)∗ n )−1,whichareallboundedbyLemmasE.13(ii),E.13(iv),E.13(v),E.15(ii),and E.15(iv). This proves part (i). Part (ii) is proved in the same way as part (i) but using Lemma D.7(ii). For part (iii), from (A.4) using the same steps leading to (D.23) in the proof of Lemma D.11 but when using as parameters φ∗, we have (cid:98)n P∗ t|t =P∗ t|t−1 −P∗ t|t−1 Λ(cid:98) ∗ n ′(Λ(cid:98) ∗ n P∗ t|t−1 Λ(cid:98) ∗ n ′+Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n P∗ t|t−1 =(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1 (E.82) −P∗ t|t−1 ((Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1+P∗ t|t−1 )−1(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1(P∗ t|t−1 )−1(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1. Notice that all inverses in (E.82) are well defined because of part (ii) and Lemmas E.15(iii) and E.14(iv). Therefore, from (E.82) max n∥P∗ t|t ∥≤n∥(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1∥ t=1,...,T + max ∥P∗ t|t−1 ∥ max ∥(P∗ t|t−1 )−1∥n∥(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1∥2 t=1,...,T t=1,...,T ·∥((Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1+P∗ t|t−1 )−1∥ =O (1)+O (n−1), p p because of parts (i) and (ii), Lemma E.15(iii), and since, by Merikoski and Kumar (2004, Theorem 1) which is Weyl’s inequality, (cid:110) (cid:111)−1 ∥((Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1+P∗ t|t−1 )−1∥= ν(r)((Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1+P∗ t|t−1 ) (cid:110) (cid:111)−1 ≤ ν(r)((Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1)+ν(r)(P∗ t|t−1 ) (cid:26)(cid:104) (cid:105)−1 (cid:27)−1 = ν(1)(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n ) +ν(r)(P∗ t|t−1 ) (cid:26)(cid:104) (cid:105)−1 (cid:27)−1(cid:110) (cid:111)−1 = ν(1)(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )ν(r)(P∗ t|t−1 ) +1 ν(r)(P∗ t|t−1 ) (cid:26) (cid:104) (cid:105)−1 (cid:27)(cid:110) (cid:111)−1 = 1− ν(1)(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )ν(r)(P∗ t|t−1 ) ν(r)(P∗ t|t−1 ) +O p (n−2) =O (1), p again by parts (i) and (ii) and Lemma E.15(iii). This proves part (iii). For part (iv), from (A.7), we get ∥P∗ t|T −P∗ t|t ∥≤∥P∗ t|t ∥2∥A(cid:98) ∗∥2∥(P∗ t+1|t )−1∥2{∥P∗ t+1|T ∥+∥P∗ t+1|t ∥}. (E.83) Start with t=T −1, then from (E.83), ∥P∗ T−1|T −P∗ T−1|T−1 ∥≤∥P∗ T−1|T−1 ∥2∥A(cid:98) ∗∥2∥(P∗ T|T−1 )−1∥2{∥P∗ T|T ∥+∥P∗ T|T−1 ∥} =O (n−2). (E.84) p by parts (i), (ii), and (iii), and since ∥A(cid:98)∗∥≤∥A∥+∥A(cid:98)∗−A∥=O p (1), by Assumption 1(d) and Lemma E.13(iv). From Page 64

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm (E.84) it follows that ∥P∗ ∥≤∥P∗ ∥+∥P∗ −P∗ ∥=O (n−1)+O (n−2). (E.85) T−1|T T−1|T−1 T−1|T T−1|T−1 p p Thus, at t=T −2, from (E.83) and (E.85), ∥P∗ T−2|T −P∗ T−2|T−2 ∥≤∥P∗ T−2|T−2 ∥2∥A(cid:98) ∗∥2∥(P∗ T−1|T−2 )−1∥2{∥P∗ T−1|T ∥+∥P∗ T−1|T−2 ∥} =O (n−2). (E.86) p From (E.86) it follows that ∥P∗ ∥≤∥P∗ ∥+∥P∗ −P∗ ∥=O (n−1)+O (n−2). (E.87) T−2|T T−2|T−2 T−2|T T−2|T−2 p p Since all the bounds in (E.84)-(E.87) are the same for all t, from part (i) and (E.83) we have max n∥P∗ ∥≤ max n∥P∗ ∥+ max n∥P∗ −P∗ ∥=O (1)+O (n−1). t|T t|t t|T t|t p p t=1,...,T t=1,...,T t=1,...,T This proves part (iv) and completes the proof. □ Lemma E.17. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞: (i) for all s=0,...,T, ∥F∗ ∥=O (1), uniformly in t≤s; t|s p (ii) n∥F∗ −F∗ ∥=O (1), uniformly in t; t|T t|t p (iii) n∥F∗ t|t −F(cid:98)W t LS∗∥=O p (1), uniformly in t; where F(cid:98)W t LS∗ =(Λ(cid:98)∗ n ′(Σ(cid:98)ξ n ∗)−1Λ(cid:98)∗ n )−1Λ(cid:98)∗ n ′(Σ(cid:98)ξ n ∗)−1x nt . Proof. The proof of part (i) follows the same steps as the proof of Lemma D.14 but when using Lemmas E.13, E.14, and E.15, instead of Lemmas D.1, D.3, D.4, and D.5. For part (ii), from (A.6) and (A.1) ∥F∗ t|T −F∗ t|t ∥≤∥P∗ t|t ∥∥A(cid:98) ∗∥∥(P∗ t+1|t )−1∥{∥F∗ t+1|T ∥+∥F∗ t+1|t ∥} ≤∥P∗ t|t ∥∥A(cid:98) ∗∥∥(P∗ t+1|t )−1∥{∥F∗ t+1|T ∥+∥A(cid:98) ∗∥∥F∗ t|t ∥} =O (n−1), p by part (i) (when s = T and s = t), Lemmas E.16(ii) and E.16(iii), and since ∥A(cid:98)∗∥ ≤ ∥A∥+∥A(cid:98)∗−A∥ = O p (1), by Assumption 1(d) and Lemma E.13(iv). This proves part (ii). For part (iii), from (A.3) and (A.1), by using Lemma D.13 (see also (D.46) in the proof of Lemma D.16 for more details) F∗ t|t =F∗ t|t−1 +P∗ t|t−1 Λ(cid:98) ∗ n ′(Λ(cid:98) ∗ n P∗ t|t−1 Λ(cid:98) ∗ n ′+Σ(cid:98) ξ n ∗)−1(x nt −Λ(cid:98) ∗ n F∗ t|t−1 ) =(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1x nt (cid:110) (cid:111) + (Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n +(P∗ t|t−1 )−1)−1−(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1 Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1x nt (cid:110) (cid:111) + I r −(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n +(P(cid:98) ∗ t|t−1 )−1)−1Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n A(cid:98) ∗F∗ t−1|t−1 . (E.88) Notice that the inverses in (E.88) are all well defined by Lemmas E.14(iv), E.15(iii), and E.16(ii). Now, by Lemma C.5 ∥(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n +(P∗ t|t−1 )−1)−1Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n −I r ∥=O p (n−1). (E.89) Furthermore, by Lemmas C.6(iii) and E.15(iv) ∥(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n +(P∗ t|t−1 )−1)−1−(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1∥=O(n−2), (E.90) Page 65

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm and by Lemmas C.3(vii) and E.15(ii), √ ∥Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1∥=O p ( n). (E.91) Indeed, we can apply Lemmas C.5 and C.6(iii), since ∥(P∗ t|t−1 )−1∥ = O p (1) by Lemma E.16(ii), ∥(Σ(cid:98)ξ n ∗)−1∥ = O p (1) by Lemma E.14(iv), and, by Lemmas C.2 and E.13(ii) we have n−1∥Λ(cid:98) ∗ n ′Λ(cid:98) ∗ n −Λ′ n Λ n ∥≤2n−1∥Λ′ n (Λ(cid:98) ∗ n −Λ n )∥+n−1∥(Λ(cid:98) ∗ n −Λ n )′(Λ(cid:98) ∗ n −Λ n )∥ ≤2n−1/2∥Λ n ∥n−1/2∥Λ(cid:98) ∗ n −Λ n ∥+n−1∥Λ(cid:98) ∗ n −Λ n ∥2 =O (max(n−1log2/δvT,T−1/2)). p which, by Weyl’s inequality (Merikoski and Kumar, 2004, Theorem 1), implies n−1|ν(j)(Λ(cid:98) ∗ n ′Λ(cid:98) ∗ n )−ν(j)(Λ′ n Λ n )|≤n−1∥Λ(cid:98) ∗ n ′Λ(cid:98) ∗ n −Λ′ n Λ n ∥=O p (max(n−1log2/δvT,T−1/2)), and, therefore, for j =1,...,r,, C j ≤p-liminf n−1ν(j)(Λ(cid:98) ∗ n ′Λ(cid:98) ∗ n )≤p-limsup n−1ν(j)(Λ(cid:98) ∗ n ′Λ(cid:98) ∗ n )≤C j . n,T→∞ n,T→∞ By using (E.89), (E.90), (E.91) into (E.88): ∥F∗ t|t −(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1x nt ∥ ≤n∥(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n +(P∗ t|t−1 )−1)−1−(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1∥n−1/2∥Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1∥n−1/2∥x nt ∥ +∥I r −(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n +(P(cid:98) ∗ t|t−1 )−1)−1Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n ∥∥A(cid:98) ∗∥∥F∗ t−1|t−1 ∥ =O (n−1), (E.92) p bypart(i)(whens=t−1),LemmaC.10,andsince∥A(cid:98)∗∥≤∥A∥+∥A(cid:98)∗−A∥=O p (1),byAssumption1(d)andLemma E.13(i). This completes the proof. □ Lemma E.18. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞, for s=t and s=T: (i) ∥T−1(cid:80)T F∗ F′∥=O (1); t=1 t|s t p (i) ∥T−1(cid:80)T F∗ F′λ ∥=O (1), uniformly in i; t=1 t|s t i p (ii) ∥T−1(cid:80)T F∗ ξ ∥=O (1), uniformly in i. t=1 t|s it p Proof. First notice that, for all k=t−T,...,t−1, (cid:13) (cid:13) (cid:13) (cid:13)n−1/2T−1(cid:88) T x F′ (cid:13) (cid:13)=O (1), (E.93) (cid:13) n,t−k t(cid:13) p (cid:13) (cid:13) t=1 byLemmaC.12. Theproofofpart(i)followsbyiteratingeitherforwardorbackwardssinceboth∥T−1(cid:80)T F∗ F′∥and t=1 t|t t ∥T−1(cid:80)T F∗ F′∥ are functions of (E.93) because of Lemma E.17. t=1 t|T t Part (ii) follows from part (i) and Assumption 1(a). Part (iii) follows by substituting F with ξ in (E.93) and then t it by applying Lemma C.12(ii). This completes the proof. □ Lemma E.19. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞: (cid:13) (cid:13) min(nlog−2/δvT, √ nT,Tlog−1/2n) (cid:13) (cid:13)T−1(cid:88) T {F∗ F∗′ +P∗ }−T−1(cid:88) T F F′ (cid:13) (cid:13)=O (1). (cid:13) t|T t|T t|T t t(cid:13) p (cid:13) (cid:13) t=1 t=1 Proof. Start with (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)T−1(cid:88) T F∗ F∗′ −T−1(cid:88) T F F′ (cid:13) (cid:13)≤2 (cid:13) (cid:13)T−1(cid:88) T (F∗ −F )F′ (cid:13) (cid:13) (cid:13) t|T t|T t t(cid:13) (cid:13) t|T t t(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 t=1 (cid:13) (cid:13) + (cid:13) (cid:13)T−1(cid:88) T (F∗ −F )(F∗ −F )′ (cid:13) (cid:13), (E.94) (cid:13) t|T t t|T t (cid:13) (cid:13) (cid:13) t=1 Page 66

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm andnoticethatifthefirsttermontherhsiso (1)thenthesecondtermisdominatedbythefirstone. Soletusconsider p the first term on the rhs of (E.94): (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)T−1(cid:88) T (F∗ −F )F′ (cid:13) (cid:13)≤ (cid:13) (cid:13)T−1(cid:88) T (F∗ −F∗ )F′ (cid:13) (cid:13) (cid:13) t|T t t(cid:13) (cid:13) t|T t|t t(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) T−1(cid:88) T (F∗ t|t −F(cid:98)W t LS∗)F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) T−1(cid:88) T (F(cid:98)W t LS∗−F t )F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 =I∗+II∗+III∗, say. (E.95) Let us consider each term in (E.95). First, I∗ ≤ max ∥P∗ t|t ∥∥A(cid:98) ∗∥ max ∥(P∗ t+1|t )−1∥ t=1,...,T t=1,...,T · (cid:40)(cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T F∗ t+1|T F′ t (cid:13) (cid:13) (cid:13) (cid:13) +∥A(cid:98) ∗∥ (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T F∗ t+1|t+1 F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:41) (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 =O (n−1), (E.96) p by Lemmas E.16(ii), E.16(iii), and E.18, and since ∥A(cid:98)∗∥≤∥A∥+∥A(cid:98)∗−A∥=O p (1), by Assumption 1(d) and Lemma E.13(iv). Second, from (E.88) and (E.92) in the proof of Lemma E.17 II∗ ≤O p (n−1) (cid:40)(cid:13) (cid:13) (cid:13) (cid:13) n−1/2T−1(cid:88) T x nt F′ t (cid:13) (cid:13) (cid:13) (cid:13) +∥A(cid:98) ∗∥ (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T F∗ t−1|t−1 F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:41) (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 O p (n−1) (cid:40) n−1/2∥Λ n | (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) (cid:13) n−1/2T−1(cid:88) T ξ nt F′ t (cid:13) (cid:13) (cid:13) (cid:13) +∥A(cid:98) ∗∥ (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T F∗ t−1|t−1 F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:41) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 t=1 =O (n−1), (E.97) p because of Lemmas C.2, C.12(i), combined with Assumption 6(b), C.12(iii), and E.18 and since ∥A(cid:98)∗∥ = O p (1) by Assumption 1(d) and Lemma E.13(iv). Finally, let us consider the last term in (E.95). First, notice that we can write F(cid:98)W t LS∗−F t =(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1(Λ n −Λ(cid:98) ∗ n )F t +(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1ξ nt , Page 67

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm which implies (cid:13) (cid:13) III∗ ≤ (cid:13) (cid:13) (cid:13) T−1(cid:88) T (Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1(Λ n −Λ(cid:98) ∗ n )F t F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) T−1(cid:88) T (Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1ξ nt F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) ≤n∥(Λ′ n (Σξ n )−1Λ n )−1∥n−1∥Λ′ n (Σξ n )−1(Λ n −Λ(cid:98) ∗ n )∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 √ + n∥(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1−(Λ′ n (Σξ n )−1Λ n )−1Λ′ n (Σξ n )−1∥ (cid:13) (cid:13) ·n−1/2∥Λ n −Λ(cid:98) ∗ n ∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) +n∥(Λ′(Σξ)−1Λ )−1∥n−1 (cid:13) (cid:13)T−1(cid:88) T Λ′(Σξ)−1ξ F′ (cid:13) (cid:13) n n n (cid:13) n n nt t(cid:13) (cid:13) (cid:13) t=1 √ + n∥(Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1Λ(cid:98) ∗ n )−1Λ(cid:98) ∗ n ′(Σ(cid:98) ξ n ∗)−1−(Λ′ n (Σξ n )−1Λ n )−1Λ′ n (Σξ n )−1∥ (cid:13) (cid:13) ·n−1/2 (cid:13) (cid:13)T−1(cid:88) T ξ F′ (cid:13) (cid:13) (cid:13) nt t(cid:13) (cid:13) (cid:13) t=1 =III∗+III∗+III∗+III∗, say. (E.98) a b c d Then, (cid:13) (cid:13) III a ∗ ≤n∥(Λ′ n (Σξ n )−1Λ n )−1∥n−1∥Λ′ n (Σξ n )−1(Λ n −Λ(cid:98)O n LS)∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) +n∥(Λ′ n (Σξ n )−1Λ n )−1∥n−1/2∥Λ n ∥∥(Σξ n )−1∥n−1/2∥ΛO n LS−Λ(cid:98) ∗ n ∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 =O (n−1/2T−1/2)+O (max(n−1log2/δvT,n−1/2T−1/2,T−1)), (E.99) p p where we used: for the first term on the rhs Lemmas C.3(iii), C.12(i), combined with Assumption 6(b), and C.8(iv) since n−1∥Λ′ n (Σξ n )−1(Λ n −Λ(cid:98)O n LS)∥ = n−1T−1∥ (cid:80)T t=1 Λ′ n (Σξ n )−1ξ nt F′ t ∥, while for the second term on the rhs we used Assumption 2(a), and Lemmas C.2, C.3(iii), C.12(i), combined with Assumption 6(b), and E.11(ii). Moreover, III∗ =O (max(n−2log4/δvT,n−1T−1/2log2/δvT (cid:112) logn,T−1(cid:112) logn)), (E.100) b p by Lemmas C.12(i), combined with Assumption 6(b), E.13(ii), and E.15(v), III∗ =O (n−1/2T−1/2), (E.101) c p by Lemmas C.3(iii) and C.8(iv), and III∗ =O (max(n−1T−1/2log2/δvT,T−1(cid:112) logn)), (E.102) d p by Lemmas C.12(iii) and E.15(v). By substituting (E.99), (E.100), (E.101), and (E.102) into (E.98) we have III∗ =O (max(n−1log2/δvT,n−1/2T−1/2,T−1(cid:112) logn)). (E.103) p Combining (E.96), (E.97), and (E.103) we have (cid:13) (cid:13) (cid:13) (cid:13)T−1(cid:88) T (F∗ −F )F′ (cid:13) (cid:13)=O (max(n−1log2/δvT,n−1/2T−1/2,T−1(cid:112) logn)), (E.104) (cid:13) t|T t t(cid:13) p (cid:13) (cid:13) t=1 Page 68

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm which once substituted into (E.94), jointly with Lemma E.16(iv) give (cid:13) (cid:13) (cid:13) (cid:13)T−1(cid:88) T {F∗ F∗′ +P∗ }−T−1(cid:88) T F F′ (cid:13) (cid:13) (cid:13) t|T t|T t|T t t(cid:13) (cid:13) (cid:13) t=1 t=1 (cid:13) (cid:13) ≤ (cid:13) (cid:13)T−1(cid:88) T F∗ F∗′ −T−1(cid:88) T F F′ (cid:13) (cid:13)+ max ∥P∗ ∥ (cid:13) t|T t|T t t(cid:13) t|T (cid:13) (cid:13) t=1,...,T t=1 t=1 =O (max(n−1log2/δvT,n−1/2T−1/2,T−1(cid:112) logn))+O (n−1), p p which completes the proof. □ Lemma E.20. UnderAssumptions1, 2, and6, theEMestimatorsoftheparametersφ ≡φ(k+1) definedinSection3.2, (cid:98)n (cid:98)n exist and are unique, for any k≥0. Proof. LetO i =O λi ×O σ i 2 ⊂Rr+1,andO r =O A ×O Γv ⊂Rr2+r(r+1)/2 withO λi ,O σ i 2 ,O A ,andO Γv definedinSection 4.3.4. At any iteration k≥0 the M-step requires solving the n+1 finite dimensional maximizations: (λ(cid:98) ( i k+1),σ (cid:98)i 2(k+1))=a ( r λ g i m ,σ2 i a ) x ∈Oi E φ(cid:98) ( n k) [ℓ i (x i1 ...x iT |F T ;λ i ,σ2 i )], i=1,...,n, (E.105) and (A(cid:98) (k+1),Γ(cid:98) v(k+1))=a ( r A g , m Γv a ) x ∈Or E φ(cid:98) ( n k) [ℓ(F T ;A,Γv)], (E.106) where ℓ (x ...x |F ;λ ,σ2)=− T log(σ2)− 1(cid:88) T (x it −λ′ i F t )2 , (E.107) i i1 iT T i i 2 i 2 σ2 t=1 i and T ℓ(F ;A,Γv)=− T logdet(Γv)− 1(cid:88) (F −AF )′(Γv)−1(F −AF ). (E.108) T 2 2 t t−1 t t−1 t=1 Now, the log-likelihoods (E.107) and (E.108) to be maximized are continuous and differentiable in O and O , which are i r a compact sets by Assumptions 1(a), 1(d), 1(e), and 2(a). Moreover, the log-likelihoods are concave in their arguments, and (E.105) and (E.106) have a closed form expressions given in (13)-(14) and (15)-(16), respectively. Last, notice that the true values of the parameters are fully identified by Assumption 6. Therefore,by,e.g.,GourierouxandMonfort(1995,Property7.11p.182),λ(cid:98) ( i k+1) andσ (cid:98)i 2(k+1),i=1,...,n,andA(cid:98)(k+1) and Γ(cid:98)v(k+1) exist and are unique for any k ≥ 0. In particular, this result holds for k = k∗, i.e., for the EM estimators, and for any n∈N since (E.105) can be solved separately for any i. This completes the proof. □ Lemma E.21. Under Assumptions 1, 2, 4, and 6, ℓ(X ;φ ) has a a local maximum denoted as nT n φ (cid:98) ∗ n ∗ =(λ(cid:98)∗ 1 ∗′···λ(cid:98)∗ n ∗′ σ 1 2∗∗···σ n 2∗∗ vec(A(cid:98))∗∗′,vech(Γ(cid:98))v∗∗′), such that (i) for all i=1,...,n, lim k→∞ ∥λ(cid:98) ( i k)−λ(cid:98)∗ i ∗∥=0; (ii) for all i=1,...,n, lim |σ2(k)−σ2∗∗|=0; k→∞ (cid:98)i (cid:98)i (iii) lim k→∞ ∥A(cid:98)(k)−A(cid:98)∗∗∥=0; (iv) lim k→∞ ∥Γ(cid:98)v(k)−Γ(cid:98)v∗∗∥=0; where all convergences are monotonic. Proof. Firstnoticethatanymaximumofℓ(X ;φ )isalsoamaximumofℓ˜(X ;φ )≡(nT)−1ℓ(X ;φ ). Now,the nT nT nT n n n starting point of the EM is such that ℓ˜(X ;φ(0)) > −∞ and since ℓ˜(X ;φ ) is continuous and differentiable in the nT (cid:98)n nT n interior of O , then {ℓ˜(X ;φ(k))} is bounded from above for any n,T ∈N. n nT (cid:98)n k≥0 Consider the definitions: ℓ˜(X ;φ )=E [ℓ˜(X |F ;φ )+ℓ˜(F ;φ )|X ]−E [ℓ˜(F |X ;φ )|X ] nT n φ(cid:98) ( n k) nT T n T n nT φ(cid:98) ( n k) T nT n nT =Q˜(φ ,φ(k))−H˜(φ ,φ(k)). (E.109) n (cid:98)n n (cid:98)n Page 69

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Then, for any φ and any n and T, n H˜(φ ;φ(k))−H˜(φ(k);φ(k))=E [ℓ˜(F |X ;φ )−ℓ˜(F |X ;φ(k))|X ] n (cid:98)n (cid:98)n (cid:98)n φ(cid:98) ( n k) T nT n T nT (cid:98)n nT (cid:34) (cid:12) (cid:35) = 1 E log f(F T |X nT ;φ n ) (cid:12) (cid:12)X nT φ(cid:98) ( n k) f(F T |X nT ;φ (cid:98) ( n k)) (cid:12) (cid:12) nT (cid:34) (cid:12) (cid:35) ≤ 1 logE f(F T |X nT ;φ n ) (cid:12) (cid:12)X nT φ(cid:98) ( n k) f(F T |X nT ;φ (cid:98) ( n k)) (cid:12) (cid:12) nT = 1 log (cid:90) f(F T |X nT ;φ n ) f(F |X ;φ(k))dF nT RrT f(F T |X nT ;φ (cid:98) ( n k)) T nT (cid:98)n T 1 (cid:90) 1 = log f(F |X ;φ )dF = log(1)=0, nT RrT T nT n T nT by Jensen’s inequality. Hence, we have (see also Dempster et al., 1977, Lemma 1) H˜(φ(k+1);φ(k))≤H˜(φ(k);φ(k)). (E.110) (cid:98)n (cid:98)n (cid:98)n (cid:98)n Therefore, from (E.109) and (E.110), for any k, ℓ˜(X ;φ(k+1))−ℓ˜(X ;φ(k))≥Q˜(φ(k+1);φ(k))−Q˜(φ(k);φ(k))≥0, (E.111) nT (cid:98)n nT (cid:98)n (cid:98)n (cid:98)n (cid:98)n (cid:98)n where the last inequality holds by definition of the M-step. This shows that the log-likelihood ℓ(X ;φ(k)) increases nT (cid:98)n monotonically as k increases. GiventhatQ˜(φ ;φ(k))hasauniquemaximumbyLemmaE.20,andsinceforanyφ ∈O andφ ∈O ,thefunction n (cid:98)n n n n n Q˜(φ ;φ ) is continuous in φ and φ and the components of the gradient vector ∇ Q˜(φ ;φ ) are continuous in φ , n n n n φ n n n n from Wu (1983, Theorem 3) lim ℓ˜(X ;φ(k))=ℓ˜(X ;φ∗∗), (E.112) nT (cid:98)n nT (cid:98)n k→∞ where the convergence is monotonic and φ∗∗ is a local maximum of ℓ˜(X ;φ ). (cid:98)n nT n Now notice that the solution of the M-step is such that λ(cid:98) (k+1) and A(cid:98)(k+1) do not depend on other parameters at the i same iteration, while σ (cid:98)i 2(k+1) depends only on λ(cid:98) ( i k+1), and Γ(cid:98)v(k+1) depends only on A(cid:98)(k+1). Since we are considering a Gaussian log-likelihood, from Wu (1983, Condition 1) holds, we have that, for any i = 1,...,n, Q˜(λ(cid:98) ( i k+1);φ (cid:98) ( n k))−Q˜(λ(cid:98) ( i k);φ (cid:98)n (k))≥M Q ∥λ(cid:98) ( i k+1)−λ(cid:98) ( i k)∥2, (E.113) Q˜(λ(cid:98) ( i k+1),σ (cid:98)i 2(k+1);φ (cid:98)n (k))−Q˜(λ(cid:98) ( i k),σ (cid:98)i 2(k);φ (cid:98) ( n k))≥M Q′∥(λ(cid:98) i (k+1)′ σ (cid:98)i 2(k+1))−(λ(cid:98) i (k)′ σ (cid:98)i 2(k))∥2, for some finite positive reals M Q and M Q′ , independent of i, and where we use the short-hand notations: Q˜(λ(cid:98) i (k);φ (cid:98) ( n k))=Q˜(λ 1 ,...,λ(cid:98) ( i k),...,λ n ,σ2 1 ,...,σ2 n ,A,Γv;φ (cid:98) ( n k)), Q˜(λ(cid:98) ( i k),σ (cid:98)i 2(k);φ (cid:98)n (k))=Q˜(λ 1 ,...,λ(cid:98) ( i k),...,λ n ,σ2 1 ,...,σ (cid:98)i 2(k),...,σ2 n ,A,Γv;φ (cid:98) ( n k)). And, similarly, Q˜(A(cid:98) (k+1);φ (cid:98) ( n k))−Q˜(A(cid:98) (k);φ (cid:98)n (k))≥M A′∥A(cid:98) (k+1)−A(cid:98) (k)∥2, (E.114) Q˜(A(cid:98) (k+1),Γ(cid:98) v(k+1);φ (cid:98) ( n k))−Q˜(A(cid:98) (k),Γ(cid:98) v(k);φ (cid:98) ( n k))≥M v′∥vec(A(cid:98) (k+1) Γ(cid:98) v(k+1))−vec(A(cid:98) (k);Γ(cid:98) v(k))∥2, for some finite positive reals M A′ and M v′ . Therefore, from (E.111), (E.112), (E.113), and (E.114), we have lim ∥(λ(cid:98) i (k+1)′ σ (cid:98)i 2(k+1))−(λ(cid:98) i (k)′ σ (cid:98)i 2(k))∥=0, k→∞ lim ∥vec(A(cid:98) (k+1) Γ(cid:98) v(k+1))−vec(A(cid:98) (k);Γ(cid:98) v(k))∥=0, k→∞ Page 70

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm which are sufficient conditions for applying the result in Wu (1983, Theorem 6), i.e., such that lim ∥(λ(cid:98) i (k)′ σ (cid:98)i 2(k))−(λ(cid:98) ∗ i ∗′ σ (cid:98)i 2∗∗)∥=0, k→∞ lim ∥vec(A(cid:98) (k) Γ(cid:98) v(k))−vec(A(cid:98) ∗∗ Γ(cid:98) v∗∗)∥=0. k→∞ This completes the proof. □ Lemma E.22. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞: √ (i) min(nlog−2/δv, √ nTlog−1/2n,T) max i=1,...,n ∥λ(cid:98)∗ i ∗−λ(cid:98)∗ i ∥=O p (1); (ii) min(nlog−2/δv, nTlog−1/2n,T) max |σ2∗∗−σ2∗|=O (1); i=1,...,n (cid:98)i (cid:98)i p (iii) nlog−2/δv ∥A(cid:98)∗∗−A(cid:98)∗∥=O p (1); (iv) nlog−2/δv ∥Γ(cid:98)v∗∗−Γ(cid:98)v∗∥=O p (1). Proof. First,noticethatrunningtheEMalgorithmusingℓ(X ;φ )isequivalenttorunningtheEMusingℓ(X ,F ;φ ), nT nT T n n and such EM will converge to a local maximum of ℓ(X ,F ;φ ) because Lemma E.21 would hold also in this case. nT T n Moreover,sinceℓ(X ,F ;φ )hasclearlyauniquemaximum, thenφ∗∗ isalocalmaximumofℓ(X ;φ )butalsothe nT T n (cid:98)n nT n unique global maximum of ℓ(X ,F ;φ ). nT T n Consider the global QML estimator φ (cid:98) ∗ n = (ϕ(cid:98)∗ n ′ θ(cid:98)∗′)′, where we defined ϕ(cid:98)∗ n = (vec(Λ(cid:98)∗ n )′ σ (cid:98)1 2∗···σ (cid:98)n 2∗)′ and θ(cid:98)∗ = (vec(A(cid:98)∗)′ vech(Γ(cid:98)v∗)′) maximizing ℓ(X nT ;φ n ). The elements of φ (cid:98) ∗ n satisfy Lemma E.13. Now,bydefinitionthecomponentsofthegradientofℓ(X ;φ )computedinφ∗ aresuchthat(noticethatℓ(F ;φ ) nT n (cid:98)n T n does not depend on λ ) i 0 r =∇ λi ℓ(X nT ;φ n )| φ n =φ(cid:98) ∗ n =∇ λi ℓ(X nT ,F T ;φ n )| φ n =φ(cid:98) ∗ n −∇ λi ℓ(F T |X nT ;φ n )| φ n =φ(cid:98) ∗ n T = (cid:88) F t (x it −F′ t λ(cid:98) ∗ i )−∇ λi ℓ(F T |X nT ;φ n )| φ n =φ(cid:98) ∗ n t=1 T = (cid:88) F t (F′ t λ i +ξ it −F′ t λ(cid:98) ∗ i )−∇ λi ℓ(F T |X nT ;φ n )| φ n =φ(cid:98) ∗ n , (E.115) t=1 and (E.115) is equivalent to (cid:32) T (cid:33)−1(cid:32) T (cid:33) (λ(cid:98) ∗ i −λ i )= T−1(cid:88) F t F′ t T−1(cid:88) F t ξ it t=1 t=1 (cid:32) T (cid:33)−1 + T−1(cid:88) F t F′ t T−1∇ λi ℓ(F T |X nT ;φ n )| φ n =φ(cid:98) ∗ n t=1 (cid:32) T (cid:33)−1 =(λ i −λO i LS)+ T−1(cid:88) F t F′ t T−1∇ λi ℓ(F T |X nT ;φ n )| φ n =φ(cid:98) ∗ n . (E.116) t=1 By comparing (E.116) with Lemma E.14(i) (see in particular (E.69) and (E.70) in its proof), and by Lemma C.13, we have i= m 1, a .. x .,n (cid:13) (cid:13) (cid:13) T−1∇ λi ℓ(F T |X nT ;φ n )| φ n =φ(cid:98) ∗ n (cid:13) (cid:13) (cid:13) =O p (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)), which, from (E.115), implies (cid:13) (cid:13) (cid:13) (cid:13) i= m 1, a .. x .,n (cid:13) (cid:13) T−1∇ λi ℓ(X nT ,F T ;φ n )| φ n =φ(cid:98) ∗ n (cid:13) (cid:13) = i= m 1, a .. x .,n (cid:13) (cid:13) T−1∇ λi ℓ(F T |X nT ;φ n )| φ n =φ(cid:98) ∗ n (cid:13) (cid:13) (E.117) =O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)). p Then, since ∇ ℓ(X ,F ;φ ) is linear in λ , there exists a positive real M′ such that λi nT T n i S ∥T−1∇ λi ℓ(X nT ,F T ;φ n )| φ n =φ(cid:98) ∗ n ∥≥M S ′∥λ(cid:98) ∗ i −λ(cid:98) ∗ i ∗∥, (E.118) Page 71

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm since, by definition, T−1∇ λi ℓ(X nT ,F T ;φ n )| φ n =φ(cid:98) ∗ n ∗ =0 r . Therefore, from (E.117) and (E.118) max ∥λ(cid:98) ∗ i −λ(cid:98) ∗ i ∗∥=O p (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)), i=1,...,n which proves part (i). Similarly, using Lemma (E.14)(ii) (see in particular (E.72), (E.73), and (E.77) in its proof), we can show that i= m 1, a .. x .,n |T−1∇ σ2 i ℓ(X nT ,F T ;φ n )| φ n =φ(cid:98) ∗ n |=O p (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)), (E.119) and i= m 1, a .. x .,n |T−1∇ σ2 i ℓ(X nT ,F T ;φ n )| φ n =φ(cid:98) ∗ n |≥M G ′ |σ (cid:98)i 2∗∗−σ (cid:98)i 2∗|. (E.120) By using (E.119) into (E.120), we prove part (ii). Parts (iii) and (iv) follow in a similar way from Lemma E.13(iv) and E.13(v), respectively. This completes the proof. □ Lemma E.23. Consider the EM estimator λ(cid:98)i ≡ λ(cid:98) ( i k+1), for any k ≥ 0. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞, min(n2log−4/δvT,nTlog−1/δvTlog−1/2n,T3/2log−1/2n)∥λ(cid:98)i −λ(cid:98) ∗ i ∗∥=O p (1), uniformly in i. Proof. First, notice that for any φ and any n and T n ∇ f(X ;φ ) (cid:90) ∇ f(X ,F ;φ ) (nT)−1∇ ℓ(X ;φ )=(nT)−1 φ n nT n =(nT)−1 φ n nT T n dF φ n nT n f(X nT ;φ n ) RrT f(X nT ;φ n ) T (cid:90) ∇ f(X ,F ;φ )f(X ,F ;φ ) =(nT)−1 φ n nT T n nT T n dF f(X ,F ;φ ) f(X ;φ ) T RrT nT T n nT n (cid:90) f(X ,F ;φ ) =(nT)−1 ∇ ℓ(X ,F ;φ ) nT T n dF RrT φ n nT T n f(X nT ;φ n ) T (cid:90) =(nT)−1 ∇ ℓ(X ,F ;φ )f(F |X ;φ )dF φ nT T T nT T RrT n n n =(nT)−1E [∇ ℓ(X ,F ;φ )|X ] φ φ nT T nT n n n =(nT)−1∇ φ n Q(φ′ n ;φ n )| φ′ n =φ n . (E.121) Second, for any k≥0, by definition of the M-step, by (E.121) and by a Taylor expansion about λ(cid:98) (k), i 0 =∇ Q(φ ;φ(k))| (E.122) r λi n (cid:98)n φ n =φ(cid:98) ( n k+1) =∇ λi Q(φ n ;φ (cid:98) ( n k))| φ n =φ(cid:98) ( n k) +∇ λiλ′ i Q(φ n ;φ (cid:98) ( n k))| φ n =φ(cid:98) ( n k) (λ(cid:98) ( i k+1)−λ(cid:98) ( i k))+O(∥λ(cid:98) ( i k+1)−λ(cid:98) ( i k)∥2) =∇ λi ℓ(X nT ;φ n )| φ n =φ(cid:98) ( n k) +∇ λiλ′ i Q(φ n ;φ (cid:98) ( n k))| φ n =φ(cid:98) ( n k) (λ(cid:98) ( i k+1)−λ(cid:98) ( i k))+O(∥λ(cid:98) ( i k+1)−λ(cid:98) ( i k)∥2) =∇ λi ℓ(X nT ;φ n )| φ n =φ(cid:98) ( n k) +∇ λiλ′ i Q(φ n ;φ (cid:98) ( n k))| φ n =φ(cid:98) ( n k) (λ(cid:98) ( i k+1)−λ(cid:98) ( i k)), since the third derivative of Q(φ ;φ(k)) with respect to λ is zero since the second derivative does not depend on λ n (cid:98)n i i because Q(φ ;φ(k)) is quadratic in λ . And from (E.122) it follows that n (cid:98)n i T−1∇ λi ℓ(X nT ;φ n )| φ n =φ(cid:98) ( n k) = −T−1∇ λiλ′ i Q(φ n ;φ (cid:98) ( n k))| φ n =φ(cid:98) ( n k) (λ(cid:98) ( i k+1)−λ(cid:98) ( i k)). (E.123) Page 72

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Third, by a Taylor expansion about λ(cid:98) i (k), by definition of λ(cid:98)∗ i ∗ which is a local maximum of ℓ(X nT ;φ n ), we have 0 r =T−1∇ λi ℓ(X nT ;φ n )| φ n =φ(cid:98) ∗ n ∗ =T−1∇ λi ℓ(X nT ;φ n )| φ n =φ(cid:98) ( n k) +T−1∇ λiλ′ i ℓ(X nT ;φ n )| φ n =φ(cid:98) ( n k) (λ(cid:98) ∗ i ∗−λ(cid:98) ( i k)) (E.124) 1 (cid:110) (cid:111) + 2 ((λ(cid:98) ∗ i ∗−λ(cid:98) ( i k))′⊗I r ) T−1∇ λi vec(∇ λiλ′ i ℓ(X nT ;φ n ))| φ n =φˇn (λ(cid:98) ∗ i ∗−λ(cid:98) ( i k)) =T−1∇ λi ℓ(X nT ;φ n )| φ n =φ(cid:98) ( n k) +T−1∇ λiλ′ i ℓ(X nT ;φ n )| φ n =φ(cid:98) ( n k) (λ(cid:98) ∗ i ∗−λ(cid:98) ( i k)) (cid:16) (cid:17) +O p ∥λ(cid:98) ∗ i ∗−λ(cid:98) ( i k)∥2n−1log2/δvT , where φˇ is such that n−1/2∥φ∗∗−φˇ ∥ ≤ n−1/2∥φ∗∗−φ(k+1)∥ and n−1/2∥φ(k+1)−φˇ ∥ ≤ n−1/2∥φ∗∗−φ(k+1)∥. In n (cid:98)n n (cid:98)n (cid:98)n (cid:98)n n (cid:98)n (cid:98)n particular, the last term in (E.124) follows from Lemma E.9 and Barigozzi (2023, eq. (A.58) in the proof of Theorem 5), which imply φ n su ∈ p On |∇ λi vec(∇ λiλ′ i ℓ(X nT ;φ n ))| ≤ φ n su ∈ p On |∇ λi vec(∇ λiλ′ i ℓ(X nT ;φ n ))−∇ λi vec(∇ λiλ′ i ℓ 0 (X nT ;ϕ n ))| + φ n su ∈ p On |∇ λi vec(∇ λiλ′ i ℓ 0 (X nT ;ϕ n ))| =O (n−1Tlog2/δvT)+O(n−1T)=O (n−1Tlog2/δvT). (E.125) p p Define the following r×r matrices: I c (λ i )=−∇ λiλ′ i Q(φ n ;φ n )=−E φ n [∇ λiλ′ i ℓ(X nT ,F T ;φ n )|X nT ] I(λ i )=−∇ λiλ′ i ℓ(X nT ;φ n ). By substituting (E.123) into (E.124) and rearranging (cid:26) (cid:27)−1(cid:26) (cid:27) (λ(cid:98) ∗ i ∗−λ(cid:98) i (k))= − ∇ λiλ′ i ℓ(X nT ;φ n )| φ n =φ(cid:98) ( n k) ∇ λi ℓ(X nT ;φ n )| φ n =φ(cid:98) ( n k) (cid:16) (cid:17) +O p ∥λ(cid:98) ∗ i ∗−λ(cid:98) ( i k)∥2n−1log2/δvT (cid:26) (cid:27)−1(cid:26) (cid:27) = ∇ λiλ′ i ℓ(X nT ;φ n )| φ n =φ(cid:98) ( n k) ∇ λiλ′ i Q(φ n ;φ (cid:98) ( n k))| φ n =φ(cid:98) ( n k) (λ(cid:98) ( i k+1)−λ(cid:98) ( i k)) (cid:16) (cid:17) +O p ∥λ(cid:98) ∗ i ∗−λ(cid:98) ( i k)∥2n−1log2/δvT (cid:110) (cid:111)−1(cid:110) (cid:111) = I(λ(cid:98) ( i k)) I c (λ(cid:98) i (k)) (λ(cid:98) ( i k+1)−λ(cid:98) ∗ i ∗+λ(cid:98) ∗ i ∗−λ(cid:98) ( i k)) (cid:16) (cid:17) +O p ∥λ(cid:98) ∗ i ∗−λ(cid:98) ( i k)∥2n−1log2/δvT . (E.126) Moreover, by a Taylor approximation about λ(cid:98)∗ i ∗, T−1I c (λ(cid:98) i (k))=T−1I c (λ(cid:98) ∗ i ∗), (cid:16) (cid:17) T−1I(λ(cid:98) ( i k))=T−1I(λ(cid:98) ∗ i ∗)+O p ∥λ(cid:98) ∗ i ∗−λ(cid:98) ( i k)∥n−1log2/δvT , (E.127) where the first relation follows from the fact that, as noticed above, I (λ ) does not depend on λ , while the second c i i follows again from (E.125). Page 73

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Therefore, from (E.126) and (E.127) (cid:18) (cid:110) (cid:111)−1(cid:110) (cid:111)(cid:19) (λ(cid:98) ( i k+1)−λ(cid:98) ∗ i ∗)= I r − I c (λ(cid:98) ( i k)) I(λ(cid:98) ( i k)) (λ(cid:98) ( i k)−λ(cid:98) ∗ i ∗) (cid:16) (cid:17) +O p ∥λ(cid:98) ∗ i ∗−λ(cid:98) ( i k)∥2n−1log2/δvT (cid:18) (cid:110) (cid:111)−1(cid:110) (cid:111)(cid:19) = I r − I c (λ(cid:98) ∗ i ∗) I(λ(cid:98) ∗ i ∗) (λ(cid:98) ( i k)−λ(cid:98) ∗ i ∗) (cid:16) (cid:17) +O p ∥λ(cid:98) ∗ i ∗−λ(cid:98) ( i k)∥n−1log2/δvT , (E.128) seealsoSundberg(1974,1976),Dempsteretal.(1977),MengandRubin(1994),McLachlanandKrishnan(2007,Chapter 3.9, pp. 99-103), and Sundberg (2019, Chapter 8). Let R(λ(cid:98)∗ i ∗)=I r −{I c (λ(cid:98)∗ i ∗)}−1{I(λ(cid:98)∗ i ∗)}, then, by setting k=k∗ in (E.128), since λ(cid:98)i ≡λ(cid:98) ( i k∗+1), we have (λ(cid:98)i −λ(cid:98) ∗ i ∗)=R(λ(cid:98) ∗ i ∗)(λ(cid:98) ( i k∗)−λ(cid:98) ∗ i ∗)+O p (cid:16) ∥λ(cid:98) ∗ i ∗−λ(cid:98) ( i k)∥n−1log2/δvT (cid:17) . (E.129) Hence, from (E.129), by iterating backwards we get ∥λ(cid:98)i −λ(cid:98) ∗ i ∗∥≤∥λ(cid:98) i (k∗)−λ(cid:98) ∗ i ∗∥ ∥R(λ(cid:98) ∗ i ∗)∥+O p (cid:16) ∥λ(cid:98) ( i k∗)−λ(cid:98) ∗ i ∗∥n−1log2/δvT (cid:17) ≤ (cid:110) ∥λ(cid:98) i (k∗−1)−λ(cid:98) ∗ i ∗∥ ∥R(λ(cid:98) ∗ i ∗)∥+O p (cid:16) ∥λ(cid:98) ( i k∗)−λ(cid:98) ∗ i ∗∥n−1log2/δvT (cid:17)(cid:111) ∥R(λ(cid:98) ∗ i ∗)∥ +O p (cid:16) ∥λ(cid:98) ( i k∗)−λ(cid:98) ∗ i ∗∥n−1log2/δvT (cid:17) (E.130) ≤∥λ(cid:98) ( i 0)−λ(cid:98) ∗ i ∗∥ ∥R(λ(cid:98) ∗ i ∗)∥k∗+1+ (cid:40) (cid:88) k∗ ∥R(λ(cid:98) ∗ i ∗)∥j (cid:41) O p (cid:16) ∥λ(cid:98) ( i k∗)−λ(cid:98) ∗ i ∗∥n−1log2/δvT (cid:17) j=0 ≤∥λ(cid:98) ( i 0)−λ(cid:98) ∗ i ∗∥ ∥R(λ(cid:98) ∗ i ∗)∥k∗+1+ (cid:40) (cid:88) k∗ ∥R(λ(cid:98) ∗ i ∗)∥j (cid:41) O p (cid:16) ∥λ(cid:98) ( i 0)−λ(cid:98) ∗ i ∗∥n−1log2/δvT (cid:17) , j=0 whereinthesecondandthirdstepweusedLemmaE.21,accordingtowhich∥λ(cid:98) ( i k+1)−λ(cid:98)∗ i ∗∥≤∥λ(cid:98) ( i k)−λ(cid:98)∗ i ∗∥foranyk≥0. Let us consider separately the terms on the rhs of (E.130). Because of Lemma E.22(i), max ∥R(λ(cid:98) ∗ i ∗)−R(λ(cid:98) ∗ i )∥=O p (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)), (E.131) i=1,...,n since R(λ i ) is continuous and differentiable in λ i . Consider the two matrices in R(λ(cid:98)∗ i ). First, from (9) we can easily see that T−1I c (λ(cid:98) ∗ i )= −T−1E φ(cid:98) ∗ n [∇ λiλ′ i ℓ(X nT ,F T ;φ n )| φ n =φ(cid:98) ∗ n |X nT ] = −T−1E φ(cid:98) ∗ n [∇ λiλ′ i ℓ(X nT |F T ;φ n )| φ n =φ(cid:98) ∗ n |X nT ] T =T−1(σ (cid:98)i 2∗)−1(cid:88) E φ(cid:98) ∗ n [F t F′ t |X nT ] t=1 T =T−1(σ (cid:98)i 2∗)−1(cid:88) E φ(cid:98) ∗ n [F t |X nT ]E φ(cid:98) ∗ n [F′ t |X nT ] t=1 T +T−1(σ (cid:98)i 2∗)−1(cid:88) E φ(cid:98) ∗ n [(F t −E φ(cid:98) ∗ n [F t |X nT ])(F t −E φ(cid:98) ∗ n [F t |X nT ])′|X nT ] t=1 T =T−1(σ2∗)−1(cid:88)(cid:8) F∗ F∗′ +P∗ (cid:9) , (E.132) (cid:98)i t|T t|T t|T t=1 where in the last step we used Assumption 4 which implies F∗ t|T = E φ(cid:98) ∗ n [F t |X nT ] for all t = 1,...,T. Also note that for (E.110) and (E.111) in the proof of Lemma E.21 to hold expectations have to be taken with respect to the same distribution as the one used to compute the log-likelihood. Hence, given that we maximize a mis-specified log-likelihood withdiagonalidiosyncraticcovariance,inthelaststepof (E.132)weuseF∗ andP∗ ,i.e.,whicharetheoutputsofthe t|T t|T Kalman smoother implemented using Σ(cid:98)ξ n ∗. Page 74

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Moreover, by Lemma E.19 (cid:13) (cid:13) (cid:13) (cid:13)T−1(cid:88) T (cid:8) F∗ F∗′ +P∗ −F F′(cid:9) (cid:13) (cid:13)=O (max(n−1log2/δvT,n−1/2T−1/2,T−1(cid:112) logn)), (cid:13) t|T t|T t|T t t (cid:13) p (cid:13) (cid:13) t=1 which, once substituted into (E.132), gives (cid:13) (cid:13) max T−1 (cid:13) (cid:13) (cid:13) I c (λ(cid:98) ∗ i )−(σ (cid:98)i 2∗)−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) i=1,...,n (cid:13) (cid:13) t=1 =O (max(n−1log2/δvT,n−1/2T−1/2,T−1(cid:112) logn)). (E.133) p This also shows that T−1I c (λ(cid:98)∗ i ) is finite and positive definite, as n,T → ∞, since T−1(cid:80)T t=1 F t F′ t is finite and positive definite with probability tending to one as n,T → ∞, because of Lemma C.12(i) combined with Assumption 6(b), and since σ2∗ is finite a positive for all i=1,...,n because of Lemma E.14(ii) and Assumption 2(a). (cid:98)i Second, from Lemma E.9 it follows that (cid:13) (cid:13) i= m 1, a .. x .,n T−1(cid:13) (cid:13) ∇ λiλ′ i ℓ(X nT ;φ n )| φ n =φ(cid:98) ∗ n − ∇ λiλ′ i ℓ 0 (X nT ;ϕ n )| ϕ n =ϕ(cid:98)∗ n (cid:13) (cid:13) (cid:13) (cid:13) = i= m 1, a .. x .,n T−1(cid:13) (cid:13) I(λ(cid:98) ∗ i )−∇ λiλ′ i ℓ 0 (X nT ;ϕ n )| ϕ n =ϕ(cid:98)∗ n (cid:13) (cid:13) =O (n−1log2/δvT). (E.134) p Furthermore, by a Taylor expansion about λ(cid:98) † i (cid:13) i= m 1, a .. x .,n T−1∥∇ λiλ′ i ℓ 0 (X nT ;ϕ n )| ϕ n =ϕ(cid:98)∗ n − ∇ λiλ′ i ℓ 0 (X nT ;ϕ n )| ϕ n =ϕ(cid:98) † n (cid:13) (cid:13) (cid:13) (cid:16) (cid:17) (cid:13) ≤ i= m 1, a .. x .,n T−1(cid:13) (cid:13) ∇ λi vec ∇ λiλ′ i ℓ 0 (X nT ;ϕ n ) | ϕ n =ϕ(cid:98) † n (cid:13) (cid:13) ∥λ(cid:98) † i −λ(cid:98) ∗ i ∥+O p ( i= m 1, a .. x .,n ∥λ(cid:98) † i −λ(cid:98) ∗ i ∥2) =O (n−2log4/δvT), (E.135) p by Lemma E.10(i) and since ∥∇ λi vec(∇ λiλ′ i ℓ 0 (X nT ;ϕ n ))∥=O p (n−1Tlog2/δvT) for all ϕ n , by (E.125). Third, from Barigozzi (2023, Theorem 5): (cid:13) (cid:13) i= m 1, a .. x .,n T−1(cid:13) (cid:13) ∇ λiλ′ i ℓ 0 (X nT ;ϕ n )| ϕ n =ϕ(cid:98) † n −∇ λiλ′ i ℓ 0 (X nT |F T ;ϕ n )| ϕ n =ϕ(cid:98) † n (cid:13) (cid:13) = i= m 1, a .. x .,n T−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ∇ λiλ′ i ℓ 0 (X nT ;ϕ n )| ϕ n =ϕ(cid:98) † n − (cid:32) −(σ (cid:98)i 2†)−1(cid:88) T F t F′ t (cid:33)(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 =O (max(n−1,n−1/2T−1/2)). (E.136) p Last, from Lemma E.10(iii): max |σ2†−σ2∗|=O (n−1log2/δvT). (E.137) (cid:98)i (cid:98)i p i=1,...,n By combining (E.134), (E.135), (E.136), and (E.137), we have (cid:13) (cid:13) max T−1 (cid:13) (cid:13) (cid:13) I(λ(cid:98) ∗ i )−(σ (cid:98)i 2∗)−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) =O p (max(n−1log2/δvT,n−1/2T−1/2). (E.138) i=1,...,n (cid:13) (cid:13) t=1 This is also shows that T−1I(λ(cid:98)∗ i ) is finite and positive definite, as n,T →∞, since T−1(cid:80)T t=1 F t F′ t is finite and positive definite with probability tending to one as n,T → ∞, because of Lemma C.12(i) combined with Assumption 6(b), and since σ2∗ is finite a positive for all i=1,...,n because of Lemma E.14(ii) and Assumption 2(a). (cid:98)i Therefore, by using (E.131), (E.133), and (E.138), and since, as remarked above T−1I c (λ(cid:98)∗ i ) is finite and positive Page 75

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm definite, we have max ∥R(λ(cid:98) ∗ i ∗)∥=∥I r −{I c (λ(cid:98) ∗ i )}−1{I(λ(cid:98) ∗ i )}∥ i=1,...,n = max ∥{I c (λ(cid:98) ∗ i )−1}{I c (λ(cid:98) ∗ i )−I(λ(cid:98) ∗ i )}∥ i=1,...,n ≤ max T∥{I c (λ(cid:98) ∗ i )−1}∥T−1∥I c (λ(cid:98) ∗ i )−I(λ(cid:98) ∗ i )∥ i=1,...,n (cid:13) (cid:13) ≤ max T∥{I c (λ(cid:98) ∗ i )−1}∥T−1 (cid:13) (cid:13) (cid:13) I c (λ(cid:98) ∗ i )−(σ (cid:98)i 2∗)−1(cid:88) T F t F′ t (cid:13) (cid:13) (cid:13) i=1,...,n (cid:13) (cid:13) t=1 (cid:13) (cid:13) + max T∥{I c (λ(cid:98) ∗ i )−1}∥ (cid:13) (cid:13) (cid:13) (σ (cid:98)i 2∗)−1(cid:88) T F t F′ t −I(λ(cid:98) ∗ i ) (cid:13) (cid:13) (cid:13) i=1,...,n (cid:13) (cid:13) t=1 =O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1(cid:112) logn). (E.139) p Moreover, ∥λ(cid:98) ( i 0)−λ(cid:98) ∗ i ∗∥≤∥λ(cid:98) ( i 0)−λ(cid:98) ∗ i ∥+∥λ(cid:98) ∗ i −λ(cid:98) ∗ i ∗∥ ≤∥λ(cid:98) ( i 0)−λ i ∥+∥λ i −λ(cid:98) ∗ i ∥+∥λ(cid:98) ∗ i −λ(cid:98) ∗ i ∗∥ =O (max(n−1,T−1/2)+O (max(n−1log2/δvT,T−1/2)) p p +O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)) p =O (max(n−1log2/δvT,T−1/2), (E.140) p because of Lemmas D.1(i), E.13(i), and E.22(i), respectively. By substituting (E.139) and (E.140) into (E.130), we have ∥λ(cid:98)i −λ(cid:98) ∗ i ∗∥≤∥λ(cid:98) ( i 0)−λ(cid:98) ∗ i ∗∥ ∥R(λ(cid:98) ∗ i ∗)∥k∗+1+o p (max(n−1log2/δvT,T−1/2)) =O (max(n−1log2/δvT,T−1/2)) p · (cid:110) O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1(cid:112) logn) (cid:111)k∗+1 p +o (max(n−1log2/δvT,T−1/2)) (E.141) p =O (max(n−2log4/δvT,n−1T−1log1/δvT (cid:112) logn,T−3/2(cid:112) logn)), p uniformly in i since the rhs of (E.139) does not depend on i. This completes the proof. □ Lemma E.24. Consider the EM estimators σ (cid:98)i 2 ≡ σ (cid:98)i 2(k+1), A(cid:98) ≡ A(cid:98)(k+1), and Γ(cid:98)v ≡ Γ(cid:98)v(k+1), for any k ≥ 0. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞: √ (i) min(nlog−2/δvT, T)|σ2−σ2∗∗|=O (1); √ (cid:98)i (cid:98)i p (ii) min(nlog−2/δvT, √ T)∥A(cid:98) −A(cid:98)∗∗∥=O p (1); (iii) min(nlog−2/δvT, T)∥Γ(cid:98)v−Γ(cid:98)v∗∗∥=O p (1). Proof. Let, (cid:110) (cid:111)−1(cid:110) (cid:111) R(σ2∗∗)=1− E [∇ ℓ(X ,F ;φ )| |X ] ∇ ℓ(X ;φ )| . (cid:98)i φ n σ2 i σ2 i nT T n σ2 i =σ(cid:98)i 2∗∗ nT σ2 i σ2 i nT n σ2 i =σ(cid:98)i 2∗∗ Then, following the same steps leading to (E.130) in the proof of Lemma E.23, we obtain |σ2−σ2∗∗|≤|σ2(0)−σ2∗∗| |R(σ2∗∗)|k∗+1+C|σ2(0)−σ2∗∗| (cid:98)i (cid:98)i (cid:98)i (cid:98)i (cid:98)i (cid:98)i (cid:98)i ≤ (cid:110) |σ2(0)−σ2|+|σ2−σ2∗|+|σ2∗−σ2∗∗| (cid:111) |R(σ2∗∗)|k∗+1+C|σ2(0)−σ2∗∗| (cid:98)i i i (cid:98)i (cid:98)i (cid:98)i (cid:98)i (cid:98)i (cid:98)i =O (max(n−1,T−1/2)+O (max(n−1log2/δvT,T−1/2) p p +O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)) p =O (max(n−1log2/δvT,T−1/2)), (E.142) p for some finite positive real C independent of i, by Lemmas D.4(i), E.13(iii), and E.22(ii), and since |R(σ2∗∗)| = O (1). (cid:98)i p Page 76

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm This proves part (i). For part (ii), following again the same steps leading to (E.130) in the proof of Lemma E.23, we obtain ∥A(cid:98) −A(cid:98) ∗∗∥≤∥A(cid:98) (0)−A(cid:98) ∗∗∥ ∥R(A(cid:98) ∗∗)∥k∗+1+C∥A(cid:98) (0)−A(cid:98) ∗∗∥ (cid:110) (cid:111) ≤ ∥A(cid:98) (0)−A∥+∥A−A(cid:98) ∗∥+∥A(cid:98) ∗−A(cid:98) ∗∗∥ ·∥R(A(cid:98) ∗∗)∥k∗+1+C∥A(cid:98) (0)−A(cid:98) ∗∗∥ =O (n−1,T−1/2)+O (min(n−1log2/δvT,T−1/2))+O (n−1log2/δvT) p p p =O (max(n−1log2/δvT,T−1/2)), p forsomefinitepositiverealC,byLemmasD.3(i),E.13(iv),E.22(iii),andsince∥R(A(cid:98)∗∗)∥=O p (1). Thisprovespart(ii). Part (iii) is proved in the same way as part (ii) but using Lemmas D.3(ii), E.13(v), and E.22(iv). This completes the proof. □ Lemma E.25. Consider the EM estimators λ(cid:98)i ≡λ(cid:98) ( i k+1) and σ (cid:98)i 2 ≡σ (cid:98)i 2(k+1), for any k ≥0. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞: (i) min(n2log−4/δvT √ ,nTlog−1/δvTlog−1/2n,T3/2log−1n) max i=1,...,n ∥λ(cid:98)i −λ(cid:98)∗ i ∗∥=O p (1); (ii) min(nlog−2/δvT, Tlog−1/2n) max |σ2−σ2∗∗|=O (1). i=1,...,n (cid:98)i (cid:98)i p Proof. First notice that from (E.71) in the proof of Lemma E.14 max ∥λ(cid:98) ( i 0)−λ i ∥=O p (max(n−1,T−1/2(cid:112) logn)), (E.143) i=1,...,n while from (D.6) in the proof of Lemma D.4 it is clear that max |σ2(0)−σ2|=O (max(n−1,T−1/2(cid:112) logn)), (E.144) (cid:98)i i p i=1,...,n because of (E.143). Then, max ∥λ(cid:98) ( i 0)−λ(cid:98) ∗ i ∗∥≤ max ∥λ(cid:98) ( i 0)−λ i ∥+ max ∥λ i −λ(cid:98) ∗ i ∥+ max ∥λ(cid:98) ∗ i −λ(cid:98) ∗ i ∗∥ i=1,...,n i=1,...,n i=1,...,n i=1,...,n =O (max(n−1,T−1/2(cid:112) logn)+O (max(n−1log2/δvT,T−1/2(cid:112) logn)) p p +O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)) p =O (max(n−1log2/δvT,T−1/2(cid:112) logn), (E.145) p because of (E.143) and Lemmas E.14(i), E.22(i). Then, from (E.141) in the proof of Lemma E.23 max ∥λ(cid:98)i −λ(cid:98) ∗ i ∗∥≤ max ∥λ(cid:98) ( i 0)−λ(cid:98) ∗ i ∗∥ max ∥R(λ(cid:98) ∗ i ∗)∥k∗+1 i=1,...,n i=1,...,n i=1,...,n +o (max(n−1log2/δvT,T−1/2(cid:112) logn)) p =O (max(n−1log2/δvT,T−1/2(cid:112) logn)) p · (cid:110) O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1(cid:112) logn) (cid:111)k∗+1 p +o (max(n−1log2/δvT,T−1/2(cid:112) logn) p =O (max(n−2log4/δvT,n−1T−1log1/δvT (cid:112) logn,T−3/2logn)), (E.146) p because of (E.145) and (E.139) in the proof of Lemma E.23. This proves part (i). Page 77

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm For part (ii), from (E.142) in the proof of Lemma E.24 (cid:26) (cid:27) max |σ2−σ2∗∗|≤ max |σ2(0)−σ2|+ max |σ2−σ2∗|+ max |σ2∗−σ2∗∗| (cid:98)i (cid:98)i (cid:98)i i i (cid:98)i (cid:98)i (cid:98)i i=1,...,n i=1,...,n i=1,...,n i=1,...,n · max |R(σ2∗∗)|k∗+1+C max |σ2(0)−σ2∗∗| (cid:98)i (cid:98)i (cid:98)i i=1,...,n i=1,...,n =O (max(n−1,T−1/2(cid:112) logn)+O (max(n−1log2/δvT,T−1/2(cid:112) logn) p p +O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)) p =O (max(n−1log2/δvT,T−1/2(cid:112) logn)), p because of (E.144), and Lemmas E.14(ii), E.22(ii). This proves part (ii) and completes the proof. □ Lemma E.26. Consider the EM algorithm initialized with any deterministic loadings Λˇ(0) = (λˇ(0)···λˇ(0))′ such that n 1 n vec(Λˇ(0))∈{On ∩E } as defined in Section 4.3.4. Then, under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞, n λi Λn√ (i) min(nlog−2/δvT, √ nTlog−1/2n,Tlog−1/2n)∥λ(cid:98)i −λ(cid:98)∗ i ∗∥=O p (1), uniformly in i. (ii) min(nlog−2/δvT, nTlog−1/2n,Tlog−1/2n) max i=1,...,n ∥λ(cid:98)i −λ(cid:98)∗ i ∗∥=O p (1). Proof. From (E.130) in the proof of Lemma E.23, ∥λ(cid:98)i −λ(cid:98) ∗ i ∗∥≤∥λˇ( i 0)−λ(cid:98) ∗ i ∗∥ ∥R(λ(cid:98) ∗ i ∗)∥k∗+1+ (cid:40) (cid:88) k∗ ∥R(λ(cid:98) ∗ i ∗)∥j (cid:41) O p (cid:16) ∥λˇ( i 0)−λ(cid:98) ∗ i ∗∥n−1log2/δvT (cid:17) j=0 ≤∥λˇ( i 0)−λ(cid:98) ∗ i ∗∥ ∥R(λ(cid:98) ∗ i ∗)∥k∗+1+O p (cid:16) ∥λˇ( i 0)−λ(cid:98) ∗ i ∗∥n−1log2/δvT (cid:17) = (cid:110) O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1(cid:112) logn) (cid:111)k∗+1 +O (n−1log2/δvT) p p =O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1(cid:112) logn), p by(E.139)intheproofofLemmaE.23,andsince∥λˇ( i 0)−λ(cid:98)∗ i ∗∥≤∥λˇ( i 0)∥+∥λ(cid:98)∗ i ∗−λ(cid:98)∗ i ∥+∥λ(cid:98)∗ i −λ i ∥≤M λ O(1),bydefinition of the initial estimator, Assumption 1(a), and Lemmas E.13(i) and E.22(i). This proves part (i). Part (ii) is proved in the same way but starting from (E.146) in the proof of Lemma E.25 and by noting that max ∥λˇ(0)∥≤M by the same arguments as before. This completes the proof. □ i=1,...,n i λ F Lemmas necessary for proving Proposition 3 Lemma F.1. Consider the EM estimators λ(cid:98)i ≡ λ(cid:98) ( i k+1), σ (cid:98)i 2 ≡ σ (cid:98)i 2(k+1), and Σ(cid:98)ξ n ≡ Σ(cid:98)n ξ(k+1), for any k ≥ 0. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞, √ (i) min( √ Tlog−1/2n,nlog−2/δvT) max i=1,...,n ∥λ(cid:98)i −λ i ∥=O p (1); (ii) min( Tlog−1/2n,nlog−2/δvT) max |σ2−σ2|=O (1); √ i=1,...,n (cid:98)i i p (iii) min( Tlog−1/2n,nlog−2/δvT)∥Σ(cid:98)ξ n −Σξ n ∥=O p (1); (iv) ∥(Σ(cid:98)ξ n√ )−1∥=O p (1); (v) min( Tlog−1/2n,nlog−2/δvT)∥(Σ(cid:98)ξ n )−1−(Σξ n )−1∥=O p (1). Proof. For part (i) max ∥λ(cid:98)i −λ i ∥≤ max ∥λ(cid:98)i −λ(cid:98) ∗ i ∗∥+ max ∥λ(cid:98) ∗ i ∗−λ(cid:98) ∗ i ∥+ max ∥λ(cid:98) ∗ i −λ i ∥ i=1,...,n i=1,...,n i=1,...,n i=1,...,n =O (max(n−2log4/δvT,n−1T−1log1/δvT (cid:112) logn,T−3/2logn)) p +O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)) p +O (max(n−1,T−1/2(cid:112) logn))+O (n−1log2/δvT) p p =O (max(n−1log2/δvT,T−1/2(cid:112) logn)), p by Lemmas E.14(i), E.22(i), and E.25(i). This proves part (i). Part (ii) is proved as part (i) but using Lemmas E.14(ii), E.22(ii), and E.25(ii). Page 78

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Part (iii) immediately follows from part (ii), indeed ∥Σ(cid:98) ξ n −Σξ n ∥≤ max |σ (cid:98)i 2−σ i 2|=O p (max(n−1log2/δvT,T−1/2(cid:112) logn)). i=1,...,n For part (iv) we have (cid:26) (cid:27)−1 ∥(Σ(cid:98) ξ n )−1∥= min σ (cid:98)i 2 ≤C ξ +O p (max(n−1log2/δvT,T−1/2(cid:112) logn)), i=1,...,n because of part (ii) and Assumption 2(a). To conclude, for part (v) we have ∥(Σ(cid:98) ξ n )−1−(Σξ n )−1∥≤∥(Σ(cid:98) ξ n )−1∥∥Σ(cid:98) ξ n −Σξ n ∥∥(Σξ n )−1∥=O p (max(n−1log2/δvT,T−1/2(cid:112) logn)), by parts (iii), (iv), and Assumption 2(a). This completes the proof. □ Lemma F.2. Consider the EM estimators λ(cid:98)i ≡λ(cid:98) ( i k+1), Λ(cid:98)n ≡Λ(cid:98) ( n k+1), σ (cid:98)i 2 ≡σ (cid:98)i 2(k+1), and Σ(cid:98)ξ n ≡Σ(cid:98) ξ n (k+1), for any k ≥0. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞: √ (i) min(nlog−2/δvT, √ Tlog−1/2n)n−1∥Λ(cid:98)′ n (Σ(cid:98)ξ n )−1Λ(cid:98)n −Λ′ n (Σξ n )−1Λ n ∥=O p (1); (ii) min(nlog−2/δvT, Tlog−1/2n)n−1/2∥Λ(cid:98)′ n (Σ(cid:98)ξ n )−1−Λ′ n (Σξ n )−1∥=O p (1); (iii) n∥(Λ(cid:98)′ n (Σ(cid:98)ξ n )−1Λ(cid:98)n ) √ −1∥=O p (1); (iv) min(nlog−2/δvT, Tlog−1/2n)n∥(Λ(cid:98)′ n (Σ(cid:98)ξ n )−1Λ(cid:98)n )−1−(Λ′ n (Σξ n )−1Λ n )−1∥=O p (1); √ (v) ω n,T,δv n∥(Λ(cid:98)′ n (Σ(cid:98)ξ n )−1Λ(cid:98)n )−1Λ(cid:98)′ n√ (Σ(cid:98)ξ n )−1−(Λ′ n (Σξ n )−1Λ n )−1Λ′ n (Σξ n )−1∥=O p (1), with ω =min(nlog−2/δvT, Tlog−1/2n). n,T,δv Proof. TheproofisthesameastheproofofLemmaE.15butusingProposition2(a)(whichdoesnotrequirethislemma to be proved) and Lemma F.1 instead of Lemmas E.13 and E.14. □ LemmaF.3. ConsidertheMSEsestimatorsP(cid:98)t|t−1 ≡P( t| k t + − 1 1 ),P(cid:98)t|t ≡P( t| k t +1),andP(cid:98)t|T ≡P( t| k T +1) derivedfromtheKalman filter and smoother and obtained using the EM estimators λ(cid:98)i ≡λ(cid:98) ( i k+1), Λ(cid:98)n ≡Λ(cid:98) ( n k+1), σ (cid:98)i 2 ≡σ (cid:98)i 2(k+1), and Σ(cid:98)ξ n ≡Σ(cid:98) ξ n (k+1), for any k≥0. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞: (i) max t=1,...,T ∥P(cid:98)t|t−1 ∥=O p (1); (ii) max t=1,...,T ∥(P(cid:98)t|t−1 )−1∥=O p (1); (iii) max t=1,...,T n∥P(cid:98)t|t ∥=O p (1); (iv) max t=1,...,T n∥P(cid:98)t|T ∥=O p (1). Proof. TheproofisthesameastheproofofLemmaE.16butusingProposition2(a)(whichdoesnotrequirethislemma to be proved) and Lemmas F.1 and F.2 instead of Lemmas E.13, E.14, and E.15. □ Lemma F.4. Consider the Kalman filter and smoother estimators F(cid:98)t|t−1 ≡ F( t| k t + − 1 1 ), F(cid:98)t|t ≡ F( t| k t +1), and F(cid:98)t|T ≡ F( t| k T +1) obtained using the EM estimators λ(cid:98)i ≡ λ(cid:98) i (k+1), Λ(cid:98)n ≡ Λ(cid:98) ( n k+1), σ (cid:98)i 2 ≡ σ (cid:98)i 2(k+1), and Σ(cid:98)ξ n ≡ Σ(cid:98)n ξ(k+1), for any k ≥ 0. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞: (i) for all s=0,...,T, ∥F(cid:98)t|s ∥=O p (1), uniformly in t≤s; (ii) n∥F(cid:98)t|T −F(cid:98)t|t ∥=O p (1), uniformly in t; (iii) n∥F(cid:98)t|t −F(cid:98)W t LS∥=O p (1), uniformly in t; where F(cid:98)W t LS =(Λ(cid:98)′ n (Σ(cid:98)ξ n )−1Λ(cid:98)n )−1Λ(cid:98)′ n (Σ(cid:98)ξ n )−1x nt . Proof. TheproofisthesameastheproofofLemmaE.17butusingProposition2(a)(whichdoesnotrequirethislemma to be proved) and Lemmas F.1, F.2, and F.3 instead of Lemmas E.13, E.14, E.15, and D.12. □ Lemma F.5. Consider the Kalman filter and smoother estimators F(cid:98)t|t−1 ≡ F( t| k t + − 1 1 ), F(cid:98)t|t ≡ F( t| k t +1), and F(cid:98)t|T ≡ F( t| k T +1) obtained using the EM estimators λ(cid:98)i ≡ λ(cid:98) ( i k+1), Λ(cid:98)n ≡ Λ(cid:98) ( n k+1), σ (cid:98)i 2 ≡ σ (cid:98)i 2(k+1), and Σ(cid:98)ξ n ≡ Σ(cid:98)n ξ(k+1), for any k ≥ 0. Under Assumptions 1, 2, 3, 4, 5, and 6, as n,T →∞: (i) for all s=0,...,T, log−1/δvT max t=1,...,s ∥F(cid:98)t|s ∥=O p (1); (ii) log−1/δvT max t=1,...,T n∥F(cid:98)t|T −F(cid:98)t|t ∥=O p (1); (iii) log−1/δvT max t=1,...,T n∥F(cid:98)t|t −F(cid:98)W t LS∥=O p (1); Page 79

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm where F(cid:98)W t LS =(Λ(cid:98)′ n (Σ(cid:98)ξ n )−1Λ(cid:98)n )−1Λ(cid:98)′ n (Σ(cid:98)ξ n )−1x nt . Proof. From(D.44)intheproofofLemmaD.14,butwhencomputedusingtheEMestimatoroftheparameters,wesee that the only modification is that we need to use max n−1/2∥x ∥=O () from Lemma E.3. This proves part (i). t=1,...,T t p Part(ii)followsfrompart(i)and(D.45)intheproofofLemmaD.15,butwhencomputedusingtheEMestimatorof the parameters. Part (iii) follows from (D.50) in the proof of Lemma D.16, but when computed using the EM estimator of the parameters, and using Lemma E.3 again and the fact that F(cid:98)t−1|t−1 is a weighted average of x n1 ,...,x n,t−1 . This completes the proof. □ G Lemmas necessary for proving Proposition 5 Lemma G.1. Under Assumptions 1, 2, 3, 5, and 6, as n,T →∞, √ min(n, Tlog−1/2n) max |σ(0)2−σ2|=O (1). (cid:98)i i p i=1,...,n Proof. From (D.6) in the proof of Lemma D.4, we have (cid:12) (cid:12) max |σ (cid:98)i 2(0)−σ i 2|≤ max (cid:12) (cid:12) (cid:12) T−1(cid:88) T x2 it −E[x2 it ] (cid:12) (cid:12) (cid:12) + max (cid:12) (cid:12) (cid:12) λ(cid:98) ( i 0)′λ(cid:98) ( i 0)−λ′ i λ i (cid:12) (cid:12) (cid:12) i=1,...,n i=1,...,n(cid:12) (cid:12) i=1,...,n t=1 +2 max ∥λ i ∥ max ∥λ(cid:98) ( i 0)−λ i ∥ i=1,...,n i=1,...,n (cid:13) (cid:13) +2 max ∥λ i ∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t F′ t −I r (cid:13) (cid:13) (cid:13) (cid:26) max ∥λ(cid:98) ( i 0)−λ i ∥+ max ∥λ i ∥ (cid:27) i=1,...,n (cid:13) (cid:13) i=1,...,n i=1,...,n t=1 (cid:13) (cid:13) +2 max ∥λ i ∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t (F(cid:101)t −F t )′ (cid:13) (cid:13) (cid:13) (cid:26) max ∥λ(cid:98) ( i 0)−λ i ∥+ max ∥λ i ∥ (cid:27) i=1,...,n (cid:13) (cid:13) i=1,...,n i=1,...,n t=1 (cid:13) (cid:13) +2 max (cid:13) (cid:13)T−1(cid:88) T ξ F′ (cid:13) (cid:13) max ∥λ ∥ (cid:13) it t(cid:13) i i=1,...,n(cid:13) (cid:13) i=1,...,n t=1 (cid:13) (cid:13) +2 max (cid:13) (cid:13) (cid:13) T−1(cid:88) T ξ it F′ t (cid:13) (cid:13) (cid:13) max ∥λ(cid:98) ( i 0)−λ i ∥ i=1,...,n(cid:13) (cid:13) i=1,...,n t=1 (cid:13) (cid:13) +2 max (cid:13) (cid:13) (cid:13) T−1(cid:88) T ξ it (F(cid:101)t −F t )′ (cid:13) (cid:13) (cid:13) (cid:26) max ∥λ(cid:98) ( i 0)−λ i ∥+ max ∥λ i ∥ (cid:27) i=1,...,n(cid:13) (cid:13) i=1,...,n i=1,...,n t=1 =O (max(n−1,T−1/2(cid:112) logn)), p by Assumption 1(a), Lemma E.1(ii) and the union bound, and (E.71) and (E.79) in the proof of Lemma E.14. This completes the proof. □ Lemma G.2. Under Assumptions 1, 2, 3, 5, and 6, as n,T →∞, for any k≥0, √ (i) min(nlog−2/δvT, nTlog−1/2n,Tlog−1/2n), ∥T−1(cid:80)T (F(k) −F )F′∥=O (1); √ t=1 t|T t t p (ii) min(nlog−2/δvT, nTlog−1/2n,Tlog−1/2n)∥T−1(cid:80)T (F(k) −F )ξ ∥=O (1), uniformly in i. t=1 t|T t it p Proof. Throughout, let y =F or y =ξ . Then, for any k≥0 we have to consider t t t it (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)T−1(cid:88) T (F(k) −F )y′ (cid:13) (cid:13)≤ (cid:13) (cid:13)T−1(cid:88) T (F(k) −F(k))y′ (cid:13) (cid:13) (cid:13) t|T t t(cid:13) (cid:13) t|T t|t t(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 t=1 (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) T−1(cid:88) T (F( t| k t )−F(cid:98) t WLS(k))y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) + (cid:13) (cid:13) (cid:13) T−1(cid:88) T (F(cid:98)W t LS(k)−F t )y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 =I+II+III, say. (G.1) Page 80

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm whereF(cid:98)W t LS(k) =(Λ(cid:98)n (k)′(Σ(cid:98)n ξ(k))−1Λ(cid:98) ( n k))−1Λ(cid:98)n (k)′(Σ(cid:98)n ξ(k))−1x nt .Letusconsidereachtermin(G.1). First,whenk=0wehave I =O (n−1) by (B.5) in theproofof Proposition 1 (which does notrequire this lemmato be proved), while, when k≥1 p (cid:13) (cid:13) I ≤ max ∥F(k) −F(k)∥ (cid:13) (cid:13)T−1(cid:88) T y (cid:13) (cid:13)=O (n−1log1/δvT), (G.2) t=1,...,T t|T t|t (cid:13) (cid:13) t(cid:13) (cid:13) p t=1 by Lemma F.5(ii) and since E   (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t (cid:13) (cid:13) (cid:13) (cid:13) 2 ≤rT−2 max (cid:88) T |E[F jt F js ]|≤1, (G.3) (cid:13) (cid:13) j=1,...,r t=1 t,s=1 by Cauchy-Schwarz inequality and Assumption 6(b), and also E   (cid:12) (cid:12) (cid:12) (cid:12) T−1(cid:88) T ξ it (cid:12) (cid:12) (cid:12) (cid:12) 2 ≤T−2 (cid:88) T |E[ξ it ξ is ]|≤T−1M 3 , (G.4) (cid:12) (cid:12) t=1 t,s=1 by Lemma C.1(iii). Second, when k = 0 we have II = O (n−1) by (B.6) in the proof of Proposition 1 (which does not p require this lemma to be proved), while, when k≥1 (cid:13) (cid:13) II ≤ t= m 1, a .. x .,T ∥F( t| k t )−F(cid:98) t WLS(k)∥ (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) T y t (cid:13) (cid:13) (cid:13) (cid:13) =O p (n−1log1/δvT), (G.5) t=1 by Lemma F.5(iii), (G.3), and (G.4). Third, (cid:13) (cid:13) III ≤∥(Λ′ n (Σξ n )−1Λ n )−1∥∥Λ′ n (Σξ n )−1(Λ n −Λ(cid:98) ( n k))∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 +∥n(Λ(cid:98) ( n k)′(Σ(cid:98)n ξ(k))−1Λ(cid:98)n (k))−1n−1/2Λ(cid:98)n (k)′(Σ(cid:98)n ξ(k))−1−n(Λ′ n (Σξ n )−1Λ n )−1n−1/2Λ′ n (Σξ n )−1∥ (cid:13) (cid:13) ·n−1/2∥Λ n −Λ(cid:98)n (k)∥ (cid:13) (cid:13) (cid:13) T−1(cid:88) T F t y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) +n∥(Λ(cid:98)n (k)′(Σ(cid:98)n ξ(k))−1Λ(cid:98) ( n k))−1∥n−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T Λ′ n (Σξ n )−1ξ nt y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) +n∥(Λ(cid:98) ( n k)′(Σ(cid:98)n ξ(k))−1Λ(cid:98) ( n k))−1∥n−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T {Λ(cid:98)n (k)′(Σ(cid:98)n ξ(k))−1−Λ′ n (Σξ n )−1}ξ nt y t ′ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 =III +III +III +III , say. (G.6) a b c d Then, when k = 0, III = III = O (n−1/2T−1/2), III = III = O (max(n−2,T−1)), and III = III = a a p b b p c c O (n−1/2T−1/2), because of (B.9), (B.10), (B.11), and (B.12) in the proof of Proposition 1 (which does not require p this lemma to be proved). While, if k≥1, III =O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)), (G.7) a p by (B.40), (B.41), and (B.42) in the proof of Proposition 3 (which does not require this lemma to be proved), and since ∥T−1(cid:80)T F F′∥ = O (1) by Lemma C.12(i) and Assumption 6(b), and ∥T−1(cid:80)T F ξ ∥ = O (T−1/2) by Lemma t=1 t t p t=1 t it p C.12(ii). Theselastarguments,and(B.43)intheproofofProposition3(whichdoesnotrequirethislemmatobeproved) imply also that, if k≥1, III =O (max(n−2log4/δvT,T−1(cid:112) logn,n−1T−1/2log2/δvT (cid:112) logn)). (G.8) b p Page 81

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Moreover, if y =F and k≥1, then t t (cid:13) (cid:13) III c =n∥(Λ(cid:98) ( n k)′(Σ(cid:98)n ξ(k))−1Λ(cid:98) ( n k))−1∥n−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T Λ′ n (Σξ n )−1ξ nt F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 =O (n−1/2T−1/2), (G.9) p by Lemmas F.2(iii) and C.8(iv). While if y =ξ and k≥1, then t it (cid:13) (cid:13) III c =n∥(Λ(cid:98) ( n k)′(Σ(cid:98)n ξ(k))−1Λ(cid:98) ( n k))−1∥n−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T Λ′ n (Σξ n )−1ξ nt ξ it (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 =O (n−1/2T−1/2), (G.10) p by Lemmas F.2(iii) and C.8(v). Similarly, if y =F , we have: t t (cid:13) (cid:13) III d =n∥(Λ(cid:98)n (k)′(Σ(cid:98) ξ n (k))−1Λ(cid:98) ( n k))−1∥n−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T {Λ(cid:98)n (k)′(Σ(cid:98)n ξ(k))−1−Λ′ n (Σξ n )−1}ξ nt F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) ≤n∥(Λ(cid:98)n (k)′(Σ(cid:98) ξ n (k))−1Λ(cid:98) ( n k))−1∥n−1/2∥Λ(cid:98)n (k)′(Σ(cid:98)n ξ(k))−1−Λ′ n (Σξ n )−1∥n−1/2 (cid:13) (cid:13) (cid:13) T−1(cid:88) T ξ nt F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:40) O (max(n−1T−1/2,T−1)), if k=0, = p √ (G.11) O (max(n−1T−1/2log2/δvT,T−1 logn)), if k≥1, p where, if k = 0, we used Lemmas D.5(ii), D.5(iii), and C.12(iii), while, if k ≥ 1, we used Lemmas F.2(ii), F.2(iii), and C.12(iii). Finally, if y =ξ , we have: t it (cid:13) (cid:13) III d =n∥(Λ(cid:98)n (k)′(Σ(cid:98)n ξ(k))−1Λ(cid:98) ( n k))−1∥n−1 (cid:13) (cid:13) (cid:13) T−1(cid:88) T {Λ(cid:98)n (k)′(Σ(cid:98)n ξ(k))−1−Λ′ n (Σξ n )−1}ξ nt ξ it (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 (cid:26) (cid:13) T (cid:13) ≤n∥(Λ(cid:98)n (k)′(Σ(cid:98) ξ n (k))−1Λ(cid:98) ( n k))−1∥ n−1(cid:13) (cid:13) (cid:13) T−1(cid:88) (Λ(cid:98) ( n k)−Λ n )′(Σξ n )−1ξ nt ξ it (cid:13) (cid:13) (cid:13) t=1 (cid:13) T (cid:13) +n−1(cid:13) (cid:13) (cid:13) T−1(cid:88) Λ′ n {(Σ(cid:98) ξ n (k))−1−(Σξ n )−1}ξ nt ξ it (cid:13) (cid:13) (cid:13) t=1 (cid:13) T (cid:13)(cid:27) +n−1(cid:13) (cid:13) (cid:13) T−1(cid:88) (Λ(cid:98) ( n k)−Λ n )′{(Σ(cid:98)n ξ(k))−1−(Σξ n )−1}ξ nt ξ it (cid:13) (cid:13) (cid:13) t=1 =n∥(Λ(cid:98)n (k)′(Σ(cid:98) ξ n (k))−1Λ(cid:98) ( n k))−1∥{III d.1 +III d.2 +III d.3 }, say. (G.12) Now, (cid:13) (cid:13) III =n−1 (cid:13) (cid:13)T−1(cid:88) T (ΛOLS−Λ )′(Σξ)−1ξ ξ (cid:13) (cid:13) d.1 (cid:13) n n n nt it(cid:13) (cid:13) (cid:13) t=1 (cid:13) (cid:13) +n−1/2∥Λ(cid:98)n (k)−ΛO n LS∥∥(Σξ n )−1∥n−1/2 (cid:13) (cid:13) (cid:13) T−1(cid:88) T ξ nt ξ it (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t=1 =III +III ,say. (G.13) d.1.1 d.1.2 Page 82

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm And, (cid:13) (cid:13) (cid:13) T (cid:32) T (cid:33)−1(cid:32) T (cid:33) (cid:13) III d.1.1 =n−1(cid:13) (cid:13) (cid:13) T−1(cid:88) T−1(cid:88) F s F′ s T−1(cid:88) F s ξ n ′ s (Σξ n )−1ξ nt ξ it (cid:13) (cid:13) (cid:13) (cid:13) t=1 s=1 s=1 (cid:13) ≤ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32) T−1(cid:88) s= T 1 F s F′ s (cid:33)−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) n−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−2 s (cid:88) ,t T =1 F s ξ n ′ s (Σξ n )−1ξ nt ξ it (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ≤ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32) T−1(cid:88) s= T 1 F s F′ s (cid:33)−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) n−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−2 s (cid:88) ,t T =1 F s ξ n ′ s (Σξ n )−1ξ nt (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) t= m 1, a .. x .,T |ξ it | =O (n−1/2T−3/2log1/δvT), (G.14) p by (E.9) in the proof of Lemma E.3, and Lemmas C.8(vi) and C.13. (cid:40) O (max(n−1,n−1/2T−1/2,T−1)), if k=0, III d.1.2 = O p (max(n−1log2/δvT,n−1/2T−1/2 √ logn,T−1)), if k≥1, (G.15) p whereweusedAssumption2(a),thelastrelationin(B.7)intheproofofProposition1(whichdoesnotrequirethislemma to be proved), and, if k=0, we used also Barigozzi (2023, Corollary 1 and Proposition B.3), or, if k≥1, we used (B.39) in the proof of Proposition 3 (which does not require this lemma to be proved). By using (G.14) and (G.15) into (G.13), we have III =O (n−1/2T−3/2log1/δvT) d.1 p (cid:40) O (max(n−1,n−1/2T−1/2,T−1)), if k=0, + p √ (G.16) O (max(n−1log2/δvT,n−1/2T−1/2 logn,T−1)), if k≥1, p Furthermore, (cid:13) T n (cid:13) III d.2 =n−1(cid:13) (cid:13) (cid:13) T−1(cid:88)(cid:88) λ j {(σ (cid:98)j 2(k))−1−(σ j 2)−1}ξ jt ξ it (cid:13) (cid:13) (cid:13) t=1j=1 (cid:13) T n (cid:13) ≤C ξ 2 j= m 1, a .. x .,n |σ (cid:98)j 2(k)−σ j (k)|n−1(cid:13) (cid:13) (cid:13) T−1(cid:88)(cid:88) λ j ξ jt ξ it (cid:13) (cid:13) (cid:13) t=1j=1 (cid:13) T (cid:13) ≤C ξ 2 j= m 1, a .. x .,n |σ (cid:98)j 2(k)−σ j (k)|n−1(cid:13) (cid:13) (cid:13) T−1(cid:88) Λ′ n ξ nt ξ it (cid:13) (cid:13) (cid:13) t=1 (cid:40) √ O (max(n−1,T−1/2 logn)), if k=0, =O p (n−1/2T−1/2)· O p (max(n−1log2/δvT,T−1/2 √ logn)), if k≥1, (G.17) p where, if k = 0, we used Lemmas G.1 and C.8(ii), while, if k ≥ 1, we used Lemmas F.1(ii) and C.8(ii). Term III is d.3 dominated by III . d.1 By using Lemmas F.2(iii) or D.5(iii), together with (G.16) and (G.17) into (G.12), we have that if y =ξ , then t it III =O (n−1/2T−3/2log1/δvT)+O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)) d p p +O (max(n−3/2T−1/2log2/δvT,n−1/2T−1(cid:112) logn)). (G.18) p By substituting (G.7), (G.8), (G.9), (G.10), (G.11), and (G.18) into (G.6), if y =F , we have t t III =O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)) p +O (max(n−2log4/δvT,T−1(cid:112) logn,n−1T−1/2log2/δvT (cid:112) logn))+O (n−1/2T−1/2) p p +O (max(n−1T−1/2log2/δvT,T−1(cid:112) logn)), (G.19) p Page 83

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm while, if y =ξ , we have t it III =O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)) p +O (max(n−2log4/δvT,T−1(cid:112) logn,n−1T−1/2log2/δvT (cid:112) logn))+O (n−1/2T−1/2) p p +O (n−1/2T−3/2log1/δvT)+O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1)) p p +O (max(n−3/2T−1/2log2/δvT,n−1/2T−1(cid:112) logn)). (G.20) p To conclude by substituting (G.2), (G.5), and either (G.19) or (G.20) into (G.1), we have (cid:13) (cid:13) (cid:13) (cid:13)T−1(cid:88) T (F(k) −F )F′ (cid:13) (cid:13)=O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1(cid:112) logn)), (cid:13) t|T t t(cid:13) p (cid:13) (cid:13) t=1 which proves part (i), and (cid:13) (cid:13) (cid:13) (cid:13)T−1(cid:88) T (F(k) −F )ξ (cid:13) (cid:13)=O (max(n−1log2/δvT,n−1/2T−1/2(cid:112) logn,T−1(cid:112) logn)), (cid:13) t|T t it(cid:13) p (cid:13) (cid:13) t=1 which proves part (ii) and completes the proof. □ H Lemmas necessary for proving Proposition 6 Lemma H.1. Consider the initial estimator of the factors F(cid:101)t defined in Section A.1, then, under Assumptions 1, 2, 3, √ and 6, as n,T →∞, if n/T →0, √ n(F(cid:101)t −F t )→d N(0 r ,WP t C), for any given t=1,...,T, where (cid:32) n (cid:33) WPC =(Σ )−1 lim n−1 (cid:88) E[ξ ξ ]λ λ (Σ )−1 t Λ it jt i j Λ n→∞ i,j=1 with Σ =lim n−1(cid:80)n λ λ′; Λ n→∞ i=1 i i Proof. By definition of the pre-estimator in Section A.1: F(cid:101)t −F t =(Λ(cid:98) ( n 0)′Λ(cid:98) ( n 0))−1Λ(cid:98) ( n 0)′x nt −F t (cid:110) (cid:111) =n(Mχ n )−1 n−1Λ(cid:98) ( n 0)′(Λ n −Λ(cid:98) ( n 0))F t +n−1(Λ(cid:98) ( n 0)−Λ n )′ξ nt +n−1Λ′ n ξ nt (H.1) (cid:110) (cid:111)(cid:110) (cid:111) +n (M(cid:99) x n )−1−(Mχ n )−1 n−1Λ(cid:98) ( n 0)′(Λ n −Λ(cid:98) ( n 0))F t +n−1(Λ(cid:98) ( n 0)−Λ n )′ξ nt +n−1Λ′ n ξ nt . Then, n∥(Mχ)−1∥=n{µχ }−1 ≥C , (H.2) n nr r by Lemma C.1(iv). Moreover, by Lemmas, C.1(v), C.11 and C.12(vii), and Merikoski and Kumar (2004, Theorem 1), which is Weyl’s inequality, for all j =1,...,r, (cid:13) (cid:13) n−1|µx−µχ|≤n−1 (cid:13) (cid:13)T−1(cid:88) T x x′ −Γx (cid:13) (cid:13)+n−1∥Γξ∥=O (max(n−1,T−1/2)), (H.3) (cid:98)j j (cid:13) t t (cid:13) p (cid:13) (cid:13) t=1 which, jointly with (H.2), implies r det(n−1M(cid:99) x n )= (cid:89) n−1µ (cid:98) x j ≥{n−1µ (cid:98) x r }r ≥{n−1µχ r −n−1|µ (cid:98) x r −µχ r |}r >0, j=1 Page 84

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm thus, n∥(M(cid:99) x n )−1∥=O p (1). (H.4) From (H.2), (H.3), and (H.4) (cid:13) (cid:13) (cid:13) (cid:13) n(cid:13) (cid:13) (M(cid:99) x n )−1−(Mχ n )−1(cid:13) (cid:13) ≤n∥(Mχ n )−1∥n−1(cid:13) (cid:13) M(cid:99) x n −Mχ n (cid:13) (cid:13) n∥(M(cid:99) x n )−1∥=O p (max(n−1,T−1/2)). (H.5) Furthermore, n−1∥Λ(cid:98) ( n 0)′(Λ n −Λ(cid:98) ( n 0))F t ∥≤n−1∥Λ′ n (Λ n −Λ(cid:98) ( n 0))∥∥F t ∥+n−1∥Λ n −Λ(cid:98) ( n 0)∥2∥F t ∥ ≤n−1∥Λ′ n (Λ n −Λ(cid:98)O n LS)∥∥F t ∥+n−1/2∥Λ n ∥n−1/2∥Λ(cid:98) ( n 0)−ΛO n LS∥∥F t ∥ +n−1∥Λ n −Λ(cid:98) ( n 0)∥2∥F t ∥ ≤n−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) t= T 1 Λ′ n ξ nt F′ t (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32) T−1(cid:88) t= T 1 F t F′ t (cid:33)−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ∥F t ∥ +n−1/2∥Λ n ∥n−1/2∥Λ(cid:98) ( n 0)−ΛO n LS∥∥F t ∥+n−1∥Λ n −Λ(cid:98) ( n 0)∥2∥F t ∥ =O (n−1/2T−1/2)+O (max(n−1,n−1/2T−1/2))+O (max(n−2,T−1)) p p p =O (max(n−1,n−1/2T−1/2,T−1)), (H.6) p by Barigozzi (2023, Corollary 1), Lemmas C.2, C.8(i), C.13, and D.1(b), and since ∥F ∥ = O (1) because E[F2] = 1, t p jt j =1,...,r, by Assumption 6(b). Last, n−1∥(Λ(cid:98) ( n 0)−Λ n )′ξ nt ∥≤n−1∥(ΛO n LS−Λ n )′ξ nt ∥+n−1/2∥Λ(cid:98) ( n 0)−ΛO n LS∥n−1/2∥ξ nt ∥ ≤n−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) T−1(cid:88) t= T 1 F t ξ n ′ t ξ nt (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32) T−1(cid:88) t= T 1 F t F′ t (cid:33)−1 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) +n−1/2∥Λ(cid:98) ( n 0)−ΛO n LS∥n−1/2∥ξ nt ∥ =O (n−1/2T−1/2)+O (max(n−1,n−1/2T−1/2)) p p =O (max(n−1,n−1/2T−1/2)), (H.7) p by Barigozzi (2023, Corollary 1), Lemmas C.8(vi) (setting Σξ = I and using ξ in place of E therein), C.13, and n n nt nT D.1(b), and since n−1/2∥ξ ∥=O (1) because (cid:80)n σ2 ≤nC by Assumption 2(a). nt p i=1 i ξ √ By substituting (H.5), (H.6), and (H.7) into (H.1), it follows that if n/T →0, as n,T →∞, √ n(F(cid:101)t −F t )=n(Mχ n )−1(n−1/2Λ′ n ξ nt )+o p (1). (H.8) andsincelim n−1Mχ =Σ byAssumption6(b)whichispositivedefinite,byAssumption2(e)(whensettingσ2 =1 n→∞ n Λ i therein) and Slutsky’s Theorem we complete the proof. □ I Derivation of the Kalman filter MSE Lemma I.1. Under Assumption 1, the DFM (3)-(4) is both stabilizable and detectable, for all n∈N. Proof. We use the definitions in Anderson and Moore (1979, Appendix C, p. 341-342). The DFM (3)-(4) is a linear systems with r states. A linear system is stabilizable if its unstable states are controllable and all uncontrollable states are stable, and it is detectable if its unstable states are observable and all unobservable states are stable. First, by factorizing Γv =HH′ for some H having full-column rank, we see that rk[H(AH)···(A(r−1)H)]=r, thus thelinearsystemiscontrollable,byAssumption1(e). Moreover,therearenounstablestates,sincebecauseofAssumption 1(d) all eigenvalues of A are smaller than one in absolute value. This implies that the model is stabilizable. Second,sincebyAssumption1(a)foranygivenn∈Nthereexistsatleastonei=1,...,nsuchthat∥λ ∥≥m ,then i λ Page 85

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm rk(Λ )≥1, while, rk(Λ )=r only for n>N, then, for a given n, there might be (r−1) unobservable states, however, n n as already noticed, they are all stable by Assumption 1(d). Thus the model is detectable. This completes the proof. □ Lemma I.2. Under Assumptions 1 and 2, the matrix P has a steady-state denoted as P=lim P . t|t−1 t→∞ t|t−1 Proof. First,giventhatwithourinitializationP =ΓF,thenitispositivedefinitebyAssumption1(b),thereforealso 0|0 P is positive definite (see also (A.2)). Second, as proved in Lemma I.1, the linear system defining the DFM (3)-(4) 1|0 is stabilizable and detectable. Therefore, because of Chan et al. (1984, Theorem 4.1), as t → ∞, P converges to a t|t−1 steady-state P exponentially fast (see Lemma I.4 below for the rate of convergence), which is a solution of the algebraic Riccati equation (ARE) derived from (A.5) P=APA′−APΛ′(Λ PΛ′ +Σξ)−1Λ PA′+Γv. n n n n n Moreover, since Lemmas D.7(i) and D.7(ii) hold for all T ∈N, we have ∥P∥=O(1), ∥P−1∥=O(1). (I.1) This completes the proof. □ Lemma I.3. Under Assumptions 1 and 2, the matrix Π has a steady-state denoted as Π=lim Π . t|t−1 t→∞ t|t−1 Proof. TheexistenceofΠfollowsfromthesameargumentsusedinLemmaI.2. Moreover,fromHarveyandDelleMonache (2009, Section 2.2) we have that Π must satisfy Π=AΠA′+APΛ′(Λ PΛ′ +Σξ)−1(Λ ΠΛ′ +Γξ)(Λ PΛ′ +Σξ)−1Λ PA′ n n n n n n n n n n n −APΛ′(Λ PΛ′ +Σξ)−1Λ ΠA′−AΠΛ′(Λ PΛ′ +Σξ)−1Λ PA′+Γv. n n n n n n n n n n Moreover, by the same arguments in Lemmas D.7(i) and D.7(ii) it holds that max ∥Π ∥ = O(1) and also t=1,...,T t|t−1 max ∥(Π )−1∥=O(1), for all T ∈N, thus t=1,...,T t|t−1 ∥Π∥=O(1), ∥Π−1∥=O(1). (I.2) This completes the proof. □ Lemma I.4. Under Assumptions 1, 2, and 6, if ∥P 0|0 ∥=O(nγ) for some γ >0, then nmax t=t¯,...,T ∥P t|t−1 −P∥=o(1) and nmax t=t¯,...,T ∥Π t|t−1 −Π∥=o(1), where t¯=⌈2+γ/2⌉. Proof. Let Ψ =I and, for t=2,...,T, 1,1 r t−1 Ψ = (cid:89) [A−AP Λ′(Λ P Λ′ +Σξ)−1Λ ]. (I.3) t,1 s|s−1 n n s|s−1 n n n s=1 Then, from Anderson and Moore (1979, Chapter 4.4, pp. 76-81), we have, for t=1,...,T, P −P={A−APΛ′(Λ PΛ′ +Σξ)−1Λ }t−1(P −P)Ψ t|t−1 n n n n n 1|0 t,1 =At−1{I −PΛ′(Λ PΛ′ +Σξ)−1Λ }t−1(P −P)Ψ r n n n n n 1|0 t,1 =At−1{I −(Λ′(Σξ)−1Λ +P−1)−1Λ′(Σξ)−1Λ }t−1(P −P)Ψ , (I.4) r n n n n n n 1|0 t,1 where we used Lemma D.13. Now, for t=2,...,T, from (I.3) and using again Lemma D.13, there exists a n¯ such that for all n≥n¯ t−1 ∥Ψ ∥≤∥A∥t−1(cid:89) ∥I −(Λ′(Σξ)−1Λ +P−1 )−1Λ′(Σξ)−1Λ ∥≤M n−(t−1), (I.5) t,1 r n n n s|s−1 n n n 0 s=1 for some finite positive real M independent of n and t, because of Lemmas C.6(i) and D.7(ii), and Assumption 1(d). 0 Moreover, ∥P −P∥=∥AP −P∥≤∥A∥∥P ∥+∥A∥∥P∥≤2∥A∥∥P ∥, (I.6) 1|0 0|0 0|0 0|0 Page 86

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm since because of Lemma D.6, ∥P∥≤∥P ∥≤∥P ∥≤P ∥. t|t−1 1|0 0|0 Therefore, for t=2,...,T, from (I.4), (I.5), and (I.6), there exists a n¯ such that for all n≥n¯ ∥P −P∥≤2∥A∥t∥I −(Λ′(Σξ)−1Λ +P−1)−1Λ′(Σξ)−1Λ }∥t−1∥P ∥∥Ψ ∥ t|t−1 r n n n n n n 0|0 t,1 ≤M n−2(t−1)∥P ∥, (I.7) 1 0|0 for some finite positive real M independent of n and t, because of Lemmas C.6(i) and D.7(ii), and Assumption 1(d). 1 Now, define t¯the first point in time such that the rhs of (I.7) is o(n−1), i.e., such that n−2(t¯−1)n1+ϵ ≤M ∥P ∥−1, 2 0|0 for any ϵ > 0 and some finite positive real M independent of n and t, or equivalently, by letting K = logM , t¯is such 2 2 that: t¯≥(3+ϵ)/2+log∥P ∥/(2logn)−K/(2logn). (I.8) 0|0 Clearly (I.8) is always satisfied if we set t¯= ⌈2+log∥P ∥/(2logn)⌉. Letting now ∥P ∥ = O(nγ) for some γ > 0, 0|0 0|0 then we have t¯= ⌈2+γ/2⌉ satisfies (I.8) and therefore we have at least max t=t¯,...,T ∥P t|t−1 −P∥ = O(n−1). Notice that if γ = 0, i.e., ∥P ∥ is finite, then t¯= 2 and in this case from (I.7) we have an even tighter bound as we get 0|0 max ∥P −P∥=O(n−2). t=2,...,T t|t−1 The proof for Π can be done analogously but using the recursions in Harvey and Delle Monache (2009). This t|t−1 completes the proof. □ Lemma I.5. Under Assumptions 1, 2, and 6, (i) nmax t=t¯,...,T ∥P t ∥=O(1); (ii) nmax ∥Π ∥=O(1); t=1,...,T t|t (iii) nmax t¯,...,T ∥Π t|t −P t ∥=o(1), (iv) max t=t¯,...,T ∥nP t −W t ∥=o(1); where Π is defined in (22), P is obtained from Π when replacing P and Π with their steady states defined t|t t t|t t|t−1 t|t−1 in Lemmas I.2 and I.3, respectively, and W is defined in Proposition 3. t Proof. We have P =Π+PΛ′(Λ PΛ′ +Σξ)−1Λ ΠΛ′(Λ PΛ′ +Σξ)−1Λ P t n n n n n n n n n n −PΛ′(Λ PΛ′ +Σξ)−1Λ Π−ΠΛ′(Λ PΛ′ +Σξ)−1Λ P n n n n n n n n n n +PΛ′(Λ PΛ′ +Σξ)−1Γξ(Λ PΛ′ +Σξ)−1Λ P. (I.9) n n n n n n n n n Hereafter, for simplicity of notation let H =(Λ′(Σξ)−1Λ )−1. Using twice Lemma C.4 we have n n n P(H+P)−1 =P (cid:8) P−1−(H+P)−1HP−1(cid:9) =I −P (cid:8) P−1−(H+P)−1HP−1(cid:9) HP−1 r =I −HP−1+P(H+P)−1HP−1HP−1 r =I −HP−1+C, say. (I.10) r Then, by Lemma D.10 and (I.10) Π−PΛ′(Λ PΛ′ +Σξ)−1Λ Π= (cid:8) I −P(H+P)−1(cid:9) Π=HP−1Π−CΠ. (I.11) n n n n n r Page 87

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Similarly, again by Lemma D.10 and (I.10) PΛ′(Λ PΛ′ +Σξ)−1Λ ΠΛ′(Λ PΛ′ +Σξ)−1Λ P−ΠΛ′(Λ PΛ′ +Σξ)−1Λ P n n n n n n n n n n n n n n n =P(H+P)−1Π(H+P)−1P−ΠP(H+P)−1 = (cid:8) I −HP−1+C (cid:9) Π (cid:8) I −P−1H+C (cid:9) −Π (cid:8) I −P−1H+C (cid:9) r r r =Π−ΠP−1H−HP−1Π+HP−1ΠP−1H−Π+ΠP−1H +CΠ+ΠC−HP−1ΠC−CΠP−1H+CΠC−ΠC = −HP−1Π+HP−1ΠP−1H+CΠ−HP−1ΠC−CΠP−1H+CΠC. (I.12) By substituting (I.11) and (I.12) into (I.9): P =PΛ′(Λ PΛ′ +Σξ)−1Γξ(Λ PΛ′ +Σξ)−1Λ P t n n n n n n n n n +HP−1Π−CΠ−HP−1Π+HP−1ΠP−1H+CΠ−HP−1CΠ−CΠP−1H+CΠC =PΛ′(Λ PΛ′ +Σξ)−1Γξ(Λ PΛ′ +Σξ)−1Λ P n n n n n n n n n −CΠ+HP−1ΠP−1H+CΠ−HP−1ΠC−CΠP−1H+CΠC =(Λ′(Σξ)−1Λ +P−1)−1Λ′(Σξ)−1Γξ(Σξ)−1Λ (Λ′(Σξ)−1Λ +P−1)−1 n n n n n n n n n n n +HP−1ΠP−1H−HP−1ΠC−CΠP−1H+CΠC. (I.13) where in the last step we used Lemma D.13. Moreover, ∥H∥=∥(Λ′(Σξ)−1Λ )−1∥=O(n−1), (I.14) n n n ∥C∥=∥P((Λ′(Σξ)−1Λ )−1+P)−1(Λ′(Σξ)−1Λ )−1P−1(Λ′(Σξ)−1Λ )−1P−1∥ n n n n n n n n n ≤∥P∥∥(P)−1∥2∥((Λ′(Σξ)−1Λ )−1+P)−1∥∥(Λ′(Σξ)−1Λ )−1∥2 =O(n−2),. (I.15) n n n n n n becauseofLemmaC.3(iii),(I.1)intheproofofLemmaI.2,andsince,byMerikoskiandKumar(2004,Theorem1)which is Weyl’s inequality, (cid:110) (cid:111)−1 ∥((Λ′(Σξ)−1Λ )−1+P)−1∥= ν(r)((Λ′(Σξ)−1Λ )−1+P) n n n n n n (cid:110) (cid:111)−1 ≤ ν(r)((Λ′(Σξ)−1Λ )−1)+ν(r)(P) n n n (cid:26)(cid:104) (cid:105)−1 (cid:27)−1 = ν(1)(Λ′(Σξ)−1Λ ) +ν(r)(P) n n n (cid:26)(cid:104) (cid:105)−1 (cid:27)−1(cid:110) (cid:111)−1 = ν(1)(Λ′(Σξ)−1Λ )ν(r)(P) +1 ν(r)(P) n n n (cid:26) (cid:104) (cid:105)−1 (cid:27)(cid:110) (cid:111)−1 = 1− ν(1)(Λ′(Σξ)−1Λ )ν(r)(P) ν(r)(P) +O (n−2) n n n p =O (1), p again by Lemma C.3(iii) and (I.1) in the proof of Lemma I.2. Therefore, from (I.13), (I.14), and (I.15) n∥P ∥≤n∥(Λ′(Σξ)−1Λ +P−1)−1∥2∥Λ′(Σξ)−1∥2∥Γξ∥ t n n n n n n +n∥H∥2∥(P)−1∥2∥Π∥+2n∥H∥∥(P)−1∥∥Π∥∥C∥+n∥C∥2∥Π∥ =O(1)+O(n−1)+O(n−2)+O(n−3), (I.16) where we used also (I.1) and (I.2) in the proofs of Lemmas I.2 and I.3, respectively. Since P does not depend on t we t prove part (i). Part (ii) is proved analogously by substituting in part (i) P and Π with P and Π and using t|t−1 t|t−1 Lemma D.7(i) and D.7(ii) instead of (I.1) and (I.2). Part (iii) follows directly from Lemmas I.2, I.3, and I.4, and parts (i) and (ii). Page 88

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Turning to part (iv), consider the first term on the rhs of (I.13), using Lemma D.13 we have PΛ′(Λ PΛ′ +Σξ)−1Γξ(Λ PΛ′ +Σξ)−1Λ P=(H−1+P−1)−1Λ′(Σξ)−1Γξ(Σξ)−1Λ (H−1+P−1)−1 n n n n n n n n n n n n n n (cid:110) (cid:111) = (H−1+P−1)−1Λ′(Σξ)−1−HΛ′(Σξ)−1+HΛ′(Σξ)−1 Γξ n n n n n n n (cid:110) (cid:111) · (Σξ)−1Λ (H−1+P−1)−1−(Σξ)−1Λ H+(Σξ)−1Λ H n n n n n n =HΛ′(Σξ)−1Γξ(Σξ)−1Λ H n n n n n (cid:110) (cid:111) + (H−1+P−1)−1Λ′(Σξ)−1−HΛ′(Σξ)−1 Γξ(Σξ)−1Λ H n n n n n n n (cid:110) (cid:111) +HΛ′(Σξ)−1Γξ (Σξ)−1Λ (H−1+P−1)−1−(Σξ)−1Λ H n n n n n n n (cid:110) (cid:111) (cid:110) (cid:111) + (H−1+P−1)−1Λ′(Σξ)−1−HΛ′(Σξ)−1 Γξ (Σξ)−1Λ (H−1+P−1)−1−(Σξ)−1Λ H n n n n n n n n n =HΛ′(Σξ)−1Γξ(Σξ)−1Λ H+A+A′+B, say. (I.17) n n n n n Then, n∥A∥≤n∥(Λ′(Σξ)−1Λ +P−1)−1Λ′(Σξ)−1−(Λ′(Σξ)−1Λ )−1Λ′(Σξ)−1∥∥H∥∥Λ′(Σξ)−1∥∥Γξ∥ n n n n n n n n n n n n n =O(n−3/2), (I.18) because of Lemmas C.1(v), C.3(vii), and C.6(iii), and (I.14). Similarly, n∥B∥≤n∥(Λ′(Σξ)−1Λ +P−1)−1Λ′(Σξ)−1−(Λ′(Σξ)−1Λ )−1Λ′(Σξ)−1∥2∥Γξ∥ n n n n n n n n n n n =O(n−3), (I.19) because of Lemmas C.1(v) and C.6(iii). Furthermore, n∥HΛ′(Σξ)−1Γξ(Σξ)−1Λ H∥≤n∥H∥2∥Λ′(Σξ)−1∥2∥Γξ∥=O(1), (I.20) n n n n n n n n because of Lemmas C.1(v), C.3(vii), and (I.14). Thus, by using (I.18), (I.19), and (I.20), from (I.17), we have n∥PΛ′(Λ PΛ′ +Σξ)−1Γξ(Λ PΛ′ +Σξ)−1Λ P−HΛ′(Σξ)−1Γξ(Σξ)−1Λ H∥ n n n n n n n n n n n n n n ≤2n∥A∥+n∥B∥=O(n−3/2). (I.21) And, by using (I.21) and (I.16) into (I.13) we have n∥P −HΛ′(Σξ)−1Γξ(Σξ)−1Λ H∥≤n∥PΛ′(Λ PΛ′ +Σξ)−1Γξ(Λ PΛ′ +Σξ)−1Λ P−HΛ′(Σξ)−1Γξ(Σξ)−1Λ H∥ t n n n n n n n n n n n n n n n n n n n +n∥H∥2∥(P)−1∥2∥Π∥+2n∥H∥∥(P)−1∥∥Π∥∥C∥+n∥C∥2∥Π∥ =O(n−1). (I.22) Finally, notice that, by definition: lim nHΛ′(Σξ)−1Γξ(Σξ)−1Λ H = lim nHn−1Λ′(Σξ)−1Γξ(Σξ)−1Λ nH n n n n n n n n n n n→∞ n→∞ (cid:110) (cid:111) =(Σ )−1 lim n−1Λ′(Σξ)−1Γξ(Σξ)−1Λ (Σ )−1 =W . (I.23) ΛΣΛ n n n n n ΛΣΛ t n→∞ Therefore, from (I.22) and (I.23) ∥nP −W ∥≤∥nP −nHΛ′(Σξ)−1Γξ(Σξ)−1Λ H∥+∥nHΛ′(Σξ)−1Γξ(Σξ)−1Λ H−W ∥=O(n−1)+o(1), t t t n n n n n n n n n n t and since P and W do not depend on t we complete the proof. □ t t Page 89

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm J Data description and data treatment This appendix presents the dataset by Barigozzi and Luciani (2023) that we used for the empirical analysis. Table J1: List of Abbreviations for Table J2 and Table J3 Source: BLS = U.S. Department of Labor: Bureau of Labor Statistics BEA = U.S. Department of Commerce: Bureau of Economic Analysis ISM = Institute for Supply Management CB = U.S. Department of Commerce: Census Bureau FRB = Board of Governors of the Federal Reserve System EIA = Energy Information Administration WSJ = Wall Street Journal F = Frequency T = Transformation SA U = Units Q = Quarterly 0 = None 0 = no 1000-P = Thousands of Persons M = Monthly 2 = ∆ 1 = yes 1000-U = Thousands of Units D = Daily 3 = ∆log BoC = Billions of Chained $-B = Dollars per Barrel Table J2: Data Description and Data Treatment N Series ID Definition Unit F S SA T 1 GDPH RealGrossDomesticProduct BoC2012$ Q BEA 1 3 2 GDYH RealGrossDomesticIncome BoC2012$ Q BEA 1 3 3 FSH RealFinalSalesofDomesticProduct BoC2012$ Q BEA 1 3 4 IH RealGrossPrivateDomesticInvestment BoC2012$ Q BEA 1 3 5 GSH RealState&LocalConsumptionExpenditures&GrossInvestment BoC2012$ Q BEA 1 3 6 FRH RealPrivateResidentialFixedInvestment BoC2012$ Q BEA 1 3 7 FNH RealPrivateNonresidentialFixedInvestment BoC2012$ Q BEA 1 3 8 MH RealImportsofGoods&Services BoC2012$ Q BEA 1 3 9 GH RealGovernmentConsumptionExpenditures&GrossInvestment BoC2012$ Q BEA 1 3 10 XH RealExportsofGoods&Services BoC2012$ Q BEA 1 3 14 CH RealPersonalConsumptionExpenditures BoC2012$ Q BEA 1 3 11 CNH RealPersonalConsumptionExpenditures: NondurableGoods BoC2012$ Q BEA 1 3 12 CSH RealPersonalConsumptionExpenditures: Services BoC2012$ Q BEA 1 3 13 CDH RealPersonalConsumptionExpenditures: DurableGoods BoC2012$ Q BEA 1 3 15 GFDIH RealNationalDefenseGrossInvestment BoC2012$ Q BEA 1 3 16 GFNIH RealFederalNondefenseGrossInvestment BoC2012$ Q BEA 1 3 17 YPDH RealDisposablePersonalIncome BoC2012$ Q BEA 1 3 18 JI GrossPrivateDomesticInvestmentChain-typePriceIndex 2012=100 Q BEA 1 3 19 JGDP GrossDomesticProductChain-typePriceIndex 2012=100 Q BEA 1 3 20 LXNFU UnitLaborCost(NonfarmBusinessSector) 2012=100 Q BLS 1 3 21 LXNFR RealCompensationPerHour(NonfarmBusinessSector) 2012=100 Q BLS 1 3 22 LXNFC CompensationPerHour(NonfarmBusinessSector) 2012=100 Q BLS 1 3 23 LXNFH HoursofAllPersons(NonfarmBusinessSector) 2012=100 Q BLS 1 3 24 LXNFA OutputPerHourofAllPersons(NonfarmBusinessSector) 2012=100 Q BLS 1 3 25 LXMU UnitLaborCost(Manufacturing) 2012=100 Q BLS 1 3 26 LXMR RealCompensationPerHour(Manufacturing) 2012=100 Q BLS 1 3 27 LXMC CompensationPerHour(Manufacturing) 2012=100 Q BLS 1 3 28 LXMH HoursofAllPersons(Manufacturing) 2012=100 Q BLS 1 3 29 LXMA OutputPerHourofAllPersons(Manufacturing) 2012=100 Q BLS 1 3 30 IP IndustrialProductionIndex 2012=100 M FRB 1 3 31 IP521 IndustrialProduction: BusinessEquipment 2012=100 M FRB 1 3 32 IP511 IndustrialProduction: DurableConsumerGoods 2012=100 M FRB 1 3 33 IP531 IndustrialProduction: DurableMaterials 2012=100 M FRB 1 3 34 IP512 IndustrialProduction: NondurableConsumerGoods 2012=100 M FRB 1 3 35 IP532 IndustrialProduction: nondurableMaterials 2012=100 M FRB 1 3 36 PCU CPI-U:AllItems 82-84=100 M BLS 1 3 37 PCUSE CPI-U:Energy 82-84=100 M BLS 1 3 38 PCUSLFE CPI-U:AllItemsLessFoodandEnergy 82-84=100 M BLS 1 3 39 PCUFO CPI-U:Food 82-84=100 M BLS 1 3 Page 90

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Table J3: Data Description and Data Treatment N Series ID Definition Unit F S SA T 40 JCBM PCE:ChainPriceIndex 2012=100 M BEA 1 3 41 JCEBM PCE:EnergyGoods&Services-priceindex 2012=100 M BEA 1 3 42 JCNFOM PCE:Food&Beverages-priceindexPurchasedforOff-PremisesConsumption 2012=100 M BEA 1 3 43 JCXFEBM PCElessFood&Energy-priceindex 2012=100 M BEA 1 3 44 JCSBM PCE:Services-priceindex 2012=100 M BEA 1 3 45 JCDBM PCE:DurableGoods-priceindex 2012=100 M BEA 1 3 46 JCNBM PCE:NondurableGoods-priceindex 2012=100 M BEA 1 3 47 PC1 PPI:IntermediateDemandProcessedGoods 1982=100 M BLS 1 3 48 P05 PPI:FuelsandRelatedProductsandPower 1982=100 M BLS 0 3 49 SP3000 PPI:FinalDemandPersonalConsumptionGds[FinishedConsumerGds] 1982=100 M BLS 1 3 50 SP3000 PPI:FinishedGoods 1982=100 M BLS 1 3 51 PIN PPI:IndustrialCommodities 1982=100 M BLS 0 3 52 PA PPI:AllCommodities 1982=100 M BLS 0 3 53 FMC MoneyStock: Currency Bil. of$ M FRB 1 3 54 FM1 MoneyStock: M1 Bil. of$ M FRB 1 3 55 FM2 MoneyStock: M2 Bil. of$ M FRB 1 3 56 FABWC C&ILoansinBankCreditAllCommercialBanks Bil. of$ M FRB 1 3 57 FABWQ ConsumerLoansinBankCreditAllCommercialBanks Bil. of$ M FRB 1 3 58 FAB BankCreditAllCommercialBanks Bil. of$ M FRB 1 3 59 FABW Loans&LeasesinBankCreditAllCommercialBanks Bil. of$ M FRB 1 3 60 FABYO OtherSecuritiesinBankCreditAllCommercialBanks Bil. of$ M FRB 1 3 61 FABWR RealEstateLoansinBankCreditAllCommercialBanks Bil. of$ M FRB 1 3 62 FOT ConsumerCreditOutstanding Bil. of$ M FRB 1 3 63 HSTMW HousingStarts: Midwest 1000-U M CB 1 3 64 HSTNE HousingStarts: Northeast 1000-U M CB 1 3 65 HSTS HousingStarts: South 1000-U M CB 1 3 66 HSTGW HousingStarts: West 1000-U M CB 1 3 67 HPT BuildingPermitNewPrivateHousingUnitsAuthorizedby 1000-U M CB 1 3 68 FBPR BankPrimeLoanRate Percent M FRB 0 1 69 FFED FederalFunds[effective]Rate Percent M FRB 0 1 70 FCM1 1-YearTreasuryBillYieldatConstantMaturity Percent M FRB 0 1 71 FCM10 10-YearTreasuryNoteYieldatConstantMaturity Percent M FRB 0 1 72 LP CivilianParticipationRate: 16yr+ Percent M BLS 0 2 73 LQ CivilianEmployment/PopulationRatio: 16yr+ Percent M BLS 0 2 74 LE CivilianEmployment: SixteenYears&Over 1000-P M BLS 0 3 75 LR CivilianUnemploymentRate: 16yr+ Percent M BLS 0 2 76 LU0 CiviliansUnemployedforLessThan5Weeks 1000-P M BLS 0 3 77 LU5 CiviliansUnemployedfor5-14Weeks 1000-P M BLS 0 3 78 LU15 CiviliansUnemployedfor15-26Weeks 1000-P M BLS 0 3 79 LUT27 CiviliansUnemployedfor27WeeksandOver 1000-P M BLS 0 3 80 LUAD Average[Mean]DurationofUnemployment Weeks M BLS 0 3 81 LANAGRA AllEmployees: TotalNonfarm 1000-P M BLS 0 3 82 LAPRIVA AllEmployees: TotalPrivateIndustries 1000-P M BLS 0 3 83 LANTRMA AllEmployees: MiningandLogging 1000-P M BLS 0 3 84 LACONSA AllEmployees: Construction 1000-P M BLS 0 3 85 LAMANUA AllEmployees: Manufacturing 1000-P M BLS 0 3 86 LATTULA AllEmployees: Trade,Transportation&Utilities 1000-P M BLS 0 3 87 LAINFOA AllEmployees: InformationServices 1000-P M BLS 0 3 88 LAFIREA AllEmployees: FinancialActivities 1000-P M BLS 0 3 89 LAPBSVA AllEmployees: Professional&BusinessServices 1000-P M BLS 0 3 90 LAEDUHA AllEmployees: Education&HealthServices 1000-P M BLS 0 3 91 LALEIHA AllEmployees: Leisure&Hospitality 1000-P M BLS 0 3 92 LASRVOA AllEmployees: OtherServices 1000-P M BLS 0 3 93 LAGOVTA AllEmployees: Government 1000-P M BLS 0 3 94 LAFGOVA AllEmployees: FederalGovernment 1000-P M BLS 0 3 95 LASGOVA AllEmployees: StateGovernment 1000-P M BLS 0 3 96 LALGOVA AllEmployees: LocalGovernment 1000-P M BLS 0 3 97 PETEXA WestTexasIntermediateSpotPriceFOB,Cushing,Oklahoma $-B M EIA 0 3 98 NAPMNI ISMMfg: NewOrdersIndex Index M ISM 1 2 99 NAPMOI ISMMfg: ProductionIndex Index M ISM 1 2 100 NAPMEI ISMMfg: EmploymentIndex Index M ISM 1 2 101 NAPMVDI ISMMfg: SupplierDeliveriesIndex Index M ISM 1 2 102 NAPMII ISMMfg: InventoriesIndex Index M ISM 1 2 103 SP500 Standard&Poor’s500StockPriceIndex 41-43=10 D WSJ 0 3 Page 91

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm References Akaike, H. (1973). Block Toeplitz Matrix Inversion. SIAM Journal on Applied Mathematics 24(2), 234–241. Anderson, B. D. O. and J. B. Moore (1979). Optimal Filtering. Dover Publications, Inc. Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71, 135–171. Bai, J. and K. Li (2012). Statistical analysis of factor models of high dimension. The Annals of Statistics 40, 436–465. Bai, J. and K. Li (2016). Maximum likelihood estimation and inference for approximate factor models of high dimension. The Review of Economics and Statistics 98, 298–309. Bakhshizadeh, M., A. Maleki, and V. H. de la Pena (2023). Sharp concentration results for heavy-tailed distributions. Information and Inference: A Journal of the IMA 12, 1655–1685. Barigozzi, M. (2023). Asymptotic equivalence of principal component and quasi maximum likelihood estimators in large approximate factor models. Technical Report arXiv:2307.09864. Barigozzi, M., H. Cho, and P. Fryzlewicz (2018). Simultaneous multiple change-point and factor analysis for high-dimensional time series. Journal of Econometrics 206, 187–225. Booth,J.G.andJ.P.Hobert(1999). Maximizinggeneralizedlinearmixedmodellikelihoodswithanautomated Monte Carlo EM algorithm. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61, 265–285. Bosq, D. (2012). Nonparametric statistics for stochastic processes: estimation and prediction. Springer Science & Business Media. Bradley,R.C.(2005). Basicpropertiesofstrongmixingconditions.asurveyandsomeopenquestions. Probability Surveys 2, 107–144. Chan,S.,G.C.Goodwin,andK.Sin(1984). ConvergencepropertiesoftheRiccatidifferenceequationinoptimal filtering of nonstabilizable systems. IEEE Transactions on Automatic Control 29, 110–118. Davidson, J. (1994). Stochastic Limit Theory. Oxford University Press. Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 39, 1–38. Durbin, J. and S. J. Koopman (2012). Time Series Analysis by State Space Methods. Oxford University Press. Fan, J., Y. Liao, and M. Mincheva (2011). High dimensional covariance matrix estimation in approximate factor models. The Annals of Statistics 39, 3320. Forni, M., D. Giannone, M. Lippi, and L. Reichlin (2009). Opening the black box: Structural factor models versus structural VARs. Econometric Theory 25, 1319–1347. Gourieroux, C. and A. Monfort (1995). Statistics and Econometric Models, Volume 1. Cambridge University Press. Gray, R. M. (2006). Toeplitz and circulant matrices: A review. Foundations and Trends® in Communications and Information Theory 2, 155–239. Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press. Page 92

Supplementary material for the paper: QMLE of Large Approximate DFM via the EM algorithm Harvey, A. C. (1990). Forecasting, structural time series models and the Kalman filter. Cambridge University Press. Harvey, A. C. and D. Delle Monache (2009). Computing the mean square error of unobserved components extracted by misspecified time series models. Journal of Economic Dynamics and Control 33, 283–295. Henderson,H.V.andS.R.Searle(1981). Onderivingtheinverseofasumofmatrices. SIAM Review 23,53–60. Ibragimov,I.A.(1962).Somelimittheoremsforstationaryprocesses.TheoryofProbabilityanditsApplications 7, 349–382. Kuchibhotla, A. K. and A. Chakrabortty (2022). Moving beyond sub-Gaussianity in high-dimensional statistics: Applications in covariance estimation and linear regression. Information and Inference: A Journal of the IMA 11, 1389–1456. Marshall, A. W., I. Olkin, and B. C. Arnold (2011). Inequalities: Theory of Majorization and Its Applications. Springer-Verlag New York. McLachlan, G. and T. Krishnan (2007). The EM algorithm and extensions, Volume 382. John Wiley & Sons. Meng,X.-L.andD.B.Rubin(1994). OntheglobalandcomponentwiseratesofconvergenceoftheEMalgorithm. Linear Algebra and its Applications 199, 413–425. Merikoski,J.K.andR.Kumar(2004).Inequalitiesforspreadsofmatrixsumsandproducts.AppliedMathematics E-Notes 4, 150–159. Merlevède, F., M. Peligrad, and E. Rio (2011). A Bernstein type inequality and moderate deviations for weakly dependent sequences. Probability Theory and Related Fields 151, 435–474. Pham,T.D.andL.T.Tran(1985). Somemixingpropertiesoftimeseriesmodels. Stochastic processes and their applications 19, 297–303. Rosenblatt, M. (1972). Uniform ergodicity and strong mixing. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 24(1), 79–84. Stout, W. F. (1974). Almost Sure Convergence. Academic press. Sundberg, R. (1974). Maximum likelihood theory for incomplete data from an exponential family. Scandinavian Journal of Statistics 1, 49–58. Sundberg, R. (1976). An iterative method for solution of the likelihood equations for incomplete data from exponential families. Communication in Statistics-Simulation and Computation 5, 55–64. Sundberg, R. (2019). Statistical modelling by exponential families. Cambridge University Press. White, H. (2001). Asymptotic Theory for Econometricians. Academic press. Wu, J. C. F. (1983). On the convergence properties of the EM algorithm. The Annals of Statistics 11, 95–103. Page 93

Cite this document
APA
Matteo Barigozzi and Matteo Luciani (2024). Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm (FEDS 2024-086). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2024-086
BibTeX
@techreport{wtfs_feds_2024_086,
  author = {Matteo Barigozzi and Matteo Luciani},
  title = {Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm},
  type = {Finance and Economics Discussion Series},
  number = {2024-086},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2024},
  url = {https://whenthefedspeaks.com/doc/feds_2024-086},
  abstract = {We study estimation of large Dynamic Factor models implemented through the Expectation Maximization (EM) algorithm, jointly with the Kalman smoother. We prove that as both the cross-sectional dimension, n, and the sample size, T , diverge to infinity: (i) the estimated loadings are √ T -consistent, asymptotically normal and equivalent to their Quasi Maximum Likelihood estimates; (ii) the estimated factors are √ n -consistent, asymptotically normal and equivalent to their Weighted Least Squares estimates. Moreover, the estimated loadings are asymptotically as efficient as those obtained by Principal Components analysis, while the estimated factors are more efficient if the idiosyncratic covariance is sparse enough.},
}