feds · June 30, 2015

Modelling Dependence in High Dimensions with Factor Copulas

Abstract

This paper presents flexible new models for the dependence structure, or copula, of economic variables based on a latent factor structure. The proposed models are particularly attractive for relatively high dimensional applications, involving fifty or more variables, and can be combined with semiparametric marginal distributions to obtain flexible multivariate distributions. Factor copulas generally lack a closed-form density, but we obtain analytical results for the implied tail dependence using extreme value theory, and we verify that simulation-based estimation using rank statistics is reliable even in high dimensions. We consider "scree" plots to aid the choice of the number of factors in the model. The model is applied to daily returns on all 100 constituents of the S&P 100 index, and we find significant evidence of tail dependence, heterogeneous dependence, and asymmetric dependence, with dependence being stronger in crashes than in booms. We also show that factor copula models provide superior estimates of some measures of systemic risk.

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Modelling Dependence in High Dimensions with Factor Copulas Dong Hwan Oh and Andrew J. Patton 2015-051 Please cite this paper as: Dong Hwan Oh and Andrew J. Patton (2015). “Modelling Dependence in High Dimensions with Factor Copulas,” Finance and Economics Discussion Series 2015-051. Washington: Board of Governors of the Federal Reserve System, http://dx.doi.org/10.17016/FEDS.2015.051. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Modelling Dependence in High Dimensions with Factor Copulas (cid:3) Dong Hwan Oh Andrew J. Patton y z Federal Reserve Board Duke University This version: 18 May 2015 Abstract This paper presents (cid:135)exible new models for the dependence structure, or copula, of economic variables based on a latent factor structure. The proposed models are particularly attractive for relatively high dimensionalapplications,involving(cid:133)ftyormorevariables,andcanbecombinedwithsemiparametricmarginal distributions to obtain (cid:135)exible multivariate distributions. Factor copulas generally lack a closed-form density, but we obtain analytical results for the implied tail dependence using extreme value theory, and we verifythatsimulation-basedestimationusingrankstatisticsisreliableeveninhighdimensions. Weconsider (cid:147)scree(cid:148)plotstoaidthechoiceofthenumberoffactorsinthemodel. Themodelisappliedtodailyreturnson all100constituentsoftheS&P100index,andwe(cid:133)ndsigni(cid:133)cantevidenceoftaildependence,heterogeneous dependence, and asymmetric dependence, with dependence being stronger in crashes than in booms. We also show that factor copula models provide superior estimates of some measures of systemic risk. Keywords: correlation, dependence, copulas, tail dependence, systemic risk. J.E.L. codes: C31, C32, C51. (cid:3)We thank Tim Bollerslev, Frank Diebold, Dick van Dijk, Yanqin Fan, Eric Ghysels, Jia Li, George Tauchen, Casper de Vries, and seminar participants at the Board of Governors of the Federal Reserve, Chicago Booth, Duke, Econometric Society Asian and Australiasian meetings, Erasmus University Rotterdam, Humboldt-Copenhagen Financial Econometrics workshop, Monash, NBER Summer Institute, NC State, Princeton, SoFiE 2012, Triangle econometrics workshop, QUT, Vanderbilt and the Volatility Institute at NYU for helpful comments. The views expressed in this paper are those of the authors and do not necessarily re(cid:135)ect those of the Federal Reserve Board. yQuantitativeRiskAnalysisSection,FederalReserveBoard,WashingtonDC20551. Email: donghwan.oh@frb.gov zDepartment of Economics, Duke University, Box 90097, Durham NC 27708. Email: andrew.patton@duke.edu 1

1 Introduction Oneofthemanysurprisesfromthe(cid:133)nancialcrisisoflate2007to2008wastheextenttowhichassets thathadpreviouslybehavedmostlyindependentlysuddenlymoved together. This was particularly prominent in the (cid:133)nancial sector, where poor models of the dependence between certain asset returns (such as those on housing, and those related to mortgage defaults) are thought to be one of the causes of the collapse of the market for CDOs and related securities, see Coval et al. (2009) and Zimmer(2012)forexample. Manymodelsthatwerebeingusedtocapturethedependencebetween a large number of (cid:133)nancial assets were revealed as being inadequate during the crisis. However, one of the di¢ culties in analyzing risks across many variables is the relative paucity of econometric models suitable for the task. Correlation-based models, while useful when risk can be summarized using the second moment, are often built on an assumption of multivariate Gaussianity, and face the risk of neglecting dependence between the variables in the tails, i.e., neglecting the possibility that large crashes may be correlated across assets. This paper makes two primary contributions. First, we propose new models for the dependence structure, or copula, of economic variables based on a latent factor structure, the use of which makes them particularly attractive for relatively high dimensional applications, involving (cid:133)fty or more variables.1 These copula models may be combined with existing parametric, semiparametric, or nonparametric models for univariate distributions to construct (cid:135)exible yet tractable joint distributions for large collections of variables. The proposed copula models permit the researcher to determine the degree of (cid:135)exibility based on the number of variables and the amount of data available. For example, by allowing for a fat-tailed common factor the model captures the possibility of correlated crashes, and by allowing the common factor to be asymmetrically distributed the model allows for the possibility that the dependence between the variables is stronger during downturns than during upturns. By allowing for multiple common factors, it is possible to capture heterogeneous pair-wise dependence within the overall multivariate copula. High dimension economic applications often require some strong simplifying assumptions in order to keep the model tractable, and an important feature of the class of proposed models is that such assumptions can be made in an easily understandable manner, and can be tested and relaxed if needed. 1For recent work on high dimensional covariance matrix estimation, see Engle et al. (2008), Fan et al. (2008), Engle and Kelly (2012), Fan et al. (2012) and Hautsch et al. (2012). 2

Factor copulas do not generally have a closed-form density, but certain properties can nevertheless be obtained analytically. Using extreme value theory we obtain theoretical results on the tail dependence properties for general factor copulas, and for the speci(cid:133)c parametric class of factor copulas that we use in our empirical work. Given the lack of closed-form density, maximum likelihood estimation for these copulas is di¢ cult, and we employ the simulation-based estimator proposed in Oh and Patton (2013). In a supplemental appendix to this paper we verify that this estimator, and its associated asymptotic distribution theory, has good (cid:133)nite-sample properties even in dimensions as high as 100, which is the relevant size given our empirical analysis. We also consider the use of (cid:147)scree(cid:148)plots, based on eigenvalues of the variables(cid:146)rank correlation matrix, to aid the choice of the number of factors in the factor copula model. The second contribution of this paper is a study of the dependence structure of all 100 constituent (cid:133)rms of the Standard and Poor(cid:146)s 100 index, using daily data over the period 2008-2010. This is one of the highest dimension applications of copula theory in the econometrics literature. We (cid:133)nd signi(cid:133)cant evidence in favor of a fat-tailed common factor for these stocks (indicative of non-zero tail dependence), implying that the Normal (or Gaussian) copula is not suitable for these assets. Moreover, we (cid:133)nd signi(cid:133)cant evidence that the common factor is asymmetrically distributed, with crashes being more highly correlated than booms. Our empirical results suggest that risk management decisions made using the Normal copula may be based on too benign a view of theseassets,andderivativesecuritiesbasedonbasketsoftheseassets,e.g. CDOs,maybemispriced if based on a Normal copula. The fact that large negative shocks may originate from a fat-tailed common factor, and thus a⁄ect all stocks at once, makes the diversi(cid:133)cation bene(cid:133)ts of investing in these stocks lower than under Normality. In an application to estimating systemic risk, we show that our factor copula model provides superior estimates of two measures of systemic risk. Certain types of factorcopulas have already appeared in the literature. The models we consider areextensionsofHullandWhite(2004), inthatweretainasimplelinear, additivefactorstructure, but allow for the variables in the structure to have (cid:135)exibly speci(cid:133)ed distributions. Other variations on factor copulas are presented in Andersen and Sidenius (2004) and van der Voort (2005), who consider certain non-linear factor structures, and in Laurent and Gregory (2005) and Rogge and Sch(cid:246)nbucher (2003), who present factor copulas for modelling times-to-default. See McNeil et al. (2005, Chapter 9) for further discussion on similar applications. Krupskii and Joe (2013) also propose a class of factor-vine copulas, where the factor structure is implied by the choice of copula 3

linkingeachvariabletothelatentfactor(s). WiththeexceptionofMcNeiletal. (2005)andKrupskii and Joe (2013), the papers to date have not considered estimation of the unknown parameters of thesecopulas,insteadexaminingcalibrationandderivativespricingusingthesecopulas. Ourformal analysis of the estimation of copulas of dimension as high as 100 is new to the literature. Some methods for modelling high dimension copulas have previously been proposed in the literature, though few consider dimensions greater than twenty.2 The Normal copula, see Li (2000) amongst many others, is simple to implement but imposes the strong assumption of zero tail dependence, and symmetric dependence between booms and crashes. The Student(cid:146)s t copula and variants are discussed in Demarta and McNeil (2005). The (cid:147)grouped t(cid:148)copula is proposed by Daul et al. (2003), who apply this copula in analyses involving up to 100 variables. This copula allows for heterogeneous tail dependence between pairs of variables, but imposes that upper and lower tail dependence are equal, a (cid:133)nding we strongly reject for equity returns. Smith et al. (2012) extract the copula implied by a multivariate skew t distribution, and Christo⁄ersen et al. (2012) combine a skew t copula with a DCC model for conditional correlations in their study of 33 developed and emerging equity market indices, and Christo⁄ersen, et al. (2013) use the same model to study 233 equity returns and credit default swap spreads. Creal and Tsay (2014) propose a stochastic copula model based on a factor structure, and use Bayesian estimation methods to apply it to an unbalancedpanelofCDSspreadsandequityreturnson100(cid:133)rms. Archimedeancopulassuchasthe Clayton or Gumbel allow for tail dependence and particular forms of asymmetry, but usually have only a one or two parameter(s) to characterize the dependence between all variables, and are thus very restrictive in higher-dimension applications. (cid:147)Vine(cid:148)copulas are constructed by sequentially applying bivariate copulas to build up a higher-dimension copula, see Aas et al. (2009), Min and Czado (2010) and Almeida et al. (2012), for example, however vine copulas are almost invariably basedonanassumptionthatishardtointerpretandtotest,seeAcaret al. (2012)foracritique. In our empirical application we compare our proposed factor models with several alternative existing models, and show that our model outperforms them all in terms of goodness-of-(cid:133)t and in an application to measuring systemic risk. The remainder of the paper is structured as follows. Section 2 presents the class of factor copulas, derives their limiting tail properties, and considers some extensions and the use of (cid:147)scree(cid:148) plots to guide the choice of the number of factors. Section 3 describes the simulated method 2For general reviews of copulas in economics and (cid:133)nance see Cherubini, et al. (2004) and Patton (2012). 4

of moments (SMM) estimation method we use. Section 4 presents an empirical study of daily returns on individual constituents of the S&P 100 equity index over the period 2008-2010. An appendix contains a discussion of the dependence measures used in estimation, and a supplemental web appendix contains all proofs, and details of simulations used to study SMM estimation for applications with dimensions as large as ours. 2 Factor copulas For simplicity of exposition we focus on unconditional distributions in this section, and discuss the extension to conditional distributions in the next section. Consider a vector of N variables, Y; with some joint distribution F ; marginal distributions F ; and copula C : y i [Y 1 ;:::;Y N ] 0 Y s F y = C(F 1 ;:::;F N ) (1) (cid:17) The copula completely describes the dependence between the variables Y ;:::;Y : We will use 1 N existingmodelstoestimatethemarginaldistributionsF (whichmaybeparametric,semiparametric i or nonparametric), and focus on constructing useful new models for the dependence between these variables, C.3 Decomposing the joint distribution in this way has two important advantages over considering the joint distribution F directly. First, it facilitates multi-stage estimation, which y is particularly useful in high dimension applications, where the sparseness of the data and the potential proliferation of parameters can cause problems. Second, it allows the researcher to draw onthelargeliteratureonmodelsforunivariatedistributions,leaving(cid:147)only(cid:148)thetaskofconstructing a model for the copula, which is a simpler problem. 2.1 The copula of a latent factor structure The class of copulas we consider are those that can be generated by the following factor structure, based on a set of N +K latent variables: 3Although we treat estimation of the marginal distributions as separate from copula estimation, the inference methods we consider do take estimation error from the marginal distributions into account. 5

K Let X = (cid:12) Z +" , i = 1;2;:::;N i ik k i k=1 X so [X ;:::;X ] X = BZ+" (2) 1 N 0 (cid:17) where " i s iid F " ((cid:13) " ), Z k s inid F zk ((cid:13) k ), Z k ?? " i 8 i;k: Then X s F x = C(G 1 ((cid:18));:::;G N ((cid:18));(cid:18)) where (cid:18) (cid:17) vec(B) 0 ;(cid:13) 0" ;(cid:13) 01 ;:::;(cid:13) 0K 0: The copula of the latent variables X; denoted C((cid:18)); is used as the mod(cid:2)el for the copula of the(cid:3)observable variables Y:4 An important point about the above construction is that the marginal distributions of X may be di⁄erent from those of the original i variables Y ; so F = G in general; we use the structure for the vector X only for its copula, and i i i 6 completely discard the resulting marginal distributions. This is motivated by our desire to use the dimension-reduction technique of imposing a factor structure only in the component of the joint distribution that is di¢ cult to estimate in high dimensions, namely the copula. Marginal distributions, on the other hand, are usually able to be estimated (cid:135)exibly, given the amount of time series data that is available in most (cid:133)nancial applications. In our empirical application, for example, we employ semiparametric models for the marginal distributions, thus allowing for great (cid:135)exibility, but impose a factor structure on the copula to avoid the (cid:147)curse of dimensionality.(cid:148) The copula implied by equation (2) is generally not known in closed form. The leading case whereitis knowniswhen F ;F ;:::;F areallGaussiandistributions,inwhichcasethevariable f " z1 zKg X is multivariate Gaussian, implying a Gaussian copula. For other choices of F ;F ;:::;F the f " z1 zKg joint distribution of X; and thus the copula of X; is generally not known in closed form. However, it is simple to simulate from F ;F ;:::;F for many classes of distributions, and from simulated f " z1 zKg data we can extract properties of the copula, such as rank correlation, Kendall(cid:146)s tau, and quantile 4This method for constructing a copula model resembles the use of mixture models, e.g. the Normal-inverse Gaussian or generalized hyperbolic distributions, where the distribution of interest is obtained by considering a function of a collection of latent variables, see Barndor⁄-Nielsen (1978, 1997), and McNeil, et al. (2005). It can also be interpreted as a special case of the (cid:147)conditional independence structure(cid:148)of McNeil, et al. (2005), which is used to describe a set of variables that are independent conditional on some smaller set of variables, Xand Zin our notation. ThevariablesZaresometimesknownasthe(cid:147)frailty(cid:148),inthesurvivalanalysisandcreditdefaultliterature, see Du¢ e, et al. (2009) for example. 6

dependence. These simulated rank dependence measures can then be used in the SMM estimation method of Oh and Patton (2013), which is brie(cid:135)y described in Section 3 below. Key choices in specifying a factor copula include the following. Firstly, the distributions to use for the common and idiosyncratic variables must be chosen. If simulation-based estimation methods are to be used, then these distributions should be such that random draws from these are easy to obtain. (This is true for most commonly-used distributions.) Secondly, as discussed in more detail below, these distributions should be such that tail dependence and asymmetric dependence (here taken to mean that (cid:147)booms(cid:148)have a di⁄erent dependence structure to (cid:147)crashes(cid:148)) can be captured. Finally, the number of factors to consider (K) must be speci(cid:133)ed. Allowing for more than a single factor adds much (cid:135)exibility to the model, at a cost of a substantial increase in the number of parameters. We discuss these choices empirically in Section 4. 2.2 Tail dependence properties of factor copulas Although most factor copulas will not have a closed-form expression, using results from extreme value theory it is possible to obtain analytically results on the tail dependence implied by a given factor copula model. These results are relatively easy to obtain, given the simple linear structure generating the factor copula. Recall the de(cid:133)nition of tail dependence for two variables X ; X with i j marginal distributions G , G : i j Pr X G 1(q);X G 1(q) (cid:28)L lim i (cid:20) (cid:0)i j (cid:20) (cid:0)j (3) ij (cid:17) q 0 h q i ! Pr X > G 1(q);X > G 1(q) i (cid:0)i j (cid:0)j (cid:28)U lim ij (cid:17) q 1 h 1 q i ! (cid:0) That is, lower tail dependence measures the probability of both variables lying below their q quantile, for q limiting to zero, scaled by the probability of one of these variables lying below their q quantile. Upper tail dependence is de(cid:133)ned analogously. In Proposition 1 below we present results for a general single factor copula model: Proposition 1 (Tail dependence for a factor copula) Consider the factor copula generated by equation (2) with K = 1: Assume F and F have regularly varying tails with a common tail z " 7

index (cid:11) > 0, i.e., Pr[Z > s] s AU z s (cid:0) (cid:11) and Pr[" i > s] s AU " s (cid:0) (cid:11), as s (4) ! 1 Pr[Z < s] s AL z s (cid:0) (cid:11) and Pr[" i < s] s AL " s (cid:0) (cid:11) as s (cid:0) (cid:0) ! 1 where AL Z ; AU Z ; AL " and AU " are positive constants, and we write x s s y s if x s =y s ! 1 as s ! 1 : Then (a) if (cid:12) ;(cid:12) > 0 the lower and upper tail dependence coe¢ cients are: i j min (cid:12) ;(cid:12) (cid:11) AL min (cid:12) ;(cid:12) (cid:11) AU (cid:28)L = i j z , (cid:28)U = i j z (5) ij min (cid:12) ;(cid:12) (cid:11) AL+AL ij min (cid:12) ;(cid:12) (cid:11) AU +AU i(cid:0) j (cid:1)z " i(cid:0) j (cid:1)z " (b) if (cid:12) ;(cid:12) < 0 the lower and(cid:0) upper(cid:1)tail dependence coe¢ cie(cid:0)nts are(cid:1): i j min (cid:12) ; (cid:12) (cid:11) AU min (cid:12) ; (cid:12) (cid:11) AL (cid:28)L = j i j j z , (cid:28)U = j i j j z (6) ij min (cid:12) ; (cid:12) (cid:11) AU +AL ij min (cid:12) ; (cid:12) (cid:11) AL+AU (cid:0)i j(cid:12) (cid:12)(cid:1)z " (cid:0)i j(cid:12) (cid:12)(cid:1)z " j j (cid:12) (cid:12) j j (cid:12) (cid:12) (c) if (cid:12) (cid:12) = 0 or (d) if (cid:12)(cid:0)(cid:12) <(cid:12)0; t(cid:12)h(cid:1)e lower and upper tail de(cid:0)pende(cid:12)nce(cid:12)(cid:1)coe¢ cients are zero. i j i j (cid:12) (cid:12) (cid:12) (cid:12) All proofs are presented in the supplemental appendix. This proposition shows that when the coe¢ cients on the common factor have the same sign, and the common factor and idiosyncratic variables have the same tail index, the factor copula generates upper and lower tail dependence. If either Z or " is asymmetrically distributed, then the upper and lower tail dependence coe¢ cients can di⁄er, which provides this model with the ability to capture di⁄erences in the probabilities of joint crashes and joint booms. If Z has a thinner upper (for example) tail than lower tail, while " is symmetric with the same tail index as Z(cid:146)s lower tail, then upper tail dependence will be zero while the lower tail dependence will generally be positive. When either of the coe¢ cients on the common factor are zero, or if they have di⁄ering signs, then the upper and lower tail dependence coe¢ cients are both zero. The above proposition considers the case that the common factor and idiosyncratic variables have the same tail index; when these indices di⁄er we obtain a boundary result: if the tail index of Z is strictly greater than that of " and (cid:12) (cid:12) > 0 then tail dependence is one, while if the tail index i j of Z is strictly less than that of " then tail dependence is zero. In ourempirical analysis in Section 4, we willfocus on the Skew t distribution of Hansen (1994) as a model for the common factor and the standardized t distribution for the idiosyncratic shocks. Proposition 2 below presents the analytical tail dependence coe¢ cients for a factor copula based on these distributions. 8

Proposition 2 (Tail dependence for a Skew t-t factor copula) Consider the factor copula generated by equation (2) with K = 1: If F = Skew t((cid:23);(cid:21)) and F = t((cid:23)); then the tail inz " dices of Z and " equal (cid:23); and the constants AL; AU; AL and AU from Proposition 1 are given i z z " " by: AL = bc b2 (cid:0) ((cid:23)+1)=2 (7) z (cid:23) ((cid:23) 2)(1 (cid:21))2 (cid:18) (cid:0) (cid:0) (cid:19) AU = bc b2 (cid:0) ((cid:23)+1)=2 z (cid:23) ((cid:23) 2)(1+(cid:21))2 (cid:18) (cid:0) (cid:19) AL = AU = c 1 (cid:0) ((cid:23)+1)=2 " " (cid:23) (cid:23) 2 (cid:18) (cid:0) (cid:19) where a = 4(cid:21)c((cid:23) 2)=((cid:23) 1), b = 1+3(cid:21)2 a2, c = (cid:0) (cid:23)+1 = (cid:0) (cid:23) (cid:25)((cid:23) 2) : Given (cid:0) (cid:0) (cid:0) 2 2 (cid:0) Proposition1andtheexpressionsforALp; AU; AL andAU above(cid:0), we(cid:1)the(cid:16)no(cid:0)bta(cid:1)inpthetailde(cid:17)pendence z z " " coe¢ cients for this copula. We next generalize Proposition 1 to consider a multi-factor copula model, which will prove useful in our empirical application in Section 4. Proposition 3 (Tail dependence for a multi-factor copula) Consider the factor copula generatedbyequation(2). AssumeF , F ;:::;F haveregularlyvaryingtailswithacommontailindex " z1 zK (cid:11) > 0, and upper and lower tail coe¢ cients AU;AU;::;AU and AL;AL;::;AL: Then if (cid:12) 0 " 1 K " 1 K ik (cid:21) 8 i;k, the lower and upper tail dependence coe¢ cients are: K 1 (cid:12) (cid:12) > 0 AL(cid:12)(cid:11)(cid:14)(cid:11) (cid:28)L = k=1 ik jk k ik L;ijk (8) ij X A(cid:8)L+ K A(cid:9)L(cid:12)(cid:11) " k=1 k ik K 1 (cid:12) (cid:12) X > 0 AU(cid:12)(cid:11)(cid:14)(cid:11) (cid:28)U = k=1 ik jk k ik U;ijk ij X A(cid:8)U + K A(cid:9)U(cid:12)(cid:11) " k=1 k ik X where max 1;(cid:13) (cid:12) =(cid:12) ; if (cid:12) (cid:12) > 0 (cid:14) 1 Q;ij ik jk ik jk , for Q L;U (9) (cid:0)Q;ijk (cid:17) 8 2 f g (cid:8) 1; (cid:9) if (cid:12) (cid:12) = 0 < ik jk :A Q + K A Q (cid:12)(cid:11) 1=(cid:11) (cid:13) " k=1 k jk , for Q L;U (10) Q;ij (cid:17) 0 A Q + XK A Q (cid:12)(cid:11) 1 2 f g " k=1 k ik @ A X 9

The extensions to consider the case that some have opposite signs to the others can be accommodated using the same methods as in the proof of Proposition 1. In the one-factor copula model the variables (cid:14) and (cid:14) can be obtained directly and are determined by min (cid:12) ;(cid:12) ; in the L;ijk U;ijk i j multi-factor copula model these variables can be determined using equation (9) ab(cid:8)ove, bu(cid:9)t do not generally have a simple expression. 2.3 Illustration of some factor copulas To illustrate the (cid:135)exibility of the class of factor copulas, Figure 1 presents 1000 random draws from bivariate distributions constructed using four di⁄erent factor copulas. In all cases the marginal distributions, F ; are set to N (0;1); and the variances of the latent variables in the factor copula i are set to (cid:27)2 = (cid:27)2 = 1; so that the common factor accounts for one-half of the variance of each X : z " i The (cid:133)rst copula is generated from a factor structure with F = F = N (0;1); implying that the z " copula is Normal. The second sets F = F = t(4); generating a symmetric copula with positive z " tail dependence. The third copula sets F = N (0;1) and F = Skew t( ; 0:25); corresponding " z 1 (cid:0) to a skewed Normal distribution. This copula exhibits asymmetric dependence, with crashes being more correlated than booms, but zero tail dependence. The fourth copula sets F = t(4) and " F = Skew t(4; 0:25); which generates asymmetric dependence and positive tail dependence. z (cid:0) Figure 1 shows that when the distributions in the factor structure are Normal or skewed Normal, taileventstendtobeuncorrelatedacrossthetwovariables.Whenthedegreesoffreedomissetto4, on the other hand, we observe several draws in the joint upper and lower tails. When the skewness parameter is negative, as in the lower two panels of Figure 1, we observe stronger clustering of observations in the joint negative quadrant compared with the joint positive quadrant. Figure 2 illustrates the di⁄erences between copulas using a multivariate approach related to our study of systemic risk below. Conditional on observing j out of 100 stocks crashing, we present the expected number, or proportion, of the remaining (100 j) stocks that will crash, a measure (cid:0) based on Geluk et al. (2007) and Hartmann et al. (2006). De(cid:133)ne: N N 1 U q q(cid:3) i (cid:17) i=1 f (cid:20) g and (cid:20)q(j) = X E N N j j (11) q(cid:3) q(cid:3) j (cid:21) (cid:0) (cid:20)q(j) (cid:25)q(j) (cid:2) (cid:3) (cid:17) N j (cid:0) For this illustration we de(cid:133)ne a (cid:147)crash(cid:148)as a realization in the lower 1/66 tail, corresponding to 10

a once-in-a-quarter event for daily asset returns. We consider four copulas: the familiar Normal, Student(cid:146)st(4)andClaytoncopula, aswellastheSkew t(4)-t(4)factorcopula, allwithparameters chosen so that linear correlation of 1/2 is implied. The upper panel shows that as we condition on more variables crashing, the expected number of other variables that will crash, (cid:20)q(j); initially increases,andpeaksataroundj = 30:Atthatpoint,theSkewt(4)-t(4)factorcopulapredictsthat around another 38 variables will crash, while under the Normal copula we expect only around 12 more variables to crash. As we condition on even more variables having crashed the plot converges inevitably to zero (since conditioning on having observed more crashes, there are fewer variables left to crash). The lower panel of Figure 2 shows that the expected proportion of remaining stocks that will crash, (cid:25)q(j); generally increases all the way to j = 99:5 This (cid:133)gure illustrates some of the features of dependence that are unique to high dimension applications, and further motivates our proposal for a class of (cid:135)exible, parsimonious models for such applications. [ INSERT FIGURES 1 AND 2 ABOUT HERE ] 2.4 Guidance on choosing the number of factors Inthissectionweconsideragraphicaltooltoobtainguidanceonthechoiceofthenumberoffactors to include in a factor copula model, namely the famous (cid:147)scree(cid:148)plot of Cattell (1966). Given that factor copula models are parametric, formal tests for the correct number of factors should exploit that parametric structure, and in our empirical analysis below we use model speci(cid:133)cation tests described in the next section for this purpose. However, it is still of interest to have some prior guidanceonjusthowmanycommonfactorsmightreasonablybeneededtodescribethedependence. A (cid:147)scree(cid:148)plot shows the eigenvalues of a covariance or correlation matrix from largest to smallest, and it is commonly found that the number of factors is equal to the number of (cid:147)large(cid:148) eigenvalues: Employing scree plots for factor copulas is made di¢ cult by the fact that we want to apply them to the rank correlation matrix of the data, not the covariance matrix, and results for the latter do not generally carry over to the former. Furthermore, the sampling variability of rank correlationsdi⁄ersfromthoseofcorrelations, asthelatterare(ratiosof)momentswhiletheformer are rank statistics. 5FortheNormalcopulathisisnotthecase,howeverthisislikelyduetosimulationerror:evenwiththe10million simulations used to obtain this (cid:133)gure, joint 1/66 tail crashes are so rare under a Normal copula that there is a fair degree of simulation error in this plot for j 80: (cid:21) 11

In the proposition below we provide conditions under which scree plots can aid in the identi(cid:133)cation of the number of factors in a factor copula. We discuss the assumptions below. Proposition 4 Assume(1)Y t siidF y , F y andF x fromequations(1)and(2)arecontinuous, and every bivariate marginal copula C of C has continuous partial derivatives with respect to u and ij i u ; (2) R^L = R^ +o (1); where R^L and R^ are the sample linear and rank correlation matrices j T T p T T of X T ; and (3) the eigenvalues of BB are (cid:147)large,(cid:148)in the sense that they imply g (R) > 1: f gt=1 0 K Let K^ = max k : g (R^y ) > 1 (12) T k T n o where R^y is the sample rank correlation matrix of Y T ; and g (A) returns the kth-largest T f t gt=1 k eigenvalue of the matrix A: (i) Under assumptions (1)(cid:150)(2), Pr[K^ K] 1 as T : T (cid:20) ! ! 1 (ii) Under assumptions (1)(cid:150)(3), Pr[K^ = K] 1 as T : T ! ! 1 The (cid:133)rst assumption above simply requires that the distributions and copulas are continuous, and the iid part of this assumption can be relaxed by invoking assumption 2 of Oh and Patton (2013) and then analyzing estimated standardized residuals rather than the original data. The second assumption is stronger, requiring rank correlations and linear correlations to be (cid:147)close.(cid:148) A su¢ cient condition for this is that the marginal distributions of X are Uniform, and in other cases it may or may not be a reasonable approximation. In the supplemental appendix we present evidence that this assumption holds very well for a variety of factor copula models based on t or skewtdistributions. Ifotherdistributionsareconsidered, inparticularthosethatarefarfrom(cid:147)bell shaped,(cid:148)it is possible that this assumption will not be plausible. If the copula is elliptical, then Kl(cid:252)ppelbergandKuhn(2009)suggestusingKendall(cid:146)stauratherthanSpearman(cid:146)srankcorrelation, as the former is a known monotonic function of linear correlation for such copulas (see Fang, et al. 2002), and assumption (2) is not needed. Elliptical copulas cannot accommodate asymmetric dependence, which we (cid:133)nd to be important in our empirical application, consistent with several existing papers, and so we do not attempt to exploit that result. With assumptions (1)(cid:150)(2) we (cid:133)nd that K^ provides, asymptotically, a lower bound on the true T number of factors; it will miss factors that are such that g (R) 1 for k [1;K]: If N diverges k (cid:20) 2 with T then this cannot happen and assumption (3) will hold automatically (see Chamberlain and Rothschild, 1983, and Bai and Ng, 2002), while in our setting of (cid:133)nite N this assumption may not 12

hold. In such cases using a threshold of one provides a lower bound on the true number of factors. In the web appendix we undertake a simulation study of K^ based on realistic parameter values, T and we (cid:133)nd that it correctly estimates the number of factors in 90% to 99% of simulations. In Figure 3 we present four examples of scree plots for a single simulation from a factor copula describedindetailinthewebappendix. InallcaseswesetN = 100andT = 1000;andwevarythe number of factors, K: We see similar shapes to these plots in other applications, and we clearly see how this sort of (cid:133)gure might provide guidance on the choice of the number of factors: the (cid:133)rst K eigenvalues are large, and the remaining N K eigenvalues gradually tail o⁄. In two of these cases (cid:0) (K = 2 and 4) the bound of one clearly (cid:147)works(cid:148)in the sense that it correctly separates the (cid:133)rst K from the remaining eigenvalues. In this simulation of the K = 1 case, the second eigenvalue is just above one, due to sampling variability in the eigenvalue, and in this case K^ would overestimate T the true number of factors. In the K = 8 case, we see that the bound of one almost (cid:147)cuts o⁄(cid:148)the eighth eigenvalue, which would lead to the underestimation of the true number of factors. [ INSERT FIGURE 3 ABOUT HERE ] 2.5 Non-linear factor copula models Theclassoffactorcopulamodelsproposedinequation(2)canbegeneralizedtomore(cid:135)exiblefactor structures, by considering (cid:147)link(cid:148)functions that are not linear and additive. Consider the following general one-factor structure: X = h(Z;" ), i = 1;2;:::;N i i Z s F z , " i s iid F " , Z " i i (13) ?? 8 [X 1 ;:::;X N ] 0 X s F x = C(G 1 ;:::;G N ) (cid:17) for some function h : R 2 R. This general structure allows us to nest a variety of well-known ! copulas in the literature. Examples of copula models that (cid:133)t in this framework are summarized below: 13

Copula h(Z;") F F Z " Normal Z +" N 0;(cid:27)2 N 0;(cid:27)2 z " Student(cid:146)s t Z1=2" Ig((cid:23)(cid:0)=2;(cid:23)=(cid:1)2) N (cid:0)0;(cid:27)2(cid:1) " Skew t (cid:21)Z +Z1=2" Ig((cid:23)=2;(cid:23)=2) N (cid:0)0;(cid:27)2(cid:1) " Gen hyperbolic (cid:13)Z +Z1=2" GIG((cid:21);(cid:31); ) N (cid:0)0;(cid:27)2(cid:1) " Clayton (1+"=Z) (cid:0) (cid:11) (cid:0)((cid:11);1) E(cid:0)xp(1)(cid:1) Gumbel (logZ=")(cid:11) Stable(1=(cid:11);1;1;0) Exp(1) (cid:0) where Ig represents the inverse gamma distribution, GIG is the generalized inverse Gaussian distribution, and (cid:0) is the gamma distribution. The skew t and Generalized hyperbolic copulas listed here are from McNeil et al. (2005, Chapter 5), the representation of a Clayton copula in this form is from Cook and Johnson (1981) and the representation of the Gumbel copula is from Marshall and Olkin (1988). The above copulas all have closed-form densities via judicious combinations of the (cid:147)link(cid:148)function h and the distributions F and F : By removing this requirement and employing simulationz " based estimation methods to overcome the lack of closed-form likelihood, one can obtain a much wider variety of models for the dependence structure. In this paper we will focus on linear, additive factor copulas, and generate (cid:135)exible models by (cid:135)exibly specifying the distribution of the common factor(s). 3 Simulation-based estimation of factor copulas Factor copula models do not generally have a closed-form likelihood, making maximum likelihood estimation di¢ cult. Oh and Patton (2013) propose an estimation method similar to the simulated methodofmoments(SMM)whichisreadilyappliedinsuchcases. Weadoptthatestimationmethod here, and brie(cid:135)y describe it below. An extensive simulation study of this estimation method for applications involving up to 100 variables is presented in the supplemental appendix. The class of data generating processes (DGPs) covered by Oh and Patton (2013) is the same as Chen and Fan (2006) and RØmillard (2010). This class allows each variable to have time-varying conditionalmeanandvariance,eachgovernedbyparametricmodels,withanunknownmarginaldistribution. The marginal distributions are estimated using empirical distribution function, making 14

thecompletemarginaldistributionmodelsdynamicandsemiparametric. Theconditionalcopulaof the data is assumed to belong to a parametric family and is assumed constant. The combination of time-varying conditional means and variance and a constant conditional copula makes this model similar in spirit to the (cid:147)CCC(cid:148)model of Bollerslev (1990). The DGP is then: Y = (cid:22) ((cid:30))+(cid:27) ((cid:30))(cid:17) t t t t where (cid:22) ((cid:30)) [(cid:22) ((cid:30));:::;(cid:22) ((cid:30))] (14) t 1t Nt 0 (cid:17) (cid:27) ((cid:30)) diag (cid:27) ((cid:30));:::;(cid:27) ((cid:30)) t 1t Nt (cid:17) f g (cid:17) [(cid:17) ;:::;(cid:17) ] iid F = C(F ;:::;F ;(cid:18)) t 1t Nt 0 (cid:17) 1 N (cid:17) (cid:24) where (cid:22) and (cid:27) are -measurable and independent of (cid:17) . is the sigma-(cid:133)eld generated by t t t 1 t t 1 F (cid:0) F (cid:0) Y ;Y ;::: . The r 1 vector of parameters governing the dynamics of the variables, (cid:30); is t 1 t 2 f (cid:0) (cid:0) g (cid:2) assumed to be pT-consistently estimable in a stage prior to copula estimation. If (cid:30) is known, or 0 if (cid:22) and (cid:27) are constant, then the model becomes one for iid data. The copula is parameterized t t by a p 1 vector of parameters, (cid:18); which is estimated using the following approach. (cid:2) TheestimationmethodofOhandPatton(2013)iscloselyrelatedtoSMMestimation,thoughit is not strictly SMM, as the (cid:147)moments(cid:148)that are used in estimation are functions of rank statistics. T They propose estimating (cid:18) based on the standardized residual (cid:17)^ (cid:27) 1((cid:30)^) Y (cid:22) ((cid:30)^) t (cid:0)t t t (cid:17) (cid:0) t=1 and simulations from some parametric joint distribution, F ((cid:18));nwith implied hcopula C((cid:18))i:oLet x m~ ((cid:18)) be an m 1 vector of dependence measures computed using S simulations from F ((cid:18)), S x (cid:2) X S ; and let m^ be the corresponding vector of dependence measures computed using the f s gs=1 T standardized residuals (cid:17)^ T . We discuss the empirical choice of which dependence measures to f t gt=1 match in the appendix. The SMM estimator is then de(cid:133)ned as: (cid:18)^ argmin Q ((cid:18)) T;S T;S (cid:17) (cid:18) (cid:2) 2 where Q ((cid:18)) g ((cid:18))W^ g ((cid:18)) (15) T;S T0 ;S T T;S (cid:17) g ((cid:18)) m^ m~ ((cid:18)) T;S T S (cid:17) (cid:0) and W^ is some positive de(cid:133)nite weight matrix, which may depend on the data. Under regularity T conditions, Oh and Patton (2013) show that if S=T as T ; the SMM estimator is ! 1 ! 1 15

consistent and asymptotically normal: pT (cid:18)^ (cid:18) d N (0;(cid:10) ) as T;S (16) T;S 0 0 (cid:0) ! ! 1 (cid:16) (cid:17) 1 1 where (cid:10) 0 = G 00 W 0 G 0 (cid:0) G 00 W 0 (cid:6) 0 W 0 G 0 G 00 W 0 G 0 (cid:0) (cid:6) avar[m^ ], G g ((cid:18) (cid:0) ); g ((cid:18)) = (cid:1) p-lim g ((cid:18) (cid:0) ) and W (cid:1) =p-lim W^ : The 0 T 0 (cid:18) 0 0 0 T;S T;S 0 T T (cid:17) (cid:17) r !1 !1 asymptotic variance of the estimator has the same form as in standard GMM applications, however the components (cid:6) and G require di⁄erent estimation methods than in standard applications. 0 0 Oh and Patton (2013) also present the distribution of a test of the over-identifying restrictions (the (cid:147)J(cid:148) test), which we will use for speci(cid:133)cation testing in our empirical application. Our empirical application below involves 100 variables, and it is well known that properties of estimators can deteriorate as the dimension grows; Oh and Patton (2013) verify that their asymptotictheoryprovidesagoodapproximationto(cid:133)nite-samplebehaviorforapplicationsinvolvingonly up to ten variables. In the supplemental appendix we undertake an extensive simulation study of this estimator in applications involving up to 100 variables. In brief, these simulations show that the SMM estimator and its associated distribution theory continue to have satisfactory properties in even in high-dimension applications: (cid:133)nite-sample bias is small, con(cid:133)dence intervals have good coverage rates, and the J test has reasonable (cid:133)nite-sample size. This provides reassurance for using this estimator in our empirical application below. 4 High-dimension copula models for S&P 100 returns Havingproposedanewclassofmodelsforcopulasinhighdimensionsanddiscussedtheirestimation, we now apply these models to a di¢ cult empirical problem. We study of the dependence between all 100 stocks that were constituents of the S&P 100 index in December 2010. Our sample period is April 2008 to December 2010, a total of T = 696 trade days. The starting point for our sample period was determined by the date of the latest addition to the S&P 100 index (Philip Morris Inc.), which has had no additions or deletions since April 2008. The stocks in our study are listed in Table 1, along with their 3-digit SIC codes, which we will use in part of our analysis below. [ INSERT TABLE 1 ABOUT HERE] Table 2 presents some summary statistics of the data used in this analysis. The top panel presents sample moments of the daily returns for each stock. The means and standard deviations 16

are comparable to those observed in other studies. The skewness and kurtosis coe¢ cients reveal a substantialdegreeofheterogeneityintheshapeofthedistributionoftheseassetreturns,motivating our use of a nonparametric estimate (the empirical distribution function, EDF) in our analysis. In the second panel of Table 2 we present information on the parameters of the AR(1)(cid:150)GJR- GARCH models, augmented with lagged market return information, that are used to (cid:133)lter each of the individual return series6: r = (cid:30) +(cid:30) r +(cid:30) r +" (17) it 0i 1i i;t 1 mi m;t 1 it (cid:0) (cid:0) (cid:27)2 = ! +(cid:12) (cid:27)2 +(cid:11) "2 +(cid:13) "2 1 " 0 it i i i;t 1 i i;t 1 i i;t 1 i;t 1 (cid:0) (cid:0) (cid:0) f (cid:0) (cid:20) g +(cid:11) "2 +(cid:13) "2 1 " 0 mi m;t 1 mi m;t 1 m;t 1 (cid:0) (cid:0) f (cid:0) (cid:20) g We estimate the parameters of the mean and variance models using quasi maximum likelihood, and we estimate the distribution of the standardized residuals using the EDF, which allows us to nonparametricallycaptureskewnessandexcesskurtosisintheresiduals,ifpresent,andimportantly it allows these characteristics to di⁄er across the 100 variables. Ourestimatesoftheparametersofthesemodelsareconsistentwiththosereportedinnumerous other studies, with a small negative AR(1) coe¢ cient found for most though not all stocks, and with the lagged market return entering signi(cid:133)cantly in 37 out of the 100 stocks. The estimated GJR-GARCH parameters are strongly indicative of persistence in volatility, and the asymmetry parameter, (cid:13); in this model is positive for all but three of the 100 stocks in our sample, supporting the wide-spread (cid:133)nding of a (cid:147)leverage e⁄ect(cid:148)in the conditional volatility of equity returns. The lagged market residual is also found to be important for volatility in many cases, with the null that (cid:11) = (cid:13) = 0 being rejected at the 5% level for 32 stocks. mi mi In the lower panel of Table 2 we present summary statistics for four measures of dependence between pairs of standardized residuals: linear correlation, rank correlation, average upper and lower 1% tail dependence (equal to ((cid:28) +(cid:28) )=2), and the di⁄erence in upper and lower 10% 0:99 0:01 taildependence(equalto(cid:28) (cid:28) ). Thetwocorrelationstatisticsmeasurethesignandstrength 0:90 0:10 (cid:0) of dependence, the third and fourth statistics measure the strength and symmetry of dependence in the tails. The two correlation measures are similar, and are 0.42 and 0.44 on average. Across all 6We considered GARCH (Bollerslev, 1986), EGARCH (Nelson, 1991), and GJR-GARCH (Glosten, et al., 1993) modelsfortheconditionalvarianceofthesereturns,andforalmostallstockstheGJR-GARCH modelwaspreferred according to the BIC. 17

4950 pairs of assets the rank correlation varies from 0.37 to 0.50 from the 25th and 75th percentiles of the cross-sectional distribution, indicating the presence of mild heterogeneity in the correlation coe¢ cients. The 1% tail dependence measure is 0.06 on average, and varies from 0.00 to 0.07 across the inter-quartile range. The di⁄erence in the 10% tail dependence measures is negative on average, and indeed is negative for over 75% of the pairs of stocks, strongly indicating asymmetric dependence between these stocks. In Figure 4 we present the (cid:147)scree(cid:148)plot of eigenvalues of the rank correlation matrix of the standardized residuals, motivated by the discussion in Section 2.4. This plot shows that the (cid:133)rst three eigenvalues are very large, all greater than four, indicating the presence of multiple common factors in the copula. The next (cid:133)ve eigenvalues are all appreciably above one, while the ninth and tenth eigenvalues are just above one. Thus the estimator proposed in Proposition 4 would suggest that 10 common factors are required, although taking estimation error into account we might suspect that only eight are needed. We investigate these suggestions more formally below. [ INSERT TABLE 2 AND FIGURE 4 ABOUT HERE] 4.1 Results from equidependence copula speci(cid:133)cations We now present our (cid:133)rst empirical results on the dependence structure of these 100 stock returns: the estimated parameters of eight di⁄erent models for the copula. In this section we consider an (cid:147)equidependence(cid:148)model, similar to the equicorrelation model of Engle and Kelly (2012), where we assume a single common factor and impose that all assets have the same coe¢ cient on the common factor. This is clearly a restrictive model, and we test whether it is rejected by the data below. We consider four existing copulas: the Clayton copula, the Normal copula, the Student(cid:146)s t copula, and the Skew t copula, with equicorrelation imposed on the latter three models (the Clayton copula implies equicorrelation by construction), and four factor copulas, described by the distributions assumed for the common factor and the idiosyncractic shock: t-Normal, Skew t-Normal, t-t, Skew t-t. All models are estimated using the SMM-type method described in Section 3. The value of the SMM objective function at the estimated parameters, Q ; is presented for each model, SMM alongwiththep-valuefromtheJ-testoftheover-identifyingrestrictions. Standarderrorsarebased on 1000 bootstraps to estimate (cid:6) ; and step size " = 0:1 to compute G^: The rank dependence T;S T measures that are used in the SMM estimation of this model are presented in the appendix. 18

Table 3 reveals that the coe¢ cient on the common factor, (cid:12); is estimated by all models to be around 0:95, implying an average correlation coe¢ cient of around 0.47. The estimated inverse degreesoffreedomparameterinthesemodelsisaround1/25, andthestandarderrorson(cid:23) 1 reveal (cid:0) that this parameter is signi(cid:133)cant7 at the 10% level for the three models that allow for asymmetric dependence, but not signi(cid:133)cant for the three models that impose symmetric dependence. The asymmetry parameter, (cid:21); is signi(cid:133)cantly negative in all models in which it is estimated, with tstatistics ranging from -2.1 to -4.4. This implies that the dependence structure between these stock returns is signi(cid:133)cantly asymmetric, with large crashes being more likely than large booms. Other papers have considered equicorrelation models for the dependence between large collections of stocks, see Engle and Kelly (2012) for example, but empirically showing the importance of allowing the implied common factor to be fat tailed and asymmetric is novel. [ INSERT TABLE 3 ABOUT HERE ] Figure5exploitsthehigh-dimensionalnatureofouranalysis, andplotstheexpectedproportion of (cid:147)crashes(cid:148)in the remaining (100 j) stocks, conditional on observing a crash in j stocks. We (cid:0) consider a (cid:147)crash(cid:148)de(cid:133)ned as a once-in-a-month (1/22, around 4.6%) event and as a once-in-aquarter (1/66, around 1.5%) event. We obtain pointwise (in j) 90% bootstrap con(cid:133)dence intervals for these estimates based on the theory in RØmillard (2010), see Patton (2012) for discussion. For once-in-a-month crashes, the observed proportions track the Skew t-t factor copula well for j up to around 25 crashes, and again for j of around 70. For j in between 30 and 65 the Normal copula appears to (cid:133)t quite well. For once-in-a-quarter crashes, displayed in the lower panel of Figure 5, the empirical plot tracks that for the Normal copula well for j up to around 30, but for j = 35 the empirical plot jumps and follows the Skew t-t factor copula. Thus it appears that the Normal copula may be adequate for modeling moderate tail events, but a copula with greater tail dependence (such as the Skew t-t factor copula) is needed for more extreme tail events. It is worth noting, however, that we have few observations in our sample for these extreme tail events, and 7Note that the case of zero tail dependence corresponds to (cid:23)(cid:0)z 1 =0; which is on the boundary of the parameter space, implying that a standard t test is strictly not applicable. In such cases the squared t statistic no longer has an asymptotic (cid:31)2 distribution under the null, rather it is distributed as an equal-weighted mixture of a (cid:31)2 and (cid:31)2; 1 1 0 see Gourieroux and Monfort (1996, Ch 21). The 90% and 95% critical values for this distribution are 1.64 and 2.71, which correspond to t-statistics of 1.28 and 1.65. 19

thus the con(cid:133)dence intervals are quite wide, making it di¢ cult to make precise statements about relative (cid:133)t. [ INSERT FIGURE 5 ABOUT HERE ] The last two columns of Table 3 report the value of the objective function (Q ) and the SMM p-value from a test of the over-identifying restrictions. The Q values reveal that the three SMM models that allow for asymmetry (Skew t copula, and the two Skew t factor copulas) out-perform all the other models, and reinforce the above conclusion that allowing for a skewed common factor is important for this collection of assets. The p-values, however, are near zero for all models, indicating that none of them pass this speci(cid:133)cation test. Two likely sources of these rejections are the assumption of equidependence, which was shown in the summary statistics in Table 2 to be questionable for this large set of stock returns, and the assumption of a single common factor, which is not consistent with the (cid:147)scree(cid:148)plot in Figure 4. We relax both of these assumptions in the next section. 4.2 Results from multi-factor copula speci(cid:133)cations In response to the rejection of the copula models based on equidependence, we now consider a generalization to allow for heterogeneous dependence. We propose a multi-factor model that allows for a common, market-wide, factor, and a set of industry factors. We use the (cid:133)rst digit of Standard Industrial Classi(cid:133)cation (SIC) to form seven groups of stocks, see Table 1. The model we consider is the copula generated by the following structure: X = (cid:12) Z +(cid:13) Z +" , i = 1;2;:::;100 i S(i) 0 S(i) S(i) i Z 0 s Skew t((cid:23);(cid:21)) (18) Z S s iid t((cid:23)), S = 1;2;:::;7; Z S Z 0 S ?? 8 " i s iid t((cid:23)), i = 1;2;:::;100; " i Z j i;j ?? 8 where S(i) is the SIC group for stock i: There are eight latent common factors in total in this model, but any given variable is only a⁄ected by two factors, simplifying its structure and reducing thenumberoffreeparameters. Notehereweimposethattheindustryfactorsandtheidiosyncratic shocks are symmetric, and only allow asymmetry in the market-wide factor, Z : It is feasible to 0 20

consider allowing the industry factors to have di⁄ering levels of asymmetry, but we rule this out in the interests of parsimony. We impose that all stocks in the same SIC group have the same factor loadings, but allow stocks in di⁄erent groups to have di⁄erent factor loadings. This generates a (cid:147)block equidependence(cid:148)model which greatly increases the (cid:135)exibility of the model, but without generating too many additional parameters to estimate. In total, this copula model has a total of 16 parameters, providing more (cid:135)exibility than the 3-parameter equidependence model considered in the previous section, but still more parsimonious (and tractable) than a completely unstructured approach to this 100-dimensional problem.8 The results of this model are presented in Table 4. The Clayton copula is not presented here as it imposes equidependence by construction, and so is not comparable to the other models. The estimated inverse degrees of freedom parameter, (cid:23) 1; is around 1/14, which is larger and more (cid:0) signi(cid:133)cant than for the equidependence model, indicating stronger evidence of tail dependence. The asymmetry parameters are also larger (in absolute value) and more signi(cid:133)cantly negative in this more (cid:135)exible model than in the equidependence model. It appears that when we add variables that control for intra-industry dependence, (i.e., industry-speci(cid:133)c factors) we (cid:133)nd the market-wide common factor is more fat tailed and left skewed than when we impose a single factor structure. [ INSERT TABLE 4 ABOUT HERE ] Focusing on our preferred Skew t-t factor copula model, the coe¢ cients on the market factor, (cid:12) ; range from 0.88 (for SIC group 2, Manufacturing: Food, apparel, etc.) to 1.25 (SIC group i 1, Mining and construction), indicating the varying degrees of inter-industry dependence. The coe¢ cients on the industry factors, (cid:13) ; measure the degree of additional intra-industry dependence, i beyond that coming from the market-wide factor. These range from 0.17 to 1.09 for SIC groups 3 and 1 respectively. Even for the smaller estimates, these are signi(cid:133)cantly di⁄erent from zero, indicating the presence of industry factors beyond a common market factor. The intra- and interindustry rank correlations and tail dependence coe¢ cients implied by this model9 are presented 8Wealsoconsideredaone-factormodelthatallowedfordi⁄erentfactorloadings,generalizingtheequidependence model of the previous section but simpler than this multi-factor copula model. That model provided a signi(cid:133)cantly better (cid:133)t than the equidependence model, but was also rejected using the J test of over-identifying restrictions, and so is not presented here to conserve space. 9Rank correlations from this model are not available in closed form, and we use 50,000 simulations to estimate these. Upper and lower tail dependence coe¢ cients are based on Propositions 2 and 3. 21

in Table 5, and reveal the degree of heterogeneity and asymmetry that this copula captures: rank correlations range from 0.39 (for pairs of stocks in SIC groups 1 and 5) to 0.72 (for stocks within SIC group 1). The upper and lower tail dependence coe¢ cients further reinforce the importance of asymmetry in the dependence structure, with lower tail dependence measures being substantially larger than upper tail measures: lower tail dependence averages 0.82 and ranges from 0.70 to 0.99, while upper tail dependence averages 0.07 and ranges from 0.02 to 0.74. [ INSERT TABLE 5 ABOUT HERE ] With this more (cid:135)exible model we can test restrictions on the factor coe¢ cients, to see whether the additional (cid:135)exibility is required to (cid:133)t the data. The p-values from these tests are in the bottom rows of Table 4. Firstly, we can test whether all of the industry factor coe¢ cients are zero, which reduces this model to a one-factor model with (cid:135)exible weights. The p-values from these tests are zero to four decimal places for all models, providing strong evidence in favor of including industry factors. We can also test whether the market factor is needed given the inclusion of industry factors by testing whether all betas are equal to zero, and as expected this restriction is strongly rejected by the data. We further can test whether the coe¢ cients on the market and industry factors are common across all industries, reducing this model to an equidependence model, and this too is strongly rejected. Finally, we use the J test of over-identifying restrictions to check the speci(cid:133)cation of these models. Using this test, we see that the models that impose symmetry are strongly rejected. The Skew t copula has a p-value of 0.04, indicating a marginal rejection, and the Skew t-t factor copula performs best, passing this test at the 5% level, with a p-value of 0.07. It is worth noting that even this multi-factor speci(cid:133)cation is still restrictive, both in the number of assumed factors, and in that it imposes equidependence within industry groups. If the computational challenge of allowing for (N) unknown parameters could be overcome, one would O expect the goodness-of-(cid:133)t to improve. We leave such an extension for future work. Thus it appears that a multi-factor model with heterogeneous weights on the factors, and that allows for positive tail dependence and stronger dependence in crashes than booms, is needed to (cid:133)t the dependence structure of these 100 stock returns. 22

4.3 Measuring systemic risk: Marginal Expected Shortfall The recent (cid:133)nancial crisis has highlighted the need for the management and measurement of systemic risk, see Acharya et al. (2010) for discussion. Brownlees and Engle (2011) propose a measure ofsystemicrisktheycall(cid:147)marginalexpectedshortfall(cid:148),orMES.Itisde(cid:133)nedastheexpectedreturn on stock i given that the market return is below some (low) threshold: MES = E [r r < C] (19) it t 1 it mt (cid:0) (cid:0) j Anappealingfeatureofthismeasureofsystemicriskisthatitcanbecomputedwithonlyabivariate model for the conditional distribution of (r ;r ), and Brownlees and Engle (2011) propose a it mt semiparametric model based on a bivariate DCC-GARCH model to estimate it. A corresponding drawbackofthismeasureisthatbyusingamarketindextoidentifyperiodsofcrisis,itmayoverlook periods with crashes in individual (cid:133)rms. With a model for the entire set of constituent stocks, such as the high dimension copula models considered in this paper, combined with standard AR- GARCH type models for the marginal distributions, we can estimate the MES measure proposed in Brownlees and Engle (2011), as well as alternative measures that use crashes in individual stocks as (cid:135)ags for periods of turmoil.10 For example, one might consider the expected return on stock i conditional on k stocks in the market having returns below some threshold, a (cid:147)kES(cid:148): N kES = E r 1 r < C > k (20) it t 1 it jt (cid:0) (cid:0) j=1 f g (cid:20) (cid:12)(cid:18) (cid:19) (cid:21) (cid:12) X (cid:12) Brownlees and Engle (2011) propose a simple method for ranking estimates of MES: (cid:12) T 1 MSE = (r MES )21 r < C (21) i it it mt T (cid:0) f g t=1 X 1 T r MES 2 it it RelMSE = (cid:0) 1 r < C i mt T MES f g it t=1(cid:18) (cid:19) X Corresponding metrics immediately follow for estimates of (cid:147)kES(cid:148): InTable6wepresenttheMSEandRelMSEforestimatesofMESandkES,forthresholdchoices of -2% and -4%. We implement the model proposed by Brownlees and Engle (2011), as well as their implementations of a model based on the CAPM, and one based purely on rolling historical 10Note that here we study 100 large cap (cid:133)rms, which are in some ways relatively homogeneous. In the systemic riskliterature,acommonquestioniswhethercrisesinonesubgroupof(cid:133)rms(e.g.,large(cid:133)rms,or(cid:133)nancial(cid:133)rms)spill over to another group (small (cid:133)rms, or non-(cid:133)nancial (cid:133)rms). We do not pursue such a question here. 23

information. Alongwiththese, wepresentresultsforfourcopulas: theNormal, Student(cid:146)st;Skewt; and Skew t-t factor copula, all with the multi-factor structure from Section 4.2 above. In the upper panelofTable6weseethattheBrownlees-Englemodelperformsthebestforboththresholdsunder theMSEperformancemetric,withtheSkewt tfactorcopulaasthesecond-bestperformingmodel. (cid:0) Under the Relative MSE metric, the factor copula is best performing model, for both thresholds, followed by the Skew t copula. Like Brownlees and Engle (2011), we (cid:133)nd that the worst-performing methods under both metrics are the Historical and CAPM methods. The lower panel of Table 6 presents the performance of various methods for estimating kES; with k set to 30.11 This measure requires an estimate of the conditional distribution for the entire set of 100 stocks, and thus the CAPM and Brownlees-Engle methods cannot be applied. We evaluatetheremaining(cid:133)vemethods,and(cid:133)ndthattheSkewt tfactorcopulaperformsthebestfor (cid:0) both thresholds, under both metrics. Thus our proposed factor copula model for high dimensional dependence allows us to gain some insights into the structure of the dependence between this large collection of assets, and also provides improved estimates of measures of systemic risk. [ INSERT TABLE 6 ABOUT HERE ] 5 Conclusion While there are numerous bivariate copula speci(cid:133)cations for applied researchers to use, there are very few copula models for high dimension applications. This paper proposes new models for the copula of economic variables based on a latent factor structure, which is particularly attractive for high dimensional applications. This class of models allows the researcher to increase or decrease the (cid:135)exibility of the model according to the amount of data available and the dimension of the problem, and, importantly, to do so in a manner that is easily interpreted and understood. The factor copulas presented in this paper do not generally have a closed-form likelihood, but we use extreme value theory to obtain new analytical results on the their implied tail dependence, and we verify that simulation-based methods can reliably be used for estimation and speci(cid:133)cation testing in applications involving up to 100 variables. 11We choose this value of k so that the number of identi(cid:133)ed (cid:147)crisis(cid:148)days is broadly comparable to the number of such days for MES. Results for alternative values of k are similar. 24

Weemployourproposedfactorcopulastostudydailyreturnsonall100constituentsoftheS&P 100 index over the period 2008-2010, and (cid:133)nd signi(cid:133)cant evidence of a skewed, fat-tailed common factor, which generates asymmetric dependence and tail dependence. Using a multi-factor copula, we (cid:133)nd evidence of the importance of industry factors, which generates heterogeneous dependence. We also consider an application to the estimation of systemic risk, and we show that the proposed factor copula model provides superior estimates of two measures of systemic risk. An interesting avenue for future research is to compare the various recently-proposed methods for modelling highdimensional dependence, such as Aas et al. (2009), Christo⁄ersen, et al. (2013), Krupskii and Joe (2013), Creal and Tsay (2014), and this paper, in terms of both statistical (cid:133)t and various economic measures of (cid:133)t. 25

Appendix: Choice of dependence measures for estimation ToimplementtheSMMestimatorofthesecopulamodelswemust(cid:133)rstchoosewhichdependence measures to use in the SMM estimation. We draw on (cid:147)pure(cid:148)measures of dependence, in the sense that they are solely a⁄ected by changes in the copula, and not by changes in the marginal distributions. For examples of such measures, see Joe (1997, Chapter 2) or Nelsen (2006, Chapter 5). Our preliminary studies of estimation accuracy and identi(cid:133)cation lead us to use pair-wise rank correlation, and quantile dependence with q = [0:05;0:10;0:90;0:95]; giving us (cid:133)ve dependence measures for each pair of variables. Let (cid:14) denote one of the dependence measures (i.e., rank correlation or quantile dependence at ij di⁄erent levels of q) between variables i and j; and de(cid:133)ne the (cid:147)pair-wise dependence matrix(cid:148): 1 (cid:14) (cid:14) 12 1N (cid:1)(cid:1)(cid:1) 2 (cid:14) 1 (cid:14) 3 12 2N D = (cid:1)(cid:1)(cid:1) (22) 6 6 . . . . . . ... . . . 7 7 6 7 6 7 6 (cid:14) 1N (cid:14) 2N 1 7 6 (cid:1)(cid:1)(cid:1) 7 4 5 Where applicable, we exploit the (block) equidependence feature of the models in de(cid:133)ning the (cid:147)moments(cid:148)to match. For the equidependence model in Section 4.1, the model implies equidependence, and we use as (cid:147)moments(cid:148)the average of these (cid:133)ve dependence measures across all pairs, reducing the number of moments to match from 5N (N 1)=2 to just 5: (cid:0) N 1 N (cid:22)(cid:14) 2 (cid:0) ^(cid:14) (23) ij (cid:17) N (N 1) (cid:0) i=1 j=i+1 X X For the multi-factor copula model in Section 4.2, we exploit the fact that (a) all variables in the same group exhibit equidependence, and (b) any pair of variables (i;j) in groups (r;s) has the same dependence as any other pair (i;j ) in the same two groups (r;s): This allows us to average 0 0 all intra- and inter-group dependence measures. Consider the following general design, where we have N variables, M groups, and k variables per group, where (cid:6)M k = N. Then decompose m m=1 m 26

the (N N) matrix D into sub-matrices according to the groups: (cid:2) D D D 11 102 (cid:1)(cid:1)(cid:1) 10M 2 D D D 3 D = 12 22 (cid:1)(cid:1)(cid:1) 20M , where D is (k k ) (24) (N (cid:2) N) 6 6 . . . . . . ... . . . 7 7 ij i (cid:2) j 6 7 6 7 6 D 1M D 2M D MM 7 6 (cid:1)(cid:1)(cid:1) 7 4 5 Then create a matrix of average values from each of these matrices, taking into account the fact that the diagonal blocks are symmetric: (cid:14) (cid:14) (cid:14) (cid:3)11 (cid:3)12 (cid:3)1m (cid:1)(cid:1)(cid:1) 2 (cid:14) (cid:14) (cid:14) 3 (cid:3)12 (cid:3)22 (cid:3)2m D = (cid:1)(cid:1)(cid:1) (25) (M (cid:2) (cid:3) M) 6 6 . . . . . . ... . . . 7 7 6 7 6 7 6 (cid:14) (cid:3)1m (cid:14) (cid:3)2m (cid:14) (cid:3)mm 7 6 (cid:1)(cid:1)(cid:1) 7 4 2 5 where (cid:14) ^(cid:14) , avg of all upper triangle values in D (cid:3)ss (cid:17) k (k 1) ij ss s s (cid:0) 1 XX (cid:14) = ^(cid:14) , avg of all elements in matrix D ; r = s (cid:3)rs k k ij rs 6 r s XX Finally, create the vector of average measures (cid:22)(cid:14)(cid:3)1 ;:::;(cid:22)(cid:14)(cid:3)M ; where (cid:2) (cid:3) M 1 (cid:22)(cid:14)(cid:3)i (cid:17) M (cid:14) (cid:3)ij (26) j=1 X This gives as a total of M moments for each dependence measure, so 5M in total. 27

References [1] Aas, K., C. Czado, A. Frigessi and H. Bakken, 2009, Pair-copula constructions of multiple dependence, Insurance: Mathematics and Economics, 44, 182-198. [2] Acar, E.F., C. Genest and J. Ne(cid:154)lehovÆ, 2012, Beyond simpli(cid:133)ed pair-copula constructions, Journal of Multivariate Analysis, 110, 74-90. [3] Acharya, V.V., T. Cooley, M. Richardson and I. Walter, 2010, Regulating Wall Street: The Dodd-Frank Act and the New Architecture of Global Finance, John Wiley & Sons. [4] Almeida, C., C. Czado, and H. Manner, 2012, Modeling high dimensional time-varying dependenceusingD-vineSCARmodels,workingpaper,http://arxiv.org/pdf/1202.2008v1.pdf. [5] Andersen, L. and J. Sidenius, 2004, Extensions to the Gaussian copula: random recovery and random factor loadings, Journal of Credit Risk, 1, 29-70. [6] Bai, J. and S. Ng, 2002, Determining the number of factors in approximate factor models, Econometrica, 70(1), 191-221. [7] Barndor⁄-Nielsen,O.E.,1978,Hyperbolicdistributionsanddistributionsonhyperbolae,Scandinavian Journal of Statistics, 5, 151-157. [8] Barndor⁄-Nielsen, O.E., 1997, Normal inverse Gaussian distributions and stochastic volatility modelling, Scandinavian Journal of Statistics, 24, 1-13. [9] Bingham, N.H., C.M. Goldie and J.L. Teugels, 1987, Regular Variation, Cambridge University Press, Cambridge. [10] Bollerslev, T., 1986, Generalized autoregressive conditional heteroskedasticity, Journal of Econometrics, 31, 307-327. [11] Bollerslev, T., 1990, Modelling the coherence in short-run nominal exchange rates: a multivariate generalized ARCH approach, Review of Economics and Statistics, 72, 498-505. [12] Brownlees, C.T. and R.F. Engle, 2011, Volatility, Correlation and Tails for Systemic Risk Measurement, working paper, Stern School of Business, New York University. [13] Cattell, R.B., 1966, Thescreetestforthenumberoffactors, Multivariate Behavioral Research, 1, 245(cid:150)276. [14] Chamberlain, G. and M. Rothschild, 1983, Arbitrage, factor structure, and mean-variance analysis on large asset markets, Econometrica, 51(5), 1281-1304. [15] Chen, X. and Y. Fan, 2006, Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspeci(cid:133)cation, Journal of Econometrics, 135, 125-154. [16] Cherubini, U., E. Luciano and W. Vecchiato, 2004, Copula Methods in Finance, John Wiley & Sons, England. 28

[17] Christo⁄ersen, P., V. Errunza, K. Jacobs and H. Langlois, 2012, Is the potential for international diversi(cid:133)cation disappearing?, Review of Financial Studies, 25, 3711-3751. [18] Christo⁄ersen, P., K.Jacobs, X.JinandH.Langlois, 2013, DynamicDependenceinCorporate Credit, working paper, Bauer College of Business, University of Houston. [19] Cook, R.D. and M.E. Johnson, 1981, A family of distributions for modelling non-elliptically symmetric multivariate data, Journal of the Royal Statistical Society, 43, 210-218. [20] Coval, J., J. Jurek and E. Sta⁄ord, 2009, The economics of structured (cid:133)nance, Journal of Economic Perspectives, 23, 3-25. [21] Creal, D.D. and R.S. Tsay, 2014, High Dimensional Dynamic Stochastic Copula Models, Journal of Econometrics, forthcoming. [22] Daul, S., E. De Giorgi, F. Lindskog and A. McNeil, 2003, The grouped t-copula with an application to credit risk, RISK, 16, 73-76. [23] Du¢ e, D., A. Eckner, G. Horel and L. Saita, 2009, Frailty correlated default, Journal of Finance, 64(5), 2089-2123. [24] Demarta, S. and A.J. McNeil, 2005, The t copula and related copulas, International Statistical Review, 73, 111-129. [25] Embrechts, P., C. Kl(cid:252)ppelberg and T. Mikosch, 1997, Modelling Extremal Events, Springer- Verlag, Berlin. [26] Engle, R.F. and B. Kelly, 2012, Dynamic Equicorrelation, Journal of Business and Economic Statistics, 30(2), 212-228. [27] Engle, R.F., N. Shephard and K. Sheppard, 2008, Fitting vast dimensional time-varying covariance models, working paper, Oxford-Man Institute, University of Oxford. [28] Fan, J., Y. Fan and J. Lv, 2008, High dimensional covariance matrix estimation using a factor model, Journal of Econometrics, 147, 186-197. [29] Fan, J., Y. Li and Y. Ke, 2012, Vast volatility matrix estimation using high frequency data for portfolio selection, Journal of American Statistical Association, 107, 412-428. [30] Fang, H., B. Fang and S. Kotz, 2002, The meta-elliptical distributions with given marginals, Journal of Multivariate Analysis, 82, 1-16. [31] Feller, W., 1970, An Introduction to Probability Theory and Its Applications, Volume II, John Wiley & Sons, USA. [32] Geluk, J.L., L. de Haan and C.G. de Vries, 2007, Weak and Strong Financial Fragility, Tinbergen Institute Discussion Paper TI 2007-023/2. [33] Glosten,L.R.,R.Jagannathan,D.E.Runkle,1993,Ontherelationbetweentheexpectedvalue andthevolatilityofthenominalexcessreturnonstocks, Journal of Finance, 48(5), 1779-1801. 29

[34] GouriØroux,C.andA.Monfort,1996,StatisticsandEconometricModels, Volume2,translated from the French by Q. Vuong, Cambridge University Press, Cambridge. [35] Hansen, B.E., 1994, Autoregressive conditional density estimation, International Economic Review, 35(3), 705-730. [36] Hautsch, N., L.M. Kyj and R.C.A. Oomen, 2012, A blocking and regularization approach to highdimensionalrealizedcovarianceestimation,Journal of Applied Econometrics,27,625-645. [37] Hartmann, P., S. Straetmans and C. de Vries, 2006, Banking system stability: A cross-atlantic perspective, in M. Carey and R.M. Stulz (eds), The Risks of Financial Institutions, University of Chicago Press, Chicago. [38] Hull, J. and A. White, 2004, Valuation of a CDO and an nth to default CDS without Monte Carlo simulation, Journal of Derivatives, 12, 8-23. [39] Kl(cid:252)ppelberg,C.andG.Kuhn,2009,Copulastructureanalysis,Journal of the Royal Statistical Society, Series B, 71(3), 737-753. [40] Krupskii, P. and H. Joe, 2013, Factor copula models for multivariate data, Journal of Multivariate Analysis, 120, 85-101. [41] Joe, H., 1997, Multivariate Models and Dependence Concepts, Monographs in Statistics and Probability 73, Chapman and Hall, London. [42] Judd, K.L, 1998, Numerical Methods in Economics, MIT Press, Cambridge, USA. [43] Laurent, J. and J. Gregory, 2005, Basket default swaps, CDOs and factor copulas, Journal of Risk, 7(4), 1-20. [44] Li, D.X., 2000, On default correlation: A copula function approach, Journal of Fixed Income, 9, 43-54. [45] Marshall,A.W.andI.Olkin,1988,Familiesofmultivariatedistributions,Journal of the American Statistical Association, 83, 834-841. [46] McNeil, A.J., R. Frey and P. Embrechts, 2005, Quantitative Risk Management, Princeton University Press, New Jersey. [47] Min, A. and C. Czado, 2010, Bayesian inference for multivariate copulas using pair-copula constructions, Journal of Financial Econometrics, 8, 511-546. [48] Nelsen, R.B., 2006, An Introduction to Copulas, Second Edition, Springer, U.S.A. [49] Nelson, D.B., 1991, Conditional heteroskedasticity in asset returns: A new approach, Econometrica, 59, 347-370. [50] Oh, D.H. and A.J. Patton, 2013, Simulated method of moments estimation for copula-based multivariate models, Journal of the American Statistical Association, 108, 689-700 [51] Palm, F.C. and J.-P. Urbain, 2011, Factor structures for panel and multivariate time series data, Journal of Econometrics, 163(1), 1-3. 30

[52] Patton, A.J., 2006, Modelling asymmetric exchange rate dependence, International Economic Review, 47(2), 527-556. [53] Patton, A.J., 2012, Copula methods for forecasting multivariate time series, in G. Elliott and A. Timmermann (eds.), Handbook of Economic Forecasting, Volume 2, Elsevier, Oxford. [54] RØmillard,B.,2010,Goodness-of-(cid:133)ttestsforcopulasofmultivariatetimeseries,workingpaper. [55] Rogge, E. and P.J. Sh(cid:246)nbucher, 2003, Modelling Dynamic Portfolio Credit Risk, NCCR FIN- RISK Working paper No. 112. [56] Smith, M., Q. Gan and R. Kohn, 2012, Modeling dependence using skew t copulas: Bayesian inference and applications, Journal of Applied Econometrics, 27, 500-522. [57] van der Voort, M., 2005, Factor copulas: totally external defaults, working paper, Erasmus University Rotterdam. [58] White, H., 1994, Estimation, Inference and Speci(cid:133)cation Analysis, EconometricSocietyMonographs No. 22, Cambridge University Press, Cambridge, U.K. [59] Zimmer, D.M., 2012, The role of copulas in the housing crisis, Review of Economics and Statistics, 94, 607-620. 31

Table 1: Stocks used in the empirical analysis Ticker Name SIC Ticker Name SIC Ticker Name SIC AA Alcoa 333 EXC Exelon 493 NKE Nike 302 AAPL Apple 357 F Ford 371 NOV National Oilwell 353 ABT Abbott Lab. 283 FCX Freeport 104 NSC Norfolk Sth 671 AEP American Elec 491 FDX Fedex 451 NWSA News Corp 271 ALL Allstate Corp 633 GD GeneralDynam 373 NYX NYSE Euronxt 623 AMGN Amgen Inc. 283 GE General Elec 351 ORCL Oracle 737 AMZN Amazon.com 737 GILD GileadScience 283 OXY OccidentalPetrol 131 AVP Avon 284 GOOG Google Inc 737 PEP Pepsi 208 AXP American Ex 671 GS GoldmanSachs 621 PFE P(cid:133)zer 283 BA Boeing 372 HAL Halliburton 138 PG Procter&Gamble 284 BAC Bank of Am 602 HD Home Depot 525 QCOM Qualcomm Inc 366 BAX Baxter 384 HNZ Heinz 203 RF Regions Fin 602 BHI Baker Hughes 138 HON Honeywell 372 RTN Raytheon 381 BK Bank of NY 602 HPQ HP 357 S Sprint 481 BMY Bristol-Myers 283 IBM IBM 357 SLB Schlumberger 138 BRK Berkshire Hath 633 INTC Intel 367 SLE Sara Lee Corp. 203 C Citi Group 602 JNJ Johnson&J. 283 SO Southern Co. 491 CAT Caterpillar 353 JPM JP Morgan 672 T AT&T 481 CL Colgate 284 KFT Kraft 209 TGT Target 533 CMCSA Comcast 484 KO Coca Cola 208 TWX Time Warner 737 COF Capital One 614 LMT Lock(cid:146)dMartn 376 TXN Texas Inst 367 COP Conocophillips 291 LOW Lowe(cid:146)s 521 UNH UnitedHealth 632 COST Costco 533 MA Master card 615 UPS United Parcel 451 CPB Campbell 203 MCD MaDonald 581 USB US Bancorp 602 CSCO Cisco 367 MDT Medtronic 384 UTX United Tech 372 CVS CVS 591 MET Metlife Inc. 671 VZ Verizon 481 CVX Chevron 291 MMM 3M 384 WAG Walgreen 591 DD DuPont 289 MO Altria Group 211 WFC Wells Fargo 602 DELL Dell 357 PM Philip Morris 211 WMB Williams 492 DIS Walt Disney 799 MON Monsanto 287 WMT WalMart 533 DOW Dow Chem 282 MRK Merck 283 WY Weyerhauser 241 DVN Devon Energy 131 MS MorganStanley 671 XOM Exxon 291 EMC EMC 357 MSFT Microsoft 737 XRX Xerox 357 ETR ENTERGY 491 Description Num Description Num SIC 1 Mining, construct. 6 SIC 5 Trade 8 SIC 2 Manuf: food, furn. 26 SIC 6 Finance, Ins 18 SIC 3 Manuf: elec, mach 25 SIC 7 Services 6 SIC 4 Transprt, comm(cid:146)s 11 ALL 100 Notes: This table presents the ticker symbols, names and 3-digit SIC codes of the 100 stocks used in the empirical analysis of this paper. The lower panel reports the number of stocks in each 1-digit SIC group. 32

Table 2: Summary statistics Cross-sectional distribution Mean 5% 25% Median 75% 95% Mean 0.0004 -0.0003 0.0001 0.0003 0.0006 0.0013 Std dev 0.0287 0.0153 0.0203 0.0250 0.0341 0.0532 Skewness 0.3458 -0.4496 -0.0206 0.3382 0.6841 1.2389 Kurtosis 11.3839 5.9073 7.5957 9.1653 11.4489 19.5939 (cid:30) 0.0004 -0.0004 0.0001 0.0004 0.0006 0.0013 0 (cid:30) -0.0345 -0.2045 -0.0932 -0.0238 0.0364 0.0923 1 (cid:30) -0.0572 -0.2476 -0.1468 -0.0719 0.0063 0.1392 m ! 1000 0.0126 0.0024 0.0050 0.0084 0.0176 0.0409 (cid:2) (cid:12) 0.8836 0.7983 0.8639 0.8948 0.9180 0.9436 (cid:11) 0.0240 0.0000 0.0000 0.0096 0.0354 0.0884 (cid:13) 0.0593 0.0000 0.0017 0.0396 0.0928 0.1628 (cid:11) 0.0157 0.0000 0.0000 0.0000 0.0015 0.0646 m (cid:13) 0.1350 0.0000 0.0571 0.0975 0.1577 0.3787 m (cid:26) 0.4155 0.2643 0.3424 0.4070 0.4749 0.5993 (cid:26) 0.4376 0.2907 0.3690 0.4292 0.4975 0.6143 s ((cid:28) +(cid:28) )=2 0.0572 0.0000 0.0000 0.0718 0.0718 0.1437 0:99 0:01 ((cid:28) (cid:28) ) -0.0922 -0.2011 -0.1293 -0.0862 -0.0431 0.0144 0:90 0:10 (cid:0) Notes: This table presents some summary statistics of the daily equity returns data used in the empirical analysis. The top panel presents simple unconditional moments of the daily return series. ThesecondpanelpresentssummariesoftheestimatedAR(1)(cid:150)GJR-GARCH(1,1)modelsestimated on these returns. The lower panel presents linear correlation, rank correlation, average 1% upper andlowertaildependence,andthedi⁄erencebetweenthe10%taildependencemeasures,computed usingthestandardizedresidualsfromtheestimatedAR(cid:150)GJR-GARCHmodel. Thecolumnspresent themeanandquantilesfromthecross-sectionaldistributionofthemeasureslistedintherows. The top two panels present summaries across the N = 100 marginal distributions, while the lower panel presents a summary across the N (N 1)=2 = 4950 distinct pairs of stocks. (cid:0) 33

skcots 001 P&S no snruter yliad rof stluser noitamitsE :3 elbaT lav-p Q (cid:21) 1 (cid:23) (cid:12) MMS (cid:0) rrE dtS tsE rrE dtS tsE rrE dtS tsE 0000.0 9440.0 (cid:150) (cid:150) (cid:150) (cid:150) 5430.0 7106.0 notyalC y 0000.0 0900.0 (cid:150) (cid:150) (cid:150) (cid:150) 1130.0 4359.0 lamroN 0000.0 9110.0 (cid:150) (cid:150) 2920.0 2720.0 6920.0 8629.0 t s(cid:146)tnedutS 0200.0 0100.0 2020.4 5103.8- 3310.0 2350.0 7550.0 6918.0 t wekS 0000.0 1010.0 (cid:150) (cid:150) 5230.0 3320.0 3920.0 5749.0 N t rotcaF (cid:0) 2000.0 8000.0 7650.0 2542.0- 9330.0 2340.0 9920.0 3649.0 N t weks rotcaF (cid:0) 0000.0 8900.0 (cid:150) (cid:150) 7150.0 2410.0 1130.0 3059.0 t t rotcaF (cid:0) 5000.0 7000.0 5150.0 4522.0- 6840.0 7970.0 4130.0 5739.0 t t weks rotcaF (cid:0) lirpA doirep eht revo snruter kcots yliad 001 ot deilppa sledom alupoc suoirav rof stluser noitamitse stneserp elbat sihT :setoN eht sa llew sa ,detneserp era sretemarap ledom alupoc eht rof srorre dradnats citotpmysa dna setamitsE .0102 rebmeceD ot 8002 taht etoN .tset noitcirtser gniyfitnedirevo eht fo eulav-p eht dna sretemarap detamitse eht ta noitcnuf evitcejbo MMS eht fo eulav si alupoc eht sesac lla ni ;alupoc t wekS eht rof ) ; ( ni tub ,sledom alupoc rotcaf eht rof )1;1 ( ni seil (cid:21) retemarap eht 1 1(cid:0) (cid:0) .yticilpmis rof nmuloc siht ni ti troper ew tub (cid:12) ton si alupoc notyalC eht fo retemarap eht taht etoN :0 = (cid:21) nehw cirtemmys y 34

sledom alupoc rotcaf-itlum rof skcots 001 P&S no snruter yliad rof stluser noitamitsE :4 elbaT t t weks rotcaF t t rotcaF t wekS t s(cid:146)tnedutS lamroN (cid:0) (cid:0) rrE dtS tsE rrE dtS tsE rrE dtS tsE rrE dtS tsE rrE dtS tsE 5540.0 2990.0 2740.0 3660.0 9600.0 8840.0 9620.0 8270.0 - - 1 (cid:23) (cid:0) 0550.0 3222.0- - - 0680.1 7956.9- - - - - (cid:21) 9380.0 7542.1 0080.0 6392.1 0411.0 1301.1 4570.0 3772.1 9080.0 7203.1 (cid:12) 1 2930.0 7488.0 2630.0 8258.0 3370.0 3437.0 6830.0 5038.0 6730.0 6198.0 (cid:12) 2 1730.0 0230.1 9830.0 3220.1 8360.0 5219.0 0830.0 9389.0 3630.0 1379.0 (cid:12) 3 5140.0 3609.0 5730.0 4609.0 5170.0 9397.0 7630.0 1578.0 6830.0 6249.0 (cid:12) 4 1550.0 9149.0 7250.0 5179.0 6180.0 1718.0 3250.0 6719.0 5550.0 9510.1 (cid:12) 5 7540.0 5560.1 1440.0 0580.1 3370.0 5359.0 5340.0 3750.1 1440.0 8101.1 (cid:12) 6 1060.0 8021.1 5350.0 7501.1 9380.0 6450.1 4650.0 2190.1 4750.0 4590.1 (cid:12) 7 2850.0 2980.1 5950.0 6650.1 2950.0 2920.1 3060.0 6369.0 8450.0 9330.1 (cid:13) 1 7540.0 1022.0 5120.0 5233.0 1140.0 4743.0 8830.0 6913.0 4410.0 8134.0 (cid:13) 2 6990.0 1071.0 7270.0 7412.0 8540.0 2242.0 4930.0 3233.0 5910.0 6214.0 (cid:13) 3 5940.0 0472.0 3720.0 0743.0 9820.0 6133.0 8230.0 6273.0 5320.0 7704.0 (cid:13) 4 8440.0 9545.0 2130.0 4125.0 4230.0 0615.0 0030.0 1585.0 3040.0 5644.0 (cid:13) 5 7040.0 6865.0 4720.0 2525.0 6820.0 1855.0 1530.0 2585.0 2820.0 2216.0 (cid:13) 6 8450.0 4393.0 4340.0 6763.0 9880.0 3532.0 4640.0 4865.0 0830.0 6565.0 (cid:13) 7 9810.0 1931.0 6620.0 7651.0 7851.0 Q MMS 2270.0 0000.0 7340.0 0000.0 0000.0 eulav-p J 0000.0 0000.0 0000.0 0000.0 0000.0 i 0 = (cid:13) i 8 0000.0 0000.0 0000.0 0000.0 0000.0 i 0 = (cid:12) i 8 0000.0 0000.0 0000.0 0000.0 0000.0 j;i (cid:13) = (cid:13); (cid:12) = (cid:12) j i j i 8 snoitcellocnosnruteryliadderetl(cid:133)otdeilppasledomalupocrotcaf-itlumsuoiravrofstlusernoitamitsestneserpelbatsihT:setoN sretemarap ledom eht rof srorre dradnats citotpmysa dna setamitsE .0102 rebmeceD ot 8002 lirpA doirep eht revo skcots 001 fo lla ni ;alupoc t wekS eht rof ) ; ( ni tub ,sledom alupoc rotcaf eht rof )1;1 ( ni seil (cid:21) retemarap eht taht etoN .detneserp era 1 1(cid:0) (cid:0) no stneic ¢eoc eht no stniartsnoc fo stset morf seulav-p tneserp swor eerht mottob ehT :0 = (cid:21) nehw cirtemmys si alupoc eht sesac .srotcaf eht 35

Table 5: Rank correlation and tail dependence implied by a multi-factor copula model SIC 1 SIC 2 SIC 3 SIC 4 SIC 5 SIC 6 SIC 7 Rank correlation SIC 1 0:72 SIC 2 0:41 0:44 SIC 3 0:44 0:45 0:51 SIC 4 0:41 0:42 0:45 0:46 SIC 5 0:39 0:40 0:44 0:41 0:53 SIC 6 0:42 0:43 0:47 0:43 0:42 0:58 SIC 7 0:45 0:46 0:50 0:46 0:44 0:47 0:57 Lower Upper tail dependence n SIC 1 0:99 0:74 0:02 0:07 0:02 0:03 0:09 0:13 n SIC 2 0:70 0:70 0:02 0:02 0:02 0:02 0:02 0:02 n SIC 3 0:92 0:70 0:92 0:07 0:02 0:03 0:07 0:07 n SIC 4 0:75 0:70 0:75 0:75 0:02 0:02 0:02 0:02 n SIC 5 0:81 0:70 0:81 0:75 0:81 0:03 0:03 0:03 n SIC 6 0:94 0:70 0:92 0:75 0:81 0:94 0:09 0:09 n SIC 7 0:96 0:70 0:92 0:75 0:81 0:94 0:96 0:14 n Notes: This table presents the dependence measures implied by the estimated skew t t factor (cid:0) copula model reported in Table 9. This model implies a block equidependence structure based on theindustrytowhichastockbelongs, andtheresultsarepresentedwithintra-industrydependence inthediagonalelements,andcross-industrydependenceintheo⁄-diagonalelements. Thetoppanel present rank correlation coe¢ cients based on 50,000 simulations from the estimated model. The bottom panel presents the theoretical upper tail depedence coe¢ cients (upper triangle) and lower tail dependence coe¢ cients (lower triangle) based on Propositions 2 and 3. 36

Table 6: Performance of methods for predicting systemic risk MSE RelMSE Cut-o⁄ -2% -4% -2% -4% Marginal Expected Shortfall (MES) Brownlees-Engle 0.9961 1.2023 0.7169 0.3521 Historical 1.1479 1.6230 1.0308 0.4897 CAPM 1.1532 1.5547 0.9107 0.4623 Normal copula 1.0096 1.2521 0.6712 0.3420 t copula 1.0118 1.2580 0.6660 0.3325 Skew t copula 1.0051 1.2553 0.6030 0.3040 Skew t t factor copula 1.0012 1.2445 0.5885 0.2954 (cid:0) k-Expected Shortfall (kES) Historical 1.1632 1.6258 1.4467 0.7653 Normal copula 1.0885 1.4855 1.3220 0.5994 t copula 1.0956 1.4921 1.4496 0.6372 Skew t copula 1.0898 1.4923 1.3370 0.5706 Skew t t factor copula 1.0822 1.4850 1.1922 0.5204 (cid:0) Notes: This table presents the MSE (left panel) and Relative MSE (right panel) for various methods of estimating measures of systemic risk. The top panel presents results for marginal expectedshortfall(MES),de(cid:133)nedinequation(19),andthelowerpanelpresentsresultsfork-expected shortfall(kES),de(cid:133)nedinequation(20),withk setto30. Twothresholdsareconsidered,C = 2% (cid:0) andC = 4%:Thereare70and21(cid:147)event(cid:148)daysforMES underthesetwothresholds, and116and (cid:0) 36 (cid:147)event(cid:148)days for kES: The best-performing model for each threshold and performance metric is highlighted in bold. 37

Normal copula t(4) t(4) factor copula 2 2 0 0 2 2 2 0 2 2 0 2 Skew Norm Norm factor copula Skew t(4) t(4) factor copula 2 2 0 0 2 2 2 0 2 2 0 2 Figure 1: Scatter plots from four bivariate distributions, all with N(0,1) margins and linear correlation of 0.5, constructed using four di⁄erent factor copulas. . 38

Expected number of remaining stocks that will crash, conditional on observing j crashes, q=1/66 40 Skew t factor copula 35 Clayton copula t copula 30 Normal copula 25 re b m 20 u N 15 10 5 0 0 20 40 60 80 100 Number of observed crashes (j) Expected proportion of remaining stocks that will crash, conditional on observing j crashes, q=1/66 0.8 0.7 0.6 0.5 n o itro 0.4 p o rP 0.3 0.2 0.1 0 0 20 40 60 80 100 Number of observed crashes (j) Figure 2: Conditional on observing j out of 100 stocks crashing, this (cid:133)gure presents the expected number (upper panel) and proportion (lower panel) of the remaining (100-j) stocks that will crash. (cid:147)Crash(cid:148)events are de(cid:133)ned as returns in the lower 1/66 tail. 39

"Scree" plots for four simulations K = 1 K = 2 3 3 s e u la 2 2 v n e g 1 1 iE 0 0 0 10 20 0 10 20 K = 4 K = 8 3 3 s e u la 2 2 v n e g 1 1 iE 0 0 0 10 20 0 10 20 Largest to smallest Largest to smallest Figure 3: Each panel of this (cid:133)gure shows the ordered eigenvalues of the sample rank correlation matrix from a 100-dimensional factor copula with K common factors. In all cases the (cid:133)rst eigenvalue is much larger than 3 and is cropped from the (cid:133)gure, and the horizontal axis is truncated at 28 for clarity. Eigenvalues of rank correlations of std resids of S&P 500 stock returns Eigenvalues 5 4 e u la v 3 n e g iE 2 1 0 0 5 10 15 20 25 30 35 Largest to smallest Figure 4: Plot of the ordered eigenvalues of the sample rank correlation matrix of the estimated standardized residuals. The largest eigenvalue is much larger than 5 and is truncated, and the horizontal axis is truncated at 38 for clarity. 40

Figure 5: Conditional on observing j out of 100 stocks crashing, this (cid:133)gure presents the proportion of the remaining (100-j) stocks that will crash. (cid:147)Crash(cid:148)events are de(cid:133)ned as returns in the lower 1/22 (upper panel) and 1/66 (lower panel) tail. Note that the horizontal axes in these two panels are di⁄erent, due to the limited information in the joint tails. 41

Cite this document

APA

Dong Hwan Oh and Andrew J. Patton (2015). Modelling Dependence in High Dimensions with Factor Copulas (FEDS 2015-051). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2015-051

BibTeX

@techreport{wtfs_feds_2015_051,
  author = {Dong Hwan Oh and Andrew J. Patton},
  title = {Modelling Dependence in High Dimensions with Factor Copulas},
  type = {Finance and Economics Discussion Series},
  number = {2015-051},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2015},
  url = {https://whenthefedspeaks.com/doc/feds_2015-051},
  abstract = {This paper presents flexible new models for the dependence structure, or copula, of economic variables based on a latent factor structure. The proposed models are particularly attractive for relatively high dimensional applications, involving fifty or more variables, and can be combined with semiparametric marginal distributions to obtain flexible multivariate distributions. Factor copulas generally lack a closed-form density, but we obtain analytical results for the implied tail dependence using extreme value theory, and we verify that simulation-based estimation using rank statistics is reliable even in high dimensions. We consider "scree" plots to aid the choice of the number of factors in the model. The model is applied to daily returns on all 100 constituents of the S&P 100 index, and we find significant evidence of tail dependence, heterogeneous dependence, and asymmetric dependence, with dependence being stronger in crashes than in booms. We also show that factor copula models provide superior estimates of some measures of systemic risk.},
}