feds · February 28, 2011

Cointegration Test with Stationary Covariates and the CDS-Bond Basis during the Financial Crisis

Abstract

This paper proposes a residual based cointegration test with improved power. Based on the idea of Hansen (1995) and Elliott & Jansson (2003) in the unit root testing case, stationary covariates are used to improve the power of the residual based Augmented Dickey Fuller (ADF) test. The asymptotic null distribution contains difficult to estimate nuisance parameters for which there is no obvious method of estimation, therefore we propose a bootstrap methodology to obtain test critical values. Local-to-unity asymptotics and Monte Carlo simulations are used to evaluate the power of the test in large and small samples, respectively. These exercises show that the addition of covariates increases power relative to the ADF and Johansen tests, and that the power depends on the long-run correlation between the covariates and the cointegration candidates. The new test is used to test for cointegration between Credit Default Swap (CDS) and corporate bond spreads for a panel of U.S. firms during the 2007-2009 financial crisis. The new test finds stronger evidence for cointegration between the two spreads for more firms, relative to ADF and Johansen tests.

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Cointegration Test with Stationary Covariates and the CDS-Bond Basis during the Financial Crisis Jason J. Wu and Aaron L. Game 2011-18 NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Cointegration Test with Stationary Covariates and the CDS-Bond Basis during the Financial Crisis Aaron L. Game Jason J Wu ∗ Federal Reserve Board Federal Reserve Board March 28, 2011 Abstract This paper proposes a residual based cointegration test with improved power. Based on the idea of Hansen (1995) and Elliott & Jansson (2003) in the unit root testing case, stationary covariates are used to improve the power of the residual based Augmented Dickey Fuller (ADF) test. The asymptotic null distribution contains difficult to estimate nuisance parameters for which there is no obvious method of estimation, therefore we propose a bootstrap methodology to obtain test critical values. Local-to-unity asymptotics and Monte Carlo simulations are used to evaluate the power of the test in large and small samples,respectively. These exercisesshowthatthe additionofcovariatesincreasespowerrelativetothe ADF andJohansentests, andthat the powerdepends on the long-runcorrelationbetween the covariates and the cointegration candidates. The new test is used to test for cointegration between Credit Default Swap (CDS) and corporate bond spreads for a panel of U.S. firms during the 2007-2009 financial crisis. The new test finds stronger evidence for cointegration between the two spreads for more firms, relative to ADF and Johansen tests. KEYWORDS: Cointegration, Stationary Covariates, Local Asymptotic Power, CDS Basis. JEL Classifications: C12, C22, G12 ∗ This article represents the views of the authors and should not be interpreted as reflecting the views of the Board of GovernorsoftheFederalReserveSystemorothermembersofitsstaff. E-mails: aaron.l.game@frb.gov andjason.j.wu@frb.gov. 1

1 Introduction Tests for cointegration are important tools for empirical macroeconomics and finance. Residual based tests for the null of no cointegration, pioneered by Engle & Granger (1987), have the advantages of computational ease and good small sample size properties. These tests involve running regressions and forming simple test statistics. However, residual based tests suffer from low power under the alternative hypothesis. Among other papers, this problem was highlighted by Pesavento (2004), whofindsthat while residualbased tests have good size in most cases, their power disadvantage relative to system-based cointegration tests is significant. The goal of this paper is to construct a more powerful residual based cointegration test. In empirical analysis, researchers often have data on variables other than the cointegration candidates. For instance, when testing for Purchasing Power Parity (PPP), time series for GDP and money growth rates are observed together with exchange rates and prices (see Amara & Papell (2006)). These variables, or covariates, may be helpful in uncovering cointegration relationships. The idea of this paper is to take advantage of these covariates in testing for cointegration. The inclusion of stationary covariates has been shown to improve the power of tests under local-to-unity alternatives in the univariate setting. Hansen (1995) first proposed a unit root test where the leads and lags of stationary covariates are included in the inference. Elliott & Jansson (2003) provided point optimal unit root tests that include stationary covariates in presence of deterministic trends. In the multivariate setting, Jansson (2004) shows that stationary covariates can be used to increase power of tests with the null of cointegration. In addition, Seo (1998) shows that covariates significantly improve the power of Johansen rank tests, while Rahbek & Mosconi (1999) study the asymptotic implications of covariate inclusion. We add to the work described above by including stationary covariates in the construction of the Augmented Dickey-Full (ADF) cointegration test. Intuitively, when stationary covariates related to the cointegration candidates are included in the residual regression, parameters of the regression are more precisely estimated, resultingin a more powerfultest. Thenew test is named theCovariate Augmented Dickey-Fuller (CADF) test. The extent of power improvement depends on the long-run correlations between the stationary covariates and cointegration candidates. Asymptotic analysis shows that the local-to-unity power functions of the CADF test depends critically on these long-run correlations. Not surprisingly, when the covariates and cointegration candidates have zero long-run correlations, the power functions are the same as those of the ADF test. 2

Large sample Monte Carlo simulations are used to illustrate the asymptotic results, revealing two interesting facts. First, the power of ADF test serves as the lower bound for the power of the CADF test, in all experiments conducted. This means that asymptotically, the CADF test does at least as well as the ADF test. Second, the power of the CADF test is the highest when the covariates are highly correlated with both the cointegration error as well as the right hand side variables in the cointegration relationship. Deriving asymptotic critical values for the CADF test is difficult due to the presence of nuisance parameters in theasymptotic nulldistribution. As pointed outby Elliott & Pesavento (2009), thereare noobvious ways to estimate the nuisance parameters. Therefore, we propose a bootstrap procedure to obtain critical values in finite samples. Small sample Monte Carlo simulations are conducted to assess the performance of the bootstrapped CADF tests under various cases of deterministic trends and various correlation scenarios. They show that the CADF test has reasonable size and good power in finite samples relative to not only the ADF test, but the Johansen test as well. In an empirical application of the new test, we investigate whether there are cointegrating relationships between Credit Default Swap (CDS) spreads and corporate bond spreads, for 24 US firms during the 2007- 2009financialcrisis. Previouswork(see, forinstance, Blancoetal.(2005), Zhu(2006), DeWit(2006), Levin et al. (2005), Norden & Weber (2009)) establishes that cointegration between CDS and bond spreads holds for most firms during benign economic periods. However, it may be the case that traditional cointegration tests used in these studies cannot as easily detect the same relationships during the recent crisis, due to the unprecedented levels of market volatility and uncertainty. The CADF test allows us to partially control for such factors through the use of covariates such as the and VIX index returns and the Libor-OIS spread. Indeed, the CADF test finds that cointegration between CDS and bond spreads holds for most firms during the crisis. In comparison, results from the ADF and Johansen tests find cointegration for less firms. The remainder of the paper will be organized as follows: section 2 describes the model, assumptions, test statistic, and bootstrap inference. It also contains asymptotic analysis of the power of the CADF test. Section 3 investigates the power of the CADF test in large and small samples using simulations. Section 4 presents CADF tests for cointegration between CDS and bond spreads during the financial crisis, and section 5 concludes. The appendix contains mathematical proofs, tables and figures. 3

2 The CADF Test and Asymptotics 2.1 Model Consider the following system: Y = µ +τ t+β X +ε (1) t Y Y ′ t t (1 ρL)ε 0 t −  ∆X t  = µ X +τ X t +ξ t (ρ) (2)      Z t   µ Z +τ Z t          Where ξ (ρ) is a vector of scalar (1 ρL)ε for ρ [ 1,1], ∆X of dimension n, and Z of dimension m. t t t t − ∈ − Y and X are the candidates for cointegration. Z are stationary covariates to be be utilized in the CADF t t t test. For brevity and in order to keep notation simple, theoretical work in this paper is based on the case of no deterministic components, i.e., µ ,µ ,µ and τ ,τ ,τ are set equal to zero. In section 3, extensive X Y Z X Y Z simulation evidence is presented on the performance of the proposed test when deterministic components are present. The hypothesis of interest is H : ρ= 1 0 H : ρ < 1 A | | Y and X are cointegrated under H , and β is the cointegrating vector. t t A Assumptions 1. (Weak Convergence of ξ (ρ)) t 1. ξ (ρ) is a stationary process with zero mean, finite variance and continuous spectral density f (λ), t ξ { } for λ [0,π]. ∈ 2. ξ (ρ) = O (1). 0 p 3. For r (0,1] and [Tr] denoting the integer part of Tr, as T ∈ → ∞ [Tr] T 1/2 ξ (ρ) Ω1/2W(r) − t ⇒ t=1 X 4

Where W(r) a (1 + n + m) 1 standard vector Brownian motion, partitioned conformably into × ′ ′ (W (r),W (r),W (r)), Ω a positive definite long run variance-covariance matrix ε X Z ′ ′ ω ω ω εε εX εZ Ω ≡  ω εX Ω XX ω X ′ Z  ≡ 2πf ξ (0)    ω εZ ω XZ Ω ZZ      And denotes weak convergence. Furthermore, assume that each element in the sigma-algebra ⇒ σ( ξ (ρ) ) is independent of W(r). { t } ∞t=1 Assumption 1.3 may be derived from more primitive assumptions (see, for instance, Phillips & Durlauf (1986), Phillips & Solo (1992), Phillips & Ouliaris (1990)). We impose, rather than derive, assumption 1.3 since it is now a standard result that holds under very general conditions. We also define an alternative decomposition of Ω that is useful in presenting the asymptotic results that follow as: ′ ω ω εε εQ Ω ≡   ω Ω εQ QQ   Where ω εQ = ω ε ′ X ω ε ′ Z ′ and Ω QQ is the long run variance matrix of Q t ≡ ∆X ′ t Z ′ t ′. h i h i Assumptions 2. (Conditions for Deriving CADF Regression) 1. For δ > 0 and λ [0,π], f (λ) δI . ξ n+m ∈ ≥ 2. DefineΓ(j) E(ξ (ρ)ξ ′ (ρ))andthefollowingmatrixnorm: forag hmatrixA, A = sup (xAAx)1/2 : ≡ t t+j × || || { ′ ′ x h,(xx) < 1 . It is required that ′ ∈ R } ∞ Γ(j) < || || ∞ j= X−∞ 3. Define R2 ω 1ω ′ Ω 1 ω and R2 ω 1ω ′ Ω 1ω . It is required that R2 < 1 and R2 < 1. εX ≡ ε−ε εX −XX εX εQ ≡ ε−ε εQ −QQ εQ εX εQ Assumption 2.1 bounds the spectral density of ξ (ρ) away from zero, assumption 2.2 is the absolute t summability of ξ (ρ)’s covariance function, guaranteeing limited serial dependence, and assumption 2.3 t guarantees that the partial sums of the stationary covariates Z are not cointegrated with either Y or X . t t t { } ′ ′ The assumptions are fairly weak as Z is not required to be a vector autoregression along with (Y ,X ) , t t t 5

nor does it have to be weakly exogenous. Furthermore, in the residual based framework distributional assumptions or conditional moment restrictions are not required. For these reasons, the CADF framework is more flexible than the powerful Johansen rank test of Seo (1998). With assumptions 2.1-2.2, we derive the CADF regression. Proposition 1. (CADF Regression) Suppose data is generated by (1) and (2) and assumptions 1.1, 2.1 and 2.2 are satisfied. Then the following equation holds ∞ ∞ ′ ∞ ′ ∆ε = θ ε + θ ∆ε + θ ∆X + θ Z +ζ (3) t 0 t 1 ε,j t j X,j t j Z,j t j t − − − − j=1 j= j= X X−∞ X−∞ Where { ζ t } isawhitenoiseprocess withE(∆X t ζ t+j )= E(Z t ζ t+j ) = 0forj = 0, ± 1, ± 2,..., ∞j= || θ X,j || < −∞ ∞ , ∞j= || θ Z,j || < ∞ and ∞j=1| θ ε,j | < ∞ . Moreover, under H 0 , θ 0 = 0. P −∞ P P Proof. Under assumptions 1.1, 2.1 and 2.2, ∞ ′ ∞ ′ (1 ρL)ε = π˜ ∆X + π˜ Z +η (4) − t X,j t − j Z,j t − j t j= j= X−∞ X−∞ With ∞j= || π˜ X,j || < ∞ and ∞j= || π˜ Z,j || < ∞ , { η t } a stationary process with E(∆X t η t+j ) = −∞ −∞ E(Z t η tP+j ) = 0 for j = 0, 1, 2P,...(see, for instance, Saikkonen (1991), equation (18)). Since η t is ± ± { } stationary and zero mean, by Wold representation, it is true that φ(L)η = ζ for an absolutely summable t t lag polynomial φ(L) and white noise process ζ . Multiplying φ(L) onto (4) and rearranging to arrive at t { } (3) with θ φ(1)(ρ 1). Hence, θ = 0 under H . Since coefficients in both (4) and φ(L) are absolutely 0 0 0 ≡ − summable, so are the coefficients in (3). Finally, zero correlations between ∆X and Z with η in all leads t t t and lags implies zero correlations between ∆X and Z with ζ in all leads and lags. t t t Notice that unlike the traditional ADF test, the leads and lags of the covariates, as well as those of ∆X , t are included in the CADF regression. Proposition 1 provides the motivation for deriving a test based on a feasible version of (3). 6

2.2 Test Statistic ε is typically not observed unless the cointegrating vector is pre-specified, therefore an estimate of β t { } is required. We consider the OLS estimate of the cointegrating vector.1 Let β be the estimate of the cointegrating vector and ε Y β ′ X be the residuals.2 Noting that ε = ε (β β)X , using (4), t ≡ t − t t tb− − ′ t similar to the derivation of (3), b b b b ∞ ∞ ′ ∞ ′ ∆ε = αε + π ∆ε + π ∆X + π Z +(ρ 1)(β β)ψ(L)X +v (5) t t − 1 ε,j t − j X,j t − j Z,j t − j − − ′ t − 1 t j=1 j= j= X X−∞ X−∞ b b b b Where conditional on β β, v ψ(L)(η (β β)∆X ) is a stationary white noise process, ψ(L) t t ′ t − { } ≡ { − − } and all coefficients in (5) are absolutely summable. Define α ψ(1)(ρ 1) and the truncation lag k. With b b ≡ − data, one can run the truncated regression k k k ′ ′ ∆ε = αε + π ∆ε + π ∆X + π Z +v (6) t t 1 ε,j t j X,j t j Z,j t j t,k − − − − j=1 j= k j= k X X− X− b b b where v (ρ 1)(β β)ψ(L)X +ς +v t,k ′ t 1 t,k t ≡ − − − ′ ′ ς π ∆ε + π ∆X + π Z t,k ≡ ε,j b t − j X,j t − j Z,j t − j j>k j>k j>k X |X| |X| b A t-statistic to test H is computed as 0 α t αb (7) ≡ s.e.(α) b where s.e.(α) is the usual standard error for t-statistics. We recommend applying the Bayesian Information b Criterion (BIC) to (6) in order to (jointly) select Z and k.3 Monte Carlo simulations in section 3 and the t b empirical application in section 4 use BIC to select k. After experimentation with the Akaike Information Criterion (AIC) and BIC, BIC was preferred as it tends to select more parsimonious lag structures. 1There are many alternatives to the OLS, some are shown to be superior to OLS in terms of efficiency (see, for instance, Saikkonen 1991). Wechoose touse OLS since it is most commonly used in practice and simple towork with theoretically. 2Sinceonly{εb}isavailable(andnot{ε }),thecoefficientsin(3)cannotbeidentified. However,sincethepurposeistotest t t whether oneof thecoefficients is zero, identification up toa re-parameterization suffices. 3Itisalso possibletochoosedifferentlead andlaglengthsintheregression for∆εb,∆X andZ . Fortheoreticalsimplicity, t t t assume k is common for all three. 7

2.3 The Bootstrap The asymptotic null distribution depends on difficult to estimate nuisance parameters (more specifically, as shown in the next section, R2 and R2 ). This is closely related to an issue pointed out by Elliott & εX εQ Pesavento (2009) regarding the long run correlation parameter between what would be the equivalent of (1 ρL)ε and ∆X of this paper. The authors on p1832 note that “...in practice, this parameter is not t t − only unknown, but also, under the null and local alternative, there is no obvious way to obtain a good estimate of this parameter”. In light of this difficulty, we propose a bootstrap inference instead of relying on asymptotics. In particular, the bootstrap inference is designed to take into account the following cases of deterministic trends: Case 1. µ ,µ ,µ and τ ,τ ,τ = 0; Y ,X and Z are neither de-meaned nor de-trended prior to X Y Z X Y Z t t t inference. Case 2. µ ,µ and τ ,τ ,τ = 0, µ = 0; Y ,X and Z are de-meaned prior to inference. X z X Y Z Y t t t 6 Case 3. τ ,τ = 0, µ ,µ ,µ ,τ = 0; Y ,X andZ arede-meaned and de-trendedprior toinference. X Z X Y z Y t t t 6 These three cases are considered in Pesavento(2004, 2007), with case 1 being the case considered in the theoretical work that follows. Let µ ,µ ,µ and τ ,τ ,τ be OLS estimates of the means and trends. X Y Z X Y Z Following the procedures of Paparoditis & Politis (2003) and Badillo et al. (2010), the bootstrap null b b b b b b distribution of t αb can be constructed by the following steps: Step 1. If the deterministic trend follows cases 2 or 3, then de-mean, or de-mean and de-trend Y and t X . Estimate β and ε using this data. Run ε = γ+ρε +u .4 t t t t 1 t − Step 2. Chooseba posbitive integer b. Define kb= [(Tb 1b)b/b] whebre [] is the integer part. Let i ,...,i 0 k 1 − · − be random i.i.d. draws from the uniform distribution on 1,2,...,T b . We generate pseudo series { − } for ε . Set ε = ε , and for t = 2,...,kb+1, t ∗1 1 b b ε = ε +u ∗t ∗t − 1 i [(t−2)/b] +t − [(t − 2)/b]b − 1 b b b Step 3. Now construct pseudo series for Y ,X and Z that reflect the various cases of deterministic t t t trends. Specifically, for t = 1,...,kb+1, 4Theinclusionoftheconstanttermγ followsfromthecenteringprocedureinequation(2.1)ofPaparoditis&Politis(2003). 8

′ Case 1. ∆X = ∆X µ τ t, Z = Z µ τ t, and Y = β X +ε ∗t t − X − X ∗t t − Z − Z t∗ ∗t ∗t ′ Case 2. ∆X ∗t = ∆X t − µ bX − τ bX t, Z ∗t = Z t − µ bZ − τ bZ t, and Y t∗ = µbY +β X b∗t +ε ∗t ′ Case 3. ∆X ∗t = ∆X t − τ bX t, Z b∗t = Z t − τ Z t, an b d Y t∗b = µ Y +τ Y t+ b β X ∗t +b ε ∗t b Step 4. Finally, with pseudobdata (Y ,X ′ ,Zb ′ ), de-mean obr de-mbean abnd de-trbend under the approt∗ ∗t ∗t ′ priate deterministic case, and compute t . Repeat steps 1-4 a large number of times to obtain the ∗αb bootstrap null distribution for t αb. The bootstrap randomly draws blocks (of length b) of u , and uses it to generate pseudo data for ε under t t H , which is in turn used to generate pseudo data for Y . In step 3, the deterministic components were 0 t b imposed on the variables. While we do not study the theoretical properties of the bootstrap in this paper, our simulations indicate that bootstrap inference works well. Readers interested in theoretical properties of the block bootstrap are referred to Paparoditis & Politis (2003) for a formal discussion in the case of unit root testing. 2.4 Asymptotics We are interested in the distribution of t αb under a local-to-unity version of H A . This section gives precise statements as to how the distribution for t αb is different from the distribution of the ADF test. Following Phillips (1987), Hansen (1995), and Pesavento (2004), re-define H so that for some constant c< 0, A c H : ρ= 1+ (8) A T so that ρ < 1 when T finite but ρ 1 as T . One more assumption is imposed: → → ∞ Assumptions 3. (Rate of Divergence of k) The truncation lag k in (6) satisfies k as T , with the bound that T 1/3k 0. − → ∞ → ∞ → Assumption 3 allows k to increase with the sample size T in order for (6) to closely approximate (5), but at a moderate rate so that the dimension of the regressors is reasonable. Ng & Perron (1995) shows in the unit root testing case, our preferred model selection criterion BIC satisfies assumption 3. For a symmetric positive definite matrix A, define its Cholesky and inverse Cholesky decompositions as A 1 2 ′ A 1 2 = A and A −2 1 A − 1 2 ′ = A − 1. Unless otherwise stated, let B ≡ 0 1B(r)dr for some vector stochastic process B(r). R R 9

Define R2 J (r) εX W (r)+W (r) εX ≡ s1 R2 X ε − εX r Jc (r) J (r)+c f exp(c(r s))J (s)ds εX ≡ εX − εX Z0 where W (r) is a univariate standard Brownian motion, independent of W (r) and W (r). Also, define X ε Z D f c  r 1 1 − − R R 2 ε 2 ε X Q J ε c X dW ε J ε c X dW X ′  Dc (J ε c X )2 J ε c X W X ′ 1 ≡  1 1 − R R 2 2 εQ R W X dW ε R W X dW X ′  2 ≡  RJ ε c X W X RW X W X ′   r − εX     R R1 R2  R R F ≡  1 − − R2 ε ε X Q 0  Bc ≡ 1 − W X ′ J ε c X W X W X ′ − 1 ′ 0 I n h (cid:16) (cid:17)(cid:16) (cid:17) i R R   Square matrices Dc,Dc and F are of dimension n+1, and Bc is an (n+1) vector. Dc and Bc in particular 1 2 2 are common expressions that appear in asymptotics for residual based tests (e.g., Phillips & Solo (1992), Pesavento (2004)). Lastly, let ω ω (1 R2 ) and ω ω (1 R2 ). ε · X ≡ εε − εX ε · Q ≡ εε − εQ Lemma 1. Let the data be generated by (1) and (2) and assume that assumptions 1 hold. If (8) is true, then as T → ∞ 1. 1 β − β ⇒ ω ε 1/ X 2 Ω− X 1 X /2 W X W X ′ − W X J ε c X (9) · (cid:18)Z (cid:19) (cid:18)Z (cid:19) b 2. If in addition assumptions 2 and 3 hold, then ψ(1)Bc′ DcBc (T 2k)(α α) 1 (10) − − ⇒ Bc′ DcBc 2 ψ(1)(Bc′ FBc)1/2 (T 2k)sb.e.(α) − ⇒ (Bc′ DcBc)1/2 2 Proof. See appendix. b Proposition 2 is the main result of this paper. Proposition 2. (Asymptotic local-to-unity Power of CADF Test) Let the data be generated by (1) and (2) and assume that assumptions 1, 2, and 3 hold. If (8) is true, then as T → ∞ Bc′ DcBc (Bc′ DcBc)1/2 t αb ⇒ (Bc′ DcBc)1/2( 1 Bc′ FBc)1/2 +c (Bc′ F 2 Bc)1/2 (11) 2 10

Proof. Noting that α = ψ(1)(c/T), (T 2k)(α α) ψ(1)(T 2k) t αb = − − +c − (T 2k)s.e.(α) T(T 2k)s.e.(α) − − b This together with Lemma 1 proves the proposition. b b Thus, the influence of the covariate feeds through R2 , the correlation between (1 ρ)ε and Q . To εQ − t t further understand the role of the covariates, consider the case where the covariates have no long run correlation with the cointegration candidates. That is, ω and ω = 0. In this case, observe that εZ XZ R2 =R2 . This means that now Dc = Dc, where εQ εX 1 1 e Jc dW Jc dW ′ Dc εX ε εX X 1 ≡  ′  RW dW RW dW X ε X X e   R R Furthermore, F = I , and n+1 Bc′ DcBc (Bc′ DcBc)1/2 t αb ⇒ (Bc′ DcBc)1/2 1 (Bc′ Bc)1/2 +c (Bc′ B 2 c)1/2 (12) 2 e This is the corresponding asymptotic distribution of the ADF test as the covariates have no long run correlations with the cointegration candidates. To the best of our knowledge, (12) is itself a new finding, since the inclusion of leads and lags of ∆X in the ADF regression removes R2 from the asymptotic t εX distribution except where it is embedded in Jc .5 εX 3 Simulations 3.1 Large Sample Power The local-to-unity asymptotic distribution in proposition 2 can be used to assess large sample power of the CADF test. We numerically construct the distribution, for c = 0, 5, 10, and -20 using 3,000 samples of − − Gaussian innovations. Each sample has the size of 3,000, and the innovations are used in constructing the functionals present in the right hand side of (6). Power is then calculated, for c = 5, 10 and -20, as the − − mass of the distribution to the left of the 5% critical value of the c= 0 distribution. Note that the test only depends on R2 and R2 . Nonetheless, it is more intuitive to express power as εX εQ a function of pairwise correlations ω , ω , and ω . We set n = m = 1 and all long run variances equal εX εZ XZ 5Compared to, say,the ADFdistribution in Pesavento (2004). 11

to one. As such, R2 = ω2 and R2 = ω ε 2 X− 2ωεXωεZωXZ+ω ε 2 Z. Figures 1-3 display the power surfaces across εX εX εQ 1 ω2 − XZ different values of ω , ω ,ω and c. εX εZ XZ [Insert Figures 1-3] As expected, for a given combination of ω , ω , and ω , the power increases monotonically as c εX εZ XZ decreases. Comparing the graphs in each of the figures with the top-left graph of that figure, it is also clear that the power function mimics the shape of R2 , although the exact shape varies. Throughout the figures, εQ in general the CADF has high power when ω and ω are large in magnitude, either with different signs εX εZ when ω is positive, or with the same signs when ω is negative. A heuristic interpretation of these XZ XZ conditions is that power is highest when the covariates Z convey different information about (1 ρL)ε t t − than X . t Importantly, the ADF tests (corresponding to the point on the graphs where ω and ω = 0) always εZ XZ have the lowest power. For instance, when R2 = 0 and c = 5 (top-right graph of figure 2), the ADF εX − test has a power of roughly 20%, while the power of the CADF test could reach 60%. Asymptotically, one cannot do worse in terms of power by using the CADF test instead of the ADF test. 3.2 Small Sample Size and Power In this section we study the small sample size and power of the CADF test, and compare the size and power to those of the ADF and Johansen λ tests. This exercise is important because it is well known that max residual based tests are typically less powerful than Johansen’s test in small samples. Furthermore, using these simulations, we study the effects of the presence of deterministic trends. Pseudo time series of length 200 are generated in the following way: for each ρ .8,.9,1 ∈ { } (1 ρ)ε 0 − ∗t  ∆X t∗  =  µ X +τ X t +Ω 1 2 ′ N(0,I 3 )      Z t∗   µ Z +τ Z t          Y = µ +τ t+X +ε t∗ Y Y t∗ ∗t Under case 1, all µ’s and τ’s were set to zero. Case 2 is the same as case 1 except that µ = 1. Case 3 Y sets µ = µ = µ = τ = 1 and τ = τ = 0. In Ω, the long run variances are set to 1, and we allow Y X Z Y X Z for various combinations of ω ,ω and ω . We discard the first 100 pseudo data points, leaving a small εX εZ XZ 12

sample size of T = 100. Using pseudo sample (Y ,X ,Z ), we conduct (after de-meaning or de-meaning and t t t de-trending under the appropriate case) the bootstrap CADF test with the bootstrap block size set to one6, along with the ADF and Johansen’s λ tests using asymptotic critical values. The numer of leads and max lags in both the ADF and CADF tests are chosen by BIC. We record whether or not the tests reject the null of no cointegration. Repeating this procedure 2,000 times, the empirical rejection rates are obtained, representing the small sample power (where ρ= .8,.9) and size (where ρ= 1). Table 1 contains the size and power results. [Insert Table 1] For the CADF and Johansen tests, power increases with ω . On the other hand, the power of the εX ADF test decreases with ω , and in general becomes significantly lower than the power of the CADF and εX Johansen tests. The power discrepancy between the ADF and CADF test is particularly large when deterministic terms are present (cases 2 and 3), or when ω is large. The ADF test performs well when ω = 0, but still fails εX εX to show higher power than the CADF test in all cases other than case 3 when ρ= .9. The low power of the ADF test in these cases is consistent with previous findings (e.g., Pesavento 2004). In terms of size (i.e., when ρ= 1), the ADF test has good size in almost every case, while the CADF test tends to be under-sized when ω is large or under case 3. εX TheCADFtestalsocompares favorably withtheJohansenλ test(seeJohansen(1988) andJohansen max (1991)). It is particularly advantageous under cases 1 and 2 when ω = 0 or .5, while the Johansen test εX is advantageous under case 3 for ω = .9. In all other instances, the powers of the two tests are similar. εX The Johansen test tends to be over-sized, particularly under case 3, whereas the CADF test under case 3 is typically under-sized. Finally, we observed that there are minor discrepancies in power for CADF test based on different combinations of (ω ,ω ), and the best combination differs depending on the deterministic case, ρ, and εZ XZ ω . εX 6Sincetherearenoserialcorrelationsintheinnovations,blocklengthcanbesmall. Intheempiricalwork,amoderateblock size is chosen. 13

4 Cointegration between Credit Default Swap and Bond Spreads The seller of a CDS contract offers insurance to the buyer of protection against default of an underlying reference entity. In return for protection, the buyer makes regular payments over the life of the contract. Thus, the CDS “spread”7 is often viewed as the price of the credit risk of the underlying reference entity. Abstracting from other factors, an investor who holds a corporate bond for a given entity requires the same premium as the seller of a CDS contract, since both the bond and CDS are exposed to the same default event of thereference entity. Thedeviation between thecorporate bondspread(accounting forthereference rate) and the CDS spread is referred to as the CDS-bond basis. Following previous literature, we usethe CDS spread minus the par asset-swap rate to measurethe basis (seeKocicetal.(2000), Houweling&Vorst(2005), Hulletal.(2004), orseeChoudhry(2006)forexplanation of alternative measures). Typically, anasset-swap consists ofafixedcouponbondandaninterest-rate swap, where the bond holder pays a fixed coupon and receives a floating spread over LIBOR. It can be thought of as measuring the difference between the present value of future cash flows of the bond and the market price of the bond using zero coupon rates (Choudhry 2006). For no arbitrage conditions to hold, the pricing of credit risk for any underlying entity should be the same in both markets, ceteris paribus. As noted by Zhu (2006), under the Duffie (1999) pricing framework, it is possible to replicate a CDS contract synthetically by shorting a maturity matched par fixed coupon bond on the underlying reference entity, and investing the money in a par fixed risk free note. Therefore, the CDS premium equals the bond spread over the reference rate, or zero basis under no arbitrage. If there exists a negative (positive) basis, arbitrage is possible through a negative (positive) basis trade by buying (shorting) the cash bond and buying protection (selling protection) on the CDS contract. Previous literature (see, for instance, Blanco et al. (2005), Zhu (2006), De Wit (2006), Levin et al. (2005), Norden & Weber (2009)) notes theexistence of the basis and establish it is stationary (i.e., CDSand bondspreads are cointegrated) for most firmsduringbenigneconomic periods. We revisit this cointegration relationship during the financial crisis, which we define as July 2007 to July 2009. Our conjecture is that unprecedented levels of volatility, illiquidity, and market uncertainty may impose difficulties for traditional teststofindcointegration between CDSandbondspreads. TheCADFtest, ontheotherhand,mayperform better through the use of covariates to account for some of these factors. 7Theconventionalword“spread”issomewhatmisleading,asCDSspreadsareactuallynotspreadsoveranyreferenceinterest rate. 14

4.1 Covariate Selection During the financial crisis, evaporation of liquidity in the market caused funding costs to rise (see Fontana (2010) and Giglio (2010)). This coupled with surging counterparty credit risk and market volatility drove the basis wider (see Fontana (2010))8. While it is difficult to construct explicit proxies for liquidity and counterparty credit risk, our choice of covariates intends to reflect these risk factors. The first covariate considered is the HFRX Global Hedge Fund Index return (HFRXGL). Hedge funds and banks comprise the largest CDS market participants (see Anderson (2010)). While banks often use the CDSmarkettohedgeagainstloan risk,hedgefundsontheotherhandareimportantspeculatorsintheCDS market, usingCDS contracts as tools toengage in creditarbitrage. Hedgefundsalso hedgeconvertible bond positions, and cover their exposures in the CDO market with CDS contracts. It is argued by Brunnermeier (2009) and Anderson (2010) that hedge funds access to external financing plays an important role in the liquidity of assets for which they participate in a large share of market transactions. The extent and rate at which hedge funds can obtain capital is related to their returns (see Boyson et al. (2008)), and consequently hedge fund performance affects the liquidity of the CDS market. HFRXGL is therefore used as a proxy for market-wide hedge fund performance. The second set of covariates is the S&P 500 returns and percentage change VIX. The S&P 500 returns can be viewed as a proxy of market wide performance as a whole, while the VIX index serves as a measure of implied market volatility. Counterparty credit risk and liquidity risk are often heightened during periods of low equity returns and high market volatility. As such, S&P 500 and VIX returns may be driven by the same factors that affect the CDS-bond basis. We also use the two covariates together in order to see how the CADF test performs when there is more than one covariate. The third covariate is the Libor-OIS spread, which is the difference in the three-month libor and the overnight indexswap (OIS)rate. TheLibor-OISspreadincreases witha perceived risein bankcounterparty credit risk (see Schwarz (2009)). In contrast to CDS contracts, bonds do not have counterparty credit risk. Because counterparty risk is a driver of the basis (see Choudhry (2006)), the Libor-OIS spread is chosen as a covariate. Finally, daily stock returnsfor each firmareusedas afirm-specificcovariate. Drivers of thebasis such as firm credit quality, type of institution, the rate at which a firm can obtain funding, (see Choudhry (2006)) 8Interestingly, traders were unable to take full advantage of the widening basis during the crisis, perhaps due to their own stringent financingor capital constraints. 15

and many other factors unique to each firm may not be captured by systematic covariates. As noted by Aunon-Nerin et al. (2002), declines in stock price are associated with a rise in CDS premium, and should be considered when assessing credit risk. Therefore, we chose stock returns as a covariate. 4.2 Data We start with all firms listed in both the Markit Partners CDS and bond data sets between June 2007 and June 2009. Five year CDS spreads are considered as they are the most actively traded. Quotes selected from Markit Partners are for CDS spreads referencing Senior Unsecured, USD denominated debt with the Modified Restructuring (MR) clause. In order to match the remaining maturity of the bond spread to the five year CDS spreads, a generic bond is constructed for each firm from a pool of outstanding bonds similar to the methodology of Zhu (2006). Using Fixed Income Securities Database (FISD), we constrain our analysis to a list of bonds that meet the following criteria: Bonds must not be puttable, callable, convertible, or reverse convertible. • Bonds must be denominated in USD. • Bonds must be Senior Unsecured. • Bonds must be fixed coupon. • For bonds that meet the stated criteria, the daily bond asset-swap rate, the depth of the quote, and type of quote for each bond is obtained from Markit. For each bond, the depth weighted average of both TRACE and Composite quotes is calculated. We eliminate all bonds with remaining maturity shorter than two and a half years or longer than seven years. There are three possible cases in constructing the generic bond for each firm-day. First, all of the firm’s available bonds have a shorter remaining maturity than 5 years, or all available bonds have a longer remaining maturity than five years. Second, there is only one bond available. Third, there is at least one bond with maturity shorter than five years and at least one bond with maturity longer than five year. In the first case, the generic bond is the bond with the maturity closest to five years. In the second case, the generic bond is the only available bond. In the third case, the generic bond is the linear interpolation of the closets two bonds on each side of the five year maturity, following Zhu (2006). Using ADF unit root tests, we ensure that all covariates and cointegration candidates 16

are stationary by excluding any firms for which one of these series is non-stationary. The final set of firms has bonds with no more than 20 consecutive days of missing quotes. Based on this construction, there are 24 firms in our final list, similar in length and the number of firms to previous studies. Daily data for the S&P 500 index, firm stock price, the VIX index, the Libor-OIS spread, and the HFRXGL index are obtained from either Bloomberg or Datastream.9 For each firm, the weekly average of the daily series of bond asset-swap rates, CDS spreads, and each covariate series is calculated. We take the first difference of the log of each covariate, except for the Libor-OIS spread where we simply take the first difference. 4.3 Results Four sets of CADF tests, one for each set of covariates, is performed under deterministic case 1. Critical values for the CADF test are generated using a 10,000 iteration residual based bootstrap with a block size of 5 (where b = 5) as described in Section 2.3. To benchmark the CADF tests, we also perform ADF and Johansen cointegration tests using asymptotic critical values. Results for each test are shown in Table 2. [Insert Table 2] The Johansen and ADF tests fail to reject the null of no cointegration at the 10% confidence level for 6 and 7 of the 24 firms, respectively. The CADF test using the S&P 500 index and the percentage change in the VIX fails to reject the null of no cointegration for 3 firms, while the CADF test using firm stock returns fails to reject to null of no cointegration for 4 of the 24 firms at the 10% confidence level. Covariates choices of the HFRXGL index and Libor-OIS spread reject the null of no-cointegration for the most firms, with each failing to reject only 2 firms. Results at the 5% confidence level are qualitatively similar. Overall, by using covariates the CADF test is able to find more cointegrating relationships than ADF and Johansen tests during the financial crisis. One possible explanation is that the inclusion of covariates removes part of the heightened volatility that may otherwise mask the cointegrating relationships. The strong performance of the CADF test for all sets of covariates is consistent with Anderson (2010), who concludes that during the crisis, systemic factors and market volatility significantly affected the basis. 9InthecaseofEnterprise,whichisprivatelyheld,theS&P500indexisusedtoproxyitsstockprice. Itshouldalsobenoted GoldmanSachsandLehmanBrothershavesubstantial,butincompletebonddataavailablefortheentiresampleperiod. These two firms are included in thefinal list out of interest. 17

5 Conclusion and Extensions This paper introduces a residual based cointegration test with better power. Inclusion of stationary covariates reduces the noise in the system, providing more precise parameter estimates and higher power tests. The test and its asymptotic distribution under the local-to-unity alternative are derived under a simple model and mild assumptions. Due to the dependence of the asymptotic null distribution on hard to estimate nuisance parameters, we provide a bootstrap framework for obtaining test critical values. Simulations based on the asymptotic results shows that the CADF test has higher power than the ADF test. The magnitude of power improvement depends on the long-run correlation between the cointegration candidates and the stationary covariates. In small samples, Monte Carlo simulations also show that the CADF test has good size and power properties in comparison to the ADF and Johansen tests, under the presence of deterministic trends. The CADF test is used to study the cointegration relationship between CDS and bond spreads for 24 U.S. firms during the financial crisis. Covariates are chosen to proxy various factors that may affect the CDS-bond basis. The use of covariates allows us to uncover cointegration relationships for more firms than the Johansen and ADF tests, possibly because the covariates partially control for the heightened levels of volatility and market uncertainly that may otherwise mask cointegration relationships. 6 Appendix 6.1 Proof of Lemma 1 To prove Lemma 1, some auxiliary results are needed. Define the regressors in the CADF regression as W t,k ≡ ε t 1 ∆X ′ t+k ... ∆X ′ t k Z ′ t+k ... Z ′ t k ∆ε t 1 ... ∆ε t k ′ (2k+1)(n+m)+k+1vector − − − − − h i εb ,W ′ ′ b b ≡ t 1 t,k − h i Define τ, a square weight matrix obf the f same dimension, as τ ≡ diag T − 2k (T − 2k)2 1 In ... (T − 2k) 1 2In (T − 2k)2 1 Im ... (T − 2k)2 1 Im (T − 2k) 1 2 ... (T − 2k) 1 2 (cid:2) (cid:3) From hereon, unless otherwise stated, denotes T t=−k k +1 . P P 18

Lemma 2. Let the data be generated by (1) and (2) and assume that assumptions 1, 2, and 3 hold. Define R τ 1 W W ′ τ 1 ≡ − t,k t,k − Rb ≡ diag (cid:16)X (T − 2k) − 2 (cid:17) ε2 t 1 E W t,k W t ′ ,k − h (cid:16) (cid:17) i P If (8) is true, then, as T b f f → ∞ 1. √T R 1 R 1 = O (1). k || − − − || p 2. 1 τb1 W v = O (1). √k|| − t,k t || p P 3. 1 τ 1 W ς = o (1). √k|| − t,k t,k || p P 4. (ρ 1)τ 1 W (β β)ψ(L)X =o (1). − t,k ′ t p − − P Proof. Conditional on β bβ, the proofs of Lemma 3.1 - 3.3 directly follows from Saikkonen (1991), Lemmas − A4-A6, withtheadditionalintegratedpieceineachcasehandledthesamewayasinhisproofs. Assumption b 1.3 guarantees that the conditioning on β β is asymptotically negligible since β, a functional of W(r) in − the limit, is asymptotically independent of σ( ξ (ρ) ), which is a super set of the sigma algebra for the b { t } ∞t=1 b objects R, W , v and ς . t,k t t,k { } { } { } To prove Lemma 2.4, note that by definition, c ε (β β)ψ(L)X (ρ 1)τ − 1 W t,k (β β) ′ ψ(L)X t = T(T − 2k) t − 1 − ′ t − 1 − X −  T√T c 2k PW b t,k ( b β − β) ′ ψ(L)X t − 1  b  −  The second partition of the vector is clearly o (1). The first pPart can be written as p f b ∞ 1 1 ′ c(β β) ψ ε X X X (β β) − ′ j T(T 2k) t − 1 t − 1 − j − T(T 2k) t − 1 t − 1 − j=0 (cid:18) − − (cid:19) X X X b b Two standard results under assumptions 1 and 2 are 1′ T 1/2X Ω2 W (r) − [Tr] ⇒ XX X T 1/2ε ω 1/2 Jc (r) (13) − [Tr] ⇒ εX εX · see, for instance, Pesavento (2004, 2006, 2007). Using these and the FCLT, for any finite j, as T , → ∞ 1 1 ′ ε X X X (β β) T(T 2k) t − 1 t − 1 − j − T(T 2k) t − 1 t − 1 − − − ω 1/2 Ω2 1′ X W Jc Ω 1 2 ′ W W ′ X Ω2 1 ω 1/2 Ω−2 1 b W W ′ − 1 W Jc ⇒ εX XX X εX − XX X X XX εX XX X X X εX · Z Z · (cid:18)Z (cid:19) (cid:18)Z (cid:19) = 0 19

This and the fact that ψ 0 as j proves the statement. j | | → → ∞ Lemma 1.1 follows directly from (13) and the fact that β β = ( T X X ′ )( T X ε ). − t=1 t t t=1 t t To prove the two statements in Lemma 1.2, re-write the b CADF rPegression (6)Pas ∆ε t = Π ′k W t,k +v t,k . First note that (T 2k)(α α) is the first element of − − b τ(Π Π b)= (R 1 R 1) τ 1 W v +R 1 τ 1 W v k k − − − t,k t,k − − t,k t,k − − (cid:16) X (cid:17) (cid:16) X (cid:17) b b Decompose the first term on the right hand side in the following way: (R 1 R 1) τ 1 W v = (R 1 R 1) τ 1 W v − − − t,k t,k − − − t,k t − − (cid:16) X (cid:17) + (R 1 R 1) (cid:16) τ 1X W ς (cid:17) b b− − − t,k t,k − + (R 1 R 1) (cid:16) τ 1X W (ρ (cid:17) 1)(β β)ψ(L)X b− − − t,k ′ t 1 − − − − (cid:16) X (cid:17) Using Lemma 2, (R 1 R 1) τ 1 W v b = O (k3/2/√T)+o (k3/2/√T b )+o (1). Assumption 3 − − − t,k t,k p p p || − || further restricts all t b hree terms o(cid:0)n thPe right han(cid:1)d side to be o p (1). Given this, by the diagonality of R 1, − 1 (T 2k)(α α) = (T 2k) 2 ε2 − (T 2k) 2 ε v +o (1) − − − − t − 1 − − t − 1 t,k p Lemma = 2.3-2.4 (cid:16) (T 2k) 2 X ε2 (cid:17) − 1(cid:16) (T 2k) 2 X ε v (cid:17) +o (1) b − − bt − 1 − − bt − 1 t p (cid:16) X (cid:17) (cid:16) X (cid:17) Consider the denominator in the last equation: b b (T ω ε− · 2 X 1 k)2 ε2 t − 1 = (T ω ε− · 2 X 1 k)2 1 − (β − β) ′  ε ε2 t − X 1 ε X t − 1 X X ′ ′ t  (β 1 β)  − X − h iX t − 1 t t t − − b b    By Lemma 1.1, b 1 0 1 − (β − β) ′ ⇒ Bc′  0 ω 1/2 Ω−2 1′  h i εX XX · b   This, together with (13), implies that ω ε−X 1 ε2 Bc′ DcBc (14) (T 2 · k) − 2 t − 1 ⇒ 2 − X Now consider the numerator ω ε − ·X 1 ε v . Co b nditional on β β, v = ψ(L)(η (β β)∆X ) is a (T − 2k)2 t − 1 t − t t − − ′ t stationary process. Following PhillipsP& b Park (1986), Lemma 2.1( b e), η t has long run varian b ce given by ω ε · Q 20

and satisfies T 1/2 [Tr] η ω 1/2 W (r). Using this fact, (13) and CMT, − t=1 t ⇒ εQ ε · P ′ (T ω ε− · 2 X 1 k)2 ε t − 1 v t = (T ω ε− · 2 X 1 k)2 1 − (β − β) ′  ε X t − 1 ψ ψ ( ( L L ) ) η η t ε t − 1 X ψ( ∆ L X )∆ ′ X t  (β 1 β)  − X − h iX t t t t − − b ψ(1)Bc′ ω ε − · X 1/2 ω ε 1 b · / Q 2 J ε c X dW ε  J ε c X dW X ′ Bc  b  ⇒  ω ε − X 1/2 ω ε 1/ Q 2 RW X dW ε RW X dW X ′  · · = ψ(1)Bc′ D cBc R R  1 Since ω /ω = (1 R2 )/(1 R2 ), this proves the asymptotic distribution for (T 2k)(α α). For ε · Q ε · X − εQ − εX − − (T 2k)s.e.(α), since − b 1/2 (T b 2k)s.e.(α)= (T 2k) 1 v2 1/2 (T 2k)2ι W W ′ − 1 ι − − − t,k − ′ t,k t,k (cid:18) (cid:19) (cid:16) X (cid:17) (cid:16)X (cid:17) where ι is a (2k + 1)(nb+ m) + k + 1 vectorbwith one in its first element and zero elsewhere, using the consistency of Π , Lemma 2, law of large numbers and Lemma 1.1, as k , k → ∞ (T 2k) 1b v2 = (T 2k) 1 v2+o (1) − − t,k − − t p X X η2 η ∆X ′ 1 b = (T − 2k) − 1 1 (β − β) ′ ψ(L)  ∆X t η ∆X t ∆X t ′  (β β)  h iX t t t t − b    ψ2(1)Bc′ ω ε · Q 0 Bc b ⇒   0 ω I εX n = ψ2(1)ω  Bc′ FBc  εX · Since η and ∆X are uncorrelated at all leads and lags. Finally, t t (T 2k)2ι W W ′ − 1 ι = ιR 1ι − ′ t,k t,k ′ − (cid:16)X (cid:17) Lemm = a1.1 (bT − 2k) − 2 ε2 t − 1 − 1 +o p (1) (14) ω (cid:16) 1 (Bc′ DcB X c) 1 (cid:17) ⇒ ε−X 2 −b · This proves Lemma 1.2. 6.2 Figures and Tables 21

1 0.8 0.6 0.4 0.2 0.5 0.5 0.4 0.3 0 0.1 0.2 0 −0.1 −0.2 −0.3 −0.5 −0.5 −0.4 corr(x,z) corr(e,z) )Q,e(rroc fo erauqS 0.35 0.3 0.25 0.2 0.15 0.1 0.5 0.5 0.4 0.3 0 0.1 0.2 0 −0.1 −0.2 −0.3 −0.5 −0.5 −0.4 corr(x,z) corr(e,z) 5− = c rewoP 0.7 0.6 0.5 0.4 0.5 0.5 0.4 0.3 0 0.1 0.2 0 −0.1 −0.2 −0.3 −0.5 −0.5 −0.4 corr(x,z) corr(e,z) 01− = c rewoP 0.95 0.9 0.85 0.8 0.75 0.5 0.5 0.4 0.3 0 0.1 0.2 0 −0.1 −0.2 −0.3 −0.5 −0.5 −0.4 corr(x,z) corr(e,z) 02− = c rewoP Figure 1: Asymptotic Power of CADF test when ω = 0.5. εX − 22

1 0.8 0.6 0.4 0.2 0 1 0.5 0.8 0.6 0 0.2 0.4 0 −0.5 −0.4 −0.2 −0.6 −1 −0.8 corr(x,z) corr(e,z) )Q,e(rroc fo erauqS 0.7 0.6 0.5 0.4 0.3 0.2 1 0.5 0.8 0.6 0 0.2 0.4 0 −0.5 −0.4 −0.2 −0.6 −1 −0.8 corr(x,z) corr(e,z) 5− = c rewoP 0.9 0.8 0.7 0.6 0.5 1 0.5 0.8 0.6 0 0.2 0.4 0 −0.5 −0.4 −0.2 −0.6 −1 −0.8 corr(x,z) corr(e,z) 01− = c rewoP 1 0.95 0.9 0.85 1 0.5 0.8 0.6 0 0.2 0.4 0 −0.5 −0.4 −0.2 −0.6 −1 −0.8 corr(x,z) corr(e,z) 02− = c rewoP Figure 2: Asymptotic Power of CADF test when ω = 0. εX 23

1 0.8 0.6 0.4 0.2 0.5 0.5 0.4 0.3 0 0.1 0.2 0 −0.1 −0.2 −0.3 −0.5 −0.5 −0.4 corr(x,z) corr(e,z) )Q,e(rroc fo erauqS 0.25 0.2 0.15 0.1 0.05 0.5 0.5 0.4 0.3 0 0.1 0.2 0 −0.1 −0.2 −0.3 −0.5 −0.5 −0.4 corr(x,z) corr(e,z) 5− = c rewoP 0.7 0.6 0.5 0.4 0.3 0.2 0.5 0.5 0.4 0.3 0 0.1 0.2 0 −0.1 −0.2 −0.3 −0.5 −0.5 −0.4 corr(x,z) corr(e,z) 01− = c rewoP 0.9 0.85 0.8 0.75 0.7 0.65 0.5 0.5 0.4 0.3 0 0.1 0.2 0 −0.1 −0.2 −0.3 −0.5 −0.5 −0.4 corr(x,z) corr(e,z) 02− = c rewoP Figure 3: Asymptotic Power of CADF test when ω = 0.5. εX 24

Table 1: Small Sample Simulation Results Case 1 Case 2 Case 3 ω CADF : (ω , ω ) ρ=1 ρ=.9 ρ=.8 ρ=1 ρ=.9 ρ=.8 ρ=1 ρ=.9 ρ=.8 εX εZ XZ ADF 5.50% 38.10% 83.60% 5.90% 21.40% 61.60% 4.25% 14.50% 44.20% Johansen 5.20% 21.70% 61.40% 7.60% 15.60% 45.60% 8.70% 15.30% 35.40% 0 CADF(.5,.4) 5.55% 53.15% 92.25% 6.35% 43.00% 85.40% 2.60% 12.70% 49.60% CADF(.2,-.2) 5.85% 46.90% 86.15% 7.50% 36.25% 79.50% 3.85% 12.75% 40.00% CADF(0,.4) 4.95% 51.25% 90.10% 6.80% 40.15% 83.60% 2.85% 13.75% 47.25% ADF 5.60% 33.00% 81.45% 4.80% 17.60% 55.40% 4.50% 11.00% 33.90% Johansen 6.00% 32.90% 78.85% 7.60% 22.60% 63.90% 8.30% 19.70% 50.10% .5 CADF(.5,.4) 5.20% 57.45% 93.85% 7.15% 40.05% 87.95% 3.75% 15.15% 47.70% CADF(.2,-.2) 5.65% 63.30% 95.70% 6.90% 47.65% 90.05% 2.60% 14.05% 51.70% CADF(0,.4) 4.40% 65.35% 95.90% 6.25% 45.75% 91.55% 2.60% 17.15% 57.05% ADF 5.50% 29.00% 78.20% 5.00% 8.00% 36.80% 4.10% 2.00% 12.10% Johansen 5.60% 96.30% 100.00% 4.50% 86.10% 99.90% 8.40% 69.10% 99.10% .9 CADF(.5,.4) 6.40% 95.60% 100.00% 7.80% 74.20% 99.65% 3.90% 22.30% 83.45% CADF(.2,-.2) 2.55% 98.75% 99.80% 1.30% 87.05% 97.20% 0.30% 32.20% 68.75% CADF(0,.4) 2.40% 97.95% 98.95% 1.40% 83.65% 93.45% 0.15% 31.10% 60.45% Note: Details on the simulation setup are described in Section 2.3. Numbers are empirical rejection frequencies from 2,000 Monte Carlosimulations. Samplesize ineach simulationis setto 100. Deterministiccases 1, 2, and 3areas describedinSection 2.3and thissection. 25

Table 2: Application Results and Test Statistics Firm Johansen ADF CADF CADF CADF CADF λ (HFRXGL) (S&P500,VIX) (StockRtn.) (Libor-OIS) max AIG 23.17∗∗∗ −4.74∗∗∗ −2.32∗ −2.06∗∗ −3.28∗∗ −2.39∗∗ ALL 26.38∗∗∗ −4.26∗∗∗ −4.40∗∗∗ −4.64∗∗∗ −4.36∗∗∗ −3.75∗∗∗ AXP 16.09∗∗∗ −2.67∗ −3.10∗∗∗ −3.55∗∗∗ −3.66∗∗∗ −2.65∗∗ BA 17.52∗∗∗ −4.34∗∗∗ −3.85∗∗∗ −3.82∗∗∗ −4.07∗∗∗ −3.77∗∗∗ CAT 11.70∗∗ −3.04∗∗ −3.13∗∗∗ −2.75∗∗∗ −2.54∗∗ −2.43∗∗ CIT 17.21∗∗∗ −3.83∗∗∗ −2.05∗∗ −2.15∗∗ −2.37∗∗ −2.91∗∗∗ CL 28.47∗∗∗ −4.82∗∗∗ −5.05∗∗∗ −5.05∗∗∗ −5.30∗∗∗ −4.94∗∗∗ DE 15.97∗∗∗ −3.61∗∗∗ −3.67∗∗∗ −3.90∗∗∗ −3.74∗∗∗ −3.21∗∗∗ DOW 15.36∗∗∗ −3.44∗∗∗ −3.86∗∗∗ −3.28∗∗∗ −3.05∗∗∗ −3.37∗∗∗ ED 7.80 −2.42 −2.20∗∗ −2.20∗∗ −2.09∗∗ −1.96∗ ENTERP 7.08 −0.76 −1.74 −1.95 −1.94 −1.29∗ F 20.58∗∗∗ −2.34 −2.91∗∗ −3.53∗∗∗ −3.14∗∗ −2.86∗∗ GE 7.65 −2.73∗ −1.78∗∗ −2.24∗∗ −2.21∗∗ −2.11∗∗ GMAC 21.90∗∗∗ −2.92∗∗ −2.34∗∗ −2.98∗∗ −1.82∗ −2.71∗∗∗ GS 10.48∗ −2.61∗ −2.92∗∗ −2.98∗∗ −2.95∗∗ −2.86∗∗ HSBC 9.68∗ −3.08∗∗ −3.80∗∗∗ −3.23∗∗∗ −1.88∗ −3.01∗∗∗ KEY 16.55∗∗∗ −3.33∗∗ −3.08∗∗ −3.24∗∗∗ −3.00∗∗ −2.71∗∗ KIM 11.85∗∗ −2.14 −3.24∗∗∗ −3.29∗∗∗ −3.09∗∗ −3.17∗∗∗ LEH 4.31 −1.83 −2.15∗ −2.45∗∗ −1.93 −2.16∗ MER 4.80 −2.31 −0.53 −0.94 −1.04 −0.14 NRUC 2.83 −1.54 −1.85∗ −1.24 −0.94 −.67 PRU 26.83∗∗∗ −4.27∗∗∗ −4.74∗∗∗ −4.81∗∗∗ −4.62∗∗∗ −4.65∗∗∗ SEAR 15.96∗∗∗ −3.04∗∗ −3.51∗∗ −3.89∗∗ −3.52∗∗∗ −3.58∗∗∗ WFC 13.22∗∗ -3.41∗∗ −3.20∗∗∗ −3.58∗∗∗ −4.24∗∗∗ −3.69∗∗∗ # FailtoReject (10%) 6 7 2 3 4 2 # FailtoReject (5%) 8 10 5 5 6 5 Notes: 1: Numberspresentedareteststatistics. 2: ***,**,and*correspondtorejectionsatthe1,5,and10percentconfidencelevels,respectively. 3: TheCADFtestisrununderdeterministiccase1,asdescribedinSection2.3,withablocksizeof5. 26

References Amara, J. & Papell, D. (2006), ‘Testing for purchasing power parity using stationary covariates’, Applied Financial Economics 16(1), 29–39. Anderson, M. (2010), ‘Contagion and Excess Correlation in Credit Default Swaps’. Aunon-Nerin, D., Cossin, D., Hricko, T. & Huang, Z. (2002), ‘Exploring for the determinants of credit risk in credit default swap transaction data: Is fixed-income markets’ information sufficient to evaluate credit risk?’. Badillo, R., Belaire-Franch, J. & Reverte, C. (2010), ‘Residual-based block bootstrap for cointegration testing’, Applied Economics Letters 17(10), 999–1003. Blanco, R., Brennan, S. & Marsh, I. (2005), ‘An Empirical Analysis of the Dynamic Relation between Investment-Grade Bonds and Credit Default Swaps’, The Journal of Finance 60(5), 2255–2281. Boyson, N., Stahel, C. & Stulz, R. (2008), ‘Hedge fund contagion and liquidity’, NBER Working Paper . Brunnermeier, M. (2009), ‘Deciphering the liquidity and credit crunch 2007-2008’, Journal of Economic Perspectives 23(1), 77–100. Choudhry, M. (2006), The credit default swap basis, Bloomberg. De Wit, J. (2006), Exploring the CDS-bond basis, National Bank of Belgium. Duffie, D. (1999), ‘Credit swap valuation’, Financial Analysts Journal 55(1), 73–87. Elliott, G.&Jansson,M.(2003), ‘Testingforunitrootswithstationary covariates’, Journal of Econometrics 115(1), 75–89. Elliott, G. & Pesavento, E. (2009), ‘Testing the null of no cointegration when covariates are known to have a unit root’, Econometric Theory 25(06), 1829–1850. Engle, R. & Granger, C. (1987), ‘Co-integration and error correction: representation, estimation, and testing’, Econometrica 55(2), 251–276. Fontana, A. (2010), ‘The persistent negative CDS-bond basis during the 2007/08 financial crisis’, Working Papers . 27

Giglio, S. (2010), ‘Credit Default Swap Spreads and Systemic Financial Risk’. Hamilton, J. (1994), Time series analysis, Princeton Univ Pr. Hansen, B. (1995), ‘Rethinking the univariate approach to unit root testing: Using covariates to increase power’, Econometric Theory 11(05), 1148–1171. Houweling, P. & Vorst, T. (2005), ‘Pricing default swaps: Empirical evidence’, Journal of International Money and Finance 24(8), 1200–1225. Hull, J., Predescu, M. & White, A. (2004), ‘The relationship between credit default swap spreads, bond yields, and credit rating announcements’, Journal of Banking & Finance 28(11), 2789–2811. Jansson, M. (2004), ‘Stationarity testing with covariates’, Econometric Theory 20(01), 56–94. Johansen,S.(1988), ‘Statistical analysisofcointegration vectors’, Journal of economic dynamics and control 12(2-3), 231–254. Johansen, S. (1991), ‘Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models’, Econometrica: Journal of the Econometric Society 59(6), 1551–1580. Kocic, A., Quintos, C. & Yared, F. (2000), ‘Identifying the benchmark security in a multifactor spread environment’, Lehman Brothers Fixed Income Derivatives Research . Levin, A., Perli, R. & Zakrajsek, E. (2005), The determinants of market frictions in the corporate market, in ‘Fourth Joint Central Bank Research Conference’, Citeseer. Ng, S. & Perron, P. (1995), ‘Unit Root Tests in ARMA Models with Data-Dependent Methods for the Selection of the Truncation Lag.’, Journal of the American Statistical Association 90(429). Norden, L. & Weber, M. (2009), ‘The Co-movement of Credit Default Swap, Bond and Stock Markets: an Empirical Analysis’, European financial management 15(3), 529–562. Paparoditis, E. & Politis, D. (2003), ‘Residual-Based Block Bootstrap for Unit Root Testing’, Econometrica 71(3), 813–855. Pesavento, E. (2004), ‘Analytical evaluation of the power of tests for the absence of cointegration’, Journal of Econometrics 122(2), 349–384. 28

Pesavento, E. (2007), ‘Residuals-based tests for the null of no-cointegration: an Analytical comparison’, Journal of Time Series Analysis 28(1), 111–137. Pesavento, E. & of Economics, E. U. I. D. (2006), Near-Optimal Unit Root Tests with Stationary Covariates with Better Finite Sample Size, European University Institute. Phillips, P. (1987), ‘Towards a unified asymptotic theory for autoregression’, Biometrika 74(3), 535. Phillips, P. C. & Park, J. Y. (1986), ‘Statistical inference in regressions with integrated processes: Part 1’. Phillips, P. & Durlauf, S. (1986), ‘Multiple time series regression with integrated processes’, The Review of Economic Studies 53(4), 473–495. Phillips, P. & Ouliaris, S. (1990), ‘Asymptotic properties of residual based tests for cointegration’, Econometrica: Journal of the Econometric Society 58(1), 165–193. Phillips, P. & Solo, V. (1992), ‘Asymptotics for linear processes’, The Annals of Statistics 20(2), 971–1001. Rahbek, A. & Mosconi, R. (1999), ‘Cointegration rank inference with stationary regressors in VAR models’, Econometrics Journal 2(1), 76–91. Saikkonen, P. (1991), ‘Asymptotically efficient estimation of cointegration regressions’, Econometric Theory 7(01), 1–21. Schwarz, K. (2009), ‘Mind the gap: disentangling credit and liquidity in risk spreads’. Seo, B. (1998), ‘Statistical inference on cointegration rank in error correction models with stationary covariates’, Journal of Econometrics 85(2), 339–385. Zhu, H. (2006), ‘An empirical comparison of credit spreads between the bond market and the credit default swap market’, Journal of Financial Services Research 29(3), 211–235. 29

Cite this document
APA
Jason J. Wu and Aaron L. Game (2011). Cointegration Test with Stationary Covariates and the CDS-Bond Basis during the Financial Crisis (FEDS 2011-18). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2011-18
BibTeX
@techreport{wtfs_feds_2011_18,
  author = {Jason J. Wu and Aaron L. Game},
  title = {Cointegration Test with Stationary Covariates and the CDS-Bond Basis during the Financial Crisis},
  type = {Finance and Economics Discussion Series},
  number = {2011-18},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2011},
  url = {https://whenthefedspeaks.com/doc/feds_2011-18},
  abstract = {This paper proposes a residual based cointegration test with improved power. Based on the idea of Hansen (1995) and Elliott & Jansson (2003) in the unit root testing case, stationary covariates are used to improve the power of the residual based Augmented Dickey Fuller (ADF) test. The asymptotic null distribution contains difficult to estimate nuisance parameters for which there is no obvious method of estimation, therefore we propose a bootstrap methodology to obtain test critical values. Local-to-unity asymptotics and Monte Carlo simulations are used to evaluate the power of the test in large and small samples, respectively. These exercises show that the addition of covariates increases power relative to the ADF and Johansen tests, and that the power depends on the long-run correlation between the covariates and the cointegration candidates. The new test is used to test for cointegration between Credit Default Swap (CDS) and corporate bond spreads for a panel of U.S. firms during the 2007-2009 financial crisis. The new test finds stronger evidence for cointegration between the two spreads for more firms, relative to ADF and Johansen tests.},
}