feds · June 30, 2017

Closed-Form Estimation of Finite-Order ARCH Models: Asymptotic Theory and Finite-Sample Performance

Abstract

Covariances between contemporaneous squared values and lagged levels form the basis for closed-form instrumental variables estimators of ARCH processes. These simple estimators rely on asymmetry for identification (either in the model's rescaled errors or the conditional variance function) and apply to threshold ARCH(1) and ARCH(p) with p < â processes. Limit theory for these estimators is established in the case where the ARCH processes are regularly varying with a well-defined third and sixth moment of the raw returns and rescaled errors, respectively. The resulting limits are highly non-normal in empirically relevant cases, with slow rates of convergence relative to the thin-tailed ân -case. Nevertheless, Monte Carlo studies of a heavy-tailed ARCH(1) process show the simple IV estimator to outperform standard QMLE in (relatively) small samples when the data are (heavily) skewed. Methods for determining confidence intervals for the ARCH estimates are also discussed.

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Closed-Form Estimation of Finite-Order ARCH Models: Asymptotic Theory and Finite-Sample Performance Todd Prono 2016-083 Please cite this paper as: Prono, Todd (2016). “Closed-Form Estimation of Finite-Order ARCH Models: Asymptotic Theory and Finite-Sample Performance,” Finance and Economics Discussion Series 2016-083. Washington: Board of Governors of the Federal Reserve System, https://doi.org/10.17016/FEDS.2016.083r1. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Closed-Form Estimation of Finite-Order ARCH Models: 1 Asymptotic Theory and Finite-Sample Performance Todd Prono2 This Version: July 2017 Abstract Strong consistency and weak distributional convergence to highly non-Gaussian limits are established for closed-form, two stage least squares (TSLS) estimators for a class of ARCH(p) models. Conditions for these results include (relatively) mild moment existence criteria that are supported empirically by many (high frequency) (cid:133)nancial returns. These conditions are not shared by competing closed-form estimators like OLS. Identi(cid:133)cation of these TSLS estimators depends on asymmetry, either in the model(cid:146)s rescaled errors or in the conditional variance function. Monte Carlo studies reveal TSLS estimation to sizably outperform quasi maximum likelihood estimation in (relatively) small samples. This outperformance is most pronounced when returns are heavily skewed. Keywords: ARCH, closed form, two stage least squares, instrumental variables, heavy tails, regular variation. JEL codes: C13, C22, C58. 1I owe thanks to Dennis Kristensen for detailed comments on an earlier version of this paper, to Blazej Mazur, Travis Nesmith, Dong Hwan Oh, and Olivier Scaillet, as well as to participants at the 2016 Meeting of the Midwest Econometrics Group, the 25th Symposium of the Society of Nonlinear Dynamics and Econometrics, and the 3rd International Workshop on FinancialMarketsand NonlinearDynamicsforhelpfulcommentsand discussions. Theviewsexpressed in thispaperarethose ofthe author and do not necessarily re(cid:135)ect those ofthe FederalReserve Board. 2FederalReserve Board. (202) 973-6955,todd.a.prono@frb.gov. 1

1.1 Introduction Since being introduced by Engle (1982), autoregressive conditional heteroskedastic (ARCH) models have become the workhorse of conditional variance modeling in (cid:133)nancial economics. The original model has been extended and generalized in various ways (see; e.g., Bollerslev et al., 1992). The most popular estimator for these types of models is quasi-maximum likelihood (QML). The asymptotic properties of QML estimation of the linear ARCH model (Engle, 1982) are well studied (see; e.g., Weiss, 1986, and more recently, Jensen and Rahbek, 2004, and Kristensen and Rahbek, 2005). However, OLS estimation of the linear ARCH model is also possible, with the accompanying advantage over QMLE being a closed-form solution. Weiss (1986) is (among) the (cid:133)rst to consider the asymptotic properties of OLS estimation of the linear ARCH model under very restrictive moment existence criteria, while Francq and Zako(cid:239)an (2000) provide important generalizations under comparable conditions. Since the linear ARCH model implies a set of Yule- Walker equations for the squared returns (see; e.g., Mikosch and Straumann, 2002), the Whittle estimatorproposedbyGiratisandRobinson(2001),theasymptoticpropertiesforwhichtheyderive under conditions comparable to Francq and Zako(cid:239)an (2000), also (cid:133)ts within the paradigm of closedform, linear ARCH estimators, because it is asymptotically equivalent to Yule-Walker estimation. More recently, Kristensen and Linton (2006) provide asymptotic theory that relaxes the restrictive conditions in Weiss (1986) and Francq and Zako(cid:239)an (2000) for establishing the distributional limit (now highly non-Gaussian) and rate of convergence of the OLS estimator for the linear ARCH model, while Mikosch and Straumann (2002) make an analogous contribution (with the same, qualitative, form for the distributional limit as in Kristensen and Linton, 2006) to the asymptotic properties of the Giratis and Robinson (2001) Whittle estimator. A necessary condition underlying even these more recent works, however, is a well-de(cid:133)ned fourth moment for the (raw) returns being modeled. Unfortunately, and in many instances, this condition appears to be violated empirically (see; e.g., Loretan and Phillips, 1994, Embrechts, Kl(cid:252)ppelberg, and Mikosch, 1997, and Hill and Renault, 2012). In light of an ill-de(cid:133)ned fourth moment for many of the (cid:133)nancial returns to which ARCH-type models are commonly applied, this paper proposes closed-form, two stage least squares (TSLS) estimators for a class of ARCH(p) models that are comparable to Francq and Zako(cid:239)an (2000), but involve di⁄erent instruments. Strong consistency and weak distributional convergence to highly non-Gaussian limits comparable (qualitatively) to those discovered in Mikosch and Straumann 2

(2002), Kristensen and Linton (2006), and Vaynman and Beare (2014) are established for these estimators, including under the condition where the fourth moment of the returns being modeled is ill-de(cid:133)ned. These closed-form, TSLS estimators apply to linear ARCH models and the threshold ARCH model of Glosten, Jagannathan, and Runkle (1993). To my knowledge, no attention is paid in the literature to establishing the asymptotic properties of closed-form estimators for the threshold ARCH model. Identi(cid:133)cation of the proposed TSLS estimators links to asymmetry, either in the distribution of rescalederrorsinthelinearARCHmodelorinthespeci(cid:133)cationoftheconditionalvariancefunction in the threshold ARCH model. The large-sample properties of these estimators are derived by extending results in Davis and Mikosch (1998) and Mikosch Sta…rica…(2000) to include this necessary asymmetry. Relative to estimators for ARCH(p) models that are asymptotically normal with a convergence rate equal to the square root of the sample size, these TSLS estimators converge (quite a bit) moreslowly (especially, in empirically-relevant cases) and to a distributional limitthat, while stable, lacks a well-de(cid:133)ned variance. Not surprising, then, Monte Carlo experiments reveal QML estimation of the linear ARCH model to be (quite a bit) more e¢ cient than TSLS estimation, in large samples. What is surprising, though, is that Monte Carlo experiments also reveal TSLS estimation of the linear ARCH model to be (quite a bit) more e¢ cient than QML estimation, in smallsamples, whenthereturndistributionis(heavily)skewed. Thislatter(cid:133)ndingevidencesTSLS estimators(aboveandbeyondtheirrelativesimplicity)topossessimproved(cid:133)nite-sampleproperties over the QMLE alternative. 1.2 Background and Motivation Consider the ARCH(1) model of Y = (cid:27) (cid:15) ; (cid:27)2 = !+(cid:11)Y2 ; (cid:15) i:i:d: D(0; 1); t t t t t 1 t (cid:0) (cid:24) where D is some zero-mean, unit-variance distribution. For this model, it is well known that Y2 = !+(cid:11)Y2 +W ; (1) t t 1 t (cid:0) where W isamartingaledi⁄erencesequence(MDS).Inotherwords, theARCH(1)modelimplies t f g an AR(1) model for the second-order return sequence. Given (1), it is apparent that OLS can be 3

used to estimate the parameters of the model. Let (cid:13) E Y2 , and X Y2 (cid:13). From (1), given t t t (cid:17) (cid:17) (cid:0) su¢ cient regulatory conditions, it also follows that (cid:0) (cid:1) E X X = (cid:11)mE X2 ; m 1; (2) t t m t (cid:0) (cid:21) (cid:0) (cid:1) (cid:0) (cid:1) from which it is apparent that consistency of OLS requires E Y4 < . Based on results from t 1 Kuersteiner (2002), Guo and Phillips (2001) consider improvin(cid:0)g th(cid:1)e e¢ ciency of OLS by de(cid:133)ning as an instrument for Y2 an in(cid:133)nite, weighted sum of past W for i 0. Given (2), either OLS t (cid:0) 1 t (cid:0) 1 (cid:0) i (cid:21) appliedto(1)ortheinstrumentalvariables(IV)estimatorofGuoandPhillips(2001)isbasedupon the second-order autocovariances of returns.3 In instances where D is heavy-tailed relative to the normal, these estimators might prove favorable to the QMLE, since the latter is known to underrepresent the second-order autocovariances, in these cases (see; e.g., Jacquier, Polson, and Rossi, 1994 and Baillie and Chung, 1999). For given values of ! and (cid:11), however, there is also (certainly) a limit to how heavy-tailed D can be, while still preserving a well-de(cid:133)ned fourth moment for Y . t Empirical evidence suggests exceedance of this limit for many (cid:133)nancial return series. Figure 1 plots Hill (1975) tail index estimates together with 95% con(cid:133)dence bands from Hill (2010, Theorem 4) for three major currency returns (all measured relative the USD) sampled at 20-minute intervals. Recalling that a tail index (cid:20) > 0 for a regularly varying random variable is a moment supremum; i.e., if Y is regularly varying, then E Y p < if and only if p < (cid:20) (see; t t j j 1 e.g., Resnick, 1987, for an introduction to regular variation), empirical evidence does not (strongly) support well-de(cid:133)ned fourth moments for these currency returns. To the contrary, for substantial sectionsofallthreeplots, eventheuppercon(cid:133)dencebandisinsideof4. Moreover, currencyreturns sampledatthis(very)highfrequencyareknowntodisplayrelativelylessvolatilitypersistence(and, hence, relatively thinner tails) then currency returns measured at lower frequencies (like hourly or daily)orequityreturnsmeasuredatanyfrequencyequaltoorhigherthandaily(see; e.g.,Anderson andBollerslev, 1997). Overallthen, itisclearthatstandardpnasymptoticsforOLSappliedto(1) are inconsistent with empirical (cid:133)ndings, since those asymptotics require E Y8 < . Moreover, t 1 it is (at least) questionable whether the OLS estimator is even consistent. (cid:0) (cid:1) While not o⁄ering much to support well-de(cid:133)ned fourth moments, Figure 1 does tend to support well-de(cid:133)ned third moments. Notice that the tail index estimates for all three returns stay close 3ThissamestatementalsoappliestotheTSLSestimatorofFrancqandZako(cid:239)an(2000)andtheWhittleestimator of Giratis and Robinson (2001). 4

to 3, and the upper con(cid:133)dence bands always cover (more than) 3. Loretan and Phillips (1994) and Jondeau and Rockinger (2003) present comparable (cid:133)ndings for daily FX and equity returns. Cont and Kan (2011, Property 3) report (cid:20) (3; 6) for daily, credit default swap spread returns. 2 Bouchaud and Potters (2003, p. 102) state that "there is now good evidence that on short time scales, and using long time series, the tail index for stocks is around 3 on several markets (U.S., Japan, Germany)." For the three currency returns in Figure 1 (JPY, EUR, and CHF), skewness is 0:32, 0:20, and (cid:0) 0:42,respectively,eachofwhichishighlysigni(cid:133)cantagainstanullofnormalitygiventhe,respective, sample sizes. Table 1 illustrates additional cases where, not only is the evidenced skewness highly signi(cid:133)cant, but also quite large in absolute terms. In general, skewness in (high frequency) (cid:133)nancial returns is prevalent enough to be considered a stylized fact, along with heavy tails. This stylized fact can be used to identify a closed-form IV estimator for the ARCH(1) model. Consider using Z t 1 = Y t 1 ;:::;Y t h 0; h < ; (cid:0) (cid:0) (cid:0) 1 (cid:0) (cid:1) as a vector of instruments for Y2 in (1). Analogous to (2), it follow that, given regulatory t 1 (cid:0) conditions, E X Y = (cid:11)mE Y3 ; (3) t t m t (cid:0) (cid:0) (cid:1) (cid:0) (cid:1) which links a set of cross-order covariances to the third moment of Y . If E Y3 = 0, as argued t t 6 above, then Z can be shown as a valid set of instruments for Y2 . In this ca(cid:0)se, Z(cid:1) can be used t 1 t 1 t 1 (cid:0) (cid:0) (cid:0) in a TSLS estimator for (1), where consistency of this estimator requires E Y3 < , a condition t 1 that is now consistent with empirical (cid:133)ndings. (cid:0) (cid:1) Relying on skewness to de(cid:133)ne valid instruments is not new (see; e.g., Lewbel, 1997). The bene(cid:133)t of doing so when estimating the ARCH(1) model is analogous to basing an estimator on (2); speci(cid:133)cally, a TSLS estimator based on Z chooses an (cid:11) that best matches (3). By being (cid:133)t t 1 (cid:0) to a particular empirical feature of the data (in this instance, a set of cross-order covariances that map to skewness in the underlying returns), this estimator might, also, perform well against the QMLE, in instances where this feature strays from what is predicted under normality. The (relatively) heavy-tailed asymptotics discussed in Kristensen and Linton (2006) that apply to the OLS estimator for the ARCH(1) model, rely on the large-sample properties of the sample, second-order autocovariances in (2) that are developed in Davis and Mikosch (1998). The (rela- 5

tively) heavier-tailed asymptotics that apply to the proposed TSLS estimator extend these results to the sample, cross-order covariances in (3). Doing so requires the return sequence Y to be t f g regularly varying. While many ARCH-type processes can be shown to be regularly varying (see; Basrak, Davis, and Mikosch, 2002), an added wrinkle in the present context is the requirement that Y be skewed. Adapting this requirement to a demonstration of regular variation for ARCH(1) t f g and threshold ARCH(1) processes is Lemma 3 in the Supplemental Appendix. The same logic behind the TSLS estimator described above extends to TSLS estimation of a threshold ARCH(1) model, with the interesting additional feature that E Y3 = 0 is no longer t 6 necessary for identi(cid:133)cation. Generally, threshold ARCH models posit tha(cid:0)t to(cid:1)morrow(cid:146)s variance depends on the sign of today(cid:146)s return. This speci(cid:133)cation requires separate ARCH e⁄ects for positive and negative returns. Non-zero skewness in positive and negative returns occurs naturally. As a consequence, TSLS estimation of a threshold ARCH(1) model bases identi(cid:133)cation on the asymmetric speci(cid:133)cation of the conditional variance function. 2.1. The ARCH(1) Case For the sequence f Y t gt 2 Z , let zt be the associated (cid:27)-algebra where zt (cid:0) 1 (cid:18) zt (cid:18) (cid:1)(cid:1)(cid:1) (cid:18) z. Consider the model Y = (cid:27) (cid:15) ; (cid:27)2 = ! +(cid:11) Y2 ; (4) t t t t 0 0 t 1 (cid:0) where ! denotes the true value, ! any one of a set of possible values, ! an estimate, and parallel 0 de(cid:133)nitions hold for all other parameter values. From (4), b (cid:27)2 = ! +(cid:27)2 A ; A = (cid:11) (cid:15)2 ; (5) t 0 t 1 t t 0 t 1 (cid:0) (cid:0) which characterizes (cid:27)2 as a stochastic recurrence equation (SRE). Most ARCH-type processes can t be characterized as SREs and, as such, shown to be regularly varying (see Basrak, Davis, and Mikosch, 2002). Speci(cid:133)cally, for Y = Y ; :::; Y , where, for short hand, t t t+h (cid:16) (cid:17) Y = Y = Y ; :::; Y ; 0 0 h (cid:16) (cid:17) Y is regularly varying in R h+1 with tail index (cid:20) 0 , if there exists a sequence of constants f a n g such that nP ( Y > a ) 1; n ; n j j (cid:0)! ! 1 6

where Y = max Y ; m j j m=0;:::;hj j a = n1=(cid:20) 0L(n); n and L( ) is slowly-varying at . (cid:1) 1 That Y is regularly varying is demonstrated in Davis and Mikosch (1998, Lemma A.1) and Mikosch and Sta…rica…(2000, Theorem 2.3), but only in instances where D is symmetric (see Remark R2intheSupplementalAppendix). RegularvariationofY canfollowminusanyneedforsymmetry in D (see Lemma 3 in the Supplemental Appendix) and applies to both the ARCH(1) case in (4) as well as the threshold ARCH(1) case of (21), making the result compatible with Assumption A3 below and complementary to Basrak, Davis and Mikosch (2002, Corollary 3.5 (B)). ASSUMPTION A1: (i) The sequence (cid:15) is i:i:d: D(0; 1) for some distribution D with f t gt Z 2 unbounded support. (ii) E (cid:15) j = c < for j > 3. j t j j 1 Under A1(i), (4) is the strong ARCH(1) model of Drost and Nijman (1993). Specifying the rescalederrorsasi.i.d. isnecessaryforestablishingthedistributionallimitsandratesofconvergence of the proposed closed-form estimators. Consistency of these estimators, however, continues to follow under the semi-strong de(cid:133)nition of ARCH (see Prono, 2014), where (weak) dependence in the higher moments of the model(cid:146)s rescaled errors is allowed. Under A1(ii), (cid:15) is relatively light-tailed, meaning that heavy-tailed features of Y stem t t f g f g from (cid:27) . It is this distinction between the tail properties of (cid:27) and (cid:15) that enables Y to t t t t f g f g f g f g be established as regularly varying. Given A1(ii), up to the jth moment of the model(cid:146)s rescaled errors is well-de(cid:133)ned. Kristensen and Rahbek (2005) assume j = 4, while Hill and Renault (2012) present empirical (cid:133)ndings that support j = 4. ASSUMPTION A2: For a d 1 vector (cid:11) of ARCH coe¢ cients, (cid:2) (cid:2) = (cid:18) = (!;(cid:11)) R d+1 ! !; (cid:11) i 0 2 j (cid:21) (cid:21) n o for some ! > 0 and, at least, one (cid:11) > 0. i A2 heralds from Kristensen and Rahbek (2005). For the ARCH(1) case, d = 1. Notice that (cid:2) is noncompact and ! is bounded below by a nonzero value. ASSUMPTION A3: E (cid:15)3 = c = 0: t (cid:3)3 6 (cid:0) (cid:1) 7

UnderA3,D inA1(i)isanasymmetricdistribution. Thedirectionofskewnessisunconstrained. Skewness in (high frequency) returns is considered a stylized fact. This fact is exogenous to the modelunderconsideration,yet(aswillbeshown)canbeharnessedtoidentifythemodel. Examples whereanasymmetricDisusedtoaccountforskewnessinreturnsincludeHansen(1994)andHarvey and Siddique (1999). ASSUMPTION A4: E A3=2 < 1. (cid:0) (cid:1) A4 is su¢ cient for Y to have a strictly stationary solution (see Mikosch, 1999, Corollary t f g 1.4.38, andRemark 1.4.39). Throughout thisand theremainingsections, assumethatthe (strictly) stationary solution is the one being observed. From (4) follows that Y2 = (cid:27)2+W ; W = (cid:27)2 (cid:15)2 1 ; (6) t t t t t t (cid:0) (cid:0) (cid:1) where W is a MDS. Let X Y2 (cid:13) , where (cid:13) E Y2 = ! 0 . Then f t g t (cid:17) t (cid:0) 0 0 (cid:17) t 1 (cid:0) (cid:11) 0 (cid:0) (cid:1) X = (cid:11) X +W ; (7) t 0 t 1 t (cid:0) in which case, the centered second-order sequence X follows an AR(1) process. Given that t f g E Y3 = E (cid:27)3 c ; t t (cid:3)3 (cid:0) (cid:1) (cid:0) (cid:1) A4 is also su¢ cient for Y3 to have a well-de(cid:133)ned and stationary mean (see Lemma 1 in the t Supplemental Appendix)(cid:8). As(cid:9)a consequence, multiplying both sides of (7) by Y for m 1 and t m (cid:0) (cid:21) taking expectations produces E X Y = (cid:11)mE Y3 : (8) t t m 0 t (cid:0) (cid:0) (cid:1) (cid:0) (cid:1) Consider Z t 1 = Y t 1 ;:::;Y t h 0 (9) (cid:0) (cid:0) (cid:0) (cid:0) (cid:1) for h < . Then E W Z = 0 by iterative expectations and, owing to (8), 1 t t (cid:0) 1 (cid:0) (cid:1) E X Z = E Y3 1; (cid:11) ; :::; (cid:11)h 1 0; (10) t (cid:0) 1 t (cid:0) 1 t (cid:2) 0 0(cid:0) (cid:16) (cid:17) (cid:0) (cid:1) (cid:0) (cid:1) 8

making Z a valid set of instruments for X . For the observed sequence Y n , consider then t (cid:0) 1 t (cid:0) 1 f t gt=1 0 X Z (cid:3) X Z t 1 t 1 t t 1 (cid:11)IV = (cid:18) t (cid:0) (cid:0) (cid:19) (cid:18) t (cid:0) (cid:19) ; (11) P P b 0 b b X Z (cid:3) X Z t 1 t 1 t 1 t 1 b (cid:18) t (cid:0) (cid:0) (cid:19) (cid:18) t (cid:0) (cid:0) (cid:19) P P b b b !IV = (cid:13) 1 (cid:11)IV ; (12) (cid:0) (cid:16) (cid:17) where b b b X = Y2 (cid:13); (cid:13) = n 1 Y2; t t (cid:0) t (cid:0) t P noting that both (cid:11)IV and !IV ar b e variance-btargetebd estimators (VTEs).4 ASSUMPTIONbA5: (cid:3) ba:s: (cid:3) , a positive de(cid:133)nite matrix: (cid:0)! 0 1 Suppose (cid:3) = n (cid:0) 1 b Z t 1 Z0 t 1 (cid:0) . In this case, (cid:11)IV is a TSLS estimator. Alternatively, (cid:18) t (cid:0) (cid:0) (cid:19) P 1 if (cid:3) = (cid:18) n (cid:0) 1b t X t (cid:0) (cid:11)X t (cid:0) 1 2 Z t (cid:0) 1 Z0 t (cid:0) 1 (cid:19) (cid:0) , (cid:11)IV is abtwo-step GMM estimator, where (cid:11) is a preliminary estPim(cid:0)ate. While th(cid:1)e two-step GMM version of (11) is certainly preferable on e¢ ciency b e b e grounds, it requires E A3 < 1 in order for A5 to hold, which is inconsistent with Figure 1. In the TSLS case, on the othe(cid:0)r ha(cid:1)nd, since Y is strongly mixing by Carrasco and Chen (2002, Corollary t f g 6), 1 (cid:3) = (cid:18) n (cid:0) 1 t Z t (cid:0) 1 Z0 t (cid:0) 1 (cid:19) (cid:0) (cid:0) a ! :s: (cid:13) (cid:0)0 1I h ; P where I is the (h h) identitby matrix, by the Ergodic Theorem, given only the milder condition h (cid:2) A4. (cid:11)IV is related to the IV estimator proposed by Guo and Phillips (2001). There are, however, two key di⁄erences. The (cid:133)rst di⁄erence involves instrument choice. In Guo and Phillips, the b instruments are second-order lags as opposed to (cid:133)rst-order lags, as is the case here. Second, the instruments in (11) are not e¢ cient in the sense of Kuersteiner (2002). Making them so, however, requires E A3 < 1 and, hence, is limited to the thin-tailed case. 4VTE for(cid:0)ARC(cid:1)H-type models is (cid:133)rst introduced by Engle and Mezrich (1996) in a QMLE context, while the asymptotic theory for this estimator is studied by Francq, HorvÆth, and Zako(cid:239)an (2011) and Vaynman and Beare (2014). 9

THEOREM 1. Consider the estimators in (11) and (12) for the model in (7). Let A 0 = E X t 1 Z t 1 0(cid:3) 0 ; B 0 = E X t 1 Z t 1 0(cid:3) 0 E X t 1 Z t 1 : (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1) Let Assumptions A1(cid:150)A5 hold. Then (cid:11)IV a:s: (cid:11) ; !IV a:s: ! : 0 0 (cid:0)! (cid:0)! b b In addition, na 3 (cid:11)IV (cid:11) d B 1A V (13) (cid:0)n (cid:0) 0 (cid:0)! 0(cid:0) 0 h (cid:16) (cid:17) if (cid:20) (3; 6), where the vector Vb = V ; :::; V 0 is jointly ((cid:20) =3) stable, with 0 2 h 1 h 0 (cid:0) components (V ) de(cid:133)ned in Lemm(cid:16)a 5 of the Supple(cid:17)mental Appendix, and m m=1;:::;h na 3 !IV ! = (cid:13) na 3 (cid:11)IV (cid:11) +o (1): (14) (cid:0)n 0 0 (cid:0)n 0 p (cid:0) (cid:0) (cid:0) (cid:16) (cid:17) (cid:16) (cid:17) b b Alternatively, if E A3 < 1 so that E Y6 < and (cid:20) (6; ), then t 1 0 2 1 (cid:0) (cid:1) (cid:0) (cid:1) pn (cid:11)IV (cid:11) d N 0; (cid:6) (15) 0 (cid:11) (cid:0) (cid:0)! 0 (cid:16) (cid:17) (cid:0) (cid:1) b and pn !IV ! d N 0; (cid:6) ; (16) 0 ! (cid:0) (cid:0)! 0 (cid:16) (cid:17) (cid:0) (cid:1) where b (cid:6) (cid:11) 0 = B 0(cid:0) 2A 0 E W t 2Z t (cid:0) 1 Z0 t (cid:0) 1 A0 0 ; (cid:6) (cid:13) 0 = E X t 2 +2 s 1 =1 E X t X t (cid:0) s ; (cid:16) (cid:17) (cid:0) (cid:1) P (cid:0) (cid:1) and (cid:6) = (cid:6) +(cid:13)2(cid:6) 2(cid:13) B 1A 2 1 E W Z Y2 : (17) ! 0 (cid:13) 0 0 (cid:11) 0 (cid:0) 0 0(cid:0) 0 (cid:18) s=1 t t (cid:0) 1 t (cid:0) s (cid:19) P (cid:0) (cid:1) Proof. Proofs of all Theorems are contained in the Appendix. Statements and proofs of all Lemmas that support the Theorems are contained in the Supplemental Appendix. The IV estimator in (11) depends on the (sample) cross-order covariances from (8), which are all nonzero owing to A3. The (weak) distributional limits of these cross-order covariances are established using a CLT from Davis and Mikosch (1998, Theorem 2.8) together with the continuous 10

mapping theorem (see Lemma 4 and Remark R3 in the Supplemental Appendix for the CLT and Lemma 5, also in the Supplemental Appendix, for the distributional limits). The method of proof extends results from Davis and Mikosch (1998) and Mikosch and Sta…rica… (2000) to cross-order covariances (see Lemmas 3(cid:150)5 in the Supplemental Appendix) and relies on a (cid:133)rst-order Taylor Expansion of (cid:27)3 around !; in which case, the limiting results are most appropriate for a small ! .5 t 0 The (weak) distributional limit in (13) is simply a linear combination of the distributional limits of the cross-order covariances, which are jointly stable by Samorodnitsky and Taqqu (1994, Theorem 2.1.5(c)). This distributional limit consists of functionals of Y . Within this limit, the individual t f g components of V are dependent (see Lemma 5 in the Supplemental Appendix). h A su¢ cient condition for (13) is j = 6 in A1. Such a condition is a close analog to one used in both Davis and Mikosch (1998) and Mikosch and Sta…rica… (2000).6 Given a result from von Bahr and Esseen (1965, Theorem 2) that is also used in Vaynman and Beare (2014), this condition is relaxedinTheorem1toallow, instead, thatj (3; 6). Thismilderconditionisbetteralignedwith 2 more-recent theory and empirical (cid:133)ndings for many (high frequency) (cid:133)nancial returns. This same milder condition also applies to the threshold ARCH(1) and ARCH(p) cases discussed in Sections 2.2 and 2.3, respectively. (cid:20)0(cid:0) 3 In (13), the limiting distribution is not impacted by (cid:13). The rate of convergence is n (cid:20)0 , which is (quite a bit) slower than the usual pn case, especially for values of (cid:20) near the lower-bound 0 b of its required support, which, as evidenced in Figure 1, are the most empirically relevant. The borderlinecaseof(cid:20) = 6isomittedforthesamereasonscitedinVaynmanandBeare(2014,Section 0 3.2). Mentioned in the Introduction and evidenced in Theorem 1, a principal advantage of (11) over the OLS alternative is that both consistency and (weak) distributional convergence follow when E Y4 = . This result renders (11) compatible with empirical (cid:133)ndings for many (cid:133)nancial return t 1 ser(cid:0)ies.(cid:1)The cost of this result, however, is a limitation on the set of permissible distributions for the model(cid:146)s rescaled errors. Given this limitation, the asymptotic properties of the OLS estimator applied to (7) are derived in the Supplemental Appendix. The distributional limit in (13) is mostly qualitative in nature, owing to a (very) awkward characteristicfunctionthatdoesnotreadilyadmittheconstructionofcon(cid:133)denceintervals. Consider 5Giventhevaluesof!typicallyencounteredinpractice,thedescribedlimitationdoesnotappeartobeparticularly binding. case 6 , In th e e ac a h na o l f og t o h u es s e co tw nd bo it c i a o s n es i , s s j ec = on 8 d . -order autocovariances are considered; i.e., E(X t X t (cid:0) m ) for m (cid:21) 1, in which 11

then (cid:28)2 = n 1 Y6: (18) n (cid:0) t t P Following the same method of proof for Dbavis and Hsing (1995, Theorem 3.1(i)), na 6(cid:28)2 d S ; (19) (cid:0)n n 0 (cid:0)! b whereS 0 is((cid:20) 0 =6) (cid:0) stable. GiventhatV h andS 0 areeachcharacterizedbystablelaws, V h 0 ; S 0 will be multivariate stable (see; e.g., Hall and Yao, 2003, and Vaynman and Beare, 201(cid:16)4, Theorem(cid:17) 4), in which case, (cid:11)IV (cid:11) B 1A V pn (cid:0) 0 d (cid:0)0 0 h; (20) (cid:28) n ! (cid:0)! S 1=2 0 b by the continuous mapping theorem. b (20) enjoys the advantage relative to (13) of removing the unknown scaling factor a 3. Given (cid:0)n (20), con(cid:133)dence intervals for (cid:11)IV can be constructed by applying the subsampling method in Vaynman and Beare (2014, Section 4.1)to the left-hand-side of (20).7 Con(cid:133)dence intervals can, b alternatively, be obtained by bootstrapping this same normalized quantity as demonstrated in Hall and Yao (2003, Corollary to Theorem 3.2). These bootstrap methods display better (cid:133)nite sample performance than the subsampling method while maintaining tractability, owing to the fact that (cid:11)IV is closed form. Inthethin-tailedcasewhereE A3 < 1,thedistributionallimitof(cid:11)IV becomesGaussian,with b the usual rate of convergence. (20(cid:0)) is(cid:1)helpful in illustrating this case; since, when E Y6 < , t b 1 (cid:28) has a degenerate limit, and the variance of the joint distribution behind V is w(cid:0)ell d(cid:1)e(cid:133)ned. n h Interestingly, in this case, the asymptotic variance of (cid:13) does not impact (cid:6) . Moreover, owing (cid:11) b 0 to (10), as c 0 (i.e., as D becomes increasingly symmetric), (cid:6) increases without bound. In (cid:3)3 ! b (cid:11) 0 the limit where c = 0, (cid:6) is ill-de(cid:133)ned, rendering (cid:11)IV unidenti(cid:133)ed. Finally, as is well known, (cid:3)3 (cid:11) 0 1 (cid:3) = E W2Z Z0 (cid:0) produces the minimum-variance estimator. In the thin-tailed case, then, 0 t t 1 t 1 (cid:0) (cid:0) b (cid:11)IV shou(cid:16)ld be a two-st(cid:17)ep GMM estimator. 7This method displays (very) poor (cid:133)nite sample performance for n 2;500 (see Vaynman and Beare, 2004, Sbection 4.2). However, given the sample sizes in Table 1 and the statemen (cid:20) t from these same authors that results for their method are improved at sample sizes of n = 50;000, subsampling might prove to be, generally, more feasible (empirically) for applications involving intraday returns. 12

2.2. The Threshold ARCH(1) Case Consider next the model of Y = (cid:27) (cid:15) ; (cid:27)2 = ! +(cid:11) Y2 I +(cid:11) Y2 I ; (21) t t t t 0 1;0 t (cid:0) 1 (cid:2) f Y t (cid:0) 1(cid:21) 0 g 2;0 t (cid:0) 1 (cid:2) f Y t (cid:0) 1 <0 g which is the threshold ARCH(1) model of Glosten, Jagannathan, and Runkle (1993); henceforth, the GJR ARCH(1) model. For this model, the following SRE applies (cid:27)2 = ! +(cid:27)2 A ; A = (cid:11) (cid:15)2 ; (cid:11) = (cid:11) I +(cid:11) I : t 0 t (cid:0) 1 t t 0;t (cid:0) 1 t (cid:0) 1 0;t (cid:0) 1 1;0 (cid:2) f Y t (cid:0) 1(cid:21) 0 g 2;0 (cid:2) f Y t (cid:0) 1 <0 g As a consequence, Y continues to have a strictly stationary solution given A4. Next, since (6) t f g continues to hold, ! +(cid:11) Cov Y2;I +(cid:11) Cov Y2;I 0 1;0 t Y 0 2;0 t Y <0 E Y2 = f t(cid:21) g f t g ; (22) t (cid:16) (cid:17) (cid:16) (cid:17) 1 (cid:11) P (Y 0)+(cid:11) P (Y < 0) (cid:0) (cid:1) (cid:0) 1;0(cid:2) t (cid:21) 2;0(cid:2) t (cid:16) (cid:17) in which case, X = (cid:11) X +(cid:11) X +W (23) t 1;0 1;t 1 2;0 2;t 1 t (cid:0) (cid:0) = X0 (cid:11) +W ; t 1 0 t (cid:0) where X = Y2 I E Y2 I ; X = Y2 I E Y2 I : 1;t (cid:0) 1 t (cid:0) 1 (cid:2) f Y t (cid:0) 1(cid:21) 0 g (cid:0) t (cid:2) f Y t(cid:21) 0 g 2;t (cid:0) 1 t (cid:0) 1 (cid:2) f Y t (cid:0) 1 <0 g (cid:0) t (cid:2) f Y t <0 g (cid:16) (cid:17) (cid:16) (cid:17) Motivated by the results in Section 2.1, consider as (potential) instruments for X t 1 (cid:0) Z t 1 = Z 1;t 1 ;Z 2;t 1 ;:::; Z 1;t h ;Z 2;t h 0; h < ; (24) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) 1 (cid:0)(cid:0) (cid:1) (cid:0) (cid:1)(cid:1) where Z = Y I E Y I ; Z = Y I E Y I 1;t (cid:0) m t (cid:0) m (cid:2) f Y t (cid:0) m(cid:21) 0 g (cid:0) t (cid:2) f Y t(cid:21) 0 g 2;t (cid:0) m t (cid:0) m (cid:2) f Y t (cid:0) m <0 g (cid:0) t (cid:2) f Y t <0 g (cid:16) (cid:17) (cid:16) (cid:17) for m 1. (cid:21) 13

ASSUMPTION A6: E Z X0 has full column rank. t 1 t 1 (cid:0) (cid:0) (cid:16) (cid:17) A6 applies the usual rank condition for identifying IV estimators. A su¢ cient condition for A6 is E Z X E Z X E Z X E Z X = 0; (25) 1;t 1 1;t 1 2;t 1 2;t 1 1;t 1 2;t 1 2;t 1 1;t 1 (cid:0) (cid:0) (cid:2) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:2) (cid:0) (cid:0) 6 (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1) which establishes Z ;Z 0 as valid instruments for X ;X 0. Let 1;t 1 2;t 1 1;t 1 2;t 1 (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:1) (cid:0) (cid:1) E (cid:15) j I = c+; E (cid:15) j I = c ; j = 1;2;3: t (cid:2) f (cid:15) t(cid:21) 0 g j t (cid:2) f (cid:15) t <0 g (cid:0)j (cid:16) (cid:17) (cid:16) (cid:17) Given (21) then, c++c = 0; c++c = 1; c++c = c ; (26) 1 (cid:0)1 2 (cid:0)2 3 (cid:0)3 (cid:3)3 where c is de(cid:133)ned in A3. Using (26), (25) can be restated as (cid:3)3 E (cid:27)3 E (cid:27)3 c+c E((cid:27) )E (cid:27)2 c c c++c+c+c = 0: (27) t (cid:2) t 3 (cid:0)3 (cid:0) t t (cid:2) (cid:0)1 (cid:0)2 3 1 2 (cid:0)3 6 (cid:0) (cid:1) (cid:2) (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1)(cid:3) Suppose that c = 0, which is to say that the distribution of (cid:15) is symmetric. In this case, again (cid:3)3 f t g using the constraints in (26), (27) is satis(cid:133)ed if E Y3 I E Y2 E Y I = 0 t Y 0 t t Y 0 (cid:2) f t(cid:21) g (cid:0) (cid:2) (cid:2) f t(cid:21) g 6 (cid:16) (cid:17) (cid:16) (cid:17) (cid:0) (cid:1) and E Y3 I E Y2 E Y I = 0; t Y <0 t t Y <0 (cid:2) f t g (cid:0) (cid:2) (cid:2) f t g 6 (cid:16) (cid:17) (cid:16) (cid:17) (cid:0) (cid:1) depending on whether (27) is solved only in terms of c+ or c , respectively. Notice that A3 is not j (cid:0)j necessary for satisfying even (25). So long as (cid:11) = (cid:11) (i.e., there exists a threshold e⁄ect in 1;0 6 2;0 the conditional variance), Z as de(cid:133)ned in (24) can serve as a valid set of instruments for X t 1 t 1 (cid:0) (cid:0) regardless of whether the rescaled errors from the GJR ARCH(1) model are skewed. In this case, it is the conditional variance function itself that supplies the necessary asymmetry for identi(cid:133)cation. In the event that (cid:11) = (cid:11) , however, (23) reduces to (7); in which case, A3 becomes necessary 1;0 2;0 for establishing validity of the instruments in (9) because, in this case, asymmetry can only come from the model(cid:146)s rescaled errors.8 8Notice that each instrument in (9) is a MDS. This characterization does not carry-over onto the instruments in (24). These latter instruments, while unconditionally mean-zero, are not conditionally mean-zero. 14

Owing to the identi(cid:133)cation condition in A6, the GJR ARCH(1) analog to (11) based upon feasible versions of X and Z is t 1 t 1 (cid:0) (cid:0) (cid:11)IV = F n 1 X Z ; (28) (cid:0) t t 1 (cid:18) t (cid:0) (cid:19) P b b b b where 1 F = n (cid:0) 1 X t 1 Z0 t 1 (cid:3) n (cid:0) 1 X t 1 Z0 t 1 0 (cid:0) n (cid:0) 1 X t 1 Z0 t 1 (cid:3) (29) (cid:20)(cid:18) t (cid:0) (cid:0) (cid:19) (cid:18) t (cid:0) (cid:0) (cid:19)(cid:21) (cid:18) t (cid:0) (cid:0) (cid:19) P P P is a 2 2h mbatrix, and b b b b b b b b (cid:2) E Y j I = n 1 Y j I ; E Y j I = n 1 Y j I ; j = 1;2: t Y 0 (cid:0) t Y 0 t Y <0 (cid:0) t Y <0 (cid:2) f t(cid:21) g t (cid:2) f t(cid:21) g (cid:2) f t g t (cid:2) f t g (cid:16) (cid:17) (cid:16) (cid:17) P P b b 1 When (cid:3) = n (cid:0) 1 Z t 1 Z0 t 1 (cid:0) , (28) is a TSLS estimator for (21), with the same discussion (cid:18) t (cid:0) (cid:0) (cid:19) regarding selectionPof (cid:3) in Section 2.1 remaining applicable. b b b b THEOREM 2. Consider the estimator in (28) for the model in (23) when (cid:11) = (cid:11) , and let 1;0 6 2;0 1 F = E X Z0 (cid:3) E X Z0 0 (cid:0) E X Z0 (cid:3) : 0 t 1 t 1 0 t 1 t 1 t 1 t 1 0 (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:20) (cid:21) (cid:16) (cid:17) (cid:16) (cid:17) (cid:16) (cid:17) In addition, let Assumptions A1(cid:150)A2 and A4(cid:150)A6 hold. Then, (cid:11)IV a:s: (cid:11) : 0 (cid:0)! b In addition, na n(cid:0) 3 (cid:11)IV (cid:0) (cid:11) 0 (cid:0) d ! F 0 W h (+; (cid:0) ) (30) (cid:16) (cid:17) if (cid:20) (3; 6), where the vector b 0 2 W h (+; (cid:0) ) = W 1 +; W 1(cid:0) ; :::; W h +; W h(cid:0) 0 (cid:16) (cid:17) is jointly ((cid:20) =3) stable, with components W+ W de(cid:133)ned in Lemma 6 of the 0 (cid:0) m m(cid:0) m=1;:::;h Supplemental Appendix. Alternatively, if E(cid:16) A3 < 1 so(cid:17)that E Y6 < and (cid:20) (6; ), t 1 0 2 1 then (cid:0) (cid:1) (cid:0) (cid:1) pn (cid:11)IV (cid:11) d N 0; F E W2Z Z0 F0 : (31) 0 0 t t 1 t 1 0 (cid:0) (cid:0)! (cid:0) (cid:0) (cid:16) (cid:17) (cid:16) (cid:16) (cid:17) (cid:17) b 15

Themainresultin(30)followsfromthe(weak)distributionalconvergenceofn 1 X Z (see (cid:0) t t 1 t (cid:0) Lemma6intheSupplementalAppendix),whichinvolvescross-ordersumsconstructedPfrompositive and negative realizations of Y , respectively. This result requires (cid:11) > 0 and (cid:11) > 0 (see f t g 1;0 2;0 RemarkR2intheSupplementalAppendix). Thedistributionallimitof(cid:11)IV isalinearcombination of the limits to sample cross-order covariances taken from the right-hand-side and left-hand-side b (+; ) of the distribution of Y t . Individual components of W h (cid:0) are dependent (see Lemma 6 in the Supplemental Appendix). In addition, W+ and W jointly depend on V from Theorem 1, which 1 1(cid:0) 1 connects the limiting result in (30) to that in (13). Normalizing the left-hand-side of (30) by (cid:28) n as it is de(cid:133)ned in (18) enables construction of either subsample or bootstrap con(cid:133)dence intervals b for (cid:11)IV as described following the statement of Theorem 1 in Section 2.1. In the case where E A3 < 1, (cid:3) = E W2Z Z0 produces the minimum variance estimator so that (cid:11)IV should 0 t t 1 t 1 b (cid:0) (cid:0) be(cid:0)a tw(cid:1)o-step GMM(cid:16)estimator. (cid:17) b NotethatTheorem2doesnotdependonA3. Asaconsequence, (28)seemstobethepreferable choice for estimating (23) over OLS in the (empirically relevant) case where E Y4 = ; since, t 1 like Theorem 1, consistency and (weak) distributional convergence are supporte(cid:0)d un(cid:1)der this case while, unlike Theorem 1, the set of permissible distributions for the model(cid:146)s rescaled errors includes symmetric candidates. Nevertheless, the asymptotic properties of the OLS estimator for (23) are also developed in the Supplemental Appendix. Finally, let (cid:0) = Cov Y2;I ; Cov Y2;I 0; P = P (Y 0); P (Y < 0) 0: 0 t f Y t(cid:21) 0 g t f Y t <0 g 0 t (cid:21) t (cid:16) (cid:16) (cid:17) (cid:16) (cid:17) (cid:17) (cid:16) (cid:17) Then, given (22), ! = (cid:13) 1 P(cid:11) (cid:0)(cid:11) 0 0 (cid:0) (cid:0) (cid:16) (cid:17) so that b b b b b b ! ! = ((cid:13) (cid:13) ) ((cid:13) P +(cid:0) ) ((cid:11) (cid:11) ); 0 0 0 0 0 0 0 (cid:0) (cid:0) (cid:0) (cid:0) in which case, a comparablebversion ofb(14) then follows. b 16

2.3. The ARCH(p) Case Consider (cid:133)nally the model of p Y = (cid:27) (cid:15) ; (cid:27)2 = ! + (cid:11) Y2 ; 1 p < : (32) t t t t 0 i;0 t i i=1 (cid:0) (cid:20) 1 P p p 1=2 ASSUMPTION A7: c (cid:11) (cid:11) < 1. 3 i;0 j;0 i=1j=1 PP A7 is the generalization of A4 to ARCH(p) processes and, as such, is su¢ cient for E Y3 < t 1 (see Lemma 8 in the Supplemental Appendix). (cid:0) (cid:1) p ASSUMPTION A8: De(cid:133)ne (cid:26) ((cid:15) ) as the largest root of 1 (cid:21)i(cid:11) (cid:15)2. p t (cid:0) i;0 t i=1 P E (cid:26) ((cid:15) )2s < 1 p t (cid:16) (cid:17) for s = 2;3;4. Suppose j = 2s in A1. Then A8 establishes E Y2s < (see Carrasco and Chen, 2002, t 1 Proposition 13). (cid:0) (cid:1) From Basrak, Davis, and Mikosch (2002), (32) can be recast in terms of the following SRE: Y = A Y +B ; (33) t t t 1 t (cid:0) e e where Y = (cid:27)2; Y2 ; Y2 ; :::; Y2 ; t t t 1 t 2 t p+1 (cid:0) (cid:0) (cid:0) (cid:16) (cid:17) e (cid:11) (cid:15)2 (cid:11) (cid:11) (cid:11) 1;0 t (cid:0) 1 2;0 2;0 (cid:1)(cid:1)(cid:1) p;0 0 (cid:15)2 0 0 ::: 0 1 t 1 (cid:0) A = B 0 1 0 ::: 0 C t B C B C B B . . . ... . . . C C B C B C B 0 0 ::: 1 0 C B C @ A B = ! ; 0; 0; :::; 0 0: t 0 (cid:16) (cid:17) Given A7, (33), Basrak, Davis, and Mikosch (2002, Theorem 3.1(A)), and Mikosch (1999, Remark 1.4.39), Y has a strictly stationary solution. Given Basrak, Davis, and Mikosch (2002, Theorem t f g 17

3.1 (B)), Y is RV((cid:20) ), and given Basrak, Davis, and Mikosch (2002, Corollary 3.5 (B)), Y is t 0 f t g RV((cid:20) ), wnhereo(cid:20) = 2(cid:20) . 0 e 0 0 Given the de(cid:133)nition of X used in Sections 2.1 and 2.2, let t X t 1 = X t 1 ; :::; X t p 0: (34) (cid:0) (cid:0) (cid:0) (cid:16) (cid:17) Then the generalization of (7) is X = X0 (cid:11) +W ; (35) t t 1 0 t (cid:0) where (cid:11) = (cid:11) ; :::; (cid:11) 0. Consider 0 1;0 p;0 (cid:16) (cid:17) Z t (cid:0) 1 = Y t (cid:0) 1 ; :::; Y t (cid:0) h 0; p (cid:20) h < 1 ; (36) (cid:16) (cid:17) as a vector of instruments for X . Given A3, Z identi(cid:133)es (cid:11) in (35) (see Lemma 9 in the t 1 t 1 0 (cid:0) (cid:0) Supplemental Appendix). Consider then the estimator (cid:11)IV = F n 1 X Z ; (37) (cid:0) t t 1 (cid:18) t (cid:0) (cid:19) P b b b where F is de(cid:133)ned as in (29), but with Z in (36) everywhere replacing Z , and X de(cid:133)ned t 1 t 1 t 1 (cid:0) (cid:0) (cid:0) as the (cid:133)nite sample version of (34). b b b THEOREM 3. Consider the estimator in (37) for the model in (35). Let Assumptions A1(cid:150)A5 and A7 hold. Then, (cid:11)IV a:s: (cid:11) : 0 (cid:0)! In addition, b na 3 (cid:11)IV (cid:11) d F V (38) (cid:0)n 0 0 p;h (cid:0) (cid:0)! (cid:16) (cid:17) if (cid:20) (3; 6), where the vector V =b V ; :::; V 0 is jointly ((cid:20) =3) stable, with 0 2 p;h p;1 p;h 0 (cid:0) components V de(cid:133)nedinLem(cid:16)ma12oftheSupple(cid:17)mentalAppendix. Alternatively, p;m m=1;:::;h if Assumptio(cid:0)n A8(cid:1)with s = 3 holds so that E Y6 < and (cid:20) (6; ), then (31) results, t 1 0 2 1 with F being the population limit of F in (37(cid:0)) an(cid:1)d Z being de(cid:133)ned in (36). 0 t 1 (cid:0) Under Theorem 3, (38) reduces to (13)bwhen p = 1. As a consequence, A3 is necessary for 18

establishing the large sample properties of (37) (see Lemma 9 in the Supplemental Appendix). That is, in the absence of skewness, (37) neither is identi(cid:133)ed nor does it possess a stable limiting distribution. The CLT underlying (38) is Basrak, Davis, and Mikosch (2002, Theorem 2.10), which generalizes Lemma 4 in the Supplemental Appendix.9 Application of Basrak et al. (2002, Theorem 2.10) requires Y ; (cid:27) to be regularly varying, which, in turn, is established by t t Basrak et. al (2002, Conro(cid:16)llary 3.5(B(cid:17))o).10 Given (18), normalization of the left-hand-side of (38) enablestheapplicationofsubsampling(seeVaynmanandBeare,2014,Theorem6)orbootstrapping (cid:11)IV (cid:11) (see Hall and Yao, 2003, Corollary to Theorem 3.1) techniques to pn (cid:0) 0 for the purpose (cid:28) n of determining con(cid:133)dence intervals for (cid:11)IV. Lastly, A8 with s = 3 i(cid:16)sbthe AR(cid:17)CH(p) analog to b E A3 < 1 that is used to establish the ARCH(1) and GJR ARCH(1) estimators as asymptotically b no(cid:0)rma(cid:1)l. A8 with s = 2 and s = 4 is used in the Supplemental Appendix to establish the large sample properties of the OLS estimator applied to (35). The distributional limit in (38) generally di⁄ers from the special case presented in (13) in that the former is derived, in part, from (normalized) sums of (cid:27) (see Lemmas 10 and 12 in t f g the Supplemental Appendix), while the latter is derived only from (normalized) sums of Y ( t f g see Lemma 5, also in the Supplemental Appendix). In other words, the distributional limit in (13) depends only on functionals of the observable sequence Y , while the distributional limit t f g in (38) depends both on functionals of Y and on functionals of the latent sequence (cid:27) . The t t f g f g complexities that arise in the cross-order covariances generated by (32) when p > 1 (see; e.g., Guo and Phillips, 2001, Lemma 1) necessitate this di⁄erential approach. The limit in (38), nonetheless, reduces to the limit in (13) when p = 1 and establishes both a stable limit and rate of convergence for (38), generally, under a method of proof that is comparable to Basrak, Davis, and Mikosch (2002, Theorem 3.6). Thedi⁄erentialapproachinestablishing(38)versus(13)isanexampleofthediminishedability to easily verify the large sample properties of general ARCH(p) versus ARCH(1) processes and (by extension) estimators that apply to each. That A4 is su¢ cient for establishing Y as strictly t f g stationary in the ARCH(1) case, while a strictly negative Lyapunov exponent for the sequence A in (33) is necessary for establishing the same result in the ARCH(p) case (see; e.g., Basrak, t f g Davis, and Mikosch, 2002, Theorem 2.1) is another example. 9Lemma 4 establishes the CLT underlying Theorems 1 and 2, respectively. 10Incontrast,Lemma3intheSupplementalAppendixestablishesregularvariationof Y underTheorems1and f tg 2, respectively. 19

Lastly, since ! ! = ((cid:13) (cid:13) ) (cid:13) (cid:19) ((cid:11) (cid:11) ); 0 0 0 0 0 (cid:0) (cid:0) (cid:0) (cid:0) and given Theorem 3, the largebsample prboperties of ! cban be established analogously to results presented in Theorem 1. b 3. Monte Carlo Consider the ARCH(1) model from Section 2.1, where (cid:15) is drawn from the skewed student(cid:146)s t f g t density of Hansen (1994). This density has two parameters, (cid:21) and (cid:17), with the former governing skewness, the latter governing the tails, and up to the (cid:17)th moment being well de(cid:133)ned. Table 1 summarizes the various ((cid:21); (cid:17)) pairs considered in the simulations. Also summarized for each pair is the skewness and (tail) index of the resulting sequence Y . To provide some context t f g for the skewness measures reported in Table 1, skewness estimates for various intra-day Japanese Yen returns (measured relative to the USD) as well as S&P 500 Index and DJIA returns are summarized in Table 2. Apparent from Table 2, high frequency (cid:133)nancial returns tend to display signi(cid:133)cant skewness that can be quite large in magnitude (see also Cont and Kan, 2011, Table 3, for comparably-sized skewness estimates for daily, 5-year credit default swap spread returns). As a consequence, even the highest level of skewness considered in the simulations has empirical support. In light of the discussion of A1(ii) in Section 2.1, the relatively thin-tailed case of (cid:17) = 8:1 is considered only to validate the large-sample properties of (cid:11) predicted by Theorem 1 and (cid:11) IV OLS predicted by Proposition 1 in the Supplemental Appendix. Given Kristensen and Rahbek (2005) b b and the empirical (cid:133)ndings of Hill and Renault (2012), the case where (cid:17) = 4:1 is considered more realistic. Lastly, for all ((cid:21); (cid:17)) pairs considered, A4 is satis(cid:133)ed so that E Y3 < . t 1 Across all simulations, ! = 0:005 and (cid:11) = 0:25.11 As noted in Ta(cid:0)ble(cid:1)1 by the tail indices, 0 0 when (cid:17) = 8:1, E Y4 < . In these cases, the simulations study the TSLS, OLS and QML t 1 estimators of the A(cid:0)RC(cid:1)H(1) model. When (cid:17) = 4:1, E Y4 = ; in which case, only the TSLS and t 1 QML estimators are studied. For the TSLS estimator(cid:0), sim(cid:1)ulations consider h = 100; 50; 25, where h is the longest lag included in the instrument vector. Sample sizes for the simulations are 100;000, 1;000, and 500, the (cid:133)rst of which is considered to validate the large-sample properties of (cid:11) and IV (cid:11) , respectively. The (relatively) small sample sizes are only considered under the heavy-tailed OLS b 11Eachofthesevaluesre(cid:135)ectsthemedianestimatefromEuro,SwissFranc,andJapaneseYenreturns(allmeasured rbelative the USD) sampled at the daily, hourly, 5-min, and 1-min frequencies obtained using the QMLE. 20

case of (cid:17) = 4:1. These cases consider the (cid:133)nite-sample performance of the TSLS estimator relative to the QMLE in instances (far) removed from normality that, nonetheless, remain empirically grounded. All Monte Carlo experiments are conducted across 10;000 simulation trials. Additional details on the experiments are contained in the notes to Tables 3 and 4. Table3summarizesthelargesampleresults(T = 100;000). Thetoppaneldepictstherelatively thin-tailed case of (cid:17) = 8:1. The bias in TSLS and OLS is small, although elevated relative to QML. In addition, OLS is more biased than TSLS, with this di⁄erence in bias widening as skewness in (cid:15) increases.12 In a comparison of e¢ ciency ratios (all measured against the QMLE), TSLS t f g and OLS are both notably less e¢ cient than QML.13 As skewness increases, the gap in e¢ ciency between TSLS and QML shrinks, although it remains sizable in absolute terms. The e¢ ciency gap between OLS and QML, in contrast, widens as skewness increases. Finally, at relatively low levels of skewness, OLS appears more e¢ cient than TSLS. At moderate to high levels of skewness, however, TSLS appears more e¢ cient than OLS, and by fairly wide margins. Lastly, there does not appear to be much di⁄erence, either in terms of bias or in terms of dispersion, from using more lagged instruments in TSLS. The bottom panel of Table 3 summarizes results from the heavy-tailed case where (cid:17) = 4:1. In this case, OLS is not consistent, explaining its exclusion from consideration. TSLS is more biased in this case than in the case where (cid:17) = 8:1.14 Interestingly, though, the e¢ ciency gap between TSLS and QML is smaller in this case than in the case where (cid:17) = 8:1. As is true in the top panel of Table 3, this e¢ ciency gap shrinks as skewness increases. In addition, there continues to be only modest di⁄erences in terms of bias and dispersion between TSLS with instrument vectors based on longer lag lengths. Table 4 summarizes the small sample results (T = 1;000 and T = 500). Relative to the bottom panel of Table 3, the bias in TSLS is notably elevated, where this bias increases with the level of skewness. Interestingly, QML now also displays notable bias, where this bias, too, increases with the level of skewness. Most interestingly, the e¢ ciency gap between TSLS and QML is now materiallyreduced. Moreover,inmanyinstances,thisgapisreversed,withTSLSevidencingsizable 12As skewness increases, the tail index decreases, thus causing the rate of convergence in (cid:11) to also slow. Note, OLS as well, that the convergence rate of (cid:11) should be faster than (cid:11) . IV OLS 13This (cid:133)nding, perhaps, is not too surprising given the relative rates of convergence of tbhe three estimators and the di⁄erences in distributions to whibch each estimator convergebs. 14This relative increase in bias is explained by the decrease in tail indices across the di⁄erent levels of skewness considered (see Table 1). With each of these tail indices near 3, the rate of convergence in (cid:11) is anticipated to be IV rather slow overall, and slower than in the case where (cid:17)=8:1. b 21

e¢ ciency gains over QML. Speci(cid:133)cally, for the sample size of T = 1;000, TSLS bests QML in terms of e¢ ciency ratios at moderate and high skewness levels. For the smaller sample size of T = 500, TSLS bests QML in terms of e¢ ciency ratios across all skewness levels. At the highest skewness level when T = 500, TSLS sizably outperforms QML. Also noteworthy, there still does not appear to be much cost in terms of sacri(cid:133)ced e¢ ciency from using "many" lagged instruments.15 Lastly, the simulation results presented in this section immediately apply to the estimator in (28). That estimator depends on the third moment of returns conditional on those returns being either greater than or equal to or less than zero. Empirically, skewness in positive and negative equity returns is large, comparable in magnitude to the skewness levels included in the simulation designs.16 4. Conclusion This paper proposes closed-form, TSLS estimators for a class of univariate ARCH(p) models. The instruments used in these estimators are not currently considered in the literature. The advantage of these instruments is that they allow the asymptotic theory for these estimators to follow under moment-existence criteria that are consistent with the empirical (cid:133)ndings for many (cid:133)nancial return series to which ARCH-type models are commonly applied. This characteristic renders the proposed TSLS estimators empirically feasible, a characteristic that is not shared by competing, closed-form estimators like OLS. Identi(cid:133)cation of these TSLS estimators links to asymmetry; either in the model(cid:146)s rescaled errors as in the ARCH(p) case, or in the speci(cid:133)cation of the conditional variance function itself as in a threshold ARCH(1) case. The asymptotic theory for these estimators extends results from Davis and Mikosch (1998) and Mikosch and Sta…rica…(2000) to cross-order covariances(de(cid:133)nedascovariancesbetweencontemporaneoussecond-orderreturnsandlagged(cid:133)rstorder returns), which become relevant for identi(cid:133)cation in instances of return asymmetry. These TSLS estimators are also shown to outperform QML in (cid:133)nite samples, con(cid:133)rming the conjecture of Bollerslev and Wooldridge (1992) that construction of an IV estimator for ARCH-type models more e¢ cient than QMLE is possible. 15There does appear to be some increase in bias that results from using more instruments; however, this cost is counter-balanced against reductions in dispersion. 16The GJR ARCH model, speci(cid:133)cally, and threshold ARCH models, generally, are applied to equity returns to account for the so called "leverage e⁄ect." 22

As an extension of this paper(cid:146)s results, consider (cid:27)2 = ! +(cid:11) Y2 +(cid:12) (cid:27)2 ; t 0 0 t 1 0 t 1 (cid:0) (cid:0) which is the popular GARCH(1;1) model introduced by Bollerslev (1986). For this model, the analog to (7) is X = (cid:30) X (cid:12) W +W ; (cid:30) = (cid:11) +(cid:12) : t 0 t 1 0 t 1 t 0 0 0 (cid:0) (cid:0) (cid:0) Following from results in Section 2.1, Z = Y ;:::;Y 0 is a valid set of instruments for X t 2 t 2 t h t 1 (cid:0) (cid:0) (cid:0) (cid:0) when Y is skewed and, thus, identi(cid:133)es (cid:30) .(cid:0)From Prono (2(cid:1)014), skewness in Y can be used to t 0 t f g f g separately identify (cid:11) and (cid:12) conditional on (cid:30) . An interesting investigation, therefore, is whether 0 0 0 theclosed-formTSLSestimatorsintroducedinthispapercanbeextendedtotheempiricallybetter performing GARCH(p;q) class of models. This investigation is the subject of ongoing research. Appendix (Proofs of the Theorems) PROOF OF THEOREM 1. Note that X = X ((cid:13) (cid:13) ); (39) t t 0 (cid:0) (cid:0) b b and X = c+(cid:11) X +W ; (40) t 0 t 1 t (cid:0) where c = ((cid:11) 1)((cid:13) (cid:13) ). Then gbiven (40), b 0(cid:0) (cid:0) 0 b c n 1 X Z 0 (cid:3) (cid:0) t 1 t 1 (cid:11)IV = (cid:11) +0 (cid:18) t (cid:0) (cid:0) (cid:19) 1 n 1 Z (41) b 0 B B B(cid:18) n (cid:0) 1 t X t (cid:0) 1 Z t (cid:0) P 1 (cid:19) 0b (cid:3) (cid:18) n (cid:0) 1 t X b t (cid:0) 1 Z t (cid:0) 1 (cid:19) C C C (cid:2) (cid:18) (cid:0) P t t (cid:0) 1 (cid:19) @ P P A n b 1 X Z b 0 (cid:3) b (cid:0) t 1 t 1 +0 (cid:18) t (cid:0) (cid:0) (cid:19) 1 n 1 W Z B B B(cid:18) n (cid:0) 1 t X t (cid:0) 1 Z t (cid:0) P 1 (cid:19) b0 (cid:3) (cid:18) n (cid:0) 1 t b X t (cid:0) 1 Z t (cid:0) 1 (cid:19) C C C (cid:2) (cid:18) (cid:0) P t t t (cid:0) 1 (cid:19) @ P P A b b b By Carrasco and Chen (2002, Corollary 6), Y is strong mixing. As a consequence, given t f g (8) and A3, (cid:11)IV a:s: (cid:11) , and !IV a:s: ! by the Ergodic Theorem. Next, given (39) and noting ! 0 ! 0 b b 23

that the population analog to (cid:11)IV in (11) is (cid:11) , 0 b A a 3 X Z E X Z na (cid:0)n 3 (cid:11)IV (cid:0) (cid:11) 0 = 0 0 (cid:18) (cid:0)n P t t t B (cid:0) 1(cid:0) (cid:0) t t (cid:0) 1 (cid:1) (cid:19)1+o P (1) 0 (cid:16) (cid:17) B C b B C @d B 1A V ; A (cid:0)! (cid:0)0 0 h where V is jointly ((cid:20) =3) stable by Lemma 5 in the Supplemental Appendix and Samorodh 0 (cid:0) nitsky and Taqqu (1994, Theorem 2.1.5(c)), noting that a 3 X Z E X Z = a 3 Y2Z E Y2Z (42) (cid:0)n t t 1 t t 1 (cid:0)n t t 1 t t 1 t (cid:0) (cid:0) (cid:0) t (cid:0) (cid:0) (cid:0) P (cid:0) (cid:1) (cid:13) 0 P n (cid:20) 2 0 (cid:20) (cid:0) 0 6 n (cid:0) 1=2 (cid:0) Z t 1 (cid:1) (cid:0) (cid:18) t (cid:0) (cid:19) = a 3 Y2Z EPY2Z +o (1) (cid:0)n t t 1 t t 1 P t (cid:0) (cid:0) (cid:0) P (cid:0) (cid:1) by Ibragimov and Linnik (1971, Theorem 18.5.3). Next, since !IV = (cid:13) 1 (cid:11)IV , (cid:0) (cid:16) (cid:17) b b b na 3 !IV ! = (cid:13) na 3 (cid:11)IV (cid:11) +na 3((cid:13) (cid:13) ) (43) (cid:0)n 0 0 (cid:0)n 0 (cid:0)n 0 (cid:0) (cid:0) (cid:0) (cid:0) (cid:16) (cid:17) = (cid:13) na 3 (cid:16) (cid:11)IV (cid:11) (cid:17) +o (1); b (cid:0) 0 (cid:0)n b (cid:0) 0 P b (cid:16) (cid:17) b where the second equality relies on a 2 Y2 d V ; (cid:0)n t 0 (cid:0)! t P for (cid:20) (3; 4] by Davis and Mikosch (1998), where V is ((cid:20) =2)-stable, and 0 2 0 0 n 1=2 Y2 d N 0; (cid:6) ; (cid:0) t (cid:13) (cid:0)! 0 t P (cid:0) (cid:1) for (cid:20) (4; 6) by Ibragimov and Linnik, where (cid:6) is de(cid:133)ned in Theorem 1. Finally, if 0 2 (cid:13) 0 24

(cid:20) (6; ), then from (41), 0 2 1 pn (cid:11)IV (cid:11) = B 1A n 1=2 W Z +o (1) (cid:0) 0 (cid:0)0 0 (cid:18) (cid:0) t t t (cid:0) 1 (cid:19) P (cid:16) (cid:17) P b d A 0 E W t 2Z t 1 Z0 t 1 A0 0 N 0; (cid:0) (cid:0) ; (cid:0)! 0 (cid:16) B2 (cid:17) 1 0 @ A and pn !IV ! = pn((cid:13) (cid:13) ) (cid:13) pn (cid:11)IV (cid:11) 0 0 0 0 (cid:0) (cid:0) (cid:0) (cid:0) (cid:16) (cid:17) d (cid:16) (cid:17) N 0; (cid:6) ; b (cid:0)! b ! 0 b (cid:0) (cid:1) with (cid:6) also de(cid:133)ned in Theorem 1. Both of these standard convergence results rely on ! 0 Ibragimov and Linnik, with the (cid:133)rst result also depending on the Slutsky Theorem.(cid:4) PROOF OF THEOREM 2. Given (39), also note that X = X G G ; G = E Y2 I ; E Y2 I 0 t (cid:0) 1 t (cid:0) 1 (cid:0) (cid:0) 0 0 t (cid:2) f Y t(cid:21) 0 g t (cid:2) f Y t <0 g (cid:16) (cid:17) (cid:16) (cid:16) (cid:17) (cid:16) (cid:17)(cid:17) b b and Z = Z H H ; t 1 t 1 0 (cid:0) (cid:0) (cid:0) (cid:0) (cid:16) (cid:17) b b H = E Y I ; E Y I ; E Y I ; E Y I ;::: 0 0 t (cid:2) f Y t(cid:21) 0 g t (cid:2) f Y t <0 g t (cid:2) f Y t(cid:21) 0 g t (cid:2) f Y t <0 g (cid:16) (cid:16) (cid:17) (cid:16) (cid:17) (cid:16) (cid:17) (cid:16) (cid:17) (cid:17) so that, comparable to (40), X t = c+X0 t 1 (cid:11) 0 +W t ; (cid:0) where c = G G 0(cid:11) ((cid:13) (cid:13) b). Then b (cid:0) 0 0(cid:0) (cid:0) 0 (cid:16) (cid:17) b b (cid:11)IV (cid:11) 0 = F c n (cid:0) 1 Z0 t 1 H H 0 n (cid:0) 1 W t +F n (cid:0) 1 W t Z t 1 ; (44) (cid:0) (cid:20) (cid:18) t (cid:0) (cid:19) (cid:0) (cid:0) (cid:18) t (cid:19)(cid:21) (cid:18) t (cid:0) (cid:19) (cid:16) (cid:17) P P P b b b b from which (cid:11)IV a:s: (cid:11) , where identi(cid:133)cation follows from A7 and (almost sure) convergence ! 0 in the sample moments follows from the Ergodic Theorem, since Y remains strong mixing, t b f g 25

this time by Carrasco and Chen (2002, Corollary 10). Next, from (28), (cid:11)IV (cid:11) = F n 1 X Z E X Z 0 (cid:0) t t 1 t t 1 (cid:0) (cid:18) t (cid:0) (cid:0) (cid:0) (cid:19) P (cid:0) (cid:1) b b F n 1 Z H H +((cid:13) (cid:13) ) ((cid:13) (cid:13) ) H H (cid:0) t 1 0 0 0 0 (cid:0) (cid:20)(cid:18) t (cid:0) (cid:19) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:21) (cid:16)(cid:16) (cid:17) (cid:16) (cid:17)(cid:17) P bF F 0 E X t Z t 1 b b b b (cid:0) (cid:0) (cid:0) (cid:16) (cid:17) (cid:0) (cid:1) b such that na 3 (cid:11)IV (cid:11) = F a 3 X Z E X Z +o (1): (cid:0)n 0 0 (cid:0)n t t 1 t t 1 P (cid:0) (cid:18) t (cid:0) (cid:0) (cid:0) (cid:19) (cid:16) (cid:17) P (cid:0) (cid:1) b (1) Let Z = Z H . Given the arguments that support the second equalities in both (42) t (cid:0) 1 t (cid:0) 1(cid:0) 0 and (43), a 3 X Z E X Z = a 3 Y2Z (1) E Y2Z (1) (cid:0)n t t t (cid:0) 1 (cid:0) t t (cid:0) 1 (cid:0)n t t t (cid:0) 1(cid:0) t t (cid:0) 1 (cid:16) (cid:17) P (cid:0) (cid:1) P H a 3 Y2 E Y2 +(cid:13) a 3 Z 0 (cid:0)n t t 0 (cid:0)n t 1 (cid:0) (cid:18) t (cid:0) t (cid:0) (cid:19) = a 3 Y2Z P(1) E Y (cid:0) 2Z ((cid:1)1) +o ( P 1) (cid:0)n t t 1(cid:0) t t 1 P t (cid:0) (cid:0) (cid:16) (cid:17) P such that na (cid:0)n 3 (cid:11)IV (cid:0) (cid:11) 0 (cid:0) d ! F 0 W h (+; (cid:0) ) ; (cid:16) (cid:17) (+; ) where W h (cid:0) is jointly ((cid:20) 0 =3) (cid:0) stabble by Lemma 6 in the Supplemental Appendix and Samorodnitsky and Taqqu (1994, Theorem 2.1.5(c)). Finally, from (44), pn (cid:11)IV (cid:11) d N 0; F E W2Z Z0 F0 ; 0 0 t t 1 t 1 0 (cid:0) (cid:0)! (cid:0) (cid:0) (cid:16) (cid:17) (cid:16) (cid:16) (cid:17) (cid:17) b by Ibragimov and Linnik (1971, Theorem 18.5.3) and the Slutsky Theorem.(cid:4) PROOF OF THEOREM 3. Let (cid:19) be a p 1 vector of ones. Given (34), (cid:2) X = X ((cid:13) (cid:13) )(cid:19) t 1 t 1 0 (cid:0) (cid:0) (cid:0) (cid:0) b b 26

Then given (39), (cid:11)IV (cid:11) = F c n 1 Z +n 1 W Z ; (45) 0 (cid:0) t 1 (cid:0) t t 1 (cid:0) (cid:18) (cid:18) t (cid:0) (cid:19) t (cid:0) (cid:19) P P b b where c = ((cid:19) 0 (cid:11) 0(cid:0) 1)((cid:13) (cid:0) (cid:13) 0 ). By Lemma 9 in the Supplemental Appendix, E Z t (cid:0) 1 X0 t (cid:0) 1 has full column rank. By Carrasco and Chen (2002, Proposition 12), Y rem(cid:16)ains stron(cid:17)g t b f g mixing. Then by the Ergodic Theorem, (cid:11)IV a:s: (cid:11) . Next, given (39), (cid:0)! 0 b (cid:11)IV (cid:11) = F n 1 X Z E X Z ((cid:13) (cid:13) )F n 1 Z + F F E X Z 0 (cid:0) t t 1 t t 1 0 (cid:0) t 1 0 t t 1 (cid:0) (cid:18) t (cid:0) (cid:0) (cid:0) (cid:19) (cid:0) (cid:0) (cid:18) t (cid:0) (cid:19) (cid:0) (cid:0) (cid:16) (cid:17) P (cid:0) (cid:1) P (cid:0) (cid:1) b b b b b so that na 3 (cid:11)IV (cid:11) = F a 3 Y2Z E Y2Z +o (1); (46) (cid:0)n 0 0 (cid:0)n t t 1 t t 1 p (cid:0) (cid:18) t (cid:0) (cid:0) (cid:0) (cid:19) (cid:16) (cid:17) P (cid:0) (cid:1) since b a 3 X Z E X Z = a 3 Y2Z E Y2Z +o (1); (cid:0)n t t 1 t t 1 (cid:0)n t t 1 t t 1 p t (cid:0) (cid:0) (cid:0) t (cid:0) (cid:0) (cid:0) P (cid:0) (cid:1) P (cid:0) (cid:1) following the same argument that supports (42). Then by Lemma 12 in the Supplemental Appendix, na 3 (cid:11)IV (cid:11) d F V ; (cid:0)n 0 0 p;h (cid:0) (cid:0)! (cid:16) (cid:17) where V is jointly ((cid:20) =3) stable bybSamorodnitsky and Taqqu (1994, Theorem 2.1.5(c)). p;h 0 (cid:0) Finally, from (45), pn (cid:11)IV (cid:11) d N 0; F E W2Z Z0 F0 ; 0 0 t t 1 t 1 0 (cid:0) (cid:0)! (cid:0) (cid:0) (cid:16) (cid:17) (cid:16) (cid:16) (cid:17) (cid:17) b if (cid:20) (6; ) by Ibragimov and Linnik (1971, Theorem 18.5.3) and the Slutsky Theorem. 0 2 1 27

References [1] Andersen, T.G. & T. Bollerslev (1997) Intraday periodicity and volatility persistence in (cid:133)nancial markets. Journal of Empirical Finance 4, 115-158. [2] Baillie, R.T. & H. Chung (1999) Estimation of GARCH models from the autocorrelations of the squares of a process. Journal of Time Series Analysis 22, 631-650. [3] Basrak, B., R.ADavis&T.Mikosch(2002)RegularvariationofGARCHprocesses.Stochastic Processes and Their Applications 99, 95-115. [4] Bollerslev, T. (1986) Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31, 307-327. [5] Bollerslev, T. R.Y. Chou & K.F. Kroner (1992) ARCH modeling in (cid:133)nance. Journal of Econometrics 52, 5(cid:150)59. [6] Bollerslev, T. & J. M. Wooldridge (1992) Quasi-maximum likelihood estimation and inference in dynamic models with time-varying covariances. Econometric Reviews 11, 143-172. [7] Bouchaud, J.P. & M. Potters (2003) Theory of Financial Risk and Derivative Pricing: From Statistical Physics to Risk Management (Second Edition). Cambridge University Press. [8] Carrasco, M. & X. Chen (2002) Mixing and moment properties of various GARCH and stochastic volatility models. Econometric Theory 18, 17-39. [9] Cont,R.&Y.H.Kan(2011)Statisticalmodelingofcreditdefaultswapportfolios.Unpublished manuscript. [10] Davis,R.A.&T.Hsing(1995)Pointprocessandpartialsumconvergenceforweaklydependent random variables with in(cid:133)nite variance. The Annals of Probability 23, 879-917. [11] Davis, R.A. & T. Mikosch (1998) The sample autocorrelations of heavy-tailed processes with applications to ARCH. The Annals of Statistics 26, 2049-2080. [12] Drost, F.C. & T.E. Nijman (1993) Temporal aggregation of GARCH processes. Econometrica 61, 909-927. [13] Embrechts, P., C. Kluppelberg & T. Mikosch (1997) Modelling Extremal Events in Insurance and Finance. Springer, Berlin. [14] Engle, R.F. (1982) Autoregressive conditional heteroskedasticity with estimates of variance of united kingdom in(cid:135)ation. Econometrica 50, 987-1007. [15] Engle, R.F. & J. Mezrich (1996) GARCH for Groups. Risk 9, 36-40. [16] Francq, C., L. HorvÆth & J.M. Zako(cid:239)an (2000) Estimating weak GARCH representations. Econometric Theory 16, 692-728. [17] Francq, C., L. HorvÆth & J.M. Zako(cid:239)an (2011) Merits and drawbacks of variance targeting in GARCH models. Journal of Financial Econometrics 9, 619-656. [18] Giraitis, L.&P.M.Robinson(2001)WhittleestimationofARCHmodels.Econometric Theory 17, 608-631. 28

[19] Glosten, L.R., R. Jagannathan & D.E. Runkle (1993) On the relation between expected value and the volatility of the nominal excess return on stocks. Journal of Finance 48, 1779-1801. [20] Guo, B. & P.C.B Phillips (2001) E¢ cient estimation of second moment parameters in ARCH models. Unpublished manuscript. [21] Hall, P. & Q. Yao (2003) Inference in ARCH and GARCH models with heavy-tailed errors. Econometrica 71, 285-317. [22] Hansen, B.E. (1994) Autoregressive conditional density estimation. International Economic Review 35, 705-730. [23] Harvey, C.R. & A. Siddique (1999) Autoregressive conditional skewness. Journal of Financial and Quantitative Analysis, 34, 465-487. [24] Hill, B.M. (1975) A simple general approach to inference about the tail of a distribution. Annals of Statistics 5, 1163-1174. [25] Hill, J.B. (2010) On tail index estimation for dependent, heterogeneous data. Econometric Theory 26, 1398-1436. [26] Hill, J.B. & E. Renault (2012) Variance targeting for heavy tailed time series. Unpublished manuscript. [27] Ibragimov, I.A. and Y.V. Linnik (1971) Independent and Stationary Sequences of Random Variables. Wolters-Noordho⁄: Groningen. [28] Jacquier, E., N. Polson & P.E. Rossi (1994) Bayesian analysis of stochastic volatility models. Journal of Business and Economic Statistics 12, 371-417. [29] Jensen, S.T. & A. Rahbek (2004) Asymptotic normality of the QMLE estimator of ARCH in the nonstationary case. Econometrica 72, 641-646. [30] Jondeau, E. & M. Rockinger (2003) Conditional volatility, skewness, and kurtosis: existence, persistence, and comovements. Journal of Economic Dynamics and Control 27, 1699-1737. [31] Kristensen,D.&O.Linton(2006)AclosedformestimatorfortheGARCH(1,1)-model.Econometric Theory 22, 323-327. [32] Kristensen, D. & A. Rahbek (2005) Asymptotics of the QMLE for a class of ARCH(q) models, Econometric Theory 21, 946-961. [33] Kuersteiner, G.M. (2002) E¢ cient instrumental variables estimation for autoregressive models with conditional heteroskedasticity. Econometric Theory 18, 547-583. [34] Lewbel, A. (1997) Constructing instruments for regressors with measurement error when no additional data are available, with an application to patents and R&D. Econometrica 65, 1201-1213. [35] Loretan, M. & P.C.B Phillips (1994) Testing the covariance stationarity of heavy-tailed time series. Journal of Empirical Finance 1, 211-248. 29

[36] Mikosch, T. (1999) Regular Variation, Subexponentiality and their applications in probability theory.Lecturenotesfortheworkshop"HeavyTailsandQueques,"EURANDOM,Eindhoven, Netherlands. [37] Mikosch, T. & D. Straumann (2002) Whittle estimation in a heavy-tailed GARCH(1,1) model. Stochastic Processes and Their Applications 100, 187-222. [38] Mikosch, T. & C. Sta…rica…(2000) Limit theory for the sample autocorrelations and extremes of a GARCH(1,1) process. The Annals of Statistics 28, 1427-1451. [39] Prono, T. (2014) Simple estimators for the GARCH(1,1) model. Available at SSRN: http://ssrn.com/abstract=1511720. [40] Resnick, S.I. (1987) Extreme Values, Regular Variation, and Point Processes. New York: Springer-Verlag. [41] Samorodnitsky, G. & M.S. Taqqu (1994) Stable Non-Gaussian Random Processes: Stochastic Models with In(cid:133)nite Variance. Stochastic Modeling. New York: Chapmand and Hall. [42] Vaynman, I. & B.K. Beare (2014) Stable limit theory for the variance targeting estimator, in Y. Chang, T.B. Fomby & J.Y. Park (eds), Essays in Honor of Peter C.B. Phillips, vol. 33 of Advances in Econometrics: Emerald Group Publishing Limited, chapter 24, 639-672. [43] von Bahr, B., & C.G. Esseen (1965) Inequalities for the rth absolute moment of a sum of random variables, 1 r 2. Annals of Mathematical Statistics 36, 299-303. (cid:20) (cid:20) [44] Weiss, A.A.(1986)AsymptotictheoryforARCHmodels: estimationandtesting.Econometric Theory 2, 107-131. 30

TABLE 1 (cid:21) (cid:17) skew. (cid:20) -0.20 4.1 -1.27 3.50 -0.40 -2.32 3.32 -0.80 -3.48 3.14 -0.20 8.1 -0.53 4.97 -0.40 -0.98 4.62 -0.80 -1.52 4.30 Notes to Tables 1. TheMonteCarlosimulationsconsider (cid:15) drawnfromtheskewedstudent(cid:146)stdensity f tg ofHansen(1994),where(cid:21)and(cid:17)aretheparametersgoverningthisdensity,withtheformerdeterminingskewness, thelatterdeterminingthetails,andmomentsuptothe(cid:17)th beingwellde(cid:133)ned. Summarizedforeach((cid:21); (cid:17))pair are the skewness and (tail) index,(cid:20),for Y ,noting that ! =0:005and (cid:11) =0:25in (cid:27)2 . For skewness, f tg 0 0 t (cid:8) (cid:9) Y 3 Skew(Y ) E t =E (cid:15)3 t (cid:17) (cid:18) (cid:27) t(cid:19) ! t (cid:0) (cid:1) so that an analytical solution is available using results from Jondeau and Rockinger (2003). The (tail) index, (cid:20), is obtained as the mean value across 10;000 simulation trials of the Hill (1975) estimator applied to 10;000 observations of Y using a constant threshold of0:5%. f tg TABLE 2 JPY Returns SPX Returns DJIA Returns freq. obs. skew. obs. skew. obs. skew. 1-min 174,997 -2.68 46,551 -1.75 46,557 -1.25 (0.01) (0.01) (0.01) 5-min 35,028 -1.94 9,312 -3.17 9,315 -2.68 (0.01) (0.03) (0.03) 10-min 17,523 -1.51 (0.02) 15-min 11,685 -3.10 (0.02) 20-min 8,766 -2.10 (0.03) Notes to Tables 2. The data source is Bloomberg. The date range for all return series is 7/19/2015(cid:150) 12/31/2015. Skew is an estimate of the (unconditionally) standardized third moment. While not equivalent to the skewness measure applied in Table 1, simulation evidence (using the skewed student(cid:146)s t density) suggests these di⁄erences to be relatively minor enough not to disrupt comparisons between the general magnitudes of skewnessmeasuressummarizedhereandinTable1. Standarderrorsfortheskewnessestimatesareinparentheses and are measured against the nullofnormality. 31

TABLE 3 mean med. dec. E¢ ciency Ratio (cid:21) est. m bias bias sd rge. rmse mae mdae rmse mae mdae (cid:17) =8:1 -0.20 TSLS 100 -0.001 -0.004 0.038 0.085 0.038 0.028 0.022 5.29 4.89 4.58 50 -0.001 -0.004 0.039 0.085 0.039 0.028 0.022 5.33 4.93 4.68 25 -0.001 -0.004 0.039 0.086 0.039 0.028 0.022 5.38 4.97 4.68 OLS -0.005 -0.010 0.034 0.064 0.034 0.023 0.018 4.76 4.12 3.89 QMLE 0.000 0.000 0.007 0.018 0.007 0.006 0.005 1.00 1.00 1.00 -0.40 TSLS 100 -0.002 -0.005 0.028 0.059 0.028 0.020 0.016 3.54 3.14 2.95 50 -0.002 -0.005 0.028 0.059 0.028 0.020 0.016 3.54 3.14 2.95 25 -0.002 -0.005 0.028 0.059 0.029 0.020 0.016 3.54 3.14 2.93 OLS -0.008 -0.015 0.040 0.077 0.041 0.029 0.023 5.08 4.55 4.40 QMLE 0.000 0.000 0.008 0.020 0.008 0.006 0.005 1.00 1.00 1.00 -0.80 TSLS 100 -0.003 -0.008 0.028 0.056 0.028 0.020 0.016 2.90 2.54 2.42 50 -0.003 -0.008 0.028 0.055 0.028 0.020 0.016 2.89 2.53 2.43 25 -0.003 -0.008 0.028 0.056 0.028 0.020 0.016 2.88 2.52 2.41 OLS -0.015 -0.023 0.046 0.091 0.049 0.037 0.031 5.02 4.72 4.73 QMLE 0.000 0.000 0.010 0.025 0.010 0.008 0.007 1.00 1.00 1.00 (cid:17) =4:1 -0.20 TSLS 100 -0.017 -0.027 0.081 0.179 0.083 0.062 0.050 4.63 4.90 4.96 50 -0.017 -0.027 0.082 0.181 0.083 0.063 0.049 4.66 4.92 4.87 25 -0.017 -0.027 0.082 0.178 0.084 0.063 0.050 4.68 4.92 4.93 QMLE 0.000 -0.002 0.018 0.039 0.018 0.013 0.010 1.00 1.00 1.00 -0.40 TSLS 100 -0.021 -0.031 0.061 0.125 0.065 0.049 0.042 2.89 3.20 3.45 50 -0.021 -0.031 0.061 0.124 0.065 0.049 0.042 2.88 3.19 3.43 25 -0.021 -0.031 0.061 0.124 0.065 0.049 0.041 2.87 3.18 3.42 QMLE 0.000 -0.002 0.023 0.047 0.023 0.015 0.012 1.00 1.00 1.00 -0.80 TSLS 100 -0.030 -0.040 0.055 0.113 0.063 0.051 0.047 2.12 2.52 2.93 50 -0.030 -0.040 0.055 0.112 0.062 0.050 0.046 2.11 2.50 2.90 25 -0.030 -0.039 0.055 0.111 0.062 0.050 0.046 2.11 2.50 2.89 QMLE 0.000 -0.003 0.029 0.061 0.029 0.020 0.016 1.00 1.00 1.00 Notes to Tables 3. The ARCH(1) model is considered with ! =0:005 and (cid:11) =0:25. Simulations are 0 0 conducted on samples of T =100;000 observations across 10;000 trials. Within each simulation trial, the (cid:133)rst 200 observations are dropped to avoid initialization e⁄ects. In the case where (cid:17) = 8:1, the estimators under study are TSLS, OLS, and QMLE. When (cid:17)=4:1, only the TSLS and QMLE estimators are considered, owing to the insu¢ cient existence of higher moments needed to render OLS consistent. For TSLS, instrument vectors of 100, 50, and 25 lags are considered. Summary statistics are the mean bias and median bias, each measured relative to the true parameter value, the standard deviation, decile range (the di⁄erence between the 90th and 10th percentiles), and the root mean squared error, mean absolute error, and median absolute error, also each measured relative to the true parameter value. The E¢ ciency Ratio is the root mean squared error, mean absolute error, and median absolute error of the given estimator divided by the corresponding measure for the QMLE. (cid:15) is drawn from the student(cid:146)s t density of Hansen (1994) for the listed ((cid:21); (cid:17)) pairs. Skewness and f tg (tail) index estimates for Y that correspond with each ((cid:21); (cid:17))pair are summarized in Table 1. f tg 32

TABLE 4 mean med. dec. E¢ ciency Ratio (cid:21) est. m bias bias sd rge. rmse mae mdae rmse mae mdae T =1;000 -0.20 TSLS 100 -0.057 -0.076 0.122 0.310 0.135 0.113 0.107 1.12 1.25 1.45 50 -0.046 -0.063 0.130 0.335 0.138 0.115 0.107 1.15 1.27 1.45 25 -0.034 -0.054 0.140 0.363 0.144 0.119 0.110 1.20 1.32 1.49 QMLE -0.004 -0.024 0.120 0.279 0.120 0.091 0.074 1.00 1.00 1.00 -0.40 TSLS 100 -0.064 -0.083 0.114 0.288 0.130 0.110 0.104 0.95 1.08 1.26 50 -0.059 -0.077 0.116 0.293 0.131 0.110 0.103 0.95 1.07 1.24 25 -0.056 -0.075 0.119 0.299 0.132 0.110 0.103 0.96 1.08 1.24 QMLE -0.004 -0.031 0.137 0.315 0.137 0.103 0.083 1.00 1.00 1.00 -0.80 TSLS 100 -0.078 -0.095 0.100 0.246 0.127 0.109 0.106 0.78 0.89 1.04 50 -0.076 -0.093 0.101 0.250 0.127 0.108 0.106 0.78 0.88 1.03 25 -0.075 -0.091 0.102 0.251 0.126 0.108 0.104 0.78 0.88 1.01 QMLE -0.005 -0.045 0.162 0.375 0.162 0.123 0.103 1.00 1.00 1.00 T =500 -0.20 TSLS 100 -0.065 -0.085 0.122 0.310 0.138 0.118 0.112 0.93 1.03 1.18 50 -0.049 -0.070 0.133 0.342 0.142 0.119 0.112 0.95 1.04 1.17 25 -0.035 -0.059 0.146 0.383 0.150 0.126 0.119 1.01 1.10 1.24 QMLE -0.007 -0.035 0.149 0.352 0.149 0.114 0.096 1.00 1.00 1.00 -0.40 TSLS 100 -0.072 -0.093 0.119 0.299 0.139 0.119 0.115 0.84 0.94 1.07 50 -0.062 -0.083 0.124 0.316 0.139 0.118 0.112 0.84 0.93 1.04 25 -0.055 -0.078 0.130 0.331 0.141 0.119 0.113 0.86 0.94 1.05 QMLE -0.007 -0.044 0.164 0.389 0.164 0.127 0.108 1.00 1.00 1.00 -0.80 TSLS 100 -0.087 -0.107 0.106 0.267 0.138 0.120 0.118 0.72 0.80 0.90 50 -0.083 -0.101 0.109 0.275 0.137 0.118 0.115 0.71 0.79 0.88 25 -0.080 -0.100 0.111 0.282 0.137 0.118 0.116 0.72 0.79 0.88 QMLE -0.010 -0.064 0.192 0.453 0.192 0.150 0.132 1.00 1.00 1.00 Notes to Tables 4. The ARCH(1) model is considered with ! =0:005 and (cid:11) =0:25. Simulations are 0 0 conducted on samplesofeitherT =1;000orT =500observationsacross10;000trials. Within each simulation trial, the (cid:133)rst 200 observations are dropped to avoid initialization e⁄ects. In both panels, (cid:17) = 4:1, so only the TSLS and QMLE estimators are considered, owing to the insu¢ cient existence of higher moments needed to render OLS consistent. For TSLS, instrument vectors of 100, 50, and 25 lags are considered. Summary statistics are the mean bias and median bias, each measured relative to the true parameter value, the standard deviation,decilerange(thedi⁄erencebetween the90th and 10th percentiles),and therootmean squared error, mean absolute error, and median absolute error, also each measured relative to the true parameter value. The E¢ ciency Ratio is the root mean squared error, mean absolute error, and median absolute error of the given estimator divided by the corresponding measure for the QMLE. (cid:15) is drawn from the student(cid:146)s t density of f tg Hansen(1994)forthelisted((cid:21); (cid:17))pairs. Skewnessand(tail)indexestimatesfor Y thatcorrespondwitheach f tg ((cid:21); (cid:17))pair are summarized in Table 1. 33

FIGURE 1 Hill Plots for Select FX (Absolute) 20-Min Log-Returns Date Range: Jan 1, 2015--May 31, 2015 Japenese Yen 11.00 10.00 9.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 24 55 87 118 149 181 212 244 275 307 338 370 401 number of tail observations Euro 7.00 6.50 6.00 5.50 5.00 4.50 4.00 3.50 3.00 2.50 2.00 24 55 87 118 149 181 212 244 275 307 338 370 401 number of tail observations Swiss Franc 12.00 11.00 10.00 9.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 24 55 87 118 149 181 212 244 275 307 338 370 401 number of tail observations Notes to Figure1: This Figure depicts Hill (1975) tail index estimates for Japanese Yen, Euro, and Swiss Franc exchange rates (all measured against the US Dollar) at decreasing thresholds. The salient features of this figure are summarized in Section 1.2 of the paper. All data sources to Bloomberg.

Cite this document

APA

Todd Prono (2017). Closed-Form Estimation of Finite-Order ARCH Models: Asymptotic Theory and Finite-Sample Performance (FEDS 2016-083). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2016-083

BibTeX

@techreport{wtfs_feds_2016_083,
  author = {Todd Prono},
  title = {Closed-Form Estimation of Finite-Order ARCH Models: Asymptotic Theory and Finite-Sample Performance},
  type = {Finance and Economics Discussion Series},
  number = {2016-083},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2017},
  url = {https://whenthefedspeaks.com/doc/feds_2016-083},
  abstract = {Covariances between contemporaneous squared values and lagged levels form the basis for closed-form instrumental variables estimators of ARCH processes. These simple estimators rely on asymmetry for identification (either in the model's rescaled errors or the conditional variance function) and apply to threshold ARCH(1) and ARCH(p) with p < â processes. Limit theory for these estimators is established in the case where the ARCH processes are regularly varying with a well-defined third and sixth moment of the raw returns and rescaled errors, respectively. The resulting limits are highly non-normal in empirically relevant cases, with slow rates of convergence relative to the thin-tailed ân -case. Nevertheless, Monte Carlo studies of a heavy-tailed ARCH(1) process show the simple IV estimator to outperform standard QMLE in (relatively) small samples when the data are (heavily) skewed. Methods for determining confidence intervals for the ARCH estimates are also discussed.},
}