ifdp · October 31, 2012

Nonparametric HAC Estimation for Time Series Data with Missing Observations

Abstract

The Newey and West (1987) estimator has become the standard way to estimate a heteroskedasticity and autocorrelation consistent (HAC) covariance matrix, but it does not immediately apply to time series with missing observations. We demonstrate that the intuitive approach to estimate the true spectrum of the underlying process using only the observed data leads to incorrect inference. Instead, we propose two simple consistent HAC estimators for time series with missing data. First, we develop the Amplitude Modulated estimator by applying the Newey-West estimator and treating the missing observations as non-serially correlated. Secondly, we develop the Equal Spacing estimator by applying the Newey-West estimator to the series formed by treating the data as equally spaced. We show asymptotic consistency of both estimators for inference purposes and discuss finite sample variance and bias tradeoff. In Monte Carlo simulations, we demonstrate that the Equal Spacing estimator is preferred in most cases due to its lower bias, while the Amplitude Modulated estimator is preferred for small sample size and low autocorrelation due to its lower variance.

Board of Governors of the Federal Reserve System International Finance Discussion Papers Number 1060 November 2012 Nonparametric HAC Estimation for Time Series Data with Missing Observations Deepa Dhume Datta and Wenxin Du NOTE: International Finance Discussion Papers are preliminary materials circulated to stimulate discussion and critical comment. References to International Finance Discussion Papers (other than an acknowledgment that the writer has had access to unpublished material) should be cleared with the author or authors. Recent IFDPs are available on the Web at www.federalreserve.gov/pubs/ifdp/. This paper can be downloaded without charge from the Social Science Research Network electronic library at www.ssrn.com.

Nonparametric HAC Estimation for Time Series Data with Missing Observations∗ Deepa Dhume Datta† Wenxin Du‡ November 5, 2012 Abstract The Newey and West (1987) estimator has become the standard way to estimate a heteroskedasticity and autocorrelation consistent (HAC) covariance matrix, but it does not immediately apply to time series with missing observations. We demonstrate that the intuitive approach to estimate the true spectrum of the underlying process using only the observed data leads to incorrect inference. Instead, we propose two simple consistent HAC estimators for time series with missing data. First, we develop the Amplitude Modulated estimator by applying the Newey-West estimator and treating the missing observations as non-serially correlated. Secondly, we develop the Equal Spacing estimator by applying the Newey-West estimator to the series formed by treating the data as equally spaced. We show asymptotic consistency of both estimators for inference purposes and discuss finite sample variance and bias tradeoff. In Monte Carlo simulations, we demonstrate that the Equal Spacing estimator is preferred in most cases due to its lower bias, while the Amplitude Modulated estimator is preferred for small sample size and low autocorrelation due to its lower variance. Keywords: heteroskedasticity, serial correlation, robust inference, missing data JEL Classifications: C22, C12, C14 ∗We would like to thank James Stock and Rustam Ibragimov for their invaluable guidance. We would alsoliketothinkthreeanonymousreferees,aswellasJohnCampbell,GaryChamberlain,DobrislavDobrev, Neil Ericsson, Jose Luis Montiel, Ulrich Müller, John Rogers, Clara Vega, Jonathan Wright, and seminar participants at the Federal Reserve Board and Harvard University for helpful comments and discussions. Carissa Tudor provided excellent research assistance. The views in this paper are solely the responsibility of the authors and should not be interpreted as reflecting the views of the Board of Governors of the Federal Reserve System or of any other person associated with the Federal Reserve System. †Division of International Finance, Board of Governors of the Federal Reserve System, Washington, DC, USA. Email: deepa.d.datta@frb.gov ‡Department of Economics, Harvard University, Cambridge, MA, USA. Email: wdu@fas.harvard.edu

1 Introduction While use of the Newey and West (1987) estimator and its Andrews (1991) implementation have become standard practice for heteroskedasticity and autocorrelation (HAC) robust inference, analogous methods for series with missing observations are far from standardized. Whendataaremissing, theNewey-Westformulasdonotimmediatelyapply, andtheformula for calculating the lagged autocorrelations that are required terms in conventional HAC formulas must be adjusted. Current practices for working with missing data include treating the missing observations as non-serially correlated, or imputing or ignoring the missing observations. To our knowledge, there has not been formal justification of HAC estimation for robust inference in these contexts, and the effect of employing these work-around methods on the resulting inferences is generally unexplored in applied work. In this paper, we provide formal justification for two methods of HAC estimation, and we compare these two methods to other existing methods. We demonstrate that treating the data as equally spaced is preferred under very general conditions, and that treating the missing observations as non-serially correlated may be preferred in instances with small sample sizes or low autocorrelation. In general, we find that our two newly formalized methods are preferred to imputation. Especially when our aim is to adjust inferences for serial correlation, it seems counterintuitive that we can either treat the data as equally spaced or treat the missing observations as non-serially correlated, since these procedures require us to depart from the original time structure or autocorrelation structure of the data. However, we show that these procedures both provide consistent estimators of the long-run variance of the observed series with missing data. Though many have suggested that we infer the spectrum of the underlying data from the observed data using the Parzen estimator, we show that the Parzen estimator is not the correct object for inference testing. Rather, we show that our Amplitude Modulated estimator (which treats the missing as non-serially correlated) and Equal Spacing estimator 1

(which treats the observed as equally spaced) are extremely simple to implement and can be used to generate asymptotically valid inferences. These insights are particularly valuable given the ad hoc approaches widely found in the applied literature. For example, researchers often use imputation procedures to fill in the missing data. Imputation seems to expand the set of observations used for analysis, or at least prevents us from dropping data when some covariates are unobserved. However, because imputed data series are often smoothed versions of the underlying unobserved series, they will often lead to asymptotically valid but extremely poor finite sample performance. Furthermore, researchers using these methods rarely adjust their inferences for this induced serial correlation. Additional evidence of the confusion in the literature stems from the implementation of the Newey-West HAC estimator in the popular statistical package, Stata. The newey2 command implements Newey-West to obtain the standard error of the coefficient in a regression using time series or panel data. When observations are missing, the option “force” can be applied. This option will apply our Equal Spacing estimator to time series data and apply our Amplitude Modulated estimator to panel data. However, it should be possible to implement either the Equal Spacing estimator or the Amplitude Modulated estimator for time series data. Furthermore, the program will not apply the Amplitude Modulated estimator at all when some lags are unobserved. This condition is likely an artifact of the literature on estimating the long-run variance of the underlying series, which develops estimators that generally require all lags to be observed (Clinger and Van Ness, 1976; Parzen, 1963). Yet for robust inference, we show that the Amplitude Modulated and Equal Spacing estimators do not require all the lags to be observed. A primary goal of this paper is to help researchers select the correct estimation procedure in applied work. To that end, we follow the style of Petersen (2009), which provides guidance for selecting standard errors in finance panel data sets. We formally present the Amplitude Modulated and Equal Spacing estimators of the long-run variance of the observed series, 2

and we review their asymptotic properties. We contrast these estimators with the Parzen estimator of the long-run variance of the underlying series. After presenting these theoretical contributions, we offer intuition for why the estimators work and how they are related to each other. To generate guidance for choosing the correct estimator, we conduct Monte Carlo simulation results for various sample sizes, correlation structures, and fractions of observations that are missing. In addition to testing our estimators using randomly missing data, we demonstrate the applicability of these estimators to a deterministic cyclical missing structure, aswithdailyfinancialdata, whichusuallycover5of7daysoftheweek. Finally, we discuss an empirical application using recursive regressions for commodities futures returns to demonstrate how the choice of estimator can affect the conclusion of empirical tests. As a preview, our results demonstrate that the Amplitude Modulated and Equal Spacing estimators are both consistent under random and deterministic missing structures. In finite samples, we find that for the same fixed bandwidth, the Equal Spacing estimator is generally less biased than the Amplitude Modulated estimator, but has larger variances. Consequently, the Equal Spacing estimator is preferred when autocorrelation is high, as the bias will dominate the mean squared error in these cases. Conversely, when autocorrelation is low, variance dominates the mean squared error, and the Amplitude Modulated estimator is preferred. The precise cutoff between these cases depends on the sample size, and whether automatic bandwidth selection procedures are implemented.1 The remainder of the paper is structured as follows. Section 1.1 discusses some examples ofmissingdataproblemsforwhichimputationandequalspacingmethodshavebeenapplied, and provides a brief review of some of the related econometrics literature. Section 2 provides anoverviewofourestimatorsbyapplyingtheminasimplesettingwithmissingobservations. Section 3 formally defines the estimators and discusses their asymptotic and finite sample 1To make it easier for researchers to apply these estimators, we have posted Matlab code for both estimators on our websites. We also have posted a basic simulation code that reports empirical rejection rates, size-adjusted power, bias, and variance for the Equal Spacing, Amplitude Modulated, Impuation, and (full sample) Newey-West estimators. Researchers can use the simulation code to evaluate the performance oftheestimatorsundercustomizedsamplesize,autocorrelation,andmissingstructurebeforechoosingwhich estimator to implement. 3

properties. Section 4 presents the application of these estimators to inference in a regression setting with missing observations. Section 5 describes the Monte Carlo simulations based on theseestimatorsandtheresults. Section6presentsanempiricalapplicationoftheestimators using data on commodities returns. Finally, Section 7 concludes. 1.1 Relation to the Literature This paper extends the HAC covariance literature to applications with missing observations. In general, the most commonly used HAC covariance matrix is the one proposed by Newey and West (1987) and further developed by Andrews (1991). The Newey-West estimator equals a weighted sum of lagged autocovariance matrices, in which the weights are calculated using the Bartlett kernel. Newey and West (1987) show that this estimator is positive semidefinite and heteroskedasticity and autocorrelation consistent. Andrews (1991) and Newey and West (1994) investigate the finite sample properties of these estimators and propose data-dependent bandwidths. Though some papers have proposed variants of the estimators discussed in these seminal papers, the estimators applied in most of the current literature remain largely unchanged from their original form. There have been earlier attempts to estimate HAC covariance matrices when some observations are missing. The Parzen (1963) paper on spectral analysis for data series with missing observations focuses on estimating the autocovariances of the underlying process in the presence of missing observations, based on a specific cyclical structure of missing data. We contribute to this literature by pointing out that for robust inference, we generally require an estimate of the long-run variance of the observed series rather than the underlying. We differentiate between the Amplitude Modulated estimator and the Parzen estimator. Following Parzen (1963), both of these estimators involve recasting the observed series as an amplitude modulated series in which the value of the underlying series is set to zero when observations are missing. The observed time structure of the data is respected, and the lagged autocovariances are estimated using only the lagged pairs which are fully 4

observed. We show that while the Amplitude Modulated estimator is a consistent estimator of the long-run variance of the observed series, the Parzen estimator is a consistent estimator of the long-run variance of the underlying series. Along with these theoretical results, we provide simulation results that demonstrate consistency of the t-statistic constructed using the Amplitude Modulated estimator. We also argue to extend the set of missing data structures to which these estimators can be applied. Other researchers have attempted to apply Parzen’s work to a variety of missing data structures, including the Bernoulli structure of randomly missing variables and more general cyclical patterns of missing observations (Scheinok, 1965; Bloomfield, 1970; Clinger and Van Ness, 1976; Dunsmuir and Robinson, 1981a,b). While the literature has moved beyond Parzen’s original application, it still is focused on applications with randomly missingobservations. Yet,manycommonapplicationsofmissingdatatechniquesarefordata that have a deterministic missing structure. Our theoretical results and simulation exercise demonstrate that as long the missing data structure satisfies our independence assumption, we can apply the Amplitude Modulated and Equal Spacing estimators to settings in which the pattern of missing observations is deterministic instead of random. This extends the set of possible applications to include business daily data, for example, in which weekends could be considered missing data. More recently, Kim and Sun (2011) construct a HAC estimator for the two-dimensional case robust to spatial heteroskedasticity and autocorrelation. The focus of the paper is not on missing data, and they do not distinguish the difference between the spatial spectrum of the underlying versus the observed process. However, they discuss the applicability of their method to irregularly observed spatial data. Reducing the spatial HAC on the irregular lattice to one-dimensional time series produces an estimator very similar to our Amplitude Modulatedestimator. Weclarifythesubtletybetweentheunderlyingandobservedspectrum and develop the Amplitude Modulated estimator in the context of time series with missing data. 5

Setting the theory aside, many researchers use imputation techniques when faced with missing observations in practice. For example, one common use of imputation is to temporally disaggregate data to generate quarterly data from annual series, or monthly data from quarterly series. The Denton method of imputation smooths data when generating these series by minimizing first-differences or second-differences (Denton, 1971). Relatedly, the Chow-Lin method uses a related indicator series that can be used to interpolate, distribute, or extrapolate data (Chow and Lin, 1971, 1976). When this method is used, some properties of the indicator series, including serial correlation, will be transferred to the imputed series. Even the simplest method of imputing data by naive linear interpolation will induce autocorrelation in the imputed series. Studies based on Monte Carlo simulations suggest that even for reasonably large sample sizes, inference methods based on Newey-West HAC covariance estimators result in significant overrejection when the serial correlation is high (den Haan and Levin, 1997). In an imputed series, the induced high autocorrelation exacerbates this distortion. Yet, researchers using these methods rarely adjust their inferences for this induced serial correlation (for two such examples, see Eaton, Kortum, Neiman, and Romalis (2011) and Forni, Monteforte, and Sessa (2009)). We show in our Monte Carlo simulations and our empirical application that our estimators are simple alternatives that avoid the problems associated with imputation. To avoid the complication of adjusting HAC estimators for the method and extent of imputation, some researchers simply ignore the missing observations. Formally, this method amounts to relabeling the time index and treating observations as though they are equally spaced in time. While this method has no previous formal justification to our knowledge, it has been widely applied. For example, observations of daily financial data are generally treatedasequallyspacedconsecutiveobservations, irrespectiveoftheiractualspacingintime (examples include Acharya and Johnson (2007), Beber, Brandt, and Kavajecz (2009), Jorion and Zhang (2007), Pan and Singleton (2008), and Zhang, Zhou, and Zhu (2009)). Yet, for prices that are affected by developments in global markets with different business weeks or 6

nationalholidays, thelackofpricedataonweekendsandholidayscouldbetreatedasmissing observations. In this paper, we formalize this Equal Spacing estimator and demonstrate its asymptotic consistency and finite sample performance. In light of the existing confusion in the literature on HAC estimation with missing data, we provide our Equal Spacing and Amplitude Modulated estimators as alternatives. We discuss the finite sample properties of these estimators and provide simulation results that offer insight into the bias and variance tradeoffs between the two estimators, so that practitioners can make an informed choice before applying either one. 2 A simple example To fix ideas, in this section we introduce each of our estimators in the context of a simple example using three weeks of daily gasoline prices. We reserve for the next section a more detailed discussion of the asymptotic and finite sample properties of the estimators and the practical considerations involved in choosing among them. Suppose we have gasoline price data {z } for the first three weeks of the month: t Mon Tues Weds Thurs Fri Sat Sun z z z z z z z 1 2 3 4 5 6 7 z z z z z z z 8 9 10 11 12 13 14 z z z z z z z 15 16 17 18 19 20 21 For clarity of exposition, suppose these data have already been demeaned, so that we have E(z ) = 0. To estimate the long-run variance of the series {z }, we can apply the standard t t Newey-West estimator: m (cid:88) Ω ˆNW = γˆ(0)+2 w(j,m)γˆ(j), j=1 7

where the Bartlett kernel, w(j,m) = 1−[j/(m+1)] if j ≤ m and w(j,m) = 0 if j > m, is used to weight the sample autocorrelations at each lag j: T 1 (cid:88) γˆ(j) = z z , t−j t T t=j+1 In our example, we can estimate the first lagged autocorrelation for the gasoline price series as: 1 γˆ(1) = [z z +z z +...+z z ]. 1 2 2 3 20 21 21 Similarly, we estimate the third lagged autocorrelation as: 1 γˆ(3) = [z z +z z +...+z z ]. 1 4 2 5 18 21 21 Note that the denominator in both cases is the total number of observations, T, rather than the number of observed lags, T −j. Now suppose we have only business daily data, with missing data on weekends: Mon Tues Weds Thurs Fri Sat Sun z z z z z . . 1 2 3 4 5 z z z z z . . 8 9 10 11 12 z z z z z . . 15 16 17 18 19 When some data points are missing, we have a few choices for how we estimate the lagged ˆ autocovariances, γˆ(j), that are components of the long-run variance, Ω. Especially in the context of business daily data, one very common procedure is to ignore the missing data. In this case, we would treat the observed prices as equally spaced in time. When estimating the first lagged autocovariance for our Equal Spacing estimator, we would treat the jump from Friday to Monday (e.g. day 5 to day 8) as a one day lag: 1 γˆES(1) = [z z +z z +z z +z z +z z +z z +z z +...+z z ]. 1 2 2 3 3 4 4 5 5 8 8 9 9 10 18 19 15 8

Similarly, the third autocovariance would be estimated as: 1 γˆES(3) = [z z +z z +z z +z z +z z +z z +z z +z z +...+z z ]. 1 4 2 5 3 8 4 9 5 10 8 11 9 12 10 15 16 19 15 Since we have effectively reduced the sample size by ignoring the missing days, the denominator for each estimated lagged autocovariance is the total number of observed data points, or 15 in our example. Alternatively, we could estimate the lagged autocovariances using only the instances in which we observe data with the right spacing. We call this the Amplitude Modulated estimator, because we are effectively modulating, or setting to 0, the value (or amplitude) of the series on the missing days. Using this estimator, to estimate the first lagged autocovariance in our example, we would use the lag from day 4 to day 5 and then skip to the lag from day 8 to day 9: 1 γˆAM(1) = [z z +z z +z z +z z +z z +z z +...+z z ]. 1 2 2 3 3 4 4 5 8 9 9 10 18 19 15 The third autocovariance would be estimated using all observed three day lags, including the three day lags between Friday and Monday: 1 γˆAM(3) = [z z +z z +z z +z z +z z +z z +z z +z z ]. 1 4 2 5 5 8 8 11 9 12 12 15 15 18 16 19 15 In this method, the denominator for each lag is again the total number of observed data points, as in the the Equal Spacing estimator. Whilewefocusonthesetwoestimatorsthroughoutmostofthepaper, forcomparison, our simulations will also implement two alternatives that are not preferred. The first alternative is the Parzen estimator, after Parzen (1963). It is constructed like the Amplitude Modulated estimator, except that we adjust the denominator to equal the number of times we observe 9

data with the right spacing: 1 γˆPZ(1) = [z z +z z +z z +z z +z z +z z +...+z z ]. 1 2 2 3 3 4 4 5 8 9 9 10 18 19 12 The third autocovariance would be estimated as: 1 γˆPZ(3) = [z z +z z +z z +z z +z z +z z +z z +z z ]. 1 4 2 5 5 8 8 11 9 12 12 15 15 18 16 19 8 Finally, we will implement our Imputation estimator, which is the Newey-West estimator applied to the filled series, {zI}, constructed by linearly imputing the missing data. In our t example with business daily gasoline prices, the first and third lagged autocorrelations can be estimated as: 1 γˆIM(1) = [z z +...+z z +z zI +zIzI +zIz +z z +...+z z +z zI +zI zI ] 21 1 2 4 5 5 6 6 7 7 8 8 9 18 19 19 20 20 21 1 γˆIM(3) = [z z +z z +z zI+z zI+z z +zIz +zIz +z z +...+z z +z zI +z zI ]. 21 1 4 2 5 3 6 4 7 5 8 6 9 7 10 8 11 16 19 17 20 18 21 3 Long-Run Variance of Time Series with Missing Observations In this section, we formalize our main estimators and describe their asymptotic and finite sample properties. 3.1 Missing Data Structure Consider a second-order stationary time series {z }∞ with (cid:80)∞ |γ (j)| < ∞ and E(z ) = µ t t=1 j=0 z t and an indicator series {g }∞ such that g = 1 if z is observed and g = 0 if z is missing. t t=1 t t t t Throughout the paper, we maintain the assumptions on the missing data structure: 10

Assumption 1. Independence: We assume that the underlying series {z } is independent of t the series {g }. In other words, for any positive integer n < ∞ and any sequence t ,...,t , the t 1 n random variable z ≡ (z ,...,z ) and g ≡ (g ,...,g ) satisfy the condition that Pr(z−1(A)∩ t1 tn t1 tn g−1(B)) = Pr(z−1(A))Pr(g−1(B)) for any two n-dimensional Borel sets A, B (cid:106) Rn. Assumption 2. Existence: lim (S/T) = α and lim [1/S] (cid:80)T g g = κ(j) both T→∞ T→∞ t=j+1 t t−j exist. Assumption 1 requires that the missing process is independent of the underlying data, so that missing data do not induce bias in the parameter estimates. Assumption 2 requires that the fractions of observed converges in probability, and the asymptotic ratio of the number of observed lag j to total number of observations exists. Under these assumptions, we allow very general stochastic or deterministic missing data processes. We give two commonly observed missing data structures as follows: Example 1. Bernoulli missing: The series {g }∞ has an i.i.d. Bernoulli distribution, in t t=1 which each g takes value 0 with probability p and value 1 with probability 1−p. t Example 2. Cyclically missing: Given a series {z }∞ , we can divide the series into cycles t t=1 which are each of length h > 2. In the first cycle of length h, we have k missing observations for some integer k < h. Define the set of time indexes of these missing observations, S = {s ,...,s }, where the integers s ∈ [1,h] for all k. For t ≤ h, g = 0 if and only if t ∈ S. In 1 k k t a cyclically missing structure, for t > h, we have g = 0 for all integers l = 1,2,...,∞, s +hl k and g = 1 otherwise. t The indicator series {g } is stochastic for Bernoulli missing and deterministic for cyclical t missing once the missing pattern is known for any h consecutive elements. 3.2 Newey-West Estimator First, we review the standard Newey-West estimator that applies to time series without missingobservations. Supposethatz iscontinuouslyobservedatt = 1,...,T withE(z ) = µ. t t 11

We let γ (j) = E[(z − µ)(z − µ)] denote the j-th lagged autocovariance. Under the z t t−j standard assumption that z is second-order stationary with (cid:80)∞ |γ (j)| < ∞, we have t j=0 z the standard results that √ 1 (cid:80)T (z − µ) → d N(0,Ω), where the long-run variance of the T t=1 t underlying process z is equal to t ∞ (cid:88) Ω = γ (j). (1) z j=−∞ The Newey-West HAC estimator for Ω is given by m (cid:88) Ω ˆNW = γˆ (0)+2 w(j,m)γˆ (j), z z j=1 where γˆ (j) = 1 (cid:80)T (z − z¯ )(z − z¯ ) and z¯ = (1/T) (cid:80)T z . In the Newey-West z T t=j+1 t T t−j T T t=1 t formula, the lagged autocovariances, γˆ (j), are weighted by the Bartlett kernel, w(j,m) = z 1 − [j/(m + 1)] for j ≤ m and w(j,m) = 0 otherwise, to ensure a positive semi-definite covariance matrix. Under fairly general technical assumptions, as long as lim m(T) = ∞ T→∞ and lim (cid:2) m(T)/T1/4 (cid:3) = 0, we have Ω ˆNW → p Ω (Newey and West, 1987). The choice of T→∞ optimal bandwidth m is given by Andrews (1991) and Newey and West (1994) who further explore the properties of alternative choices for the bandwidth and kernel and the finite sample properties of these estimators. 3.3 Long-Run Variance of the Underlying Process - Parzen Estimator In the presence of missing observations, we follow Parzen (1963) and recast the series as an amplitude modulated version of some underlying full series. We define the amplitude modulated series, {z∗}, as z∗ = g z . Using the amplitude modulated series {z∗}, Parzen t t t t (1963) suggests the following estimator for the autocovariance of the underlying series {z }: t (cid:80)T (z∗ −g z¯∗)(z∗ −g z¯∗) γˆPZ(j) = t=j+1 t t T t−j t−j T , z (cid:80)T g g t=j+1 t t−j 12

if ΣT g g > 0. Dunsmuir and Robinson (1981a) establishes γˆPZ(j) → p γ (j) provided t=j+1 t t−j z z that z∗ is asymptotically stationary. t Under the special case that lim (cid:80)T g g > 0 for all j, we can use the observed T→∞ t=j+1 t t−j data to construct our Parzen estimator, which is a Newey-West type consistent estimator of the long-run variance of the underlying process z : t m Ω ˆPZ = γˆPZ(j)+2 (cid:88) w(j,m)γˆPZ(j) → p Ω z z j=1 While this object may be useful in some instances, it is incorrect for inference testing. First, Dunsmuir and Robinson (1981b) study the case in which w(j,m) = 1, and point out that Ω ˆPZ may not be positive semi-definite.2 Secondly, as we further demonstrate, the long-run variance of the underlying process differs from the long-run variance of the observed process. Though the Parzen estimator is formed using observed data only, it is a consistent estimator of the variance of the underlying process. Consequently, inference on the observed data will be invalid if we use the Parzen estimate of the variance. 3.4 Long-Run Variance of the Observed Process Let S = (cid:80)T g be the total number of the observed. The sample mean is given by z¯∗ = t=1 t T 1 (cid:80)T z∗. Asymptotic mean and variance of z¯∗ is given by the following proposition. S t=1 t T Proposition 1. z¯∗ → p µ and Ω∗ ≡ lim S ·E(z¯∗ −µ)2 = (cid:80)∞ κ(j)γ (j). T T→∞ T j=−∞ z Proof. Given E(g ) = lim (S/T) = α and g is independent of z , we have E(z∗) = t T→∞ t t t E(g )E(z ) = αµ. We can rewrite lim z¯∗ = lim 1 (cid:80)T z∗ = lim T 1 (cid:80)T z∗.We t t T→∞ T T→∞ S t=1 t T→∞ S T t=1 t knowthatlim (S/T) = α. Bythelawoflargenumbers,wehavelim 1 (cid:80)T z∗ = αµ. T→∞ T→∞ T t=1 t 2In our simulations, we implement the estimator using Bartlett kernel weights to maintain comparability withresultsforESandAM.However,thisdoesnotguaranteethattheestimatorwillbepositivesemi-definite. 13

Therefore, lim z¯∗ = µ. We also have T→∞ T (cid:34) (cid:35)2 T (cid:88) S ·E(z¯∗ −µ)2 = [1/S]E g (z −µ) T t t t=1 (cid:34) (cid:35) T T−1 T (cid:88) (cid:88) (cid:88) = [1/S]E (z −µ)2g2 +2 (z −µ)(z −µ)g g t t t t−j t t−j t=1 j=1 t=j+1 (cid:34) (cid:32) (cid:33) (cid:32) (cid:33)(cid:35) T T−1 T 1 (cid:88) (cid:88) 1 (cid:88) = [T/S)] γ (0)E g2 +2 γ (j)E g g . z T t z T t t−j t=1 j=1 t=j+1 Define κ(j) ≡ lim [1/S] (cid:80)T E(g g ), the share of times lag j is observed. Given T→∞ t=j+1 t t−j (cid:104) (cid:105) lim (T/S) = 1/α and E (1/T) (cid:80)T g g = lim (1/T) (cid:80)T g g = ακ(j), T→∞ t=j+1 t t−j T→∞ t=j+1 t t−j we have ∞ (cid:88) Ω∗ = lim S ·E(z¯∗ −µ)2 = κ(j)γ (j). (2) T z T→∞ j=−∞ Therefore, the long-run variance of the observed amplitude modulated process, i.e., Ω∗, is a weighted sum of the original autocovariances, with the weights being the asymptotic ratio of the number of the observed lags to the total number of the observed, S. Comparing equations (1) and (2), when z is observed at all t, i.e., g = 1 for all t, then Ω∗ = Ω. In the t t presence of missing observations, if all autocovariances are positive, we have κ(j) ≤ 1. Then the long-run variance of the amplitude modulated process is always weakly smaller than the long-run variance of the underlying process, Ω∗ ≤ Ω. 3.4.1 Amplitude Modulated Estimator ToestimateΩ∗ infinitesampleswithS observed,anaturalcandidateisgivenbythefollowing proposition. 14

Proposition 2. A consistent estimator of Ω∗ is given by T−1 (cid:88) Ω ˆ∗ = γˆ (0)+2 γˆ (j) z∗ z∗ j=1 where γˆ (j) = [1/S] (cid:80)T (z∗ −g z¯∗)(z∗ −g z¯∗). z∗ t=j+1 t t T t−j t−j T Proof. We note that T (cid:88) γˆ (j) = [1/S] (z∗ −g z¯∗)(z∗ −g z¯∗) z∗ t t T t−j t−j T t=j+1 T T 1 (cid:88) = (z −z¯∗)(z −z¯∗)g g S T t T t−j T t t−j t=j+1 Since lim T/S = 1/α, lim E(z − z¯∗)(z − z¯∗) = γ (j) and lim E(g g ) = T→∞ T→∞ t T t−j T z T→∞ t t−j ακ(j) we have p γˆ (j) → κ(j)γ(j). z∗ Therefore, Ω ˆ∗ → p Ω∗. However, Ω ˆ∗ is not guaranteed to be positive semi-definite, which is not desirable for inference. We can use a kernel-based method to ensure the covariance estimator is positive semi-definite: m (cid:88) Ω ˆAM = γˆ (0)+2 w(j,m)γˆ (j). z∗ z∗ j=1 We follow Newey and West (1987) and illustrate with the most commonly used kernel, the Bartlett kernel. Proposition 3. Using the Bartlett kernel, w(j,m) = 1 − [j/(m + 1)] if j ≤ m and w(j,m) = 0 if j > m, suppose (i) the bandwidth m satisfies lim m(T) = +∞ and T→∞ lim [m(T)/T1/4] = 0. Then Ω ˆAM is positive semi-definite and Ω ˆAM → p Ω∗. T→∞ 15

Proof. We follow proofs of Theorems 1 and 2 in Newey and West (1987) by defining h ≡ t z∗ −g µ , h ˆ ≡ z∗ −g z¯∗ and replace all T in the denominator in Newey and West (1987) t t t t t T with S. Our estimator Ω ˆAM is almost equivalent to applying the Newey-West estimator to the amplitude modulated series. However, we make two minor modifications to the components γˆ (j) = [1/S] (cid:80)T (z∗ − g z¯∗)(z∗ − g z¯∗). First, we subtract z∗ by g z¯∗ instead of z∗ t=j+1 t t T t−j t−j T t t T E(g )z¯∗, so that the difference (z∗ −g z¯∗) equals zero for unobserved data. In the case of a t T t t T mean-zero series, this modification would not be required. Secondly, since we want to use Ω ˆAM to make inferences about the mean of the observed process z¯∗, we divide the sum by T S instead of T so that our inference procedure remains consistent. 3.4.2 Equal Spacing Estimator Instead of casting the time series with missing observations as an amplitude modulated process, an alternative method is to ignore the missing observations and treat the data as equally spaced over time. We define the function ι(t) as the mapping from time index t to the new equal spacing time domain s: ι(t) = (cid:80)t g . We use this mapping to relabel (cid:96)=1 (cid:96) the time indices of the observed values from the series {z } to create the series {zES} for t s s = 1,...,ι(T), in the equal spacing time domain. The sample mean of the observed series is given by z¯ES = 1 (cid:80)S zES. T S s=1 s Proposition 4. z¯ES → p µ and ΩES ≡ lim S ·E(z¯ES −µ)2 = Ω∗. T T→∞ T Proof. We let ∆jES ≡ ι−1(s) − ι−1(s − jES) be a function that maps the time gap (jES) s between z and z in the equal spacing domain to the time gap (j) in the time dos s−jES main of the underlying process {z }. Using the indicator function I(·), we define λjES(j) = t lim 1 (cid:80)S I(∆jES = j), which equals the frequency that the observed lag jES maps to T→∞ S s=1 s lag j in the original time domain. Then we can rewrite the Equal Spacing autocovariance in 16

terms of the autocovariance of the underlying process: γ (jES) = lim E(z −z¯ES)(z −z¯ES) zES s T s−jES T T→∞ ∞ (cid:88) = λjES(j)γ (j) z j=−∞ Applying the same standard results as in Equation 1 to the equal spacing series, we have: ∞ (cid:88) ΩES = γ (jES) zES jES=−∞ ∞ ∞ (cid:88) (cid:88) = λjES(j)γ (j) z jES=−∞j=−∞   ∞ ∞ (cid:88) (cid:88) =  λjES(j)γ z (j) j=−∞ jES=−∞ ∞ (cid:88) = κ(j)γ (j) = Ω∗. z j=−∞ The second to last equation holds because ∞ S T (cid:88) 1 (cid:88) 1 (cid:88) lim I(∆jES = j) = lim g g . T→∞ S s T→∞ S t t−j jES=−∞ s=1 t=1 To estimate Ω∗ using the equally spaced series in finite samples, we can use m (cid:88) Ω ˆES = γˆ (0)+2 w(jES,m)γˆ (jES) zES zES j=1 where S−1 1 (cid:88) γˆ (jES) = (zES −z¯ES)(zES −z¯ES). zES S s T s−j T j=1 Proposition 5. Ω ˆES is PSD and Ω ˆES → p Ω∗. 17

Proof. Positive semi-definiteness of Ω ˆES can be established using same argument in the proof of Theorem 1 in Newey and West (1987). Using their notation, we let h ˆ = zES −z¯ES. To s s T prove consistency, since s(T)−1 T−1 (cid:88) (cid:88) γˆ (0)+2 γˆ (jES) = γˆ (0)+2 γˆ (j) zES zES z∗ z∗ jES=1 j=1 and w(j,m) → p 1 for all j, Ω ˆES and Ω ˆAM estimators are asymptotically equivalent. We know Ω ˆAM → p Ω∗ by proposition 3, and hence, Ω ˆES → p Ω∗. The Equal Spacing estimator is particularly simple to implement, because it only requires relabelingthetimeindexofaserieswithmissingobservationstoignorethegapsandtreatthe data as equally spaced over time. Once this is done, the Equal Spacing estimator amounts to applying the standard Newey-West estimator to the equally spaced series. 3.5 Finite Samples Although Ω ˆAM and Ω ˆES are both asymptotically consistent, finite sample performance might differ due to different weighting on autocovariance estimators. We use the standard mean squared error (MSE) criterion to to evaluate the performance of the estimator Ω ˆi where i ∈ {AM,ES}. MSE(Ω ˆi) = Bias2(Ω ˆi)+Var(Ω ˆi) (cid:104) (cid:105)2 = E(Ω ˆi −Ω∗) +E[(Ω ˆi −Ω ¯i)2] where Ω ¯i = E(Ω ˆi). Consider the case that mAM = mES ≡ m. The lag length in the original time domain is weakly greater than that in the equal spacing domain: j = ι−1(s)− ι−1(s − jES) ≥ jES. Under the same fixed bandwidth m, there are two main differences between Ω ˆAM and Ω ˆES. First, since the kernel weight is decreasing in the lag length for the Newey-West estimator, Ω ˆES assigns weakly higher weight on all autocovariance estimators 18

compared to Ω ˆAM. To see this more explicitly, (cid:88) m w(jES,m) (cid:88) S Ω ˆES = (z −z¯∗)(z −z¯∗) S s T s−jES T jES=1 s=jES+1 To write it in the original time domain, we have (cid:88) m w(jES,m) (cid:88) T Ω ˆES = (z −z¯∗)(z −z¯∗) S t T t−j T jES=1 t=j+1 where t = ι−1(s) and j = ι−1(s)−ι−1(s−jES) ≥ jES. We compare Ω ˆES with Ω ˆAM, m T (cid:88) w(j,m) (cid:88) Ω ˆAM = (z −z¯∗)(z −z¯∗)g g . S t T t−j T t t−j j=1 t=j+1 When g g = 1, we have w(jES,m) ≥ w(j,m) since the weighting function decreases in lag t t−j length and j ≥ jES. Therefore, given the same bandwidth, Ω ˆES puts weakly more weight than Ω ˆAM on each observed pairwise product (z −z¯∗)(z −z¯∗). Second, for the same fixed t T t−j T bandwidth m, Ω ˆAM only estimates autocovariance with lag length up to m in the original time domain, while Ω ˆES also includes observations for lags greater than m the original time domain. These two differences have different implications on the relative variance and bias of the two estimators. As discussed in den Haan and Levin (1997), Newey-West type kernel-based estimators suffer from three sources of finite sample bias. First, the summation in the autocovariance estimator is divided by the sample size, instead of the actual number of observed lags. We expect this source of bias to be more severe for Ω ˆES because Ω ˆES includes higher-order lags that are not included in Ω ˆAM and puts more weight on these high-order biased lagged autocovariance estimators. However, this bias decreases rapidly as the sample size increases. Second, the kernel-based method assigns zero weights to lags with orders greater than T. This source of bias is the same for Ω ˆES and Ω ˆAM. 19

The third and most significant source of bias is driven by the fact that kernel-based estimators under-weight the autocovariance estimators. They assign weights to autocovariance estimators that are less than unity and are declining toward zero with increasing lag order j. Compared with the long-run variance of the amplitude modulated series, Ω∗, the bias of Ω ˆAM arising from this source is given by T−1 (cid:88) Bias(Ω ˆAM) = [1−w(j,m)]γ (j). z∗ j=−(T−1) For a fixed bandwidth, the higher the serial correlation, the more severe the bias. The estimator Ω ˆES can reduce this kernel-based bias because Ω ˆES always assigns weakly higher (or closer to unitary) weight to all autocovariance estimators as compared to Ω ˆAM. For variance of the estimators, we always have Var(Ω ˆES) > Var(Ω ˆAM) because Ω ˆES includes more high-order lags that are relatively poorly estimated. Therefore, the tradeoff between variance and bias determines the relative finite sample performance of Ω ˆAM and Ω ˆES. The previous discussion uses a fixed bandwidth. Andrews (1991) and Newey and West (1994) propose data-dependent choice of the bandwidth that aims to optimize the mean-variance tradeoff. We apply the automatic bandwidth selection procedure proposed by Newey and West (1994) to the AM and ES processes. As we will demonstrate using Monte-Carlo simulations, under both fixed and automatic bandwidth selection, for small sample size and low autocorrelation, MSE(Ω ˆES) > MSE(Ω ˆAM). For moderate sample size or high autocorrelation, we always have MSE(Ω ˆAM) > MSE(Ω ˆES). 4 Regression Model with Missing Observations We can apply asymptotic theory developed in the previous section to a regression model with missing observations. Suppose we have the time series regression, where y and u are scalars, t t x is a k ×1 vector of regressors, and β is a k ×1 vector of unknown parameters. Suppose t 20

(cid:16) (cid:17)−1 further that 1 (cid:80)T x x(cid:48) → p Σ−1 and E(u |x ) = 0, but the u ’s have conditional het- T t=1 t t xx t t t eroskedasticity and are possibly serially correlated. In the presence of missing observations, we let g = 1 if y and all components of x are observed and g = 0 if y or any component t t t t t of x is missing. Then we can re-express the regression in terms of amplitude modulated t processes, y∗ = x∗β +u∗, t = 1,...,T, t t t wherey∗ = g y , x∗ = g x andu∗ = g u . Werequiretheorthogonalitycondition, E(u∗|x∗) = t t t t t t t t t t t 0. The standard result for the OLS estimator is given by (cid:32) (cid:33)−1(cid:32) (cid:33) T T (cid:88) (cid:88) β ˆAM −β = x∗x∗(cid:48) x∗u∗ . (3) t t t t t=1 t=1 Alternatively, without recasting the series as an amplitude modulated process, we ignore all observations for which g = 0 and assume all observed values are equally spaced in time. t Therefore, the estimated regression becomes yES = xESβ +uES, s = 1,...,S s s s and (cid:32) (cid:33)−1(cid:32) (cid:33) S S (cid:88) (cid:88) β ˆES −β = xESxES(cid:48) xESuES . (4) s s s s s=1 s=1 Comparing equations 3 and 4, we can easily see that AM and ES give the same coefficient estimates: β ˆAM = β ˆES ≡ β ˆ . ˆ We normalize β using the number of observed data, S, and then we have (cid:32) (cid:33)−1(cid:32) (cid:33) √ 1 (cid:88) T 1 (cid:88) T S(β ˆ −β) = x∗x∗(cid:48) √ x∗u∗ . S t t S t t t=1 t=1 21

(cid:16) (cid:17)−1 Given that 1 (cid:80)T x x(cid:48) → p Σ−1 in the absence of missing observations and x and g are T t=1 t t xx t t (cid:16) (cid:17)−1 (cid:16) (cid:17)−1 independent, 1 (cid:80)T x∗x∗(cid:48) also converges in probability. We let 1 (cid:80)T x∗x∗(cid:48) → p S t=1 t t S t=1 t t Σ−1 . Using the notation from the previous section, we define z ≡ x u and let z∗ ≡ g z x∗x∗ t t t t t t denote the amplitude modulated series and zES denote the ES series. Then we have s T S 1 (cid:88) 1 (cid:88) z¯∗ ≡ z∗ = zES. T S t S s t=1 t=1 We know E(z ) = E(z¯∗) = 0 using the orthogonality condition. t T Proposition 6. The asymptotic distribution of the OLS estimator is given by √ S(β ˆ −β) → d N(0,Σ−1 Ω∗Σ−1 ), x∗x∗ x∗x∗ where Ω∗ = (cid:80)∞ κ(j)γ (j) and κ(j) = lim 1 (cid:80)T E(g g ). j=−∞ z S→∞ S t=j+1 t t−j To estimate (cid:80)−1 , we can use x∗x∗ (cid:32) (cid:33)−1 (cid:32) (cid:33)−1 T S Σ ˆ−1 = 1 (cid:88) x∗x∗(cid:48) = 1 (cid:88) xESxES(cid:48) → p Σ−1 . x∗x∗ S t t S s s x∗x∗ t=1 s=1 Proposition 7. We define m (cid:88) Ω ˆAM = Γ ˆAM + w(j,m)[Γ ˆAM +Γ ˆAM(cid:48)], 0 j j j=1 where Γ ˆAM = (1/S) (cid:80)T z∗z∗(cid:48) , j t=1 s s−j m (cid:88) Ω ˆES = Γ ˆES + w(j,m)[Γ ˆES +Γ ˆES(cid:48)], 0 j j j=1 where Γ ˆES = (1/S) (cid:80)S zESzES(cid:48). Then we have Ω ˆAM → p Ω∗ and Ω ˆES → p Ω∗. For inferences, j s=1 s s−j the t-statistic based on Ω ˆAM is given by 22

ˆ β −β tAM = k 0 → d N(0,1), k (cid:113) V ˆAM/(S −k) kk where V ˆAM is the (k,k)-th element of V ˆAM = Σ ˆ−1 Ω ˆAMΣ ˆ−1 . Alternatively, the t-statistic kk x∗x∗ x∗x∗ based on Ω ˆES is given by ˆ β −β tES = k 0 → d N(0,1), k (cid:113) V ˆES/(S −k) kk where V ˆES is the (k,k)-th element of V ˆES = Σ ˆ−1 Ω ˆESΣ ˆ−1 . kk x∗x∗ x∗x∗ 5 Simulation In the Monte Carlo simulations that follow, we study the properties of our Amplitude Modulated and Equal Spacing estimators using a simple location model. To evaluate these estimators under a variety of circumstances, we generate data with various levels of autocorrelation and for a range of sample sizes. We test our estimators under the two missing structures described in Section 3.1, the Bernoulli missing structure and the deterministic cyclically missing structure. We implement the estimators using a standard fixed bandwidth for our benchmark results, and also provide results that implement the automatic bandwidth selection procedure proposed by Newey and West (1994). Our primary evaluation criteria are the empirical rejection probability of the test, and the power of the test against an appropriate alternative. 23

5.1 Data Structure The inference procedures are tested on a simulated data series {y } that is generated using t a simple location model: y = β +(cid:15) t t (cid:15) = φ(cid:15) +η t t−1 t where η i.i.d. ∼ N(0,1) and (cid:15) = 0 t 0 For each of N = 100,000 iterations, we use β = 0 and generate a data series {y ,...,y } 1 Tmax with a sample size of Tmax = 24,000. Since we run tests over a range of sample sizes for each estimator, we use the first T observations in each iteration for T ∈ {120, 360, 1200, 4800, 12000, 24000}. To test these methods for a range of autocorrelation parameters, φ, we generate data separately for φ ∈{0, 0.3, 0.5, 0.7, 0.9}. The regressor in this model is a constant 1, and we conduct simulation exercises under different missing structures for the dependent variable {y }, which are described below. For each iteration, we generate a series t {g }, which indicates for each t whether y is observed. Finally, we generate the series {y∗}, t t t where y∗ = g y . t t t Given this data, we estimate the parameter of interest, βi, and the estimator for the covariance matrix, Ω ˆi, for each estimator i ∈ {NW,ES,AM}. We also perform simulations for two additional methods. First, we implement the imputation method, in which the missing y are linearly imputed before the standard Newey-West estimator is applied to the t filled series {yI}. Secondly, we implement the Parzen estimator from Section 3.3. Since the t estimator Ω ˆPZ is not positive semi-definite, we calculate the rejection rate using the number of rejections divided by the number of simulations in which Ω ˆPZ > 0. We use these estimators to calculate the t-statistic, ti, used for a test of the null hy- β pothesis H : β = 0 against the two-sided alternative, H : β (cid:54)= 0. We choose a 5% level of 0 a significance and reject the null hypothesis when |ti| > 1.96. For the standard estimations, β 24

we use a fixed bandwidth of m = 4(T/100)(2/9). We also apply the automatic bandwidth selection procedure of Newey and West (1994) to the Newey-West, Equal Spacing, Amplitude Modulating, and Imputation methods. Results are reported for simulation exercises under a variety of sampling schemes. Our benchmark sampling scheme is one with a Bernoulli missing structure as described in Example 1 of Section 3.1. For these simulations, the series {g } has an i.i.d. Bernoulli distribution t with fixed probability of missing, p = 6/12. For comparison, we also provide two variants in which the probability of missing is set to 4/12 and 8/12. We also simulate data under four data structures with cyclically missing observations, as described in Example 2 of Section 3.1. For these, we choose a cycle length of 12, with 6 or 8 observations missing each cycle. In these simulations, the missing structure is cyclical in that we generate a single pattern of missing observations for the first cycle, and apply this same pattern to every cycle of 12 observations. Additionally, the pattern is deterministic in the sense that we apply the same pattern of missing observations for all iterations in the simulation. This sampling structure reflects the potential application of these methods to monthlydatawithmissingobservations. Forexample,becausecommoditiesfuturescontracts expire only some months of the year, the data on monthly commodities returns will have the same missing pattern each year. Another common application is for daily financial data, in which the same 2 weekend days are missing in each cycle of 7 days. Under a deterministic cyclical missing structure, it is possible to have cases for which certain lagged autocovariances in the original time series domain are never observed. As we notedintheintroduction,thepopularstatisticalsoftwareStataforbidsimplementationofthe Amplitude Modulated estimator under this case, even when using the “force” command. Yet, our theoretical results do not require all the lags to be observed to generate asymptotically correct inference using the ES and AM estimators. Consequently, we perform simulations for deterministic cyclical missing structures under both cases: all lags are observed at least once, or some lags are never observed. We show that the finite sample performance does not 25

differ much between these two cases, and neither case differs much from the results under the Bernoulli missing structure. In our first two cyclical missing structures, we set the cyclical pattern of observed data such that each lagged autocovariance can be estimated from the observed data. In our cyclical structure with 6 of 12 missing, we observe {z , z , z , z , z , z }, and then observe 3 6 8 9 10 11 thesamepatternofobservationsforeachcycleoflength12. Inournextcyclicalstructure, we have 8 of 12 missing. We observe {z , z , z , z } in the first cycle, and the same pattern for 3 4 7 9 each cycle after that. For our final two cyclical missing structures, we require the observed data to be spaced within the cycle such that at least one lag is never observed. In our structure with 6 of 12 missing, we observe {z , z , z , z , z , z } in the first cycle. Under 1 3 5 8 10 12 this structure, the sixth lag is never observed. For our cyclical structure with 8 of 12 missing, we observe {z , z , z , z }, so that the fifth lag is never observed. 2 3 6 12 5.2 Evaluation Criteria The primary evaluation criteria for these estimators is the empirical rejection probability of the tests. The empirical rejection probability measures the likelihood that null hypothesis is rejected when it is in fact true (Type I error). Each iteration of the Monte Carlo simulation represents one hypothesis test, and the reported rejection probability reflects the fraction of iterationsforwhichthet-statisticwaslargeenoughinmagnitudetorejectthenullhypothesis. We also provide measures of the power of the test, as well as measures of the bias and variance of the estimators. The power of the test measures the probability that the null hypothesis is rejected when the alternative hypothesis is true. Since we find empirical rejection probabilities that can be much higher than the 0.05 benchmark, we calculate the size-adjusted power for ease of comparability. Following Ibragimov and Müller (2010), we (cid:112) set the alternative hypothesis to H : β = 4/ T(1−φ2). To calculate the power, we first a a calculate ti , which is analogous to ti for each i, except that we subtract β instead of β in βa β a 0 26

the numerator. For example, β ˆNW −β tNW = a . βa (cid:113) V ˆNW/T kk Next, wecalculateanadjustedcriticalvalue, tcrit, whichisthet-statisticatthe5thpercentile 0.05 of our simulations. This value is equal to the critical value for which the empirical rejection probability would have been exactly 0.05 under our simulation procedure. To calculate the size-adjusted power, we calculate ti under the alternative hypothesis above and reject when βa |ti | > tcrit. βa 0.05 Finally, in order to understand the finite sample performance, we study the empirical mean squared error of our estimators by decomposing it into the empirical variance and bias. Under the benchmark Bernoulli case, we first calculate the value of Ω∗ under our data generating process as: ∞ (cid:88) Ω∗ = lim (T/S) γ E(g g ) j t t−j T→∞ j=−∞ (cid:32) (cid:33) ∞ (cid:88) = (1/p) p+2 p2φj Var((cid:15) ) t j=1 (cid:18) (cid:19)(cid:18) (cid:19) 2pφ 1 = 1+ 1−φ 1−φ2 where p is the probability of being observed. The second equation follows because (1) lim T/S = 1/p; (2) E(g g ) = p if j = 0 and E(g g ) = p2 if j ≥ 1. The third T→∞ t t−j t t−j equation holds because Var((cid:15) ) = 1/(1 − φ2). Returning to the MSE decomposition, we t have: (cid:92) (cid:91)2 (cid:92) MSE = Bias +Variance N = (cid:16) Ω ˆ¯i −Ω∗ (cid:17)2 + 1 (cid:88)(cid:16) Ω ˆi −Ω ˆ¯i (cid:17)2 , N c c=1 27

where Ω ˆ¯i = (1/N) (cid:80)N Ω ˆi is the sample mean of all the covariance estimators and c indexes c=1 c the N = 100,000 iterations. Note that for i = NW, we have p = 1. We use these measures to study the finite sample properties of our estimators, especially to compare the AM and ES estimators. The primary findings are reported in Tables 2 through 7.3 5.3 Results Table 2 provides the rejection probabilities for our two main estimators, the Equal Spacing estimator and the Amplitude Modulated estimator. We provide results under a fixed bandwidth and automatic bandwidth selection for each estimator, and for comparison purposes, present results for the application of the Newey-West estimator applied to the full series without missing observations. Our benchmark missing data structure is the Bernoulli missing structure in which observations are missing with probability p = 1/2. We focus on the results for the simulation with T = 360, and also provide results for a large and very large sample size (T = 1200 or 24000). Our simulation results provide evidence that the ES and AM estimators perform well in finite samples. As is well known in the HAC literature, we find that the empirical rejection probability can be a bit higher than 5.0 for small samples, even when there is no autocorrelation. In addition, when the autocorrelation parameter is high, there can be quite a bit of overrejection even for very large sample sizes (T = 24000). However, we do find that the rejection probability is falling towards 5.0 as the sample size increases. We also find evidence that our ES and AM estimators are well-behaved under deterministic cyclically missing structures. In Table 3, we see little difference in the rejection rates under each of our three data structures: randomly missing under a Bernoulli structure, deterministic cyclical missing when all lags are observed, and deterministic cyclical missing when some lags are unobserved. 3Full simulation results are available upon request. 28

Table 2 also provides the empirical rejection probabilities for the Parzen and Imputation estimators for T = 360 and T = 24000. As expected, the higher serial correlation induced by the imputation procedure results in extremely high rejection rates as compared to the ES and AM estimators. We can also see in Table 2 that the results for the Parzen estimator substantiate our argument that this estimator cannot be used for robust inference for series with missing observations. In our simulation with φ = 0 and T = 360, we found 20 instances (out of 100,000) in which the non-PSD Parzen estimator returned a negative estimate of the variance. Additionally, we find that the rejection probability is generally decreasing in the sample size but is U-shaped with respect to the autocorrelation, and often underrejects for low levels of autocorrelation.4 NextweturntoacomparisonofthefinitesamplepropertiesoftheESandAMestimators. In the results for the fixed bandwidth estimators in Table 4, we find that for T = 360, the AM estimator is preferred for autocorrelation parameters φ ≤ 0.3, while the ES estimator is preferred for series with higher autocorrelation. For larger samples, the ES estimator is preferred for a larger range of φ, so that for T = 24000, we have that the ES estimator is preferred for all the simulations with nonzero autocorrelations. To better understand these results, we turn to Table 5, which reports the empirical bias and variance for these two estimators under our benchmark simulations. As expected, when using the same fixed bandwidth, the ES estimator has a higher variance than the AM estimator for each sample size and autocorrelation. As discussed in Section 3.5, this is because compared to Ω ˆAM, the Ω ˆES estimator includes more high-order lags that are relatively poorly estimated. With regard to the bias, the ES estimator has better performance for higher autocorrelation parameters, though this effect is mitigated and sometimes reversed in small samples. The poor small sample performance is driven by the first source of bias discussed in Section 4Interestingly, the test using the PZ estimator is well-behaved when the autocorrelation is 0. This is consistent with our theoretical results, because when there is no autocorrelation, we have that the long-run variance of the underlying and observed series are asymptotically equivalent. Consequently, we have that when there is no autocorrelation, ΩˆPZ and ΩˆAM are asymptotically equivalent as well. 29

3.5, that the summation in the autocovariance estimator is divided by the sample size rather than the actual number of observed lags. This bias declines rapidly as the sample size increases. In contrast, the bias behind the overrejection for high autocorrelations is driven by the underweighting of high-order lagged autocovariances. Since the ES estimator places a higher weight on high-order autocovariances, it has lower bias than the AM estimator when the autocorrelation is high. As the sample size grows and the first source of bias becomes less important, the ES estimator is preferred for a larger range of autocovariance parameters. This bias and variance tradeoff changes when we implement automatic bandwidth selection. The results in Table 4 indicate that under this procedure, the AM estimator has a lower rejection probability and is thus preferred at all but the highest level of autocorrelation, for every sample size. To provide further context for this result, Table 6 reports the average selected bandwidth for each simulation. We know from den Haan and Levin (1997) that using a higher bandwidth will increase the variance of the estimator while decreasing the bias. Given that the AM estimator has a lower variance than the ES estimator when using a fixed bandwidth, it is not surprising that the automatic bandwidth selection typically chooses a higher bandwidth for the AM estimator than for the ES estimator. The incremental improvement in the bias between the fixed and automatic bandwidth selection is larger for the AM estimator than for the ES estimator. Consequently, under automatic bandwidth selection, the ES estimator is only preferred for extremely high autocorrelation, when the bias of the AM estimator is much higher than that of the ES estimator. Turning to the size-adjusted power, we find in table Table 4 that the power of the two estimators is roughly equivalent. Comparing our two main estimators, we have that the power of the AM estimator is generally stronger than that of the ES estimator. Just as for the Newey-West results, we have that the power falls as the autocorrelation increases, and that the size-adjusted power does not vary as the sample size increases. Unsurprisingly, we canseethatthepowerofthetestunderthemissingstructureisweakerthanforthefullseries, due to the smaller observed sample size. This effect is mitigated at high autocorrelation, 30

however, and we can see that under very high autocorrelation, the power is roughly equal for the Newey-West application to the full series and for the application of our two estimators to the series with missing observations. Finally, Table 7 presents results for varying fractions of missing observations. At low autocorrelation, the rejection rate increases as the fraction of missing observations increases. ThisislikelydrivenbythefirstsourceoffinitesamplebiasdiscussedinSection3.5,whichgets worse as the fraction of missing observations increases. In contrast, at high autocorrelation, the rejection rate falls as the fraction of missing observations increases. This effect is likely due to the fact that when a higher fraction of observations is missing, the observed process is less persistent, and the estimators are better able to overcome the underweighting of the higher order autocovariances. Putting these two effects together, we have that the AM estimator is preferred for a larger range of autocorrelation parameters when a higher fraction of data is missing. This is consistent with our previous finding, that the AM estimator is preferred when the serial correlation is low. Overall, these simulation results are consistent with our theoretical findings. We show that the ES and AM estimators both are well-behaved for random and deterministic missing structures. In general, the ES and AM estimators are preferred to imputation methods, which may induce serial correlation in the imputed series, resulting in more severe bias and overrejection. In finite samples, we find that for the same fixed bandwidth, the ES estimator is generally less biased than the AM estimator, but has larger variance. Consequently, the ES estimator is preferred when autocorrelation is high, as the bias will dominate the mean squared error in these cases. Conversely, when autocorrelation is low, variance dominates the mean squared error, and the AM estimator is preferred. 31

6 Empirical Application: Recursive Tests for a Positive Sample Mean In this section, we present an application of our estimators to test for positive returns to investing in commodities futures contracts. While commodities tend to have positive returns on average, they also have extremely high volatility. We apply our methods to construct the sample mean and standard error of the returns series, and test whether the returns are statisticallydistinguishablefromzero. Duetothestructureofcommoditiesfuturescontracts, the time series of returns have missing observations, and are therefore a natural application for our estimators. Commoditiesfuturescontractsspecifythequantityandpriceofacommoditytobetraded on a predetermined expiration date at some point in the future. While some commodities have contracts expiring every month, many have contract months that are irregularly spaced throughout the year. For example, copper futures contracts were available for only March, May, July, September, and December until full monthly coverage began in 1989. Consequently, if we want to calculate the monthly return to investing in commodities futures over a long sample period, our monthly returns data will either reflect a fluctuating time-tomaturity, or will be an incomplete series with irregularly spaced missing observations. For each commodity, we calculate the return as the percentage change in the price of the contract over the three months prior to the contract’s expiration. For instance, we calculate the return to the December contract as the change in price from the last trading day in August to the last trading day in November. Since we want our returns series to reflect the changeinthepriceoverthesametime-to-maturity,weareonlyabletocalculatethisreturnin the months immediately preceding contract expiration. For copper, this means we will have only five observations each year.5 The existence of irregularly spaced commodities futures contracts results in a deterministic cyclical pattern of missing observations in the constant- 5We restrict the full sample for copper to the period with missing contract months, 1960-1989. 32

maturity returns series. Contract availability and spacing differs across commodities, but tends to remain constant year to year for each commodity. In this application, we calculate the sample mean for three representative commodities: copper, soybean oil, and lean hogs.6 We apply our Equal Spacing, Amplitude Modulated, and Imputation estimators to calculate the HAC standard error of the sample mean, and test the hypothesis that the sample mean is significantly different from zero at the five percent level of significance. Under the Imputation method, the missing observations are linearly imputed, while the Equal Spacing and Amplitude Modulated estimators use only the observed data to calculate the mean and standard error. Because it does not provide robust results, we do not provide inference results using the Parzen estimator. In addition to performing the t-test for the full sample, we use a recursive method to compare our three methods across various sample sizes. For each commodity, we first calculate the sample mean and standard error over the first twelve months of the sample, and perform a t-test of the mean for just this sample window. We then recursively perform the same test for an expanding sample window, adding one month at a time until the full sample is covered. Lastly, we also perform the same type of recursive tests starting with the last twelve months of the full sample. In this backwards recursive test, we use an earlier starting month in each iteration until the sample window again covers the full sample. Having the forward and backwards recursive results allows us to note any structural shifts that may have occurred over time. Figures 2 and 3 depicts the results, and Table 8 provides an overview of rejection rates over the full set of recursive results. Figure 2 shows the sample mean and 95% confidence intervals constructed using the Imputation and Equal Spacing methods. (The Amplitude Modulated estimator is omitted from the figure for clarity.) The means are very similar across the two methods for most samplewindows. Theprimarydifferenceisthatasexpected,theImputationmethodestimate 6Weselectedonerepresentativecommodityfromeachofthemajorcommoditytypes(metals,animal,and agricultural). We omit the energy commodities, as these commodity contracts do not have missing contract months. 33

of the standard error is generally smaller than the Equal Spacing and Amplitude Modulated estimates, resulting in a higher rejection rate for Imputation. In the figure, we have shaded the samples for which the hypothesis is rejected under the Imputation method but not rejected under Equal Spacing. The fraction of shaded iterations ranges from 3.2% for lean hogs to 18.2% for copper in the forward recursive results. In the backwards recursive results, the fraction of shaded iterations ranges from 0% for soybean oil to 18.4% for copper. It is unsurprising that the Imputation method results in a higher rejection rate relative to the Equal Spacing and Amplitude Modulated methods. While in many cases naive imputation is likely to bias the parameter of interest, we have tried to construct an example with little to no bias. However, since the imputed observations are constructed using the observed data rather than drawn from the underlying distribution of data, we note that the standard error of the imputed series is likely lower than the standard error of the observed series. Additionally, the induced high serial correlation of the imputed series will make it likely that the standard error of the imputed series will be underestimated by the Newey-West estimator. For all of these reasons, it is likely that we will have overrejection in hypothesis testing. Without knowing the true mean of the series, in this application we cannot know which method gives us the “right” conclusion for a larger fraction of the tests. Yet, the examples illustrate that the Imputation and Equal Spacing methods can lead to different conclusions in a number of cases, depending on the available data. 7 Conclusion Thispaperprovidestwosimplesolutionstothecommonproblemofconductingheteroskedasticity and autocorrelation (HAC) robust inference when some observations are missing. Our definitions of the Amplitude Modulated and Equal Spacing estimators are simply formal descriptions of ad hoc practices that are already in use. Yet, by formalizing these procedures, we are able to provide theoretical results that clear up some of the existing confusion 34

in the literature. By studying the estimators and their properties, we provide justification for their past application to daily business data and through common statistical software packages such as Stata. We also justify their application under a wide variety of missing data structures, including deterministic and cyclical missing structures. Our theoretical discussion of the estimators highlights a few main conclusions. After establishing the difference between the long-run variance of the underlying and observed series, we demonstrate that our Amplitude Modulated and Equal Spacing estimators both are consistent for the long-run variance of the observed series. This distinction is important, as we also show that we require the long-run variance of the observed series to construct t-statistics for inference, such as in a regression setting. In addition to discussing the asymptotic properties of the estimators, we provide some discussion of their finite sample properties, based on our previous understanding of the finite sample properties of HAC estimators more generally. We also provide simulation results and apply our estimators to a real world problem involvingmissingdataincommoditiesfuturesreturns. Theseresultsprovidefurtherevidence supporting our description of the asymptotic and finite sample behavior of the estimators. In addition, theresultsoftheseexercisesareusedtodrawconclusionsthatcanprovideguidance to practitioners who need to decide between the estimators for applied work. Though this paper focuses on applying the estimators in a time series setting, they can also be naturally extended for application in a panel setting. We leave this extension for future work. References Acharya, V. V., and T. C. Johnson (2007): “Insider trading in credit derivatives,” Journal of Financial Economics, 84(1), 110–141. Andrews, D. W. K. (1991): “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation,” Econometrica, 59(3), 817–858. Beber, A., M. W. Brandt, and K. A. Kavajecz (2009): “Flight-to-Quality or Flightto-Liquidity? Evidence from the Euro-Area Bond Market,” Review of Financial Studies, 22(3), 925–957. 35

Bloomfield, P. (1970): “Sepctral Analysis with Randomly Missing Observations,” Journal of the Royal Statistical Society, Series B (Methodological), 32(3), 369–380. Brillinger, D. R. (1973): “Estimation of the Mean of a Stationary Time Series by Sampling,” Journal of Applied Probability, 10(2), 419–431. (1984): “Statistical Inference for Irregularly Observed Processes,” in Time Series Analysis of Irregularly Observed Data, ed. by E. Parzen, pp. 36–57. Springer-Verlag, New York. Chow, G. C., and A.-l. Lin(1971): “BestLinearUnbiasedInterpolation,Distribution,and Extrapolation of Time Series by Related Series,” The Review of Economics and Statistics, 53(4), 372–375. Chow, G. C., and A.-l. Lin (1976): “Best Linear Unbiased Estimation of Missing Observations in an Economic Time Series,” Journal of the American Statistical Association, 71(355), 719–721. Clinger, W., and J. W. Van Ness (1976): “On Unequally Spaced Time Points in Time Series,” The Annals of Statistics, 4(4), 736–745. den Haan, W. J., and A. T. Levin (1997): “A Practitioner’s Guide to Robust Covariance Matrix Estimation,” in Robust Inference, ed. by G. Maddala, and C. Rao, vol. 15 of Handbook of Statistics, pp. 299–342. Elsevier Science B.V. Denton, F. T. (1971): “Adjustment of Monthly or Quarterly Series to Annual Totals: An Approach Based on Quadratic Minimization,” Journal of the American Statistical Association, 66(333), 99–102. Dunsmuir, W., and P. M. Robinson (1981a): “Asymptotic Theory for Time Series Containing Missing and Amplitude Modulated Observations,” Sankhya¯: The Indian Journal of Statistics, Series A, 43(3), 260–281. (1981b): “Estimation of Time Series Models in the Presence of Missing Data,” Journal of the American Statistical Association, 76(375), 560–568. Eaton, J., S. Kortum, B. Neiman, and J. Romalis (2011): “Trade and the Global Recession,” NBER Working Paper No. 16666. Forni, L., L. Monteforte, and L. Sessa (2009): “The General Equilibrium Effects of Fiscal Policy: Estimates for the Euro Area.,” Journal of Public Economics, 93(3-4), 559–585. Harvey, A. C., and R. G. Pierse (1984): “Missing Observations in Economic Time Series,” Journal of the American Statistical Association, 79(385), 125–131. Ibragimov, R., and U. K. Müller (2010): “t-Statistic Based Correlation and Heterogeneity Robust Inference,” Journal of Business and Economic Statistics, 28(4), 453–468. 36

Jorion, P., and G. Zhang (2007): “Good and Bad Credit Contagion: Evidence from Credit Default Swaps,” Journal of Financial Economics, 84(3), 860–883. Kiefer, N. M., and T. J. Vogelsang (2002a): “Heteroskedasticity-Autocorrelation Robust Standard Errors Using the Bartlett Kernel Without Truncation,” Econometrica, 70(5), 2093–2095. (2002b): “Heteroskedasticity-Autocorrelation Robust Testing Using Bandwidth Equal to Sample Size,” Econometric Theory, 18(06), 1350–1366. (2002c): “ANewAsymptoticTheoryForHeteroskedasticity-AutocorrelationRobust Tests,” Econometric Theory, 21(06), 1130–1164. Kiefer, N. M., T. J. Vogelsang, and H. Bunzel (2000): “Simple Robust Testing of Regression Hypotheses,” Econometrica, 68(3), 695–714. Kim, M. S., and Y. Sun(2011): “SpatialHeteroskedasticityandAutocorrelationConsistent Estimation of Covariance Matrix,” Journal of Econometrics, 160(2), 349–371. McKenzie, C. R., and C. A. Kapuscinski (1997): “Estimation in a linear model with serially correlated erros when observations are missing,” Mathematics and Computers in Simulation, 44(1), 1–9. Newey, W. K., and K. D. West (1987): “A Simple, Positive Semi-Definite Heteroskedasticity and Autocorrelation Consistent Covariance Matrix,” Econometrica, 55(3), 703–708. (1994): “Automatic Lag Selection in Covariance Matrix Estimation,” Review of Economic Studies, 61(4), 631–653. Pan, J., and K. J. Singleton (2008): “Default and Recovery Implicit in the Term Structure of Sovereign CDS Spreads,” Journal of Finance, 63(5), 2345–2384. Parzen, E. (1963): “On Spectral Analysis with Missing Observations and Amplitude Modulation,” Sankhya¯: The Indian Journal of Statistics, Series A, 25(4), 383–392. Petersen, M. A. (2009): “Estimating Standard ERrors in Finance Panel Data Sets: Comparing Approaches,” The Review of Financial Studies, 22(1), 435–480. Phillips, P. C. B., and S. N. Durlauf (1986): “Multiple Time Series Regression with Integrated Processes,” The Review of Economic Studies, 53(4), 473–495. Phillips, P. C. B., and V. Solo (1992): “Asymptotics for Linear Processes,” The Annals of Statistics, 20(2), 971–1001. Scheinok, P. A. (1965): “Spectral Analysis with Randomly Missed Observations: The Binomial Case,” The Annals of Mathematical Statistics, 36(3), 971–977. White, H. (1984): Asymptotic Theory for Econometricians. Academic Press. 37

Zhang, B. Y., H. Zhou, and H. Zhu (2009): “Explaining Credit Default Swap Spreads with the Equity Volatility and Jump Risks of Individual Firms,” Review of Financial Studies, 22(12), 5099–5131. Figure 1: Time Series of Returns Copper Returns (% p.a.) 400 200 0 −200 1960 1970 1980 1990 Soybean Oil Returns (% p.a.) 400 200 0 −200 1960 1970 1980 1990 2000 2010 Lean Hogs Returns (% p.a.) 200 100 0 −100 1960 1970 1980 1990 2000 2010 38

stluseR laciteroehT fo yrammuS :1 elbaT srotamitsE sretemaraP noitalupoP :seireS gniylrednU Ω → p ZPΩ ˆ )j( γ ∞(cid:80) = Ω z ∞−=j :ssecorP detaludoM edutilpmA )3 noitisoporP(∗Ω → p MAΩ ˆ )1 noitisoporP( )j( γ)j(κ ∞(cid:80) = ∗Ω z ∞−=j :ssecorP gnicapS lauqE )5 noitisoporP( ∗Ω → p SEΩ ˆ )4 noitisoporP( ∗Ω = SEΩ 39

Table 2: Benchmark Results Empirical Rejection Rate Autocorrelation (φ): 0.0 0.3 0.5 0.7 0.9 Sample Size T=360 ES, fixed bw 6.0 7.0 8.0 10.7 23.1 AM, fixed bw 5.6 6.7 8.3 12.6 30.9 Parzen, fixed bw 6.6 4.6 4.5 6.1 19.2 Imputation, fixed bw 8.9 9.7 11.2 15.5 34.4 NW, fixed bw 5.6 7.2 9.1 14.2 33.7 Sample Size T=1200 ES, fixed bw 5.3 6.0 6.8 8.7 18.8 AM, fixed bw 5.2 6.0 7.2 10.6 26.7 Parzen, fixed bw 5.5 3.6 3.3 4.4 14.9 Imputation, fixed bw 8.1 8.5 9.6 13.1 29.9 NW, fixed bw 5.2 6.4 7.9 11.9 29.4 Sample Size T=24000 ES, fixed bw 5.1 5.4 5.8 6.6 11.0 AM, fixed bw 5.1 5.4 6.1 7.5 16.5 Parzen, fixed bw 5.1 3.0 2.3 2.2 6.3 Imputation, fixed bw 6.3 6.5 7.0 8.4 17.9 NW, fixed bw 5.1 5.6 6.3 7.9 17.5 Notes: Tablereportsforarangeofsamplesizesandautocorrelationparameterstheempirical rejection rate as defined in Section 5.2. Data follow our benchmark missing structure, the Bernoulli structure with probability of missing 6/12. The Newey-West estimation provided for comparison uses the full series of simulated data without any missing observations. See Sections 3 and 4 for details on the estimators and the regression context, and Section 5.1 for details on the simulation parameters. 40

Table 3: Varying Missing Structure Empirical Rejection Rate Autocorrelation (φ): 0.0 0.3 0.5 0.7 0.9 Randomly missing, Bernoulli structure ES, fixed bw 6.0 7.0 8.0 10.7 23.1 AM, fixed bw 5.6 6.7 8.3 12.6 30.9 Deterministic cyclically missing, All lags observed ES, fixed bw 5.9 6.9 7.9 10.1 22.2 AM, fixed bw 5.4 6.5 8.0 11.9 30.1 Deterministic cyclically missing, Some lags unobserved ES, fixed bw 6.0 6.5 7.5 9.8 22.1 AM, fixed bw 5.5 6.3 8.1 12.8 32.0 Full sample benchmark NW, fixed bw 5.6 7.2 9.1 14.2 33.7 Notes: Table reports for a range of autocorrelation parameters the empirical rejection rate for a sample size of T = 360 under varying missing structures as described in Section 5.1: the Bernoulli missing structure, the deterministic cyclical missing structure in which all lags are observed, and the deterministic cyclical missing structure in which some lags are never observed. The probability of missing is 6/12 for the Bernoulli structure, while the cyclical structures have exactly 6 of 12 observations missing in each cycle. The Newey-West estimation provided for comparison uses the full series of simulated data without any missing observations. See Sections 3 and 4 for details on the estimators and the regression context, and Section 5.1 for details on the simulation parameters. 41

Table 4: Finite Samples: Fixed and Automatic Bandwidth Selection Empirical Rejection Rate Size-Adjusted Power Autocorrelation (φ): 0.0 0.3 0.5 0.7 0.9 0.0 0.3 0.5 0.7 0.9 Sample Size T=360 ES, fixed bw 6.0 7.0 8.0 10.7 23.1 78.9 63.5 49.2 32.7 14.0 AM, fixed bw 5.6 6.7 8.3 12.6 30.9 79.4 64.1 50.0 33.0 14.2 ES, auto bw 6.8 7.7 8.7 10.1 16.8 77.8 62.4 48.2 31.7 13.6 AM, auto bw 6.0 7.0 8.0 9.9 19.7 79.0 63.7 49.1 32.4 13.8 NW, fixed bw 5.6 7.2 9.1 14.2 33.7 97.6 82.3 62.2 38.1 14.7 NW, auto bw 5.9 7.4 8.1 10.0 19.8 97.5 81.5 61.2 37.2 14.4 Sample Size T=1200 ES, fixed bw 5.3 6.0 6.8 8.7 18.8 80.2 65.2 51.4 34.1 14.5 AM, fixed bw 5.2 6.0 7.2 10.6 26.7 80.4 65.5 51.5 34.3 14.6 ES, auto bw 5.5 6.3 6.9 7.4 10.8 79.9 64.9 50.5 33.5 14.2 AM, auto bw 5.3 6.1 6.4 7.4 12.6 80.2 65.2 51.0 33.9 14.4 NW, fixed bw 5.2 6.4 7.9 11.9 29.4 97.8 83.0 63.0 38.6 15.0 NW, auto bw 5.3 6.3 6.6 7.5 12.6 97.8 82.6 62.7 38.1 14.8 Sample Size T=24000 ES, fixed bw 5.1 5.4 5.8 6.6 11.0 80.2 65.4 51.0 33.8 14.4 AM, fixed bw 5.1 5.4 6.1 7.5 16.5 80.3 65.4 51.1 33.7 14.4 ES, auto bw 5.1 5.4 5.7 5.7 6.2 80.2 65.3 50.9 33.6 14.4 AM, auto bw 5.1 5.4 5.5 5.7 6.4 80.3 65.4 51.0 33.7 14.3 NW, fixed bw 5.1 5.6 6.3 7.9 17.5 97.8 83.1 63.0 38.6 14.9 NW, auto bw 5.1 5.4 5.5 5.6 6.4 97.8 83.1 63.1 38.7 14.9 Notes: Tablereportsforarangeofsamplesizesandautocorrelationparameterstheempirical rejection rate and size-adjusted power as defined in Section 5.2. Data follow our benchmark missing structure, the Bernoulli structure with probability of missing 6/12. The Newey-West estimation provided for comparison uses the full series of simulated data without any missing observations. See Sections 3 and 4 for details on the estimators and the regression context, and Section 5.1 for details on the simulation parameters. 42

ecnairaV dna saiB laciripmE :5 elbaT ecnairaV saiB 9.0 7.0 5.0 3.0 0.0 9.0 7.0 5.0 3.0 0.0 :)φ( noitalerrocotuA 063=T eziS elpmaS 5.24 44.1 382.0 301.0 8240.0 150,1 83.3 471.0 1020.0 67000.0 wb dexfi ,SE 6.81 68.0 981.0 070.0 3720.0 164,1 96.5 932.0 2910.0 81000.0 wb dexfi ,MA 0.231 18.2 944.0 241.0 4950.0 095 22.2 491.0 7230.0 39200.0 wb otua ,SE 9.87 30.2 433.0 890.0 6530.0 918 34.2 451.0 2220.0 27000.0 wb otua ,MA 5.34 55.1 082.0 480.0 0220.0 967,5 10.22 988.0 3560.0 81000.0 wb dexfi ,WN 6.052 39.4 826.0 651.0 9030.0 999,2 07.7 014.0 0950.0 57000.0 wb otua ,WN 0021=T eziS elpmaS 8.81 65.0 501.0 730.0 3510.0 038 91.2 001.0 7900.0 01000.0 wb dexfi ,SE 1.8 33.0 860.0 420.0 4900.0 772,1 02.4 751.0 1110.0 20000.0 wb dexfi ,MA 6.98 43.1 812.0 060.0 5120.0 342 18.0 080.0 7210.0 14000.0 wb otua ,SE 4.55 00.1 441.0 240.0 4210.0 493 29.0 950.0 1010.0 90000.0 wb otua ,MA 4.91 16.0 501.0 5030.0 8700.0 580,5 85.61 806.0 4140.0 20000.0 wb dexfi ,WN 2.971 26.2 282.0 3370.0 9010.0 224,1 39.2 161.0 5520.0 01000.0 wb otua ,WN 00042=T eziS elpmaS 0.0 40.3 560.0 110.0 8300.0 803 15.0 120.0 8100.0 00000.0 wb dexfi ,SE 0.0 92.1 730.0 700.0 2200.0 386 81.1 730.0 4200.0 00000.0 wb dexfi ,MA 0.0 66.02 691.0 930.0 9900.0 81 90.0 110.0 0200.0 00000.0 wb otua ,SE 0.0 49.41 351.0 810.0 9600.0 72 80.0 700.0 7100.0 00000.0 wb otua ,MA 0.0 06.3 280.0 9110.0 2300.0 037,2 17.4 641.0 4900.0 00000.0 wb dexfi ,WN 0.0 63.35 154.0 7040.0 7110.0 89 52.0 810.0 4300.0 00000.0 wb otua ,WN ,erutcurtsgnissimkramhcnebruowollofataD .2.5noitceSnidenfiedsaecnairavdnaderauqssaiblaciripmestroperelbaT :setoN lluf eht sesu nosirapmoc rof dedivorp noitamitse tseW-yeweN ehT .21/6 gnissim fo ytilibaborp htiw erutcurts illuonreB eht noisserger eht dna srotamitse eht no sliated rof 4 dna 3 snoitceS eeS .snoitavresbo gnissim yna tuohtiw atad detalumis fo seires .sretemarap noitalumis eht no sliated rof 1.5 noitceS dna ,txetnoc 43

Table 6: Automatically Selected Bandwidths 6 of 12 missing Autocorrelation (φ): 0.0 0.3 0.5 0.7 0.9 T=360, Fixed bandwidth: m(T)=5 ES, auto bw 5.9 5.1 5.0 6.4 10.0 AM, auto bw 5.4 5.0 6.4 9.8 13.0 NW, auto bw 5.4 5.4 8.0 11.4 14.1 T=1200, Fixed bandwidth: m(T)=6 ES, auto bw 6.4 6.0 7.0 10.8 17.2 AM, auto bw 6.2 6.7 10.5 16.5 22.2 NW, auto bw 6.2 8.1 13.2 18.8 23.9 T=24000, Fixed bandwidth: m(T)=12 ES, auto bw 13.2 14.6 20.2 33.3 65.2 AM, auto bw 13.2 18.0 31.8 55.7 94.0 NW, auto bw 13.2 23.5 39.0 62.3 98.3 Notes: Table reports for a range of autocorrelation parameters and sample sizes the mean of the bandwidth selected by the Newey and West (1994) procedure described in Section 5.1. Data follow our benchmark missing structure as described in the Bernoulli structure with probability of missing 6/12. The Newey-West estimation provided for comparison uses the full series of simulated data without any missing observations. See Sections 3 and 4 for details on the estimators and the regression context, and Section 5.1 for details on the simulation parameters. 44

Table 7: Varying Fraction of Missings Empirical Rejection Rate Size-Adjusted Power Autocorrelation (φ): 0.0 0.3 0.5 0.7 0.9 0.0 0.3 0.5 0.7 0.9 4 of 12 missing ES, fixed bw 5.9 7.0 8.2 11.3 26.6 89.2 72.7 56.2 35.8 14.5 AM, fixed bw 5.6 6.9 8.6 13.1 32.4 89.4 73.2 56.6 36.2 14.6 ES, auto bw 6.4 7.5 8.5 9.7 17.4 88.7 72.0 55.0 34.9 14.0 AM, auto bw 5.9 7.2 8.0 9.8 19.8 89.2 72.5 55.6 35.5 14.2 6 of 12 missing ES, fixed bw 6.0 7.0 8.0 10.7 23.1 78.9 63.5 49.2 32.7 14.0 AM, fixed bw 5.6 6.7 8.3 12.6 30.9 79.4 64.1 50.0 33.0 14.2 ES, auto bw 6.8 7.7 8.7 10.1 16.8 77.8 62.4 48.2 31.7 13.6 AM, auto bw 6.0 7.0 8.0 9.9 19.7 79.0 63.7 49.1 32.4 13.8 8 of 12 missing ES, fixed bw 6.4 7.1 7.8 9.6 18.7 61.0 50.5 41.2 29.1 13.6 AM, fixed bw 5.6 6.4 7.6 11.2 28.3 62.5 51.9 42.5 29.9 13.9 ES, auto bw 7.8 8.4 9.1 10.5 15.7 57.6 48.2 39.6 27.7 13.1 AM, auto bw 5.9 6.6 7.6 9.5 19.4 61.9 51.4 41.8 29.2 13.6 Full sample benchmark NW, fixed bw 5.6 7.2 9.1 14.2 33.7 97.6 82.3 62.2 38.1 14.7 NW, auto bw 5.9 7.4 8.1 10.0 19.8 97.5 81.5 61.2 37.2 14.4 Notes: Table reports for a range of autocorrelation parameters the empirical rejection rate and size-adjusted power for a sample size of T = 360 under varying probability of missing observation. Data follow our benchmark missing structure as described in Section 5.1, the Bernoulli structure with probability of missing 4/12, 6/12, or 8/12. The Newey-West estimation provided for comparison uses the full series of simulated data without any missing observations. See Sections 3 and 4 for details on the estimators and the regression context, and Section 5.1 for details on the simulation parameters. 45

Table 8: Rejection Rates for Recursive Tests Panel A: 5% significance level Copper Soybean Oil Lean Hogs Forward Backward Forward Backward Forward Backward Imputation 64.8 28.2 25.3 0.0 93.7 18.3 Amplitude Modulated 59.7 26.5 18.1 0.0 91.4 8.8 Equal Spacing 46.7 9.8 12.4 0.0 90.7 8.4 AM rejects, IM does not 0.0 1.2 0.0 0.0 0.2 0.0 ES rejects, IM does not 0.0 0.0 0.0 0.0 0.2 0.0 IM rejects, AM does not 5.2 2.9 7.2 0.0 2.4 9.5 IM rejects, ES does not 18.2 18.4 12.9 0.0 3.2 9.9 Panel B: 10% significance level Copper Soybean Oil Lean Hogs Forward Backward Forward Backward Forward Backward Imputation 77.2 34.6 63.5 22.2 96.1 26.1 Amplitude Modulated 72.9 34.0 29.7 0.0 94.6 15.1 Equal Spacing 65.4 31.4 24.8 0.0 93.5 15.1 AM rejects, IM does not 0.0 0.0 0.0 0.0 0.0 0.0 ES rejects, IM does not 0.0 0.0 0.0 0.0 0.0 0.0 IM rejects, AM does not 4.3 0.6 33.8 22.2 1.5 11.0 IM rejects, ES does not 11.8 3.2 38.7 22.2 2.6 11.0 Notes: Tablereportstherejectionrateforthetestofthehypothesisthattherecursivesample mean is equal to zero. The final row of the first panel corresponds to the shaded areas in Figures 2 and 3. 46

Figure 2: Recursive Test Results - Sample Mean and Error Bands Copper Returns (% p.a.) 60 Imputation Equal Spacing 40 20 0 −20 1960 1970 1980 1990 Soybean Oil Returns (% p.a.) 100 50 0 −50 1960 1970 1980 1990 2000 2010 Lean Hogs Returns (% p.a.) 60 40 20 0 −20 1970 1980 1990 2000 2010 Note: Figureplotstherecursivesamplemeanand95%confidenceintervalsfortheimputation and equal spacing methods. The recursive sample mean is calculated as the mean of returns over the period from the start of the sample to sample end date (plotted on the x-axis). Shaded areas indicate recursive samples for which the imputation method finds statistical significance while the equal spacing method finds none. 47

Figure 3: Backwards Recursive Test Results - Sample Mean and Error Bands Copper Returns (% p.a.) 150 Imputation 100 Equal Spacing 50 0 −50 1990 1980 1970 1960 Sample Start Date Soybean Oil Returns (% p.a.) 100 50 0 −50 2010 2000 1990 1980 1970 1960 Sample Start Date Lean Hogs Returns (% p.a.) 60 40 20 0 −20 −40 2010 2000 1990 1980 1970 Sample Start Date Note: Figureplotsthebackwardsrecursivesamplemeanand95%confidenceintervalsforthe imputation and equal spacing methods. The backwards recursive sample mean is calculated as the mean of returns over the period from the end of the sample to sample start date (plotted on the x-axis). Note that the sample size is increasing with earlier start dates, moving left to right on the plot. Shaded areas indicate recursive samples for which the imputation method finds statistical significance while the equal spacing method finds none. 48

Cite this document

APA

Deepa Dhume Datta and Wenxin Du (2012). Nonparametric HAC Estimation for Time Series Data with Missing Observations (IFDP 2012-1060). Board of Governors of the Federal Reserve System, International Finance Discussion Papers. https://whenthefedspeaks.com/doc/ifdp_2012-1060

BibTeX

@techreport{wtfs_ifdp_2012_1060,
  author = {Deepa Dhume Datta and Wenxin Du},
  title = {Nonparametric HAC Estimation for Time Series Data with Missing Observations},
  type = {International Finance Discussion Papers},
  number = {2012-1060},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2012},
  url = {https://whenthefedspeaks.com/doc/ifdp_2012-1060},
  abstract = {The Newey and West (1987) estimator has become the standard way to estimate a heteroskedasticity and autocorrelation consistent (HAC) covariance matrix, but it does not immediately apply to time series with missing observations. We demonstrate that the intuitive approach to estimate the true spectrum of the underlying process using only the observed data leads to incorrect inference. Instead, we propose two simple consistent HAC estimators for time series with missing data. First, we develop the Amplitude Modulated estimator by applying the Newey-West estimator and treating the missing observations as non-serially correlated. Secondly, we develop the Equal Spacing estimator by applying the Newey-West estimator to the series formed by treating the data as equally spaced. We show asymptotic consistency of both estimators for inference purposes and discuss finite sample variance and bias tradeoff. In Monte Carlo simulations, we demonstrate that the Equal Spacing estimator is preferred in most cases due to its lower bias, while the Amplitude Modulated estimator is preferred for small sample size and low autocorrelation due to its lower variance.},
}