feds · December 31, 1999

Nonparametric Estimation of Multifactor Continuous-Time Interest Rate Models

Abstract

This paper studies the finite sample properties of the kernel regression method of Boudoukh et al. (1998) for estimating multifactor continuous-time term structure models. Monte Carlo simulations are employed, with a grid-search technique to find the optimal kernel bandwidth. The estimator exhibits truncation and correlated residuals biases near the boundaries of the data. However, the variance of the estimator is so high that the biases are unlikely to be relevant from a hypothesis testing point of view. The performance of the estimator is also studied under model misspecification. Irrelevant regressors reduce efficiency and induce additional biases in the estimates. Using Treasury bill data, I test whether the estimates produced by the nonparametric estimator are statistically distinguishable from estimates obtained under a parametric model. The kernel regressions pick up nonlinearities in the data that the parametric model cannot capture.

Nonparametric Estimation of Multifactor Continuous Time Interest Rate Models Chris Downing ⁄ Board of Governors of the Federal Reserve System ⁄IthankMattPritsker,TorbenAndersen,JesperLund,andtheparticipantsatthe1998 Conference of the Society for Computational Economics for helpful discussions. I remain responsible for any errors. The views expressed in this paper are those of the author and are not necessarily those of the Board of Governors or members of its stafi. Address correspondencetoChrisDowning, FederalReserveBoard, Mail Stop 89, Washington, DC 20551. The author may also be reached by phone: (202) 452-2378, fax: (202) 452-5296, or e-mail: cdowning@frb.gov. 1

Abstract Thispaperstudiestheflnitesamplepropertiesofthekernelregression method of Boudoukh, Richardson, Stanton and Whitelaw (1998) for estimating multifactor continuous{time term structure models. Monte Carlo simulations are employed, with a grid-search technique to flnd the optimal kernel bandwidth. The performance of the estimator is also studied under model misspeciflcation. Irrelevant regressors reduce e–ciency and induce additional biases in the estimates. Using Treasury bill data, I test whether the estimates produced by the nonparametric estimator are statistically distinguishable from estimates obtained under a parametric model. The kernel regressions pick up nonlinearities that the parametric model cannot capture. 2

In a series of recent papers, researchers in flnance have developed nonparametric methods for estimating the drift and difiusion functions of continuous time stochastic processes. Stanton (1997) pioneered a method based on the theory of weak approximations of the expectations of functions of stochastic processes. His methodological innovation was to estimate the expectations using kernel regression methods, and then invert them in order to recover the drift and difiusion functions of the underlying processes. The method has been applied to the problem of estimating univariate continuous time models of the term structure. More recently, Boudoukh et al. (1998) extended the estimator to the problem of estimating multivariate term structuremodels. Althoughdifierentinimportantrespects, themethoddeveloped by Ait{Sahalia (1996) is related to the Stanton and BRSW estimators in that it also relies on nonparametric techniques and is also applied to the problem of pricing interest rate derivative securities.1 One of the more provocative conclusions reached by Ait{Sahalia (1996), Stanton (1997), and Boudoukh et al. (1998) is that the short rate drift appears to be nonlinear. This conclusion is at odds with the rest of the term structure literature, because in virtually all previous work, the short rate is modeled with a linear drift. In part to investigate the robustness of this result, Pritsker (1998) and Chapman and Pearson (1999) look at the properties of the Stanton and Ait{Sahalia estimators in flnite samples. In both of these papers, the authors concluded that the nonlinearity result is not robust, and could be an artifact of the flnite sample properties of the estimator. However, the authors do not formally test this hypothesis. In this paper, I study the flnite sample properties of the BRSW estimator for multifactor models.2 Monte Carlo simulations of data from the stochastic volatility model of Andersen and Lund (1997a) are used to examine how closely the estimator flts the known drift and difiusion functions. The Andersen and Lund (1997a) model is used because it provides a reasonably good flt to Treasury data, although in their flnal analysis the authors reject the model using a chi{squared test. I flrst focus on the problem of kernel bandwidth selection. Because the asymptotically optimal bandwidths are functions of the derivatives of the unknown joint density of the data generating process, I use a grid{search technique to flnd the bandwidths that minimize a sum of squared errors criterion. I flnd that, even with the optimal bandwidths, the estimator exhibits a high degree of bias with forty years of data simulated at weekly frequency. However, the sampling variance of the estimator is high, so that from a hy- 3

pothesis testing point of view, the biases are likely to be irrelevant. The performance of the BRSW estimator is also analyzed under model misspeciflcation. The results show that if one uses the BRSW estimator to flt a misspecifled model in which irrelevant arguments of the drift and difiusion functions are included, the e–ciency of the estimator decreases markedly. Somewhat more surprising is the result that including irrelevant conditioning variables introduces additional bias in the estimates. The additional biases are a result of adding dimensions along which biases from truncation and correlated residuals can afiect the estimator. These biases and ine–ciencies highlight that, while nonparametric estimatorsmightfreeonefromtheneedtospecifytheparticularfunctionalforms for the various estimands, one still must correctly specify the arguments of the functions (and thus the correct set of conditioning variables in the kernel regressions). In other words, nonparametric estimators do not obviate issues ofspeciflcation; rather, suchissues areremoved toahigherlevel ofgenerality. My main conclusion is that the BRSW estimator, and related kernel regression methods, are primarily useful as diagnostic tools when used in the context of term structure modeling. Given the problems associated with bandwidth selection when the data are autocorrelated, and given the problems of calculating reliable standard errors for kernel regression estimators, it is more productive to use the kernel regression methods to test if a given parametric speciflcation is an adequate description of the data. In other words, the more general kernel regression estimator can be used to try and \pick up" nonlinearities in the data that a parametric model might miss. An important advantage of this approach is that the flnite sample distributions of test statistics based on the BRSW estimator can be bootstrapped under the null hypothesis that the parametric model is the \true" data generating process. Thus, one can produce quantiles for the hypothesis test statistics thatarerobustagainstflnitesamplebiasesintheBRSWestimator. Idemonstrate this by applying the BRSW estimator to test the Andersen and Lund (1997a) model of the term structure. The results of the hypothesis tests show that the biases of the BRSW estimator do not fully explain the difierences between the estimates obtained under the BRSW estimator and the parametric estimator. There appear to be signiflcant nonlinearities in the evolution of the short rate that the parametric model cannot capture. This paper is organized as follows. In the next section, I examine the dynamic behavior of the Andersen and Lund (1997a) stochastic volatility model, which is used in the Monte Carlo simulations in the following sec- 4

tions. Section II discusses the BRSW estimator and kernel regression, and contains the main results on fltting the Andersen and Lund (1997a) model. SectionIIIdiscussestheperformanceoftheestimatorinthecontextofmodel misspeciflcation. Section IV presents the results of hypothesis tests on the Andersen and Lund (1997a) model, and the flnal section concludes. I Dynamic Behavior of the Stochastic Volatility Model Inthissection, IdiscussthecalculationofweaksolutionsoftheAndersenand Lund (1997a) model (henceforth, the \AL model"). An interesting feature of the AL model is that it fails to satisfy the conditions su–cient to guarantee the existence of a unique solution, raising questions about the stability of the system, as well as questions about the existence of a stationary density. Maintaining the assumption that the system has a solution, I use a weak numericsolutionalgorithmandanextensionoftheKolmogorov-Smirnovtest to determine whether or not the transition densities of the system converge at long trajectories. From the results, we can conclude that the system has a stationary density at the parameters considered. The speciflcation of the AL model is given as: p dr = • („¡r )dt+(cid:190) r dW (1) t 1 t t t 1;t dlog(cid:190) 2 = • ((cid:181)¡log(cid:190) 2 )dt+»dW ; (2) t 2 t 2;t where W and W are independent standard Wiener processes. 1 2 The set of su–cient conditions for the existence of a solution to this system includes the conditions that the drift and difiusion functions satisfy Lipschitz and growth conditions (see Karatzas and Shreve (1991) and Ait{ Sahalia (1996) for difierent formulations of the conditions). The speciflcation of the difiusion function of the interest rate process (1) causes the system to violate the growth condition. The relevant condition is given by: (cid:190) 2 r+» 2 • k(1+r 2 +(log(cid:190) 2 ) 2 ): (3) This condition must apply uniformly in t, meaning that the constant k must apply for all t simultaneously. It is easy to show that there is no k that satisfles condition (3). For any k, let log(cid:190)2 = r, so that (cid:190)2 = er. Substituting, 5

we have: e r r+» 2 • k(1+2r 2 ); or (4) err+»2 • k: (5) (1+2r2) The left-hand side of (5) clearly diverges as r ! 1, showing that the growth condition is violated by the model. In essence, the model fails to satisfy the growth condition because the difiusion function in the interest rate process involves an exponential transformation of the volatility state variable. To make the exponential transform in the interest rate difiusion explicit, rewrite the AL model in the following equivalent form 3: p dr t = • 1 („¡r t )dt+ e(cid:190)tr t dW 1;t (6) d(cid:190) = • ((cid:181)¡(cid:190) )dt+»dW ; (7) t 2 t 2;t Because it fails to satisfy the growth condition, there might not be a unique Ito process in <2 that satisfles (6) { (7). In practice, it’s di–cult to use numeric methods to verify the existence of a unique solution. I assume that a solution exists, and instead focus on the dynamic stability of the system. For certain parameterizations of the drift and difiusion functions, the model will exhibit explosive behavior, and thus fail to have a stationary density. Determining whether or not the model is explosive is a problem to which we can apply a numeric solution algorithm. Kloeden and Platen (1995) derive a number of algorithms for computing weak solutions of systems of SDEs like the AL model. The solution algorithms operate on a flnite time interval [0;T]. A key feature of the algorithms is the discretization of the time interval into M smaller time steps of T length ¢, where ¢ = . The simplest method is the Euler scheme, which M has a degree of accuracy that is inversely proportional to the length of the time step ¢. The following set of recursive formulae show how to generate values of r and (cid:190): q r t = r t¡1 +• 1 („¡r t¡1 )¢+ e(cid:190) t¡1r t¡1 ¢· 1;t (8) p (cid:190) = (cid:190) +• ((cid:181)¡(cid:190) )¢+» ¢· ; (9) t t¡1 2 t¡1 2;t where · and · are independent standard normal deviates, and r and (cid:190)2 1;t 2;t 0 0 are given. Where necessary, I’ll use r~ and (cid:190)~ to indicate values of r and (cid:190) computed from the discrete system in (8) and (9). 6

Understanding the dynamic behavior of the AL model, as well as evaluating the nonparametric estimator in the next section, both boil down to computing the expectations of difierent functions of the state variables r and (cid:190): E[f(r ;(cid:190) )]; (10) T T where f(¢) is a smooth function. Kloeden and Platen (1995) prove that the expectation of f(¢), calculated at (r~ ;(cid:190)~ ), converges to the true expectation T T as ¢ ! 0: lim jE[f(r ;(cid:190) )]¡E[f(r~ ;(cid:190)~ )]j = 0: (11) T T T T ¢!0 By choosing f(r;(cid:190)) = (r;(cid:190)); (12) we can use the Euler scheme to compute the moments of transition densities of the AL model. It is useful to flrst consider whether or not the transition densities appear to be converging in location and scale. To do so, I use Monte Carlo simulations to generate moments of the transition densities of the model. From each of 25 difierent starting points, equally dispersed on the square of values: n o (r;(cid:190)) : 0:02 • r • 0:20;¡7:00 • log(cid:190) 2 • ¡5:00 (13) I simulate 1,000 batches of 100 trajectories. The last point of each trajectory is saved, forming a batch of 100 draws from the transition density deflned by the starting point and the length of the trajectories. I compute the mean and variance of each batch of saved points. Thus, at the end of a run, we have 1,000 independent draws of the flrst two moments of each of the 25 transitiondensities. Eightsuchrunsarecompleted, theflrstwithtrajectories one year in length, the second with flve year trajectories, and so on for ten, twenty, thirty, forty, flfty, and flnally sixty year trajectories. The parameters employed are shown in table I, and ¢ = 1 .5 52 Table II displays univariate statistics for the pooled data (N = 25;000), with which we can perform some unscientiflc \eyeball tests" for convergence. If the null hypothesis of convergence is correct, the moments of the transition densities should converge to the moments of the stationary density. The means should converge as follows: lim E[r ] = „ = 0:0596; (14) T T!1 lim E[(cid:190) ] = (cid:181) = ¡6:3599: (15) T T!1 7

Examining the values in the second column (labeled ‘Mean’) of table II, it’s clear that the flrst moments (E[¢] values) of the transition densities are converging to these values. The interest rate mean hits the value in (14) at around thirty years, and then bounces around within a narrow confldence interval. Thevolatilitymeanconvergesquiterapidlyandverypreciselytothe value in (15), re(cid:176)ecting the higher degree of mean reversion in the volatility drift function. 6 The second moments should converge approximately as follows: lim Var[r ] … 0:00032 (16) T T!1 lim Var[(cid:190) ] … 0:7780; (17) T T!1 The approximate value for the second moment of r is calculated as the variance of the stationary density of a square{root process: p dr = • („¡r )dt+(cid:190) r dW (18) t 1 t t t with (cid:190) flxed at e(cid:181). The variance is given by e(cid:181)„ . The approximate value for 2• 1 the second moment of (cid:190) is calculated as the stationary variance of a constant difiusion process: d(cid:190) = • ((cid:181)¡(cid:190) )dt+»dW : (19) t 2 t t »2 The variance of this process is given by . From column seven of table II, 2• 2 it appears that the variances (Var[¢] values) are converging to neighborhoods of the values in (16) and (17). In the case of the interest rate process, we would probably reject the null hypothesis that the variance is equal to the value in (16), even for the sixty year trajectories. Of course, this is because the process is not really the square{root process that we used to compute the variance. For the volatility process, we would probably accept the null hypothesis that the variance is equal to the value in (17). The means of Var[(cid:190)] are close to the value in (17), and the standard deviation around the ¢ means is relatively large. The variance of the volatility process converges to the value in (17), while the variance of the interest rate process does not converge to (16), because the dependence between the interest rate and volatility processes is expressed in the difiusion function of the interest rate process. The volatility process does in fact evolve like the Vasicek process that we used to compute the variance in (17). While the transition densities appear to be converging in the flrst two moments, they still might have difierent distribution functions. Moreover, 8

it’s hard to assess joint signiflcance using table II. Assuming that a solution to the system exists, we would like to show that the system is stationary, deflned to mean that the transition densities converge to a common density with flnite moments, as the length of the time interval increases: lim …(r ;(cid:190) jr ;(cid:190) ) !d …(r;(cid:190)); (20) T T 0 0 T!1 for r 2 <++ and (cid:190) 2 <, and where …(r ;(cid:190) jr ;(cid:190) ) is the transition density 0 0 T T 0 0 between times 0 and T, and …(r;(cid:190)) is the stationary density. If we use the discrete system in (8)-(9) to make draws from the transition densities deflnedbydifierentstartingpoints(r ;(cid:190) )andtimeintervals[0;T], andthese 0 0 densities exhibit convergence as T increases, then we can interpret this as evidence supporting our hypothesis that the system has a stationary density at the parameter values in table I.4 To rigorously test for convergence in distribution when the true distribution is unknown, we can use an adaptation of the Kolmogorov- Smirnov (KS) test for bivariate densities, due to Fasano and Franceschini (1987). The one dimensional KS test statistic is based on the maximum value of the absolute difierence between two cumulative distribution functions. A direct generalization of this statistic to higher dimensions is not possible because cumulative probability is not well deflned in more than one dimension. However, an analogous statistic can be based on the integrated probabilities in each the four natural quadrants at a given point (r ;(cid:190) ). The analog to the i i KS statistic is the maximum difierence over the data points and over the quadrants of the integrated probabilities. In essence, the algorithm for computing the statistic searches through the data for the point at which the difierence in the proportions of data in one of the four natural quadrants formed by the point is maximized. Fasano and Franceschini (1987) work out an approximation to the probability of realizing the observed maximum difference in proportions, under the null hypothesis that the two densities are identical.7 The transition densities of the discrete system in (8)-(9p) converge to the transition densities of the continuous{time system at rate ¢ (see Kloeden and Platen (1995) or Brandt and Santa{Clara (1999) for proofs). Thus, the discretesystemcanbeusedtodrawrandomsamplesfromtransitiondensities that closely approximate the densities of the continuous{time system. To carry out the convergence test, I use two starting values that are widely 9

apart on the (r;(cid:190)) plane. The points that I use are: f(„+2(cid:190)^ ;(cid:181) +2(cid:190)^ );(„¡2(cid:190)^ ;(cid:181)¡2(cid:190)^ )g: (21) r (cid:190) r (cid:190) The points are two standard deviations away from the long-run means of the processes, and four standard deviations from one another.8 The standard deviations(cid:190)^ and(cid:190)^ areapproximatedusingthesquarerootsofthevaluesfor r (cid:190) Var[r ] and Var[(cid:190) ] from table II, respectively. From each of these points, 60 60 I use the discrete system in (8)-(9) to simulate 20,000 trajectories, saving the last point on each trajectory. The two sets of points form large samples of the two transition densities. The bivariate KS test is applied to the two samples to test whether or not they are drawn from identical distributions. I repeat this exercise for trajectories of lengths between one and forty years. The parameterization of the system and the length of the time step are the same as before.9 Table III displays the results. The flrst column gives the trajectory lengths in years. The second and third columns display the bivariate KS test statistic and the approximate p{value, respectively. From the results, we can conclude that the two distribution functions become indistinguishable after forty years. The approximation to the p{value becomes imprecise for values above 0:2. However, given the large sample sizes, and the results from table II, we can conclude with a high degree of confldence that the system does in fact have a stationary density. The length of time at which the transition densities appear to converge is consistent with the behavior of the system reported in Andersen and Lund (1997a). In order to simulate draws from the stationary density, Andersen and Lund (1997a) ran the Euler simulator for approximately thirty-eight years. The authors found that using longer trajectories had no signiflcant efiects on their results. Their results are consistent with the flnding here that the distributions converge at trajectories of around forty years in length.10 To sum up, it is reasonable to conclude that, at the parameter values considered here, the AL model is stable and has a stationary density. Both ofthesefeaturesareprerequisitesfortheconsistencyoftheBRSWestimator, and we will make use of some of the results in table II in what follows. In the next section, we turn to considering the behavior of the BRSW estimator in flnite samples. 10

II Nonparametric Estimation Assume that the term structure is determined by two state variables, the short rate r and the volatility of the short rate (cid:190): dr = fi (r ;(cid:190) )dt+fl (r ;(cid:190) )dW ; (22) t r t t r t t r;t d(cid:190) = fi (r ;(cid:190) )dt+fl (r ;(cid:190) )dW ; (23) t (cid:190) t t (cid:190) t t (cid:190);t where W and W are independent Wiener processes, and suppose that r;t (cid:190);t we observe data generated from these processes at discrete time intervals of length ¢. The Euler method of the previous section is one way to relate our discrete observations to the drift and difiusion functions of the continuous{ time processes. The Euler discretization for this system is given by:11 p r ¡r = fi ¢+fl ¢· ; (24) t+1 t r rp r;t+1 (cid:190) ¡(cid:190) = fi ¢+fl ¢· ; (25) t+1 t (cid:190) (cid:190) (cid:190);t+1 where, as before, · and · are independent standard normal deviates. It’s r (cid:190) easy to see that the observations in equations (24) and (25) satisfy the following relationships: 1 E[r ¡r jF ] = fi +O(¢); (26) t+1 t t r ¢ 1 E[(cid:190) ¡(cid:190) jF ] = fi +O(¢); (27) t+1 t t (cid:190) ¢ h i 1 E (r ¡r ) 2jF = fl 2 +O(¢); (28) t+1 t t r ¢ h i 1 E ((cid:190) ¡(cid:190) ) 2jF = fl 2 +O(¢); (29) t+1 t t (cid:190) ¢ where O(¢) means terms for which it is true that lim O(¢) < 1, and ¢!0 ¢ F denotes the information set at time t. The methodological innovation of t Boudoukh et al. (1998) is to note that, if we compute estimates of the flrst and second conditional moments on the left hand sides of equations (26) - (29), we will have estimates of the drift and difiusion functions accurate to O(¢). In order to estimate the conditional moments in equations (26)-(29) with minimal a priori structure on the drift and difiusion functions, a kernel regression method is used. First, we deflne a grid of interest rate and volatility values at which to estimate the conditional moments. Then, at each grid 11

value (r ;(cid:190) ), the estimates of the conditional moments are computed as i j follows: TX¡1 E[r i;t+1 ¡r i;t j(r i ;(cid:190) j)] = W(t)(r t+1 ¡r t) (30) t=1 TX¡1 E[(cid:190) i;t+1 ¡(cid:190) i;t j(r i ;(cid:190) j)] = W(t)((cid:190) t+1 ¡(cid:190) t) (31) t=1 h i TX¡1 E (r i;t+1 ¡r i;t) 2j(r i ;(cid:190) j) = W(t)(r t+1 ¡r t) 2;and (32) t=1 h i TX¡1 E (r i;t+1 ¡r i;t) 2j(r i ;(cid:190) j) = W(t)((cid:190) t+1 ¡(cid:190) t) 2; (33) t=1 where W(t) is the Nadaraya{Watson product weight function: K (r ¡r )K ((cid:190) ¡(cid:190) ) h i t h j t W(t) = i;j i;j ; (34) PT K (r ¡r )K ((cid:190) ¡(cid:190) ) h i t h j t i;j i;j t=1 and ‡ · 2 1 ¡1 x K (x) = p e 2 hi;j (35) h i;j 2… is the Gaussian kernel, and i;j = 1;2;:::;N. The smoothing parameters h , or \bandwidths," are the way one trades ofi bias against variance in i;j the flt. Large bandwidths reduce local variation, but increase bias. Small bandwidths flt local phenomena, at the cost of increased variance. Theoretic results for kernel regression estimators show that the optimal bandwidths will be proportional to T¡ 6 1 . However, the constant of proportionality is a complicated function of the joint density and its derivatives, the function to be estimated and its derivatives, the bandwidths, and the properties of the kernel function. Since under the AL model the joint density function is not known, it is not possible to derive a closed{form expression for the optimal bandwidth. Instead, one must rely on numerical procedures. I conduct a search over a grid of bandwidth values in order to arrive at an optimal bandwidth for data generated by the AL model using the Euler approximations.12 For the interest rate drift function, I search over scaling factors ` r = 1;2;4;6;8;10;12 for the bandwidth ` r (cid:190)^ r T¡1 6 that minimizes the sum of squared errors (SSE), computed as the sum over the estimation grid 12

of the squared deviations of the estimated surface from the true surface. For the interest rate difiusion, I search over a 7£7 grid of integer scaling factors ` r and ` (cid:190) to flnd the bandwidth vector (` r (cid:190)^ r ;` (cid:190) (cid:190)^ (cid:190) )T¡ 6 1 that minimizes the SSE. For the volatility process functions, I search for scaling factors in the same way as for the interest rate functions. Table IV displays the optimal scaling factors and the associated SSEs. The results in table IV show that, for the drift functions, the more highly autocorrelated interest rate data require relatively more smoothing. This is because a wider bandwidth leads to more cancellation of biases, and the biases tend to be more serious with more highly autocorrelated data, as will be discussed shortly. The large SSE on fi re(cid:176)ects a high degree of bias at (cid:190) extreme values of (cid:190). If along the volatility dimension the solution grid were restricted to values in the range (¡4:9;¡6:8), for example, the SSE on fi (cid:190) would be two orders of magnitude smaller.13 In the following discussion, I report pointwise averages for flts of the drift and difiusion functions over a 25 £ 25 grid of equally{spaced values on the square deflned by14: f(r;(cid:190)) : 0:02651 < r < 0:16731;¡7:0 < (cid:190) < ¡4:6g: (36) The pointwise averages are computed over 1000 simulations from the AL model. The \true" functions are parameterized using the values shown in table I in the previous section. The simulated data are drawn at a weekly frequency, withtwenty{flveinter{weekdraws.15 Eachtrajectoryisfortyyears in length. I run ofi flfty years of data before drawing simulated values, in view of the results from the previous section.16 Figures 1 and 2 display the fltted and true surfaces, as well as 95% pointwise confldence surfaces, for the flts obtained using the bandwidth scaling factors in table IV. In general, the fltted surfaces exhibit signiflcant biases near the boundaries of the data, but the sampling variances are so high that the biases are likely to be irrelevant from the point of view of hypothesis testing. Only in a few small regions do the true surfaces \break through" the 95% confldence region. The quality of the flts is in general better for the volatility process, re(cid:176)ecting the higher degree of mean reversion for this process. As discussed in Chapman and Pearson (1999), two efiects induce bias in theestimatedsurfaces. Neartheboundariesofthedata,thekernelfunctionis truncated, andsinceitis symmetric, thisskews theweights towardthecenter 13

of the data. This can have predictable efiects on the estimates. Taking the interest rate drift as an example, near the lower boundary of r, the weights will be biased toward higher values of r where the observed drifts tend to be less positive, or even negative. This biases the estimates near the lower boundary downward. The opposite is true for high values of r. Similar reasoningfollowsalongthevolatilitydimension,becausethevolatilityprocess is also mean-reverting. The second form of bias results from the correlation of the residuals with the regressors near the edges of the data. The nonparametric regression model for the drifts is given by: r ¡r = fi +† (37) t+¢ t r r;t+¢ (cid:190) ¡(cid:190) = fi +† (38) t+¢ t (cid:190) (cid:190);t+¢ where the † are disturbances. Unbiased estimation requires that: ¢;t+¢ E[† jr ;(cid:190) ] = 0; and (39) r;t+¢ t t E[† jr ;(cid:190) ] = 0: (40) (cid:190);t+¢ t t Bias arises because, in fact, the nonparametric estimator works with a flnite data set for which (39) and (40) don’t necessarily hold at the boundaries of the data. For example, at the data point where: (r t ;(cid:190) t ) = (rmax;(cid:190)); (41) it must be the case that: r t+¢ ¡r t • rmax ¡r t : (42) In other words, at the upper boundary of the observations on r, the residual in equation (37) must be negative, and ceterus paribus this causes downward bias in the point estimate of the drift function of the interest rate process. Moreover, to the extent that the residuals † and † are correlated, bias will r (cid:190) also be induced in the drift of the volatility process. This form of bias does not afiect the difiusion estimates, because the sign of (r ¡r )2 is always t+¢ t positive. Returning to flgures 1 and 2, we see that, for high interest rates, the interest rate drift function estimate is biased upward, indicating that the efiect of truncation bias is dominant. The opposite pattern holds for the estimates 14

of the volatility drift function. The estimates of the difiusion function of the interest rate exhibit complicated patterns of bias, as illustrated in the lower panel of flgure 1. This is because the interest rate difiusion is a function of both state variables, and in addition, the interest rate data are highly persistent. The function is well estimated at the center of the data, but toward the corners of the surface, signiflcant biases are in evidence. Looking at the lower panel of flgure 2, we flnd that the surface is estimated with much less bias. It is useful to compute numerical measures of error, both for diagnostic purposes and as a prelude to the test statistics used below. I compute three error measures, based on the L , L and L norms. To \estimate" the L 1 2 1 1 norm, I use the simple formula XX L ^ = jf ^ ¡f j; (43) 1 i;j i;j i j ^ where i and j run over the solution grid, f denotes the estimated function i;j value at (r ;(cid:190) ), and f denotes the true value.17 The L norm is similar, i j i;j 2 except that we \integrate" the squared errors over the solution surface: XX L ^ = (f ^ ¡f ) 2 : (44) 2 i;j i;j i j Finally, inspired by the Kolmogorov{Smirnov test of the previous section, I compute an estimate of the L norm: 1 L ^ = maxjf ^ ¡f j: (45) 1 i;j i;j i;j Table V displays these error measures for the surfaces shown in flgures 1 and 2. ^ ^ Examining the results in table V, we see that the measures L and L are 1 2 ^ driven by extreme errors. This can be deduced from the fact that the L 1 ^ measure tends to be large relative to the L measure. The error measures 1 for fl and fl underscore the success of the kernel method for estimating r (cid:190) ^ ^ difiusion functions. In both cases, the L and L measures are at least an 1 2 order of magnitude smaller than the corresponding measures on the drift functions. The relatively large values of the error measures on fi highlights (cid:190) the in(cid:176)uence of the choice of the solution set on the estimator, noted earlier. The ine–ciency of the estimator can be measured by integrating the region between the upper and lower 95% confldence surfaces. The measure 15

that I compute is given by: XX EFF = (f ^(+) ¡f (¡) ); (46) i;j i;j i j where f ^(+) denotes a point on the upper surface, and f ^(¡) on the lower surface. Thus, a larger value for EFF indicates greater ine–ciency, the confldence surfaces being farther apart. Table VI displays the calculations for the surfaces in flgures 1 and 2. The ine–ciency measures in table VI are primarily useful for comparisons between estimators. I defer a discussion of these results until the next section, where I consider the performance of the BRSW estimator when the model is misspecifled. III Misspeciflcation The estimates in the previous section were computed for the unrealistic case where we assumed a priori knowledge of the arguments to the drift and difiusion functions, and could thus use the correct conditioning variables in the kernel regressions. In other words, we estimated the following system: dr = fi (r )dt+fl (r ;(cid:190) )dW (47) t r t r t t r;t d(cid:190) = fi ((cid:190) )dt+fl dW ; (48) t (cid:190) t (cid:190) (cid:190);t in which all the arguments coincide with the arguments of the corresponding functions in the AL model. Suppose we were to estimate the more general system in (22)-(23). In this case, the drift functions and the difiusion function of the volatility process are misspecifled. The drift function for the interest rate process depends only on the level of the interest rate, as shown in (47), but under the more generalmodelwewillconditiononthelevelsoftheinterestrateandvolatility. Similarly, for the volatility drift, we’ll condition on both state variables when in fact the drift only depends on the level of volatility, as shown in (48). The volatilitydifiusionwillbehighlymisspecifled. Forthisfunction, wecondition on both state variables when in fact the difiusion is constant. It is interesting to look at how these forms of misspeciflcation afiect the estimator. Figures 3 and 4 display the various estimated surfaces. Introducing irrelevant conditioning variables introduces additional biases in the estimates due to the correlations in the residuals at the data boundaries, as discussed 16

above. Starting with the top panel of flgure 3, the surface has a distinct slope along the volatility dimension for high values of r. For low values of r, the surface also has a non{zero slope along the (cid:190) axis, although it is less pronounced. Comparing the top panel of flgure 4 to the top panel of flgure 2, we see that for the volatility drift, the irrelevant conditioning information leads mainly to a loss of e–ciency. There is only slight evidence of increased bias. The results for the volatility difiusion function are similar. Table VII shows the error measures for the correct and misspecifled flts. In general, the errors increase, although there are some important excep- ^ ^ tions. For fi , both the L and L measures improve under the misspecifled r 1 2 model, showing that the introduction of the irrelevant conditioning variable facilitatedadditionalbiascancellations. Thedifiusionfunctionfl iscorrectly r specifled under both models and thus the error measures don’t change. For the volatility process, the irrelevant conditioning information signiflcantly worsens the flt for both the drift and difiusion functions. In sum, the results here and above show that irrelevant conditioning information has an ambiguous efiect on the magnitude and sign of bias. As we would expect, the inclusion of irrelevant conditioning variables results in greater ine–ciency. Table VIII displays the ine–ciency measure given by equation (46) for the misspecifled model. Comparing these values to the values in table VI, we see that the value of EFF is in general greater under the misspecifled model. The e–ciency loss is greatest for the volatility difiusion, where we have introduced two irrelevant variables. The value of EFFjumpsfrom54:4to90:0. ThevalueofEFFforfl doesn’tchangebecause r in both cases we’ve estimated the function with both conditioning variables. The main points to take away from the results of this section and the previous section are that the kernel regression estimator has signiflcant flnite sample biases, but that the variance of the estimator is high enough that there is reason to doubt that the biases are relevant for hypothesis testing. In a real{data situation, of course, one can’t know the sampling variance, or the degree of bias in the estimator. In light of these facts, a question that plays to the strengths of the kernel estimator is to ask if the estimator produces estimates of the drift and difiusion functions that are statistically distinguishable from a known parametric estimator. In other words, does the more general kernel estimator \pick up" anything in the data that the parametric estimator might be missing? In this context, Monte Carlo methods can be used to bootstrap the flnite{sample distributions of statistics based 17

on the nonparametric estimator. IV Hypothesis Tests In this section, I use Treasury bill data to test the hypothesis that the BRSW estimator produces estimated surfaces that are statistically indistinguishable from the surfaces implied by the estimates in table I. The test proceeds in two stages. First, the quantiles of three difierent test statistics for the BRSW estimator are bootstrapped under the null hypothesis that the AL model is the true data generating process. Second, the BRSW estimator is applied to the Treasury data, and the values of the test statistics are computed. Finally, the values of the test statistics computed for the Treasury data are compared to the bootstrapped quantiles. The Treasury data used to proxy the riskless short rate are the same data that are used by Andersen and Lund (1997a). I use the three-month Treasury{bill yield, at weekly (Wednesday) frequency from 1962-1999.18 The data are obtained from the H.15 release of the Federal Reserve System. I convert the series from a bank discount basis to an investment basis prior to analysis, and Tuesday values are substituted for Wednesday values when the Wednesday value is missing. I also make use of data on the slope of the term structure. The data used to form the slope of the term structure are the same data used in Boudoukh et al. (1998). I use the yields on Treasury securities at constant, ten{year maturities, again from the H.15 release. The slope of the term structure is computed as the difierence between the ten{year rate and the three{month rate. The slope of the term structure is used in the estimation procedure becausethevolatilityprocessisnotdirectlyobservable. Estimatesofthevolatilities are obtained by flrst fltting the level and slope data using the BRSW estimator, and using the estimates of the interest rate difiusion function from this flrst stage to compute the implied volatilities. The three{month rates and implied volatilities are used in the estimation of the functions of the \true" processes.19 In the flrst stage, I estimate fl (r ;S ) from the following r t t system: dr = fi (r ;S )dt+fl (r ;S )dW (49) t r t t r t t r;t dS = fi (r ;S )dt+fl (r ;S )dW ; (50) t S t t S t t S;t 18

^ where S is the slope of the term structure at time t. The estimate fl (r;S) t r is then used to infer the volatility process observations. An observation s is t obtained by plugging in (interpolating where necessary) the observed values (r ;S ) to obtain fl ^ (r ;S ) = s .20 Finally, to make the volatility process t t r t t t consistent with the AL model, I make the transformation (cid:190) = ln(s2=r ). t t t The series of implied (cid:190) values has the same unconditional mean as the volatility process estimated by Andersen and Lund (1997a). The estimated unconditional mean for the AL model is reported in table I as ¡6:3599. The unconditional mean of the volatility values inferred using the BRSW estimator and observations on the level and slope of the term structure is ¡6:3557. It is reassuring that two estimators agree on this parameter. Using the Monte Carlo methods of the previous section, I bootstrap the distributionofthreedifierentstatistics,underthenullhypothesisthattheAL model is the \true" data generating process. The test statistics are the mean squared error (MSE), mean absolute error (MAE) and maximum absolute deviation (MAD), deflned as follows: –(r;(cid:190)) ^ MSE = L (51) N2 2 –(r;(cid:190)) ^ MAE = L (52) N2 1 ^ MAD = –(r;(cid:190))L ; (53) 1 ^ ^ ^ where L , L and L are deflned in equations (43)-(45) in the previous 1 2 1 section, and –(r;(cid:190)) is a \trimming function" used to reduce the efiect of boundary biases on the statistics. I used –(¢) to trim the solution grid to a 21 £ 21 square, thus removing the outer two rings of data. The quantiles of the statistics are found by compiling the values of the statistics for 1,000 simulated draws from the AL model using the Euler method of the previous section. Table IX displays the 90% and 95% quantiles for the three statistics. If the null hypothesis is true, when we apply the BRSW estimator to the Treasury data and compute the statistics on the resulting estimated surfaces, we should obtain values for the statistics that fall into the middle of the bootstrapped distributions. If the null hypothesis is false, the statistics will fall into the upper tails of the distributions, and we can conclude that the kernel estimator is \picking up" something in the data that is missed by the parametric estimator. The distributions of the test statistics are computed under the misspecifled model; this allows the for the best chance of picking up something in 19

the data that the parametric model might miss. For each function and each statistic, I search over an 18£18 grid for the pair of integer scaling values (` r ;` (cid:190) ) that produce bandwidths (` r (cid:190)^ r ;` (cid:190) (cid:190)^ (cid:190) )T¡ 6 1 that minimize the statistic in question. This approach flnds the bandwidth values that minimize the statistics for the model that maximizes the likelihood of flnding signiflcant difierences between the nonparametric and parametric estimates. The statistic{minimizing bandwidths are shown in table X. Figures 5 and 6 display the fltted surfaces for the bandwidth values that minimize the MSE criterion, as well as the surfaces under the AL model. The observed statistics are displayed in table XI. Except for the interest rate drift function, the null hypothesis is rejected at the 95% level for each function and statistic. For the interest rate drift function (fi ), the r th 90 quantiles are 0:000048, 0:0057 and 0:013 for the MSE, MAE, and MAD statistics, respectively. From table XI, we see that the observed statistics are 0:000033, 0:0049, and 0:0089, respectively { all less than the associated quantile values and thus within the 90% acceptance region. For the interest rate difiusion (fl ), we see that the observed statistic values are greater than r the 95% quantiles for each statistic, indicating rejection of the hypothesis that the Treasury data are drawn from the distribution implied by the AL model. Similarly, the observed statistics for the volatility process functions (fi and fl ) indicate rejection of the null hypothesis. (cid:190) (cid:190) In sum, the results support the conclusion that the Treasury data are not generated by the AL model. However, the results do not support the conclusion of nonlinearities in the interest rate drift function. The results indicate that the interest rate difiusion and the volatility process drift and difiusion functions exhibit nonlinearities that are not captured by the AL model. It is important to emphasize that these hypothesis test results are robust against any residual kernel biases that may be present in the estimates, because we have bootstrapped the flnite sample distributions of the statistics under the null hypothesis that the parametric model is the true data generating process. The quantiles that are reported in table IX are thus \corrected" for kernel bias by the bootstrap. 20

V Conclusion In this essay, I used Monte Carlo simulations from the Andersen and Lund (1997a) stochastic volatility model of interest rates to study the flnite sample properties of the BRSW estimator. The estimator exhibited complicated patterns of bias and a high sampling variance. The introduction of irrelevant conditioning information resulted in increased ine–ciency in all cases, and increased bias in most cases. I tested whether the BRSW estimates were statistically distinguishable from the parametric estimates, and found that the BRSW estimator indeed appeared to be picking up dynamics in the data that the parametric estimator missed. As part of this research, I worked out a method to test whether or not a system of stochastic difierential equations is stationary. The algorithm that I used for performing the test involved the flrst{order Euler discretization scheme for simulating trajectories from the model, and an extension of the Kolmogorov{Smirnov test. As mentioned earlier, it would be useful to extend the bivariate Kolmogorov{Smirnov test to the case of k{samples. It is possible that the k{sample generalization can be derived much the same way that the univariate k{sample KS test is derived from its two sample analogue. While the full k{sample bivariate statistic would be computationally burdensome to calculate, the wide range of applications for which it would be useful would seem to justify its development. In the econometrics literature, and in the research pipeline, there are many difierent estimators for the drift and difiusion functions of continuous time stochastic processes. For example, one can turn to the e–cient method of moments estimator of Gallant and Tauchen (1996) or the simulated likelihood method of Brandt and Santa{Clara (1999). It would be useful to compare the flnite sample properties of these estimators against a common benchmark, such as the maximum likelihood estimator for a model in which the transition densities are known in closed form. To date, little work has been done to understand the relative performance of the difierent estimators. 21

Appendix Kernelregression,particularlyinmultipledimensions,isnecessarilyacomputationally intensive procedure. However, a parallel computer can make short work of even fairly large problems, because kernel regression lends itself easily to parallelization. In this appendix, I discuss a very simple algorithm that I’ve developed for doing kernel regression on a parallel computer. In two dimensions, kernel regression using the Nadaraya-Watson estimator essentially boils down to computing the following formula repeatedly over a grid of solution points: XT ^ f(x ;y ) = W(t)g(x ;y ;x ;y ); (54) i j t t i j t=1 where W(t) is the weighting function from equation (34) in the body of the paper, and g(¢) is a known function of the data and the solution point. We compute this equation for fx ;y gN . i j i;j=1 A naive parallel algorithm for this problem is to simply break up the solutiongridintochunks,andtoassignthechunkstotheavailableprocessors. This algorithm is in general ine–cient unless one also works out an algorithm for balancing the load across the processors, which is a di–cult problem, particularly on a shared machine. A more e–cient approach is to rely on the operating system for load balancing, and to assign small bits of the task (single grid points) to lightweight processes for execution. The bit of pseudo{ code below shows how I implemented such an algorithm using the pthreads library on a Sun workstation running the Sun Solaris 2.6 operating system. The outer while loop checks the completion condition, where the size of the problem is given by the parameter n = N. The if{statement inside the while loop ensures that a limited number of threads are running at one time, where the maximum number of threads is given by nt. This mechanism prevents the program from loading the machine with so many lightweight processes that they begin to compete with one another for resources, degrading performance. When the limit nt is reached, the algorithm waits for threads to join (terminate), and then flres ofi more threads as needed. The routine Kernel Thread is the routine in which the actual computations are done. 22

i = 0; count = 0; while ( i < n ) { if ( count < nt ) { if ( pthread_create((pthread_t *) &thread_id, (pthread_attr_t *) &thread_attributes, Kernel_Thread, (void *) (thread_data + i)) ) { perror("pthread_create"); return; } count++; i++; } else { thr_join((thread_t) 0, (thread_t *) &thread_id, (void **) NULL); count--; } } 23

The algorithm is e–cient, driving a Sun Ultrasparc with three processors to around 80% of maximum e–ciency in terms of cpu utilization. Over a solution grid with 144 points, using 2,080 data points, the algorithm computed 4,000 iterations of the BRSW estimator for the AL model in approximately eleven minutes. When the number of data points was increased to 208,000, the program drove the machine to nearly maximum e–ciency, and ran in one hour, forty minutes. 24

References Ait{Sahalia, Y.: 1996, Testing continuous{time models of the spot interest rate, The Review of Financial Studies 9, 385{426. Andersen, T. G. and Lund, J.: 1997a, Estimating continuous-time stochastic volatilitymodelsoftheshortterminterestrate,Journal of Econometrics 77(2), 343{377. Andersen, T. G. and Lund, J.: 1997b, Stochastic volatility and mean drift in the short rate difiusion: Sources of steepness, level and curvature in the yield curve. Working paper. Boudoukh, J., Richardson, M., Stanton, R. and Whitelaw, R. F.: 1998, The stochastic behavior of interest rates: Implications from a multifactor, nonlinear continuous-time model. Working paper. Brandt, M.W.andSanta{Clara, P.: 1999, Simulatedlikelihoodestimationof multivariatedifiusionswithanapplicationtointerestratesandexchange rates with stochastic volatility. Working paper. Chapman, D. A. and Pearson, N. D.: 1999, Is the short rate drift actually nonlinear? Forthcoming in Journal of Finance. Du–e, D. and Kan, R.: 1996, A yield{factor model of interest rates, Mathematical Finance 6, 379{406. Fasano, G. and Franceschini, A.: 1987, A multidimensional version of the kolmogorov{smirnov test, Monthly Notices of the Royal Astronomical Society 225, 155{170. Gallant, A. and Tauchen, G.: 1996, Which moments to match?, Econometric Theory 12, 657{681. H˜ardle, W.: 1990, Applied Nonparametric Regression, Cambridge University Press. Karatzas, I. and Shreve, S. E.: 1991, Brownian Motion and Stochastic Calculus, Springer{Verlag, New York, NY. Kloeden, P. E. and Platen, E.: 1995, Numerical Solution of Stochastic Differential Equations, Springer-Verlag, Berlin. 25

Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P.: 1994, Numerical Recipes in C, second edn, Cambridge University Press, Cambridge. Pritsker, M.: 1998, Nonparametricdensityestimationandtestsofcontinuous time interest rate models, Review of Financial Studies 11(3), 449{487. Stanton, R.: 1997, A nonparametric model of term structure dynamics and the market price of interest rate risk, The Journal of Finance 52, 1973{ 2002. 26

Footnotes 1. In what follows, I refer to the the Boudoukh et al. (1998) estimator for multifactor models as the \BRSW estimator." 2. The Ait{Sahalia (1996) estimator is di–cult to adapt to multivariate models, so I do not consider it here. 3. One can verify that (6)-(7) are equivalent to (1)-(2) using Ito’s Lemma andthetransformation(cid:190)^ = log(cid:190)2. Inequations(6)-(7),Ihaveomitted t t the ‘^’ symbol on (cid:190) for notational brevity. t 4. It is important to keep in mind our maintained hypothesis that the system has a unique solution. We might conclude that the system is stationary, but if our maintained hypothesis is in error, the transition densities could be converging to the stationary density of a difierent system! This is similar to the problems that can arise when solving a partial difierential equation with a flnite difierence algorithm that is inconsistent. However, as we’ll see below, the transition densities appear to converge, and there is no evidence of convergence to the \wrong" density. 5. In private communications, the authors indicated that the parameters reported in Andersen and Lund (1997a) re(cid:176)ect rescalings of the difiusion function. The parameter values in table I are from Andersen and Lund (1997b), in which the authors correct the values for the rescaling. In tests similar to those reported here, I found that the system was borderline stationary, perhaps even nonstationary, at the values actually published in Andersen and Lund (1997a). 6. Thestandarddeviationsarereportedatzeroduetorounding. Inreality they are on the order of 10¡14. The tight standard deviations re(cid:176)ect the use of the antithetic variance reduction technique. 7. Unlike the standard one dimensional KS test statistic, the bivariate statistic is slightly distribution{dependent. In future work, I plan to study the test statistic a little more closely. For more information on the test statistic, see the paper cited in the text and Press, Teukolsky, Vetterling and Flannery (1994). 27

8. Picking points farther out in the tails of the distribution will bias the test toward flnding convergence at longer trajectories. On the other hand, from the results in table II, we can make some assessment of the probability of observing the points that are chosen for the test. One should pick points far enough out in the tails so that the probability of observing points that could generate difierent results is very low, but not so far out that the test becomes computationally infeasible. 9. It would be useful to have a k{sample bivariate Kolmogorov{Smirnov test, with which one could simultaneously test the convergence of bivariate transition densities deflned by a surface of k starting points. To my knowledge, no such test has been developed. 10. It’s unclear how the e–cient method of moments estimator used in Andersen and Lund (1997a), or other simulation estimators, are afiected when the flrst draws of simulated trajectories are not drawn from the stationary density of the process. To my knowledge, a formal study of the issue has not been completed. In related work, Brandt and Santa{ Clara (1999) report that flxing the flrst observation has little efiect on thesimulatedmaximumlikelihoodestimatorthattheydevelop, butthe extent to which this flnding generalizes to other estimators is unknown. Of course, the efiects must be limited in a large sample, simply because the efiect of any single observation on the likelihood function will be limited. In the main, it is a small sample issue. 11. When no confusion will arise, in what follows I omit the arguments to the drift and difiusion functions. They are to be understood. 12. The cross-validation approach to bandwidth selection is not useful for highly autocorrelated data. See Ha˜rdle (1990) for a short discussion, and Pritsker (1998) for a more in{depth discussion of the problems. 13. The solution grid was chosen to be consistent with the hypothesis tests in section four. 14. The set of valid (r;(cid:190)) values is deflned by the observed Treasury bill data in section four. The set contains all of the observed data points. 15. The inter-week draws ensure that, during the simulations, the discretized process for the interest rate never takes on negative values. 28

In addition, with the inter{week draws, the data are simulated at a degree of accuracy that is greater than the accuracy of the nonparametric estimator. Thus, the accuracy of the weak solution does not bound the accuracy of the estimator. 16. A parallel kernel estimator is used in order to manage the computational load. The parallel kernel estimator is discussed in the appendix. 17. To be precise, one ought to compute the L norm using a quadrature 1 integration method or the like, especially if the function surfaces exhibit radical gradients. Because our surfaces are very well{behaved, the simple formulas used here su–ce for our purposes. 18. Andersen and Lund (1997a) use data for the period from 1954-1995; in all other respects the series are the same. 19. See Du–e and Kan (1996) for a discussion. 20. To estimate the interest rate difiusion, I used bandwidths ((cid:190)^ r ;(cid:190)^ (cid:190) )T¡1 6 = (7:384985e¡03;3:579158e¡03), where the (cid:190)^ symbols denote sample standard deviations. 29

Table I: Parameter Values ThistableliststheparametervaluesusedintheMonteCarlosimulationsthroughout the paper. The parameters are taken from Andersen and Lund (1997b). The stochastic system is given by: p dr t = • 1(„¡r t)dt+(cid:190) t r t dW 1;t dlog(cid:190) t 2 = • 2((cid:181)¡log(cid:190) t 2 )dt+»dW 2;t Parameter Value • 0.1633 1 „ 0.0595 • 1.0397 2 (cid:181) -6.3599 » 1.2719 30

Table II: Simulation Results This table reports the results of Monte Carlo simulations to generate moments of the transition densities of the AL model. From each of 25 difierent starting points, 1,000 batches of 100 trajectories are simulated. The last point of each trajectory is saved, formingabatchof100drawsfromthetransitiondensitydeflnedbythestartingpointand the length of the trajectory. The mean and variance of each batch of saved points is then computed. At the end of a run, the procedure produces 1,000 independent draws of the flrst two moments of each of the 25 transition densities. Eight such runs are completed, the flrst with trajectories one year in length, the second with flve year trajectories, and so on for ten, twenty, thirty, forty, flfty, and flnally sixty year trajectories. In the table, the ’Mean’ columns show the average over the 25 densities of the moment in question, and the ’Std Dev’ columns show the dispersion of this moment over the 25 densities. The ’Min’and’Max’columnsshowtheminimumsandmaximumsofeachmomentoverthe25 densities, respectively. Moment Mean Std Dev Min Max Moment Mean Std Dev Min Max E[r 1] 0.0769 0.0420 0.0163 0.1401 Var[r 1] 0.0002 0.0001 0.0000 0.0011 E[r 5] 0.0685 0.0218 0.0353 0.1050 Var[r 5] 0.0004 0.0002 0.0000 0.0020 E[r 10] 0.0634 0.0097 0.0462 0.0830 Var[r 10] 0.0005 0.0001 0.0001 0.0018 E[r 20] 0.0602 0.0022 0.0529 0.0682 Var[r 20] 0.0004 0.0001 0.0001 0.0019 E[r 30] 0.0596 0.0013 0.0547 0.0658 Var[r 30] 0.0004 0.0001 0.0002 0.0016 E[r 40] 0.0595 0.0012 0.0547 0.0647 Var[r 40] 0.0004 0.0001 0.0001 0.0016 E[r 50] 0.0594 0.0012 0.0549 0.0651 Var[r 50] 0.0004 0.0001 0.0001 0.0015 E[r 60] 0.0594 0.0012 0.0547 0.0651 Var[r 60] 0.0004 0.0001 0.0001 0.0016 E[(cid:190) 1] -6.2339 0.2473 -6.5838 -5.8841 Var[(cid:190) 1] 0.6971 0.1399 0.2701 1.4404 E[(cid:190) 5] -6.3580 0.0037 -6.3632 -6.3527 Var[(cid:190) 5] 0.7932 0.1590 0.2567 1.6098 E[(cid:190) 10] -6.3598 0.0000 -6.3599 -6.3598 Var[(cid:190) 10] 0.7936 0.1596 0.3470 1.5934 E[(cid:190) 20] -6.3599 0 -6.3599 -6.3599 Var[(cid:190) 20] 0.7945 0.1591 0.2875 1.5405 E[(cid:190) 30] -6.3599 0 -6.3599 -6.3599 Var[(cid:190) 30] 0.7950 0.1589 0.2838 1.6439 E[(cid:190) 40] -6.3599 0 -6.3599 -6.3599 Var[(cid:190) 40] 0.7942 0.1589 0.2723 1.6174 E[(cid:190) 50] -6.3599 0 -6.3599 -6.3599 Var[(cid:190) 50] 0.7950 0.1597 0.3147 1.5621 E[(cid:190) 60] -6.3599 0 -6.3599 -6.3599 Var[(cid:190) 60] 0.7934 0.1593 0.3054 1.5795 31

Table III: Bivariate KS Test Results This table displays the results of the bivariate Kolmogorov-Smirnov test for convergence in distribution of the transition densities of the AL system. Two transition densities are tested for convergence. The densities are deflned by starting points that are two standard deviations away from the long{run means of each process, and about four standard deviations away from one another, and by the length of the trajectories. The flrst column displays the length of the trajectories, the second column shows the test statistic, and the flnal column shows the p{Value. Years KS p{Value 1 0.9991 0.0000 5 0.8735 0.0000 10 0.4928 0.0000 20 0.1061 0.0000 30 0.0240 0.0012 40 0.0100 0.5327 32

Table IV: Scaling Factors and Sum of Squared Errors Thistablereportstheresultsofagridsearchfortheoptimalscalingfactorson the bandwidths of the kernel estimator for each function of the system. The flrst column lists the function, and the second and third columns display the relevant scaling factors that minimized the sum{of{squared error criterion. The flnal column displays the resulting SSE value. Function ` ` SSE r (cid:190) fi 6.0 { 0.00273 r fl 2.0 1.0 0.00277 r fi { 1.0 0.14943 (cid:190) fl { 2.0 0.00829 (cid:190) 33

Table V: Error Measures This table reports measures of error in flt for the kernel estimates displayed in flgures 1 and 2. The error measures are deflned as: XX L ^ = jf ^ ¡f j; 1 i;j i;j i j XX L ^ = (f ^ ¡f ) 2 ; 2 i;j i;j i j L ^ = maxjf ^ ¡f j; 1 i;j i;j i;j ^ where f denotes the kernel estimate at point i;j on the solution grid, and i;j f denotes the true value. ^ ^ ^ Function L L L 1 2 1 fi 1.744913e-02 2.799722e+00 9.110867e-03 r fl 4.321830e-03 9.906933e-01 1.201149e-02 r fi 4.221677e-01 1.005452e+01 1.003910e-01 (cid:190) fl 9.791103e-03 2.473750e+00 3.958000e-03 (cid:190) 34

Table VI: Ine–ciency Measure This table reports the value of an ine–ciency measure for the estimates displayed in flgures 1 and 2. The ine–ciency measure is deflned as: XX EFF = (f ^(+) ¡f ^(¡) ); i;j i;j i j ^(+) where f denotes the upper 95% confldence value at point (r ;(cid:190) ) on the i;j i j ^(¡) solution surface, and f denotes the lower 95% value. i;j Function EFF fi 6.56 r fl 9.06 r fi 1068.59 (cid:190) fl 54.44 (cid:190) 35

Table VII: Error Measures under Misspeciflcation This table reports measures of error in flt for the kernel estimates displayed in flgures 3 and 4. The error measures are deflned as: XX L ^ = jf ^ ¡f j; 1 i;j i;j i j XX L ^ = (f ^ ¡f ) 2 ; 2 i;j i;j i j L ^ = maxjf ^ ¡f j; 1 i;j i;j i;j ^ where f denotes the kernel estimate at point i;j on the solution grid, and i;j f denotes the true value. ^ ^ ^ Function L L L 1 2 1 fi 1.718060e-02 2.773228e+00 9.949927e-03 r fl 4.321830e-03 9.906933e-01 1.201149e-02 r fi 5.777287e-01 1.257005e+01 1.707380e-01 (cid:190) fl 3.256718e-02 4.120876e+00 1.014200e-02 (cid:190) 36

Table VIII: Ine–ciency Measure under Misspeciflcation This table reports the value of an ine–ciency measure for the estimates displayed in flgures 3 and 4. The ine–ciency measure is deflned as: XX EFF = (f ^(+) ¡f ^(¡) ); i;j i;j i j ^(+) where f denotes the upper 95% confldence value at point (r ;(cid:190) ) on the i;j i j ^(¡) solution surface, and f denotes the lower 95% value. i;j Function EFF fi 6.67 r fl 9.06 r fi 1201.86 (cid:190) fl 90.09 (cid:190) 37

Table IX: Bootstrapped Quantiles This table reports the bootstrapped quantiles of three difierent statistics, computed under the null hypothesis that the AL model is the \true" data generating process. The test statistics are the mean absolute error (MAE), mean squared error (MSE), and maximum absolute deviation (MAD), deflned as follows: –(r;(cid:190)) ^ MAE = L (55) N2 1 –(r;(cid:190)) ^ MSE = L (56) N2 2 ^ MAD = –(r;(cid:190))L ; (57) 1 ^ ^ ^ where L , L and L are deflned as: 1 2 1 XX L ^ = jf ^ ¡f j; 1 i;j i;j i j XX L ^ = (f ^ ¡f ) 2 ; 2 i;j i;j i j L ^ = maxjf ^ ¡f j; 1 i;j i;j i;j ^ where f denotes the kernel estimate at point i;j on the solution grid, and i;j f denotes the true value. The function –(r;(cid:190)) is a \trimming function" used to reduce the efiect of boundary biases on the statistics. I used –(¢) to trim the solution grid to a 21£21 square, thus removing the outer two rings of data. The quantiles of the statistics are found by compiling the values of the statistics for 1,000 simulated draws from the AL model using the Euler method. th th 95 Quantiles 90 Quantiles Function MSE MAE MAD Function MSE MAE MAD fi 0.000054 0.006009 0.013931 fi 0.000048 0.005736 0.013033 r r fl 0.000034 0.003841 0.020105 fl 0.000027 0.003423 0.018463 r r fi 1.305474 0.406894 1.453917 fi 1.128532 0.355319 1.193143 (cid:190) (cid:190) fl 0.001622 0.040272 0.042565 fl 0.001057 0.032504 0.035641 (cid:190) (cid:190) 38

Table X: Bandwidth Scalings for Observed Statistic Values This table reports the bandwidth scalings for the kernel estimates based on Treasury data. For each function and each statistic, I search over an 18£18 grid for the pair of integer scaling values (` ;` ) that produce bandwidths r (cid:190) (` r (cid:190)^ r ;` (cid:190) (cid:190)^ (cid:190) )T¡ 6 1 that minimize the statistic in question. This approach flnds the bandwidth values that minimize the statistic that maximizes the probability of flnding signiflcant difierences between the BRSW and EMM estimators. MSE MAE MAD Function ` ` ` ` ` ` r (cid:190) r (cid:190) r (cid:190) fi 4.0 12.0 4.0 12.0 4.0 12.0 r fl 1.0 1.0 2.0 1.0 1.0 12.0 r fi 1.0 10.0 1.0 12.0 12.0 1.0 (cid:190) fl 6.0 8.0 6.0 6.0 12.0 12.0 (cid:190) 39

Table XI: Observed Statistic Values This table reports statistic values computed on Treasury data. The statistics are deflned as: –(r;(cid:190)) ^ MAE = L (58) N2 1 –(r;(cid:190)) ^ MSE = L (59) N2 2 ^ MAD = –(r;(cid:190))L ; (60) 1 ^ ^ ^ where L , L and L are deflned as: 1 2 1 XX L ^ = jf ^ ¡f j; 1 i;j i;j i j XX L ^ = (f ^ ¡f ) 2 ; 2 i;j i;j i j L ^ = maxjf ^ ¡f j; 1 i;j i;j i;j ^ where f denotes the kernel estimate at point i;j on the solution grid, and i;j f denotes the value implied by the AL model. The function –(r;(cid:190)) is a \trimming function" used to reduce the efiect of boundary biases on the statistics. I used –(¢) to trim the solution grid to a 21 £ 21 square, thus removing the outer two rings of data. The bootstrapped quantiles of the statistics are displayed in table IX. Function MSE MAE MAD fi 0.000033 0.004941 0.008971 r fl 0.004405 0.041261 0.204983 r fi 10.724300 2.713032 5.731216 (cid:190) fl 0.031650 0.150124 0.318428 (cid:190) 40

Figure 1: Estimates for Interest Rate Process Interest Rate Drift true lower mean upper 0.01 0.005 0 -0.005 -0.01 -0.015 -0.02 -0.025 0.09 0.08 0.07 0.05 0.06 sigma 0.05 0.1 0.04 r 0.15 Interest Rate Diffusion true lower mean upper 0.06 0.05 0.04 0.03 0.02 0.01 0 0.09 0.08 0.07 0.05 0.06 sigma 0.05 0.1 0.04 r 0.15 41

Figure 2: Estimates for Volatility Process Volatility Drift true lower mean upper 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -2.5 -3 -3.5 -4 0.09 0.08 0.07 0.05 0.06 sigma 0.05 0.1 0.04 r 0.15 Volatility Diffusion true lower mean upper 1.32 1.31 1.3 1.29 1.28 1.27 1.26 1.25 1.24 1.23 1.22 0.09 0.08 0.07 0.05 0.06 sigma 0.05 0.1 0.04 r 0.15 42

Figure 3: Estimates for Misspecifled Interest Rate Process Interest Rate Drift true lower mean upper 0.01 0.005 0 -0.005 -0.01 -0.015 -0.02 -0.025 -0.03 0.09 0.08 0.07 0.05 0.06 sigma 0.05 0.1 0.04 r 0.15 Interest Rate Diffusion true lower mean upper 0.06 0.05 0.04 0.03 0.02 0.01 0 0.09 0.08 0.07 0.05 0.06 sigma 0.05 0.1 0.04 r 0.15 43

Figure 4: Estimates for Misspecifled Volatility Process Volatility Drift true lower mean upper 2 1 0 -1 -2 -3 -4 -5 0.09 0.08 0.07 0.05 0.06 sigma 0.05 0.1 0.04 r 0.15 Volatility Diffusion true lower mean upper 1.31 1.3 1.29 1.28 1.27 1.26 1.25 1.24 1.23 1.22 0.09 0.08 0.07 0.05 0.06 sigma 0.05 0.1 0.04 r 0.15 44

Figure 5: Interest Rate Process Interest Rate Drift parametric kernel 0.01 0.005 0 -0.005 -0.01 -0.015 -0.02 -0.025 0.8 0.7 0.6 0.5 0.05 0.4 sigma 0.3 0.1 0.2 r 0.1 0.15 Interest Rate Diffusion parametric kernel 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.8 0.7 0.6 0.5 0.05 0.4 sigma 0.3 0.1 0.2 r 0.1 0.15 45

Figure 6: Volatility Process Volatility Drift parametric kernel 2 1 0 -1 -2 -3 -4 -5 -6 -7 0.8 0.7 0.6 0.5 0.05 0.4 sigma 0.3 0.1 0.2 r 0.1 0.15 Volatility Diffusion parametric kernel 1.4 1.3 1.2 1.1 1 0.9 0.8 0.8 0.7 0.6 0.5 0.05 0.4 sigma 0.3 0.1 0.2 r 0.1 0.15 46

Cite this document
APA
Chris Downing (1999). Nonparametric Estimation of Multifactor Continuous-Time Interest Rate Models (FEDS 1999-62). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_1999-62
BibTeX
@techreport{wtfs_feds_1999_62,
  author = {Chris Downing},
  title = {Nonparametric Estimation of Multifactor Continuous-Time Interest Rate Models},
  type = {Finance and Economics Discussion Series},
  number = {1999-62},
  institution = {Board of Governors of the Federal Reserve System},
  year = {1999},
  url = {https://whenthefedspeaks.com/doc/feds_1999-62},
  abstract = {This paper studies the finite sample properties of the kernel regression method of Boudoukh et al. (1998) for estimating multifactor continuous-time term structure models. Monte Carlo simulations are employed, with a grid-search technique to find the optimal kernel bandwidth. The estimator exhibits truncation and correlated residuals biases near the boundaries of the data. However, the variance of the estimator is so high that the biases are unlikely to be relevant from a hypothesis testing point of view. The performance of the estimator is also studied under model misspecification. Irrelevant regressors reduce efficiency and induce additional biases in the estimates. Using Treasury bill data, I test whether the estimates produced by the nonparametric estimator are statistically distinguishable from estimates obtained under a parametric model. The kernel regressions pick up nonlinearities in the data that the parametric model cannot capture.},
}