ifdp · January 31, 2008

On the Application of Automatic Differentiation to the Likelihood Function for Dynamic General Equilibrium Models

Abstract

A key application of automatic differentiation (AD) is to facilitate numerical optimization problems. Such problems are at the core of many estimation techniques, including maximum likelihood. As one of the first applications of AD in the field of economics, we used Tapenade to construct derivatives for the likelihood function of any linear or linearized general equilibrium model solved under the assumption of rational expectations. We view our main contribution as providing an important check on finite-difference (FD) numerical derivatives. We also construct Monte Carlo experiments to compare maximum-likelihood estimates obtained with and without the aid of automatic derivatives. We find that the convergence rate of our optimization algorithm can increase substantially when we use AD derivatives.

Board of Governors of the Federal Reserve System International Finance Discussion Papers Number 920 February 2008 On the Application of Automatic Differentiation to the Likelihood Function for Dynamic General Equilibrium Models Houtan Bastani Luca Guerrieri NOTE: International Finance Discussion Papers are preliminary materials circulated to stimulate discussion and critical comment. References in publications to International Finance Discussion Papers (other than an acknowledgment that the writer has had access to unpublished material) should be cleared with the author or authors. Recent IFDPs are available on the Web at www.federalreserve.gov/pubs/ifdp/. This paper can be downloaded without charge from Social Science Research Network electronic library at http://www.ssrn.com/.

On the Application of Automatic Differentiation to the Likelihood Function for Dynamic General Equilibrium Models∗ Houtan Bastani and Luca Guerrieri∗∗ Federal Reserve Board February, 2008 Abstract Akeyapplicationofautomaticdifferentiation(AD)istofacilitatenumericaloptimization problems. Such problems are at the core of many estimation techniques, including maximum likelihood. As one of the first applications of AD in the field of economics, we used Tapenade to construct derivatives for the likelihood function of any linear or linearized general equilibrium model solved under the assumption of rational expectations. We view our main contribution as providing an important check on finite-difference (FD) numericalderivatives. Wealso construct MonteCarlo experimentsto compare maximumlikelihood estimates obtained with and without the aid of automatic derivatives. We find that the convergence rate of our optimization algorithm can increase substantially when we use AD derivatives. Keywords: General Equilibrium Models, Kalman Filter, Maximum Likelihood JEL Classification: C52, C61, C63 ∗ TheauthorsthankChrisGust,AlejandroJustiniano,andseminarparticipantsatthe2007meeting of the Society for Computational Economics. We are particularly grateful to Gary Anderson for countless helpful conversations, as well as for generously sharing his programs with us. The views expressed in this paper are solely the responsibility of the authors and should not be interpreted as reflecting the views of the Board of Governors of the Federal Reserve System or of any other person associated with the Federal Reserve System. ∗∗ Corresponding author. Telephone (202) 452-2550. E-mail Luca.Guerrieri@frb.gov.

1 Introduction While applications of automatic differentiation have spread across many different disciplines, they have remained less common in the field of economics.1 Based on the successes reported in facilitating optimization exercises in other disciplines, we deployedADtechniquestoassistwiththeestimationofdynamicgeneralequilibrium (DGE) models. These models are becoming a standard tool that central banks use to inform monetary policy decisions. However, the estimation of these models is complicated by the many parameters of interest. Thus, typically, the optimization method of choice makes use of derivatives. However, the complexity of the models does not afford a closed-form representation for the likelihood function. Finitedifference methods have been the standard practice to obtain numerical derivatives in this context. Using Tapenade (see (Hasco¨et 2004), (Hasco¨et, Greborio, and Pascual 2005), (Hasco¨et, Pascual, and Dervieux 2005)), we constructed derivatives for a general formulation of the likelihood function, which takes as essential input the linear representation of the model’s conditions for an equilibrium. The programming task was complicated by the fact that the numerical solution of a DGE model under rational expectations relies on fairly complex algorithms.2 We use Lapack routines for the implementation of the solution algorithm. In turn, ourtopLapackroutinesmakeuseofalargenumberofBlasroutines. Abyproductof our project has been the implementation of numerous AD derivatives of the double precision subset of Blas routines. Table 1 lists the routines involved. In the remainder of this paper, Section 2 lays out the general structure of a DGE model and describes our approach to setting up the model’s likelihood function. 1Examples of AD contributions to the computational finance literature are (C. H. Bischof 2002), (M. Giles 2006), (Giles 2007). 2In this paper we focus on the first-order approximation to the solution of a DGE model. Many alternative approaches have been advanced. We use the algorithm described by (Anderson and Moore 1985) which has the marked advantage of not relying on complex decompositions. 3

Section 3 outlines the step we took to implement the AD derivatives and how we built confidence in our results. Section 4 gives an example of a DGE model that we used to construct Monte Carlo experiments to compare maximum-likelihood estimates that rely, alternatively, on AD or FD derivatives, reported in Section 5. Section 6 concludes. 2 General Model Description and Estimation Strategy The class of DGE models that is the focus of this paper take the general form:   E X  t t+1      H(θ) X  = 0. (1)  t    X t−1 In the equation above, H is a matrix whose entries are a function of the structural parameter vector θ, while X is a vector of the model’s variables (including t the stochastic innovations to the shock processes). The term E is an expectation t operator, conditional on information available at time t and the model’s structure as in equation 1. Notice that allowing for only one lead and one lag of X in the t above equation implies no loss of generality. The model’s solution takes the form: X = S(H(θ))X , (2) t t−1 thus, given knowledge of the model’s variables at time t−1, a solution determines the model’s variables at time t uniquely. The entries of the matrix S are themselves functions of the matrix H and, in turn, of the parameter vector θ.   x  t  Partitioning X t such that X t =  , where (cid:178) t is a collection of all the inno- (cid:178) t vations to the exogenous shock processes (and possibly rearranging the system) it 4

is convenient to rewrite the model’s solution as x = A(H(θ))x +B(H(θ))(cid:178) . (3) t t−1 t Again, the entries in the matrices A and B are fundamentally functions of the parameter vector θ. Given a subset of the entries in x as observable, call these t entries y , the state-space representation of the system takes the form: t x = A(H(θ))x +B(H(θ))(cid:178) (4) t t−1 t y = Cx (5) t t Without loss of generality, we restrict the matrix C to be a selector matrix, which picks the relevant entries of x . Using the Kalman Filter recursions, we can express t the likelihood function for the model as: L = L(A(θ),B(θ),C,y ,...,y ) (6) t−h t where y and y are respectively the first and last observation points available. t−h t The routines we developed, given an input H(θ), produce the derivative of the likelihood function with respect to the structural parameters, ∂L, and as an inter- ∂θ mediate product, ∂A, the derivative of the model’s reduced-form parameters with ∂θ respect to the structural parameters. 3 Implementing AD Derivatives To obtain AD derivatives of the likelihood function, we used Tapenade in tangent mode. Tapenade required limited manual intervention on our part. This is remarkable given that the code to be differentiated consisted of approximately 80 subroutines for a total of over 17,000 lines of code. The derivative-augmented code produced by Tapenade covers approximately 25,000 lines (the original code has a size of 554 kilobytes and the differentiated code is 784 kilobytes in size). Recoding became necessary when the Lapack or Blas routines we called did not explicitly declare the sizes of the arguments in the calling structure and instead 5

allowed for arbitrary sizing (possibly exceeding the storage requirements). A more limited recoding was required when we encountered the use of “GOTO” statements in the Fortran 77 code of the Blas library, which Tapenade could not process. More substantially, two of the decompositions involved in the model solution, the real Schur decomposition and the singular-value decomposition, are not always unique. Parametric restrictions of the models we tested could ensure uniqueness of these decompositions. In those cases, we verified that AD derivatives obtained through Tapenade satisfied some basic properties of the decompositions that we derived analytically, but our test failed whenever we relaxed those parametric restrictions to allow for more general model specifications In particular, we relied on the Lapack routine DGEESX to implement the real Schur decomposition. For a given real matrix E, this decomposition produces a unitary matrix X, such that T = XHEX is quasitriangular. Given ∂E, we need ∂θ that the derivative ∂X satisfy ∂T = ∂XHEX + XH∂EX + XHE∂X, where ∂T is ∂θ ∂θ ∂θ ∂θ ∂θ ∂θ itself quasitriangular. This property failed to be met by our AD derivatives when our choice of E implied a non-unique Schur decomposition. To obviate this problem, we substituted the AD derivative for the DGEESX routine with the analytical derivative of the Schur decomposition as outlined in (Anderson 1987). Similarly, the singular value decomposition, implemented through the DGESVD routine in the Lapack library, given a real matrix E, produces unitary matrices U and V and a diagonal matrix D, such that E = UDVT. Given ∂E, it can be ∂θ shown that UT∂EV = UT∂UD+ ∂D +D∂V V, where ∂D is diagonal and UT∂U and ∂θ ∂θ ∂θ ∂θ θ ∂θ ∂V V are both antisymmetric. Our AD derivative of the routine DGESVD failed to ∂θ satisfy this property when the matrix E had repeated singular values (making the decomposition non-unique). We substituted our AD derivative with the analytical derivative derived by (Papadopoulo and Lourakis 2000). Totestthederivativeofthelikelihoodfunction, weusedatwo-prongedapproach. For special cases of our model that could be simplified enough as to yield a closed- 6

form analytical solution, we computed analytical derivatives and found them in agreement with our AD derivatives, accounting for numerical imprecision. To test the derivatives for more complex models that we could not solve analytically, we relied on comparisons with centered FD derivatives. Generally with a step size of 10−8 we found broad agreement between our AD derivatives and FD derivatives. Plotting AD and FD side by side, and varying the value at which the derivatives were evaluated, we noticed that the FD derivatives appeared noisier than the AD derivatives. We quantify the “noise” we observed in an example below. 4 Example Application As a first application of our derivatives, we consider a real business cycle model augmented with sticky prices and sticky wages, as well as several real rigidities, following the work of (Smets and Wouters 2003). Below, we give a brief description of the optimization problems solved by agents in the model, which allows us to interpret the parameters estimated in the Monte Carlo exercises that follow. There is a continuum of households of measure 1, indexed by h, whose objective is to maximize a discounted stream of utility according to the following setup: (cid:88)∞ max E βj(U(C (h),C (h)) t t+j t+j−1 [Ct(h),Wt(h),It(h),Kt+1(h),Bt+1(h)] j=0 +V(L (h)))+βjλ (h)[Π (h)+T (h)+(1−τ )W (h)L (h) t+j t+j t t+j Lt t+j t+j 1 (I (h)−I (h))2 t+j t+j−1 +(1−τ )R K (h)− ψ P Kt kt+j t+j I t+j 2 I (h) t+j−1 (cid:90) (cid:184) −P C (h)−P I (h)− ψ B (h)+B (h) t+j t+j t+j t+j t+j+1,t+j t+j+1 t+j s +βjQ (h)[(1−δ)K (h)+I (h)−K (h)]. t+j t+j t+j t+j+1 The utility function depends on consumption C (h) and labor supplied L (h). The t t parameter β is a discount factor for future utility. Households choose streams for consumption C (h), wages W (h), investment I (h), capital K (h) and bond holdt t t t+1 ingsB (h), subjecttothebudgetconstraint, whoseLagrangianmultiplieris λ (h), t+1 t 7

capital accumulation equation, whose Lagrangian multiplier is Q (h), and the lat (cid:179) (cid:180) −1+θw bor demand schedule L (h) = L Wt(h) θw . Households rent to firms (described t t Wt below) both capital, at the rental rate R , and labor at the rental rate W (h), Kt t subject to labor taxes at the rate τ and to capital taxes at the rate τ . There Lt Kt are quadratic adjustment costs for investment, governed by the parameter ψ , and I capital depreciates at a per-period rate δ. We introduce Calvo-type contracts for wages following (Erceg, Henderson, and Levin 2000). According to these contracts, the ability to reset wages for a household h in any period t follows a Poisson distribution. A household is allowed to reset wages with probability 1 − ξ . If the w wage is not reset, it is updated according to W (h) = W (h)πj (where π is the t+j t steady-state inflation rate), as in (Yun 1996). Finally, T (h) and Π (h) represent, t t respectively, net lump-sum transfers from the government and an aliquot share of the profits of firms. In the production sector, we have a standard Dixit-Stiglitz setup with nominal rigidities. Competitive final producers aggregate intermediate products for resale. Their production function is (cid:183)(cid:90) 1 1 (cid:184) 1+θp Y t = Y t (f)1+θp (7) 0 and from the zero profit condition the price for final goods is (cid:183)(cid:90) 1 − 1 (cid:184) −θp P t = P t (f) θp . (8) 0 where P (f) is the price for a unit of output for the intermediate firm f. t Intermediate firms are monopolistically competitive. There is complete mobility of capital and labor across firms. Their production technology is given by Y (f) = A K (f)αLd(f)1−α. (9) t t t t Intermediate firms take input prices as given. Ld(f), which enters the intermediate t firms’productionfunction,isanaggregateovertheskillssuppliedbyeachhousehold, 8

(cid:179) (cid:180) and takes the form Ld t (f) = (cid:82) h L t (h)1+ 1 θw 1+θw . A t is the technology level and evolves according to an autoregressive (AR) process: A −A = ρ (A −A)+(cid:178) , (10) t A t−1 At where (cid:178) is an iid innovation with standard deviation σ , and A is the steady-state At A level for technology. Intermediate firms set their prices P (f) according to Calvot type contracts with reset probabilities 1−ψ . When prices are not reset, they are P updated according to P t(f) = P (f)πj. t+j t Finally, the government sector sets a nominal risk-free interest rate according to the reaction function: π i = −1+γ (π −π)+γ (log(Y )−log(Y )+(cid:178) , (11) t π t Y t t−1 it β where inflation π ≡ Pt , and (cid:178) is itself an AR process of order 1. For this t Pt−1 it process, we denote the AR coefficient with ρ ; the stochastic innovation is iid with i standard deviation σ . Notice that, in this setting, households are Ricardian, hence i the time-profile of net lump-sum transfers is not distortionary. We assume that these transfers are set according to: τ W L +τ R K = G +T . (12) Lt t t Kt Kt t t t Labor taxes, τ , and capital taxes, τ , follow exogenous AR processes Lt Kt τ −τ = ρ (τ −τ )+(cid:178) , (13) Lt L L Lt−1 L Lt τ −τ = ρ (τ −τ )+(cid:178) , (14) Kt K K Kt−1 K Kt as does Government spending (expressed as a share of output) (cid:195) (cid:33) G G G G t t−1 − = ρ − +(cid:178) . (15) G Gt Y Y Y Y t t−1 In the equations above, the exogenous innovations (cid:178) ,(cid:178) ,(cid:178) are iid with standard Lt Kt Gt deviations σ , σ , and σ , respectively. The parameters τ , τ , and G, without a L Kt G L K Y time subscript, denote steady-state levels. 9

Thecalibrationstrategyfollows(Erceg,Guerrieri, andGust2005)andparameter values are reported in Table 2. By linearizing the necessary conditions for the solution of the model, we can express them in the format of Equation (1). 5 Monte Carlo Results Using the model described in Section 4 as the data-generating process, we set up a Monte Carlo experiment to compare maximum-likelihood estimates obtained through two different optimization methods. One of the methods relies on our AD derivative of the model’s likelihood function. The alternative method, uses a two-point, centered, finite-difference approximation to the derivative. In setting up the likelihood function, we limit our choices for the observed variables in the vector y of equation (5) to four series, namely: growth rate of output t log(Y )−log(Y ), price inflation π , wage inflation ω ≡ Wt , and the policy intert t−1 t t Wt−1 est rate i . For each Monte Carlo sample, we generate 200 observations, equivalent t to 50 years of data given our quarterly calibration, a sample length often used in empirical studies. We attempt to estimate the parameters ρ , σ , governing the i i exogenous shock process for the interest rate reaction function; ψ , ψ , the Calvo P W contract parameters for wages and prices; and γ , and γ the weights in the mone- π Y tary policy reaction function for inflation and activity. In the estimation exercises, we kept the remaining parameters at their values in the data-generating process as detailed in Table 2. We considered 1,000 Monte Carlo samples.3 The two experiments described below differ only insofar as we chose two different initialization points for the optimization routines we used to maximize the likelihood function. Figure 1 shows the sampling distribution for the parameter estimates from our 3Our maximum-likelihood estimates were constructed using the MATLAB optimization routine FMINUNC. When the optional argument “LargeScale” is set to “OFF”, this routine uses a limited memory quasi-Newton conjugategradientmethod, whichtakesasinputfirstderivativesoftheobjectivefunction, oranacceptableFD approximation. 10

Monte Carlo exercise when we initialize the optimization routine at the true parameter values used in the data-generating process. The black bars in the various panels denote the estimates that rely on AD derivatives, while the white bars denote the estimates obtained with FD derivatives. The optimization algorithm converged for all of the 1,000 Monte Carlo samples.4 We verified that the optimization routine did move away from the initial point towards higher likelihood values, so that clustering of the estimates around the truth do not merely reflect the initialization point. For our experiment, the figure makes clear that when the optimization algorithm is initiated at the true value for the parameters of interest, reliance on FD derivatives minimally affects the maximum-likelihood estimates for those parameters.5 Of course, the true value of the parameters do not necessarily coincide with the ML parameter estimates for small samples. Yet, it is unrealistic to assume that a researcherwouldhappenonsuchgoodstartingvalues. Figure2reportsthesampling distribution of estimates obtained when we initialize the optimization algorithm at arbitraryvaluesfortheparametersbeingestimated,awayfromtheirtruevalues. For the estimates reported in Figure 2, we chose ρ = 0.6, σ = 0.4, ψ = .5, ψ = 0.5, i i P W γ = 3, γ = 0.15. The bars in Figure 2 show the frequency of estimates in a π Y given range as a percentage of the 1,000 experiments we performed. We excluded results for which our optimization algorithm failed to converge. The figure makes clear that the convergence rate is much higher when using AD derivatives (47.2% instead of 28.3% for FD derivatives). Moreover, it is also remarkable that the higher convergence rate is not accompanied by a deterioration of the estimates (the 4ForourMATLABoptimizationroutine,wesettheconvergencecriteriontorequireachangeintheobjective function smaller than to 10−4, implying 6 significant figures for our specific likelihood function. This choice seemed appropriate given the limited precision of observed series in practical applications. 5We experimented with a broad set of Monte Carlo experiments by varying the choice of estimation parameters, so as to encompass the near totality of parameters in the calibration table, or so as to study individual parameters in isolation. We found results broadly in line with the particular Monte Carlo experiments we are reporting below. Our results also appear robust to broad variation in the calibration choice. 11

increased height of the black bars in the figure is proportional to that of the white bars). To quantify the difference between AD and FD derivatives of the likelihood function for one of our samples, we varied the parameters we estimated one at a time. Figure 3 shows the percentage difference in the magnitude of the AD and FD derivatives for ρ and σ . We discretized the ranges shown using a grid of i i 1,000 equally spaced points. The differences are generally small percentage-wise, although, on occasion, they spike up, or creep up as we move away from the true value, asinthecaseofσ . Fortheotherparametersweestimated, wedidnotobserve i differences in the magnitudes of the AD and FD derivatives larger than 10−4 over ranges consistent with the existence of a rational expectations equilibrium for our model. 6 Conclusion Given that the approximation error for a first derivative of the likelihood function of a DGE model computed through FD methods depends on the size of the second derivative, which itself is subject to approximation error, we view having an independent check in the form of automatic derivatives as a major contribution of our work. As an example application, we showed that AD derivatives can facilitate the computation of maximum-likelihood estimates for the parameters of a DGE model. 12

References Anderson, G. (1987). A Procedure for Differentiating Perfect-Foresight-Model Reduced-Form Coefficients. Journal of Economic Dynamics and Control 11, 465–81. Anderson, G. and G. Moore (1985). A Linear Algebraic Procedure for Solving Linear Perfect Foresight Models. Economic Letters 17, 247–52. C. H. Bischof, H. M. Bu¨cher, B. L. (2002). Automatic Differentiation for Computational Finance. In B. R. E. J. Kontoghiorghes and S. Siokos (Eds.), Computational Methods in Decision-Making. springer. Erceg, C. J., L. Guerrieri, and C. Gust (2005). Can Long-Run Restrictions Identify Technology Shocks? Journal of the European Economic Association 3(6), 1237–1278. Erceg, C. J., D. W. Henderson, and A. T. Levin (2000). Optimal monetary policy with staggered wage and price contracts. Journal of Monetary Economics 46(2), 281–313. Giles, M. (2007). Monte Carlo Evaluation of Sensitivities in Computational Finance. In E. A. Lipitakis (Ed.), HERCMA The 8th Hellenic European Research on Computer Mathematics and its Applications Conference. Hasco¨et, L. (2004). TAPENADE: a tool for Automatic Differentiation of programs. In Proceedings of 4th European Congress on Computational Methods, ECCOMAS’2004, Jyvaskyla, Finland. Hasco¨et, L., R.-M. Greborio, and V. Pascual (2005). Computing Adjoints by Automatic Differentiation with TAPENADE. Springer. Forthcoming. Hasco¨et, L., V. Pascual, and D. Dervieux (2005). Automatic Differentiation with TAPENADE. Springer. Forthcoming. M. Giles, P. G. (2006). Smoking Adjoints: Fast Monte Carlo Greeks. Risk Magazine 19, 88–92. 13

Papadopoulo, T. and M. I. A. Lourakis (2000). Estimating the Jacobian of the Singular Value Decomposition: Theory and Applications. Lecture Notes in Computer Science 1842/2000, 554–570. Smets, F. and R. Wouters (2003). An Estimated Dynamic Stochastic General Equilibrium Model of the Euro Area. Journal of the European Economic Association 1(5), 1123–1175. Yun, T. (1996). Nominal price rigidity, money supply endogeneity, and business cycles. Journal of Monetary Economics 37, 345–370. 14

Table 1: List of library functions Blas Functions daxpy.f dcopy.f ddot.f dgemm.f dgemv.f dger.f dnrm2.f drot.f dscal.f dswap.f dtrmm.f dtrmv.f dtrsm.f Lapack Functions dgebak.f dgebal.f dgeesx.f dgehd2.f dgehrd.f dgeqp3.f dgeqr2.f dgeqrf.f dgesv.f dgetf2.f dgetrf.f dgetrs.f dhseqr.f dlacn2.f dlacpy.f dladiv.f dlaexc.f dlahqr.f dlahr2.f dlaln2.f dlange.f dlanv2.f dlapy2.f dlaqp2.f dlaqps.f dlaqr0.f dlaqr1.f dlaqr2.f dlaqr3.f dlaqr4.f dlaqr5.f dlarfb.f dlarf.f dlarfg.f dlarft.f dlarfx.f dlartg.f dlascl.f dlaset.f dlassq.f dlaswp.f dlasy2.f dorg2r.f dorghr.f dorgqr.f dorm2r.f dormqr.f dtrexc.f dtrsen.f dtrsyl.f dtrtrs.f 15

Table 2: Calibration Parameter Used to Determine Parameter Used to Determine Parameters governing households’ and firms’ behavior β = 0.997 discount factor φ = 3 investment adj. cost I τ = 0.28 steady state labor tax rate τ = 0 steady state capital tax rate L K ψ = 0.75 Calvo price parameter ψ = 0.75 Calvo wage parameter P W δ = 0.025 depreciation rate Monetary Policy Reaction Function γ = 1.5 inflation weight γ = 0.5 output weight π Y Exogenous Processes AR(1) Coefficient Standard Deviation ρ = 0.98 labor tax rate σ = 3.88 labor tax rate innovation L L ρ = 0.97 capital tax rate σ = 0.80 capital tax innovation K K ρ = 0.98 govt spending σ = 0.30 govt spending innovation G G ρ = 0.95 monetary policy σ = 0.11 monetary policy innovation i i ρ = 0.95 technology σ = 0.94 labor tax innovation A A 16

ρ i 20 15 10 5 0 0.944 0.946 0.948 0.95 0.952 0.954 0.956 tnecreP σ i 20 15 10 5 0 0.09 0.1 0.11 0.12 0.13 tnecreP γ π 20 15 10 5 0 1.46 1.47 1.48 1.49 1.5 1.51 1.52 1.53 1.54 tnecreP γ Y 20 15 10 5 0 0.485 0.49 0.495 0.5 0.505 0.51 0.515 tnecreP ψ P 20 15 10 5 0 0.72 0.73 0.74 0.75 0.76 0.77 0.78 tnecreP ψ W 20 15 10 5 0 0.72 0.73 0.74 0.75 0.76 0.77 0.78 tnecreP Estimates Obtained with AD Derivatives Estimates Obtained with FD Derivatives True Parameter Value Figure 1: Sampling Distribution of Parameter Estimates; the Initial Guesses Coincided with the True Values in the Data-Generating Process. 17

ρ i 8 6 4 2 0 0.944 0.946 0.948 0.95 0.952 0.954 0.956 tnecreP σ i 8 6 4 2 0 0.09 0.095 0.1 0.105 0.11 0.115 0.12 0.125 0.13 tnecreP γ π 8 6 4 2 0 1.47 1.48 1.49 1.5 1.51 1.52 1.53 tnecreP γ Y 8 6 4 2 0 0.485 0.49 0.495 0.5 0.505 0.51 0.515 tnecreP ψ P 8 6 4 2 0 0.72 0.73 0.74 0.75 0.76 0.77 tnecreP ψ W 6 4 2 0 0.72 0.73 0.74 0.75 0.76 0.77 0.78 tnecreP Estimates Obtained with AD Derivatives Estimates Obtained with FD Derivatives True Parameter Value Figure 2: Sampling Distribution of Parameter Estimates; the Initial Guesses Did Not Coincide with the True Values in the Data-Generating Process. 18

ρ: Percentage Difference in the Magnitude of AD and FD Derivatives i 30 20 10 0 −10 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 σ: Percentage Difference in the Magnitude of AD and FD Derivatives i 100 50 0 −50 −100 −150 0 1 2 3 4 5 6 7 8 9 10 Figure 3: Percentage Difference Between AD and FD Derivatives. 19

Cite this document
APA
Houtan Bastani and Luca Guerrieri (2008). On the Application of Automatic Differentiation to the Likelihood Function for Dynamic General Equilibrium Models (IFDP 2008-920). Board of Governors of the Federal Reserve System, International Finance Discussion Papers. https://whenthefedspeaks.com/doc/ifdp_2008-920
BibTeX
@techreport{wtfs_ifdp_2008_920,
  author = {Houtan Bastani and Luca Guerrieri},
  title = {On the Application of Automatic Differentiation to the Likelihood Function for Dynamic General Equilibrium Models},
  type = {International Finance Discussion Papers},
  number = {2008-920},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2008},
  url = {https://whenthefedspeaks.com/doc/ifdp_2008-920},
  abstract = {A key application of automatic differentiation (AD) is to facilitate numerical optimization problems. Such problems are at the core of many estimation techniques, including maximum likelihood. As one of the first applications of AD in the field of economics, we used Tapenade to construct derivatives for the likelihood function of any linear or linearized general equilibrium model solved under the assumption of rational expectations. We view our main contribution as providing an important check on finite-difference (FD) numerical derivatives. We also construct Monte Carlo experiments to compare maximum-likelihood estimates obtained with and without the aid of automatic derivatives. We find that the convergence rate of our optimization algorithm can increase substantially when we use AD derivatives.},
}