feds · August 31, 2016

Tempered Particle Filtering

Abstract

The accuracy of particle filters for nonlinear state-space models crucially depends on the proposal distribution that mutates time t-1 particle values into time t values. In the widely-used bootstrap particle filter this distribution is generated by the state-transition equation. While straightforward to implement, the practical performance is often poor. We develop a self-tuning particle filter in which the proposal distribution is constructed adaptively through a sequence of Monte Carlo steps. Intuitively, we start from a measurement error distribution with an inflated variance, and then gradually reduce the variance to its nominal level in a sequence of steps that we call tempering. We show that the filter generates an unbiased and consistent approximation of the likelihood function. Holding the run time fixed, our filter is substantially more accurate in two DSGE model applications than the bootstrap particle filter.

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Tempered Particle Filtering Edward Herbst and Frank Schorfheide 2016-072 Please cite this paper as: Herbst, Edward and Frank Schorfheide (2016). “Tempered Particle Filtering,” Finance and Economics Discussion Series 2016-072. Washington: Board of Governors of the Federal Reserve System, http://dx.doi.org/10.17016/FEDS.2016.072. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Tempered Particle Filtering Edward Herbst Frank Schorfheide ∗ Federal Reserve Board University of Pennsylvania PIER, CEPR, and NBER August 25, 2016 Abstract The accuracy of particle filters for nonlinear state-space models crucially depends on the proposal distribution that mutates time t 1 particle values into time t values. − In the widely-used bootstrap particle filter this distribution is generated by the statetransition equation. While straightforward to implement, the practical performance is often poor. Wedevelopa self-tuningparticle filter inwhich the proposal distributionis constructed adaptively through a sequence of Monte Carlo steps. Intuitively, we start from a measurement error distribution with an inflated variance, and then gradually reduce the variance to its nominal level in a sequence of steps that we call tempering. We show that the filter generates an unbiased and consistent approximation of the likelihoodfunction. Holdingtheruntimefixed,ourfilterissubstantiallymoreaccurate in two DSGE model applications than the bootstrap particle filter. JEL CLASSIFICATION: C11, C15, E10 KEYWORDS:BayesianAnalysis, DSGEModels, NonlinearFiltering, MonteCarloMethods ∗Correspondence: E. Herbst: Board of Governors of the Federal Reserve System, 20th Street and Constitution Avenue N.W., Washington, D.C. 20551. Email: edward.p.herbst@frb.gov. F. Schorfheide: Department of Economics, 3718 Locust Walk, University of Pennsylvania, Philadelphia, PA 19104-6297. We thankseminarparticipantsattheBoardofGovernorsandtheFederalReserveBankofClevelandforhelpful comments. Email: schorf@ssc.upenn.edu. Schorfheide gratefully acknowledges financial support from the National Science Foundation under the grant SES 1424843. The views expressed in this paper are those of theauthorsanddonotnecessarilyreflecttheviewsoftheFederalReserveBoardofGovernorsortheFederal Reserve System.

1 1 Introduction Estimated dynamic stochastic general equilibrium (DSGE) models are now widely used by academics to conduct empirical research in macroeconomics as well as by central banks to interpret the current state of the economy, to analyze the impact of changes in monetary or fiscal policies, and to generate predictions for macroeconomic aggregates. In most applications, the estimation utilizes Bayesian techniques, which require the evaluation of the likelihood function of the DSGE model. If the model is solved with a (log)linear approximation technique and driven by Gaussian shocks, then the likelihood evaluation can be efficiently implemented with the Kalman filter. If, however, the DSGE model is solved using a nonlinear technique, the resulting state-space representation is nonlinear and the Kalman filter can no longer be used. Fern´andez-VillaverdeandRubio-Ram´ırez(2007)proposedtouseaparticlefiltertoevaluate the likelihood function of a nonlinear DSGE model and many other papers have followed this approach since. However, configuring the particle filter so that it generates an accurate approximation of the likelihood function remains a key challenge. The contribution of this paper is to propose a self-tuning particle filter, which we call a tempered particle filter, that in our applications is substantially more accurate than the widely-used bootstrap particle filter. Our starting point is the state-space representation of a potentially nonlinear DSGE model given by a measurement equation and a state-transition equation the form (cid:0) (cid:1) y = Ψ(s ,t;θ)+u , u N 0,Σ (θ) (1) t t t t u ∼ s = Φ(s ,(cid:15) ;θ), (cid:15) F ( ;θ). t t−1 t t (cid:15) ∼ · The functions Ψ(s ,t;θ) and Φ(s ,(cid:15) ;θ) are generated numerically when solving the DSGE t t−1 t model. Here y is a n 1 vector of observables, u is a n 1 vector of normally distributed t y t y × × measurement errors, and s is an n 1 vector of hidden states. In order to obtain the t s × likelihood increments p(y Y ,θ), where Y = y ,...,y , it is necessary to integrate out t+1 1:t 1:t 1 t | { } the latent states: (cid:90) (cid:90) p(y Y ,θ) = p(y s ,θ)p(s s ,θ)p(s Y ,θ)ds ds , (2) t+1 1:t t+1 t+1 t+1 t t 1:t t+1 t | | | | which can be done recursively with a filter.

2 Particle filters represent the distribution of the hidden state vector s conditional on time t t information Y = y ,...,y through a swarm of particles s j ,W j M such that 1:t { 1 t } { t t }j=1 M (cid:90) 1 (cid:88) h(s j)W j h(s )p(s Y ,θ). (3) M t t ≈ t t | 1:t j=1 The approximation here is in the sense of a strong law of large numbers (SLLN) or a central limit theorem (CLT). The approximation error vanishes as the number of particles M tends to infinity. The filter recursively generates approximations of p(s Y ,θ) for t = 1,...,T t 1:t | andproducesapproximationsofthelikelihoodincrementsp(y Y ,θ)asaby-product. There t 1:t | exists a large literature on particle filters. Surveys and tutorials are provided, for instance, by Arulampalam, Maskell, Gordon, and Clapp (2002), Capp´e, Godsill, and Moulines (2007), Doucet and Johansen (2011), Creal (2012), and Herbst and Schorfheide (2015). Textbook treatments of the statistical theory underlying particle filters can be found in Liu (2001), Capp´e, Moulines, and Ryden (2005), and Del Moral (2013). The conceptually most straightforward version of the particle filter is the bootstrap particle filter proposed by Gordon, Salmond, and Smith (1993). This filter uses the statetransition equation to turn s j particles onto s j particles, which are then reweighted based t−1 t on their success in predicting the time t observation, measured by p(y s j ,θ). While the t t | bootstrap particle filter is easy to implement, it relies on the state-space model’s ability to accurately predict y by forward simulation of the state-transition equation. In general, the t lower the average density p(y s j ,θ), the more uneven the distribution of the updated particle t t | weights, and the less accurate the approximation in (3). Ideally, the proposal distribution for s j should not just be based on the state-transition t equation p(s s ,θ), but also account for the observation y . In fact, conditional on s j t t−1 t t−1 | the optimal proposal distribution is the posterior p(s y ,s j ,θ) p(y s ,θ)p(s s j ,θ), t t t−1 t t t t−1 | ∝ | | where denotes proportionality. Unfortunately, in a generic nonlinear state-space model, it ∝ is not possible to directly sample from this distribution. Constructing an approximation for p(s y ,s j ,θ) in a generic state-space model is difficult and often involves tedious modelt t t−1 | specific calculations that have to be executed by the user of the algorithm prior to its implementation.1 Theinnovationinourpaperistogeneratethisapproximationinasequence 1Attempts include approximations based on the one-step Kalman filter updating formula applied to a

3 of Monte Carlo steps. The starting point is the observation, that the larger the measurement errorvariancethemoreaccuratethefilterbecomes, becauseholdingeverythingelseconstant, the variance of the particle weights decreases. Building on this insight, in each period t, we generate s j by forward simulation, but then update the particle weights based on a t density p (y s ,θ) with an inflated measurement error variance. In a sequence of steps that 1 t t | we call tempering, we reduce this inflated measurement error variance to its nominal level. These steps mimic a sequential Monte Carlo algorithm designed for a static parameter. Such algorithms have been successfully used to approximate posterior distributions for parameters of econometric models.2 Weshowthatourproposedtemperedparticlefilterproducesavalidapproximationofthe likelihood function and substantially reduces the Monte Carlo error relative to the bootstrap particle filter, even after controlling for computational time. Our algorithm can be embedded into particle Markov chain Monte Carlo algorithms that replace the true likelihood by a particle-filter approximation; see, for instance, Fern´andez-Villaverde and Rubio-Ram´ırez (2007) for DSGE model applications and Andrieu, Doucet, and Holenstein (2010) for the underlying statistical theory. The remainder of the paper is organized as follows. The proposed tempered particle filter is presented in Section 2. We provide a SLLN for the particle filter approximation of the likelihood function in Section 3 and show that the approximation is unbiased. Here we are focusing of a version of the filter that is non-adaptive. The filter is applied to a smallscale New Keynesian DSGE model and the Smets-Wouters model in Section 4 and Section 5 concludes. Theoretical derivations, computational details, DSGE model descriptions, and data sources are relegated to the Online Appendix. 2 The Tempered Particle Filter A key determinant of the behavior of a particle filter is the distribution of the normalized weights w˜j W j W ˜ j = t t−1 , t 1 (cid:80)M w˜j W j M i=1 t t−1 linearized version of the DSGE model. Alternatively, one could use the updating step of an approximate filter, e.g., the ones developed by Andreasen (2013) or Kollmann (2015). 2Chopin (2002) first showed how to use sequential Monte Carlo methods to conduct inference on a parameter that does not evolve over time. Applications to the estimation of DSGE model parameters have been considered in Creal (2007) and Herbst and Schorfheide (2014). Durham and Geweke (2014) and Bognanni and Herbst (2015) provide applications to the estimation of other time-series models.

4 where W j is the (normalized) weight associated with the jth particle at time t 1, w˜j t−1 t − incremental weight after observing y , and W ˜ j is the normalized weight accounting for this t t new observation.3 For the bootstrap particle filter, the incremental weight is simply the likelihood of observing y given the jth particle, p(y s j ,θ). t t t | One can show that, holding the observations fixed, the bootstrap particle filter becomes more accurate as the measurement error variance increases because the variance of the particle weights W ˜ M decreases. Consider the following stylized example which examines { t }j=1 an approximate population analogue for W ˜ j. Suppose that y is scalar, the measurement t t errors are distributed as u N(0,σ2), W = 1, and let δ = y Ψ(s ,t;θ). Moreover, t ∼ u t−1 t t − t assume that in population the δ ’s are distributed according to a N(0,1). In this case, we t can define the weights v(δ ) normalized under the population distribution of δ as (omitting t t t subscripts): (cid:110) (cid:111) v(δ) = exp −2σ 1 u 2 δ2 = (cid:18) 1+ 1 (cid:19)1/2 exp (cid:26) 1 δ2 (cid:27) . (cid:110) (cid:16) (cid:17) (cid:111) (2π)−1/2 (cid:82) exp 1 1+ 1 δ2 dδ σ u 2 −2σ u 2 −2 σ u 2 By virtue of the normalization, the mean of the weights is equal to one: E[v(δ)] = 1. The population variance of the weights v(δ) is given by (cid:90) 1 1+σ2 V[v(δ)] = v2(δ)dδ 1 = u 1. (cid:112) − σ u 2+σ u 2 − Note that V[v(δ)] as σ 0. Moreover, V[v(δ)] 0 as σ . By differentiu u −→ ∞ −→ −→ −→ ∞ ating with respect to σ one can show that the variance is monotonically decreasing in the u measurement error variance σ2. This heuristic illustrates that the larger the measurement u error variance in the state-space model (holding the observations fixed), the smaller the variance of the particle weights. Because the variance of an importance sampling approximation is an increasing function of the variance of the particle weights, increasing the measurement error variance tends to raise the accuracy of the particle filter approximation. We use this insight to construct a tempered particle filter in which we generate proposed particle values s˜j sequentially, by reducing the measurement error variance from an inflated t 3In the notation developed subsequently, the tilde on W˜j indicates that this is weight associated with t particle j before any resampling of the particles.

5 initial level Σ (θ)/φ to the nominal level Σ (θ). Formally, define u 1 u (cid:26) (cid:27) 1 p (y s ,θ) φd/2 Σ (θ) −1/2exp (y Ψ(s ,t;θ))(cid:48)φ Σ−1(θ)(y Ψ(s ,t;θ)) , (4) n t | t ∝ n | u | −2 t − t n u t − t where: φ < φ < ... < φ = 1. 1 2 N φ Here φ scales the inverse covariance matrix of the measurement error and can therefore be n interpreted as a precision parameter. The reduction of the measurement error variance is achieved by a sequence of Monte Carlo steps that we borrow from the literature of SMC approximations for posterior moments of static parameters (see Chopin (2002) and, for instance, the treatment in Herbst and Schorfheide (2015)). By construction, p (y s ,θ) = p(y s ,θ). Based on p (y s ,θ) we can define the bridge N t t t t n t t φ | | | distributions p (s y ,s ,θ) p (y s ,θ)p(s s ,θ). (5) n t t t−1 n t t t t−1 | ∝ | | Integrating out s under the distribution p(s Y ,θ) yields the bridge posterior density t−1 t−1 1:t−1 | for s conditional on the observables: t (cid:90) p (s Y ,θ) = p (s y ,s ,θ)p(s Y ,θ)ds . (6) n t 1:t n t t t−1 t−1 1:t−1 t−1 | | | In the remainder of this section we describe the proposed tempered particle filter. We do so in two steps: Section 2.1 presents the main algorithm that iterates over periods t = 1,...,T to approximate the likelihood increments p(y Y ,θ) and the filtered states p(s Y ,θ). t 1:t−1 t 1:t | | In Section 2.2 we focus on the novel component of our algorithm, which in every period t uses N steps to reduce the measurement error variance from Σ (θ)/φ to Σ (θ). φ u 1 u 2.1 The Main Iterations The tempered particle filter has the same structure as the bootstrap particle filter. In every period t, we use the state-transition equation to simulate the state vector forward, we update the particle weights, and we resample the particles. The key innovation is to start out with a fairly large measurement error variance, which is then iteratively reduced to the nominal measurement error variance Σ (θ). As the measurement error variance is reduced u

6 (tempering), we adjust the innovations to the state-transition equation as well as the particle weights. The algorithm is essentially self-tuning. The user only has to specify the overall number of particles M and two tuning parameters for the tempering steps. The tempering sequence φ ,...,φ can be chosen adaptively. We will provide more details in Section 2.2. 1 N φ Algorithm 1 summarizes the iterations over periods t = 1,...,T. Algorithm 1 (Tempered Particle Filter) 1. Period t = 0 initialization. Draw the initial particles from the distribution s j iid 0 ∼ p(s θ) and set N = 1, s j,N φ = s j, and W j,N φ = 1, j = 1,...,M. 0 φ 0 0 0 | 2. Period t Iterations. For t = 1,...,T: (a) Particle Initialization. i. Starting from s j,N φ,W j,N φ , generate (cid:15)˜j,1 F ( ;θ) and define t−1 t−1 t (cid:15) { } ∼ · s˜j,1 = Φ(s j,N φ,(cid:15)˜j,1;θ). t t−1 t ii. Compute the incremental weights: w˜j,1 = p (y s˜j,1 ,θ) (7) t 1 t t | = (2π)−d/2φ d/2 Σ (θ) −1/2 1 u | | (cid:20) (cid:26) (cid:27)(cid:21) 1 exp (cid:0) y Ψ(s˜j,1 ,t;θ) (cid:1)(cid:48) φ Σ−1(θ) (cid:0) y Ψ(s˜j,1 ,t;θ) (cid:1) . × − 2 t − t 1 u t − t iii. Normalize the incremental weights: w˜j,1 W j,N φ W ˜ j,1 = t t−1 (8) t 1 (cid:80)M w˜j,1 W j,N φ M j=1 t t−1 to obtain the particle swarm s˜j,1 ,(cid:15)˜j,1 ,s j,N φ,W ˜ j,1 , which leads to the approxt t t−1 t { } imation M (cid:90) 1 (cid:88) h ˜1 = h(s˜j,1)W ˜ j,1 h(s )p (s Y ,θ)ds . (9) t,M M t t ≈ t 1 t | 1:t t j=1 Moreover M 1 (cid:88) w˜j,1 W j,N φ p (y Y ,θ). (10) M t t−1 ≈ 1 t | 1:t−1 j=1

7 iv. Resample the particles: s˜j,1 ,(cid:15)˜j,1 ,s j,N φ,W ˜ j,1 s j,1 ,(cid:15) j,1 ,s j,N φ,W j,1 , t t t−1 t t t t−1 t { } (cid:55)→ { } where W j,1 = 1 for j = 1,...,N. This leads to the approximation t M (cid:90) 1 (cid:88) h ¯1 = h(s j,1)W j,1 h(s )p (s Y )ds . (11) t,M M t t ≈ t 1 t | 1:t t j=1 (b) Tempering Iterations: Execute Algorithm 2 to i. convert the particle swarm s j,1 ,(cid:15) j,1 ,s j,N φ,W j,1 s j,N φ,(cid:15) j,N φ,s j,N φ,W j,N φ t t t−1 t t t t−1 t { } (cid:55)→ { } to approximate M (cid:90) 1 (cid:88) h ¯N φ = h(s j,N φ)W j,N φ h(s )p(s Y ,θ)ds ; (12) t,M M t t ≈ t t | 1:t t j=1 ii. compute the approximation pˆ (y Y ,θ) of the likelihood increment. M t 1:t−1 | 3. Likelihood Approximation T (cid:89) pˆ (Y θ) = pˆ (y Y ,θ). (cid:4) (13) M 1:T M t 1:t−1 | | t=1 If we were to set φ = 1, N = 1, and omit Step 2.(b), then Algorithm 1 is exactly 1 φ identical to the bootstrap particle filter: the s j particle values are simulated forward using t−1 the state-transition equation, the weights are then updated based on how well the new state s˜j predicts the time t observations, measured by the predictive density p(y s˜j), and finally t t t | the particles are resampled using a standard resampling algorithm, such as multinominal resampling, or systematic resampling.4 The drawback of the bootstrap particle filter is that the proposal distribution for the innovation (cid:15)˜j F ( ;θ) is “blind,” in that it is not adapted to the period t observation y . t (cid:15) t ∼ · Thistypicallyleadstoalargevarianceintheincrementalweightsw˜j, whichinturntranslates t 4Detailed textbook treatments of resampling algorithms can be found in the by Liu (2001) and Capp´e, Moulines, and Ryden (2005).

8 into inaccurate Monte Carlo approximations. Taking the states s j M as given and { t−1 }j=1 assuming that a t 1 resampling step has equalized the particle weights, that is, W j = 1, t−1 − the conditionally optimal choice for the proposal distribution is p((cid:15)˜j s j ,y ,θ). However, t t−1 t | because of the nonlinearity in state-transition and measurement equation, it is not possible to directly generate draws from this distribution. The main idea of our algorithm is to sequentially adapt the proposal distribution for the innovations to the current observation y by raising φ from a small initial value to φ = 1.5 This is done in Step 2(b), which is t n N φ described in detail in Algorithm 2 in the next section. In general, we could replace the draws of (cid:15)˜j,1 from the innovation distribution F ( ;θ) in t (cid:15) · Step 2(a)i of Algorithm 2 with draws from a tailored distribution with density g1((cid:15)˜j,1 s j,N φ) t t | t−1 and then adjust the incremental weight ω˜j,1 by the ratio p ((cid:15)˜j,1)/g1((cid:15)˜j,1 s j,N φ), as it is done t (cid:15) t t t | t−1 in the generalized version of the particle filter. Here the g ( ) density might be constructed t · based on a linearized version of the DSGE model or be obtained through the updating steps ofaconventionalnonlinearfilter, suchasanextendedKalmanfilter, unscentedKalmanfilter, or a Gaussian quadrature filter. Thus, the proposed tempering steps can be used either to relieve the user from the burden of having to construct a g1((cid:15)˜j,1 s j,N φ) in the first place; or it t t | t−1 could be used to improve upon the accuracy obtained with a readily-available g1((cid:15)˜j,1 s j,N φ). t t | t−1 2.2 Tempering the Measurement Error Variance The tempering iterations build on the sequential Monte Carlo (SMC) algorithms that have been developed for static parameters. In these algorithms (see, for instance, Chopin (2002), Durham and Geweke (2014), Herbst and Schorfheide (2014, 2015)), the goal is to generate draws from a posterior distribution p(θ Y) by sampling from a sequence of bridge posteriors | p (θ Y) (cid:2) p(Y θ) (cid:3)φnp(θ). Note that the bridge posterior is equal to the actual posterior n | ∝ | for φ = 1. At each iteration, the algorithm cycles through three stages: particle weights n are updated in the correction step; the particles are being resampled and particle weights are equalized in the selection step; and particle values are changed in the mutation step. The analogue of (cid:2) p(Y θ) (cid:3)φn in our algorithm is p (y s ,θ) given in (4), which reduces to n t t | | p(y s ,θ) for φ = 1. Algorithm 2 summarizes the correction, selection, and mutation steps t t n | for tempering iterations n = 2,...,N . φ 5The number of iterations that we are using depends on the period t, but to simplify the notation somewhat, we dropped the t subscript and write N rather than N . φ φ,t

9 Algorithm 2 (Tempering Iterations) Thisalgorithmreceivesasinputtheparticleswarm s j,1 ,(cid:15) j,1 ,s j,N φ,W j,1 and returns as output the particle swarm s j,N φ,(cid:15) j,N φ,s j,N φ,W j,N φ and t t t−1 t t t t−1 t { } { } the likelihood increment pˆ (y Y ,θ). Set n = 2 and N = 0. M t 1:t−1 φ | 1. Do until n = N : φ (a) Correction: i. For j = 1,...,M define the incremental weights p (y s j,n−1 ,θ) w˜j,n(φ ) = n t | t (14) t n p (y s j,n−1 ,θ) n−1 t t | = (cid:18) φ n (cid:19)d/2 exp (cid:26) 1 (cid:2) y Ψ(s j,n−1 ,t;θ) (cid:3)(cid:48) φ − 2 t − t n−1 (cid:27) (φ φ )Σ−1 (cid:2) y Ψ(s j,n−1 ,t;θ) (cid:3) . × n − n−1 u t − t ii. Define the normalized weights w˜j,n(φ )W j,n−1 W ˜ j,n(φ ) = t n t , (15) t n 1 (cid:80)M w˜j,n(φ )W j,n−1 M j=1 t n t (W j,n−1 = 1, because the resampling step was executed in iteration n 1), t − and the inefficiency ratio M InEff(φ ) = 1 (cid:88)(cid:0) W ˜ j,n(φ ) (cid:1)2 . (16) n M t n j=1 iii. If InEff(φ = 1) r∗, then set φ = 1, N = n, and W ˜ j,n = W ˜ j,n(φ = 1). n n φ t t n ≤ Otherwise, let n = n+1, φ∗ be the solution to InEff(φ∗) = r∗, and W ˜ j,n = n n t W ˜ j,n(φ = φ∗). t n n iv. The particle swarm s j,n−1 ,(cid:15) j,n−1 ,s j,N φ,W ˜ j,n approximates t t t−1 t { } M (cid:90) 1 (cid:88) h ˜n = h(s j,n−1)W ˜ j,n h(s )p (s Y ,θ)ds . (17) t,M M t t ≈ t n t | 1:t t j=1 (b) Selection: Resample the particles: s j,n−1 ,(cid:15) j,n−1 ,s j,N φ,W ˜ j,n sˆj,n ,(cid:15)ˆj,n ,s j,N φ,W j,n , t t t−1 t t t t−1 t { } (cid:55)→ { }

10 where W j,n = 1 for j = 1,...,N. Keep track of the correct ancestry information t such that sˆj,n = Φ(s j,N φ,(cid:15)ˆj,n;θ) t t−1 t for each j. This leads to the approximation M (cid:90) 1 (cid:88) h ˆn = h(sˆj,n)W j,n h(s )p (s Y ,θ)ds . (18) t,M M t t ≈ t n t | 1:t t j=1 (c) Mutation: Use a Markov transition kernel K (s sˆ;s ) with the invariance n t t t−1 | property (cid:90) p (s y ,s ,θ) = K (s sˆ;s )p (sˆ y ,s ,θ)dsˆ (19) n t t t−1 n t t t−1 n t t t−1 t | | | to mutate the particle values (see Algorithm 3 for an implementation). This leads to the particle swarm s j,n ,(cid:15) j,n ,s j,N φ,W j,n , which approximates t t t−1 t { } M (cid:90) 1 (cid:88) h ¯n = h(s j,n)W j,n h(s )p (s Y ,θ)ds . (20) t,M M t t ≈ t n t | 1:t t j=1 2. Approximate the likelihood increment: (cid:89) N φ (cid:32) 1 (cid:88) M (cid:33) pˆ (y Y ,θ) = w˜j,n W j,n−1 (21) M t | 1:t−1 M t t n=1 j=1 with the understanding that W j,0 = W j,N φ. (cid:4) t t−1 The correction step adapts the stage n 1 particle swarm to the reduced measurement − errorvarianceinstagenbyreweightingtheparticles. Theincrementalweightsin(14)capture the change in the measurement error variance from Σ (θ)/φ to Σ (θ)/φ and yield an n n−1 n n importance sampling approximation of p (s Y ,θ) based on the stage n 1 particle values. n t 1:t | − N Rather than relying on a fixed exogenous tempering schedule φ φ , we choose φ to n n=1 n { } achieve a targeted inefficiency ratio r∗ > 1, an approach that has proven useful in the context of global optimization of nonlinear functions. Geweke and Frischknecht (2014) develop an adaptive SMC algorithm incorporating targeted tempering to solve such problems. To relate the inefficiency ratio to φ , we begin by defining n 1 e = (y Ψ(s j,n−1 ,t;θ))(cid:48)Σ−1(y Ψ(s j,n−1 ,t;θ)). j,t 2 t − t u t − t

11 Assuming that the particles were resampled in iteration n 1 and W j,n−1 = 1, we can then t − express the inefficiency ratio as InEff(φ ) = 1 (cid:88) M (cid:0) W ˜ j,n(φ ) (cid:1)2 = M 1 (cid:80)M j=1 exp[ − 2(φ n − φ n−1 )e j,t ] . (22) n M j=1 t n (cid:16) M 1 (cid:80)M j=1 exp[ − (φ n − φ n−1 )e j,t ] (cid:17)2 It is straightforward to verify that for φ = φ the inefficiency ratio InEff(φ ) = 1 < r∗. n n−1 n Moreover, we show in the Online Appendix that the function is monotonically increasing on the interval [φ ,1], which is the justification for Step 1(a)iii of Algorithm 3. Thus, we n−1 are raising φ as closely to one as we can without exceeding a user-defined bound on the n variance of the particle weights. Note that we can use the same approach to set the initial scaling factor φ in Algorithm 1. 1 The selection step is executed in every iteration n to ensure that we can find a unique φ based on (22) in the subsequent iteration. The equalization of the particle weights n+1 allows us to characterize the properties of the function InEff(φ ). Finally, in the mutation n step we are using a Markov transition kernel to change the particle values (s j,n ,(cid:15) j,n) in a t t way to maintain an approximation of p (s Y ,θ). In the absence of the mutation step the n t 1:t | initial particle values (s j,1 ,(cid:15) j,1) generated in Step 2(a) of Algorithm 2 would never change t t and we would essentially reproduce the bootstrap particle filter by computing p(y s˜j ,θ) t t | sequentially under a sequence of measurement error covariance matrices that converges to Σ (θ). The mutation can be implemented with N steps of a random walk Metropolisu MH Hastings (RWMH) algorithm. Algorithm 3 (RWMH Mutation Step) Thisalgorithmreceivesasinputtheparticleswarm sˆj,n ,(cid:15)ˆj,n ,s j,N φ,W j,n and returns as output the particle swarm s j,n ,(cid:15) j,n ,s j,N φ,W j,n . t t t−1 t t t t−1 t { } { } 1. Tuning of Proposal Distribution: Compute M M 1 (cid:88) 1 (cid:88) µ(cid:15) = (cid:15)ˆj,n W j,n , Σ(cid:15) = (cid:15)ˆj,n((cid:15)ˆj,n)(cid:48)W j,n µ(cid:15)(µ(cid:15))(cid:48). n M t t n M t t t − n n j=1 j=1 2. Execute N Metropolis-Hastings Steps for Each Particle: For j = 1,...M: MH (a) Set (cid:15)ˆj,n,0 = (cid:15)ˆj,n. Then, for l = 1,...,N : t t MH

12 i. Generate a proposed innovation: e j N (cid:0) (cid:15)ˆj,n,l−1 ,c2Σ(cid:15) (cid:1) . t ∼ t n n ii. Compute the acceptance rate: (cid:40) (cid:41) p (y e j ,s j,N φ,θ)p (e j) α(e j (cid:15)ˆj,n,l−1) = min 1, n t | t t−1 (cid:15) t . t | t p (y (cid:15)ˆj,n,l−1 ,s j,N φ,θ)p ((cid:15)ˆj,n,l−1) n t t t−1 (cid:15) t | iii. Update particle values: (cid:40) e j with prob. α(e j (cid:15)ˆj,n,l−1) (cid:15)ˆj,n,l = t t | t t (cid:15)ˆj,n,l−1 with prob. 1 α(e j (cid:15)ˆj,n,l−1) t t t − | (b) Define (cid:15) j,n = (cid:15)ˆj,n,NMH, s j,n = Φ(s j,N φ,(cid:15) j,n;θ). (cid:4) t t t t−1 t To tune the RWMH steps, we use the (cid:15)ˆj,n ,W j,n particles (this is the output from the t t { } selection step in Algorithm 2) to compute a covariance matrix for the Gaussian proposal distribution used in Step 2.(a) of Algorithm 3. We scale the covariance matrix adaptively by c to achieve a desired acceptance rate. In particular, we compute the average empirical n ˆ rejection rate R (c ), based on the Mutation phase in iteration n 1. The average is n−1 n−1 − computed across the N RWMH steps. We set c = c∗ and for n > 2 adjust the scaling MH 1 factor according to (cid:0) ˆ (cid:1) e20(x−0.40) c = c f 1 R (c ) , f(x) = 0.95+0.10 . (23) n n−1 − n−1 n−1 1+e20(x−0.40) This function is designed to increase the scaling factor by 5 percent if the acceptance rate is well above 0.40, and decrease the scaling factor by 5 percent if the acceptance rate is well below 0.40. For acceptance rates near 0.40, the increase (or decrease) of c is attenuated by n the logistic component of the function above. In our empirical applications, the performance of the filter was robust to variations on the rule. In order to run Algorithm 3 the user has to specify the initial scaling of the proposal covariance matrix c , as well as the number of Metropolis-Hastings steps. In principle, the ∗ user can also adjust the target acceptance rate and the speed of adjustment in (23).

13 3 Theoretical Properties of the Filter We will now examine asymptotic (with respect to the number of particles M) and finite sample properties of the particle filter approximation of the likelihood function. Section 3.1 providesaSLLNandSection3.2showsthatthelikelihoodapproximationisunbiased. Throughout this section, we will focus on a version of the filter that is not self-tuning. This version of the filter replaces Algorithm 2 by Algorithm 4 and Algorithm 3 by Algorithm 5: Algorithm 4 (Tempering Iterations – Not Self-Tuning) Thisalgorithmisidenticalto N Algorithm 2, with the exception that the tempering schedule φ φ is pre-determined. The n n=1 { } Do until n = N -loop is replaced by a For n = 1 to N -loop and Step 1(a)iii is eliminated. (cid:4) φ φ Algorithm 5 (RWMH Mutation Step – Not Self-Tuning) This algorithm is identical to Algorithm 3 with the exception that the sequences c ,Σ(cid:15) N φ are pre-determined. { n n} n=1 (cid:4) Extensions of the asymptotic results to self-tuning sequential Monte Carlo algorithms are discussed, for instance, in Herbst and Schorfheide (2014) and Durham and Geweke (2014). 3.1 Asymptotic Properties Under suitable regularity conditions the Monte Carlo approximations generated by a particle filter satisfy a SLLN and a CLT. Proofs for a generic particle filter are provided in Chopin (2004). We will subsequently establish a SLLN for the tempered particle filter, by modifying the recursive proof developed by Chopin (2004) to account for the tempering iterations of Algorithm 4. To simplify the notation we drop θ from the conditioning set of all densities. In this paper we are primarily interested in establishing an almost-sure limit for the Monte Carlo approximation of the likelihood function:   (cid:89) T a.s. (cid:89) T (cid:89) N φ p n (y t Y 1:t−1 ) pˆ M (Y 1:T ) = pˆ M (y t Y 1:t−1 ) p 1 (y t Y 1:t−1 ) |  = p(Y 1:T ). (24) | −→ | p (y Y ) n−1 t 1:t−1 t=1 t=1 n=2 | Here the last equality follows because p (y Y ) = p(y Y ) by definition. The limit N t 1:t−1 t 1:t−1 φ | | is obtained by letting the number of particles M . We assume that the length of the −→ ∞

14 sample T is fixed. As a by-product, we also derive an almost-sure limit for Monte Carlo approximations of moments of the filtered states: M (cid:90) (cid:90) h ¯n = 1 (cid:88) h(s j,n ,s j,N φ)W j,n a.s. h(s ,s )p (s ,s Y )ds ds . (25) t,M M t t−1 t −→ t t−1 n t t−1 | 1:t t t−1 j=1 We use h( ) to denote a generic function of both s and s for technical reasons that will t t−1 · be explained below.6 Of course, a special case is a function that is constant with respect to s . For short, we simply denote such functions by h(s ). t−1 t In order to guarantee the almost-sure convergence, we need to impose some regularity conditions on the functions h(s ,s ). We define the following classes of functions: t t−1 (cid:26) (cid:12) (cid:90) 1 = h(s ,s ) (cid:12) (cid:12) E [ h(s ,s ) ]p(s Y )ds < , (26) Ht t t−1 (cid:12) p(·|st−1) | t t−1 | t−1 | 1:t−1 t−1 ∞ (cid:20) (cid:21) δ > 0 s.t. E (cid:12) (cid:12)h(s ,s ) E [h] (cid:12) (cid:12) 1+δ < C < , ∃ p(·|st−1) t t−1 − p(·|st−1) ∞ (cid:27) E [h] N φ p(·|st−1) ∈ H t−1 and for n = 2,...,N : φ (cid:26) (cid:12) (cid:12) n = h(s ,s )(cid:12)h(s ,s ) n−1, (27) Ht t t−1 (cid:12) t t−1 ∈ Ht (cid:20) (cid:21) δ > 0 s.t. E (cid:12) (cid:12)h(s ,s ) E [h] (cid:12) (cid:12) 1+δ < C < , ∃ Kn(·|sˆt,st−1) t t−1 − Kn(·|sˆt,st−1) ∞ (cid:27) E [h(s ,s )] n−1 . Kn(·|sˆt,st−1) t t−1 ∈ Ht By definition n˜ n for n˜ > n. The classes n are chosen such that the moment bounds Ht ⊆ Ht Ht that guarantee the almost sure convergence of Monte Carlo averages of h(s j,n ,s j,N φ) are t t−1 satisfied. The key assumption here is that there exists a uniform bound for the centered 1+δ conditional moment of the function h(s ,s ) under the state-transition density p(s s ) t t−1 t t−1 | and the transition kernel of the mutation step of Algorithm 5, K (s sˆ,s ). This will allow n t t t−1 | us to apply a SLLN to the particles generated by the forward simulation of the model and the mutation step in the tempering iterations. 6Spoiler alert: we need the s because the Markov transition kernel generated by Algorithm 4 (or t−1 Algorithm 2 is invariant under the distribution p (s y ,s ), which is conditioned on s , instead of the n t t t−1 t−1 | distribution p (s Y ). n t 1:t |

15 For the class 1 to be properly defined according to (26) we need to define N φ. Let H1 H 0 = N φ and note that E [h] is a function of s only. Thus, we define H 0 H 0 p(·|s0) 0 (cid:26) (cid:12) (cid:90) (cid:27) (cid:12) = h(s ) (cid:12) h(s ) p(s )ds < . (28) 0 0 0 0 0 H (cid:12) | | ∞ Under the assumption that the initial particles are generated by iid sampling from p(s ), 0 the integrability conditions ensures that we can apply Kolmogorov’s SLLN. Throughout this paper, we use C to denote a generic constant. Notice that any bounded function h( ) < h is an element of n for all t and n. Under the assumption that the measurement | · | Ht errors have a multivariate normal distribution, the densities p (y s ) and the density ratios n t t | p (y s )/p (y s ) are bounded uniformly in s , which means that these functions are n t t n−1 t t t | | elements of all n. Ht Bychangingthedefinitionoftheclasses n andrequiringmomentsoforder2+δ toexist, Ht the subsequent theoretical results can be extended to a CLT following arguments in Chopin (2004) and Herbst and Schorfheide (2014). The CLT provides a justification for computing numerical standard errors from the variation of Monte Carlo approximations across multiple independentrunsofthefilter, buttheformulasfortheasymptoticvarianceshaveanawkward recursive form that makes it infeasible to evaluate them. Thus, they are of limited use in practice. 3.1.1 Algorithm 1 In order to prove the convergence of the Monte Carlo approximations generated in Step 2(a) of Algorithm 1 we can use well established arguments for the bootstrap particle filter, which a.s. we adapt from the presentation in Herbst and Schorfheide (2015). We use to denote −→ almost-sure convergence as M . Starting point is the following recursive assumption: −→ ∞ j,N j,N Assumption 1 The particle swarm s φ,W φ generated by the period t 1 iteration of t−1 t−1 { } − Algorithm 1 approximates: M (cid:90) h ¯N φ = 1 (cid:88) h(s j,N φ)W j,N φ a.s. h(s )p(s Y )ds . (29) t−1,M M t−1 t−1 −→ t−1 t−1 | 1:t−1 t−1 j=1 N for functions h(s ) φ . t−1 t−1 ∈ H

16 In our statement of the recursive assumption we only consider functions that vary with s , which is why we write h(s ) (instead of h(s ,s )). As discussed above, if the filter t−1 t−1 t−1 t−2 is initialized by direct sampling from p(s ), then the recursive assumption is satisfied for 0 t = 1. Conditional on the recursive assumption, we can obtain the following convergence result: Lemma 1 Suppose that Assumption 1 is satisfied. Then for h 1: ∈ Ht M (cid:90) (cid:90) h ˆ1 = 1 (cid:88) h(s˜j,1 ,s j,N φ)W j,N φ a.s. h(s ,s )p(s ,s Y )ds ds (30) t,M M t t−1 t−1 −→ t t−1 t t−1 | 1:t−1 t t−1 j=1 1 (cid:80)M h(s˜j,1 ,s j,N φ)w˜j,1 W j,N φ (cid:90) (cid:90) h ˜1 = M j=1 t t−1 t t−1 a.s. h(s ,s )p (s ,s Y )ds ds (31) t,M 1 (cid:80)M w˜j,1 W j,N φ −→ t t−1 1 t t−1 | 1:t t t−1 M j=1 t t−1 M (cid:90) (cid:90) h ¯1 = 1 (cid:88) h(s j,1 ,s j,N φ)W j,1 a.s. h(s ,s )p (s ,s Y )ds ds . (32) t,M M t t−1 t −→ t t−1 1 t t−1 | 1:t t t−1 j=1 Moreover, M (cid:90) pˆ (y Y ) = 1 (cid:88) w˜j,1 W j,N φ a.s. p (y s )p (s Y )ds . (33) 1 t | 1:t−1 M t t−1 −→ 1 t | t 1 t | 1:t−1 t j=1 A formal proof of Lemma 1 that verifies the moment conditions required for the almostsure convergence is provided in the Online Appendix. Subsequently, we provide the key steps of the argument. Because we need to keep track of joint densities (s ,s ), we define t t−1 p (y s )p(s s )p(s Y ) n t t t t−1 t−1 1:t−1 p n (s t ,s t−1 Y 1:t ) = (cid:82) (cid:2)(cid:82) | | | (cid:3) | p (y s ) p(s s )p(s Y )ds ds n t t t t−1 t−1 1:t−1 t−1 t | | | Here we used the fact that according to the state-space model the distribution of y condit tional on s does not depend on (s ,Y ). Moreover, the distribution of s conditional t t−1 1:t−1 t on s does not depend on Y . Thus, integrating with respect to s yields t−1 1:t−1 t−1 (cid:90) p (s ,s Y )ds = p (s Y ), n t t−1 1:t t−1 n t 1:t | | where p (s Y ) was previously introduced in (6). n t 1:t | The forward iteration of the state-transition equation amounts to drawing s j from the t density p(s s j,N φ). Use E [h] to denote expectations under this density and consider t | t−1 p(·|s j,Nφ) t−1

17 the decomposition: (cid:90) (cid:90) h ˆ1 h(s ,s )p(s ,s Y )ds ds (34) t,M − t t−1 t t−1 | 1:t−1 t t−1 M (cid:18) (cid:19) 1 (cid:88) = h(s˜j,1 ,s N φ ) E [h] W j,N φ M t t−1 − p(·|s j,Nφ) t−1 t−1 j=1 M (cid:18) (cid:90) (cid:90) (cid:19) 1 (cid:88) + E [h]W j,N φ h(s ,s )p(s ,s Y )ds ds M p(·|s j,Nφ) t−1 − t t−1 t t−1 | 1:t−1 t t−1 t−1 j=1 = I +II, j,N j,N say. Both terms converge to zero. First, conditional on the particles s φ,W φ the t−1 t−1 { } j,N weights W φ are known and term I is an average of mean-zero random variables that are t−1 independently distributed. Second, the definition of 1 implies that E [h] N φ . Ht p(·|s j,Nφ) ∈ H t−1 t−1 Thus, we can deduce from Assumption 1 that term II converges to zero. This delivers (30). In slight abuse of notation we can now set h( ) to either h(s )p (y s ) or p (y s ) to deduce t 1 t t 1 t t · | | the convergence result required to justify the approximations in (31) and (33). Finally, the SLLN is preserved by the resampling step, which delivers (32). 3.1.2 Algorithm 4 The convergence results for the tempering iterations rely on the following recursive assumption, which according to Lemma 1 is satisfied for n = 2. Assumption 2 The particle swarm s j,n−1 ,s j,N φ,W j,n−1 generated by iteration n 1 of t t−1 t { } − Algorithm 4 approximates: M (cid:90) (cid:90) h ¯n−1 = 1 (cid:88) h(s j,n−1 ,s j,N φ)W j,n−1 a.s. h(s ,s )p (s ,s Y )ds ds . (35) t,M M t t−1 t −→ t t−1 n−1 t t−1 | 1:t t t−1 j=1 for functions h n. ∈ Ht The convergence results are stated in the following Lemma:

18 Lemma 2 Suppose that Assumption 2 is satisfied. Then for h n−1: t ∈ H 1 (cid:80)M h(s j,n−1 ,s j,N φ)w˜j,n W j,n−1 h ˜n = M j=1 t t−1 t t (36) t,M 1 (cid:80)M w˜j,n W j,n−1 M j=1 t t (cid:90) (cid:90) a.s. h(s ,s )p (s ,s Y )ds ds t t−1 n t t−1 1:t t t−1 −→ | M (cid:90) (cid:90) h ˆn = 1 (cid:88) h(sˆj,n ,s j,N φ)W j,n a.s. h(s ,s )p (s ,s Y )ds ds . (37) t,M M t t−1 t −→ t t−1 n t t−1 | 1:t t t−1 j=1 Moreover, (cid:18) (cid:92) (cid:19) M p n (y t | Y 1:t−1 ) = 1 (cid:88) w˜j,n W j,n−1 a.s. p n (y t | Y 1:t−1 ) (38) p (y Y ) M t t −→ p (y Y ) n−1 t 1:t−1 n−1 t 1:t−1 | j=1 | and for h n ∈ Ht M (cid:90) (cid:90) h ¯n = 1 (cid:88) h(s j,n ,s j,N φ)W j,n a.s. h(s ,s )p (s ,s Y )ds ds . (39) t,M M t t−1 t −→ t t−1 n t t−1 | 1:t t t−1 j=1 The convergence in (39) implies that the recursive Assumption 2 is satisfied for iteration n + 1 of Algorithm 4. Thus, we deduce that the convergence in (39) holds for n = N . φ This, in turn, implies that if the recursive Assumption 2 for Algorithm 1 is satisfied at the beginning of period t it will also be satisfied at the beginning of period t+1. A formal proof of Lemma 2 is provided in the Online Appendix. We will provide an outline of the argument below. Correction and Selection Steps. The convergence of the approximations in (36) and (38), obtained after executing the correction step, follows from the recursive Assumption 2 and the fact that h(s j,n−1 ,s j,N φ) n−1 and h(s j,n−1 ,s j,N φ)w˜j,n n−1. Furthermore, it t t−1 t t t−1 t t ∈ H ∈ H relies on the following calculation: (cid:82) (cid:82) h(s ,s ) pn(yt|st) p (s ,s Y )ds ds t t−1 pn−1(yt|st) n−1 t t−1 | 1:t t t−1 (40) (cid:82) pn(yt|st) p (s Y )ds pn−1(yt|st) n−1 t | 1:t t (cid:82) (cid:82) h(s ,s ) pn(yt|st) pn−1(yt|st)p(st,st−1|Y1:t−1) ds ds = t t−1 pn−1(yt|st) pn−1(yt|Y1:t−1) t t−1 (cid:82) pn(yt|st) pn−1(yt|st)p(st|Y1:t−1) ds pn−1(yt|st) pn−1(yt|Y1:t−1) t (cid:82) (cid:82) h(s ,s )p (y s )p(s ,s Y )ds ds t t−1 n t t t t−1 1:t−1 t t−1 = (cid:82) | | p (y s )p(s Y )ds n t t t 1:t−1 t | | (cid:90) (cid:90) = h(s ,s )p (s ,s Y )ds ds . t t−1 n t t−1 1:t t t−1 |

19 The first equality is obtained by reversing Bayes Theorem and expressing the “posterior” p (s ,s Y ) as the product of “likelihood” p (y s ) and “prior” p(s ,s Y ) din−1 t t−1 1:t n−1 t t t t−1 1:t−1 | | | vided by the “marginal likelihood” p (y Y ). We then cancel the p (y s ) and the n−1 t 1:t−1 n−1 t t | | marginal likelihood terms to obtain the second equality. Finally, an application of Bayes Theorem leads to the third equality. Recall that p (y Y ) = p(y Y ) by construction and that an approximation of N t 1:t−1 t 1:t−1 φ | | p (y Y ) is generated in Step 2.(a)iii of Algorithm 1. Together, this leads to the approxi- 1 t 1:t−1 | mation of the likelihood increment p(y Y ) in (21). Resampling after the correction step t 1:t−1 | preserves the SLLN, which delivers (37). Mutation Step. We now outline how to establish (39). Let (cid:90) E [h] = h(s ,s )K (s sˆ;s )ds , Kn(·|sˆt;st−1) t t−1 n t | t t−1 t which is a function of (sˆ,s ). We can decompose the Monte Carlo approximation from t t−1 the mutation step as follows: M (cid:90) (cid:90) 1 (cid:88) h(s j,n ,s j,N φ)W j,n h(s ,s )p (s ,s Y )ds ds (41) M t t−1 t − t t−1 n t t−1 | 1:t t t−1 j=1 M (cid:18) (cid:19) 1 (cid:88) = h(s j,n ,s j,N φ) E [h] W j,n M t t−1 − Kn(·|sˆj t ,n;s j t− ,N 1 φ) t j=1 M (cid:18) (cid:90) (cid:90) (cid:19) 1 (cid:88) + E [h] h(s ,s )p (s ,s Y )ds ds W j,n M Kn(·|sˆj t ,n;s j t− ,N 1 φ) − t t−1 n t t−1 | 1:t t t−1 t j=1 = I +II, say. By construction, conditional on the particles sˆj,n ,s j,N φ,W j,n term I is an average of indet t−1 t { } pendent mean-zero random variables, which converges to zero. The analysis of term II is more involved for two reasons. First, as highlighted above, E [h] is a function not only of sˆ but also of s . Second, while the invariance Kn(·|sˆj t ,n;s j t− ,N 1 φ) t t−1

20 property (19) implies that (cid:90) E [h]p (sˆ y ,s )dsˆ (42) Kn(·|sˆt;st−1) n t | t t−1 t (cid:90) (cid:18)(cid:90) (cid:19) = h(s ,s )K (s sˆ;s )ds p (sˆ y ,s )dsˆ t t−1 n t t t−1 t n t t t−1 t | | (cid:90) (cid:18)(cid:90) (cid:19) = h(s ,s ) K (s sˆ;s )p (sˆ y ,s )dsˆ ds t t−1 n t t t−1 n t t t−1 t t | | (cid:90) = h(s ,s )p (s y ,s )ds , t t−1 n t t t−1 t | the summation over (sˆj,n ,s j,N φ,W j,n) generates an integral with respect to p (s ,s Y ) t t−1 t n t t−1 1:t | instead of p (s y ,s ); see (37). n t t t−1 | To obtain the expected value of E [h] under the distribution p (sˆ,s Y ), Kn(·|sˆt;st−1) n t t−1 | 1:t notice that p (s ,s Y ) = p (s ,s y ,Y ) (43) n t t−1 1:t n t t−1 t 1:t−1 | | = p (s s ,y ,Y )p(s y ,Y ) n t t−1 t 1:t−1 t−1 t 1:t−1 | | = p (s s ,y )p(s y ,Y ). n t t−1 t t−1 t 1:t−1 | | The last equality holds because, using the first-order Markov structure of the state-space model, we can write p (y s ,s ,Y )p(s s ,Y ) n t t t−1 1:t−1 t t−1 1:t−1 p n (s t y t ,s t−1 ,Y 1:t−1 ) = (cid:82) | | | p (y s ,s ,Y )p(s s ,Y )ds st n t | t t−1 1:t−1 t | t−1 1:t−1 t p (y s )p(s s ) n t t t t−1 = (cid:82) | | p (y s )p(s s )ds st n t | t t | t−1 t = p (s y ,s ). n t t t−1 | Therefore, we obtain (cid:90) (cid:90) E [h]p (sˆ,s Y )dsˆds (44) Kn(·|sˆt;st−1) n t t−1 | 1:t t t−1 (cid:90) (cid:18)(cid:90) (cid:19) = E [h]p (sˆ y ,s )dsˆ p (s y ,Y )ds Kn(·|sˆt;st−1) n t | t t−1 t n t−1 | t 1:t−1 t−1 (cid:90) (cid:18)(cid:90) (cid:19) = h(s ,s )p (s y ,s )ds p (s y ,Y )ds t t−1 n t t t−1 t n t−1 t 1:t−1 t−1 | | (cid:90) (cid:90) = h(s ,s )p (s ,s Y )ds ds . t t−1 n t t−1 1:t t t−1 |

21 The first equality uses (43). The second equality follows from the invariance property (42) and for the third equality we used (43) again. Thus, under suitable regularity conditions, term II also converges to zero almost surely, which leads to the convergence in (39). We can deduce from Lemmas 1 and 2 that we obtain almost-sure approximations of the likelihood increment for every period t = 1,...,T. Because T is fixed and p (y Y ) = N t 1:t−1 φ | p(y Y ), we obtain the following Theorem: t 1:t−1 | Theorem 1 Consider the nonlinear state-space model (1) with Gaussian measurement errors. Suppose that the initial particles are generated by iid sampling from p(s ). Then the 0 Monte Carlo approximation of the likelihood function generated by Algorithms 1, 4, 5 is consistent:   (cid:89) T a.s. (cid:89) T (cid:89) N φ p n (y t Y 1:t−1 ) pˆ M (Y 1:T ) = pˆ M (y t Y 1:t−1 ) p 1 (y t Y 1:t−1 ) |  = p(Y 1:T ). (45) | −→ | p (y Y ) n−1 t 1:t−1 t=1 t=1 n=2 | 3.2 Unbiasedness Particle filter approximations of the likelihood function are often embedded into posterior samplers for the parameter vector θ, e.g., a Metropolis-Hastings algorithm or a sequential Monte Carlo algorithm; see Herbst and Schorfheide (2015) for a discussion and further references in the context of DSGE models. A necessary condition for the convergence of the posterior sampler is that the likelihood approximation of the particle filter is unbiased. Theorem 2 Suppose that the tempering schedule is deterministic and that the number of stages N is the same for each time period t 1. Then, the particle filter approximation of φ ≥ the likelihood generated by Algorithm 1 is unbiased:    E(cid:2) pˆ M (Y 1:T | θ) (cid:3) = E  (cid:89) T  (cid:89) N φ (cid:32) M 1 (cid:88) M w˜ t j,n W t j,n−1 (cid:33)  = p(Y 1:T | θ). (46) t=1 n=1 j=1 A proof of Theorem 2 unbiasedness is provided in the Online Appendix. Our proof exploits the recursive structure of the algorithm and extends the proof by Pitt, Silva, Giordani, and Kohn (2012) to account for the tempering iterations.

22 4 DSGE Model Applications In this section, we assess the performance of the tempered particle filter (TPF) and the bootstrap particle filter (BSPF). The principle point of comparison is the accuracy of the approximation of the likelihood function, though we will also assess each filter’s ability to track the filtered states. We consider two models in the subsequent analysis. The first is a small-scale New Keynesian DSGE model that comprises a consumption Euler equation, a New Keynesian Phillips curve, a monetary policy rule, and three exogenous shock processes. The second model is the medium-scale DSGE model by Smets and Wouters (2007), which is the core of many of the models that are being used in academia and at central banks. While the exposition of the algorithms in this paper focuses on the nonlinear state-space model (1), the numerical illustrations are based on linearized versions of the DSGE models. Linearized DSGE models (with normally distributed innovations) lead to a linear Gaussian state-space representation. This allows us to use the Kalman filter to compute the exact values of the likelihood function p(Y θ) and the filtered states E[s Y ,θ]. 1:T t 1:t | | We assess the accuracy of the particle filter approximations by running the filters repeatedly and studying the sampling distribution of their output across independent runs. To evaluate the accuracy of pˆ (Y θ) we consider two statistics. The first is the log likelihood M 1:T | approximation error, ˆ ∆ = lnpˆ (Y θ) lnp(Y θ). (47) 1 M 1:T 1:T | − | Because the particle filter approximation of the likelihood function is unbiased (see Theorem 2), Jensen’s inequality applied to the concave logarithmic transformation implies that the expected value of ∆ ˆ is negative. Second, we consider the following statistic:7 1 pˆ (Y θ) ˆ M 1:T ∆ = | 1 = exp[lnpˆ (Y θ) lnp(Y θ)] 1. (48) 2 M 1:T 1:T p(Y θ) − | − | − 1:T | ˆ The computation of ∆ requires us to exponentiate the difference in log-likelihood values, 2 which is feasible if the particle filter approximation is reasonably accurate. The unbiasedness ˆ result implies that the sampling mean of ∆ should be close to zero. 2 In our experiments, we run the filters N = 100 times and examine the sampling run ˆ ˆ properties of the discrepancies ∆ and ∆ . Because there is always a trade-off between 1 2 accuracy and speed, we also assess the run-time of the filters. The run-time of any particle 7Assessing the bias of the likelihood function pˆ (Y θ) directly, is numerically challenging, because M 1:T | exponentiating a log-likelihood value of around 300 leads to a missing value using standard software. −

23 filter is sensitive to the exact computing environment used. Thus, we provide details about the implementation in the Online Appendix. In this regard, it is important to note that the tempered particle filter is designed to work with a small number of particles (i.e., on a desktop computer.) Therefore we restrict the computing environment to a single machine and we do not try to leverage large-scale parallelism via a computing cluster, as in Gust, Herbst, Lopez-Salido, and Smith (2016). Results for the small-scale New Keynesian DSGE model are presented in Section 4.1. In Section 4.2 the tempered particle filter is applied to the Smets-Wouters model. 4.1 A Small Scale DSGE Model We first use the BSPF and the TPF to evaluate the likelihood function associated with the small-scale New Keynesian DSGE model used in Herbst and Schorfheide (2015). The details about the model can be found in the Online Appendix. From the perspective of the particle filter, the key feature of the model is that it has three observables (output growth, inflation, and the federal funds rate). To facilitate the use of particle filters, we augment the measurement equation of the DSGE model by independent measurement errors, whose standard deviations we set to be 20% of the standard deviation of the observables.8 Great Moderation Sample. The data span 1983Q1 to 2002Q4, for a total of 80 observations for each series. We assess the performance of the particle filters for two parameter vectors, which are denoted by θm and θl and tabulated in Table 1. The value θm is chosen as a high likelihood point, close the posterior mode of the model. The log likelihood at θm is lnp(Y θm) = 306.49. The second parameter value, θl, is chosen to be associated with a | − lower log-likelihood value. Based on our choice, lnp(Y θl) = 313.36. The sample and the | − parameter values are identical to those used in Chapter 8 of Herbst and Schorfheide (2015). We compare the BSPF with two variants of the TPF which differ with respect to the targeted inefficiency ratio: r∗ = 2 and r∗ = 3. For the BSPF we use M = 40,000 particles and for the TPF we consider M = 4,000 and M = 40,000 particles, respectively. In Algorithm 3 we use N = 1 Metropolis-Hastings steps and set the initial scale of the MH proposal covariance matrix to c = 0.3. ∗ ˆ Figure 1 displays density estimates for the sampling distribution of ∆ associated with 1 each particle filter for θ = θm (left panel) and θ = θl (right panel). For θ = θm, the 8Themeasurementerrorstandarddeviationsare0.1160foroutputgrowth,0.2942forinflation,and0.4476 for the interest rates.

24 Table 1: Small-Scale Model: Parameter Values Parameter θm θl Parameter θm θl τ 2.09 3.26 κ 0.98 0.89 ψ 2.25 1.88 ψ 0.65 0.53 1 2 ρ 0.81 0.76 ρ 0.98 0.98 r g ρ 0.93 0.89 r(A) 0.34 0.19 z π(A) 3.16 3.29 γ(Q) 0.51 0.73 σ 0.19 0.20 σ 0.65 0.58 r g σ 0.24 0.29 lnp(Y θ) -306.5 -313.4 z | Figure 1: Small-Scale Model: Distribution of Log-Likelihood Approximation Errors θ = θm θ = θl 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 10 8 6 4 2 0 2 4 − − − − − ytisneD 0.6 0.5 TPF(r∗=2),M=40000 0.4 TPF(r∗=3),M=40000 0.3 TPF(r∗=2),M=4000 0.2 TPF(r∗=3),M=4000 0.1 BSPF,M=40000 0.0 15 10 5 0 − − − ytisneD TPF(r∗=2)M=40000 TPF(r∗=3),M=40000 TPF(r∗=2),M=4000 TPF(r∗=3),M=4000 BSPF,M=40000 Notes: Density estimate of ∆ ˆ = lnpˆ(Y θm) lnp(Y θm) based on N = 100 runs of 1 1:T 1:T run | − | the particle filter. TPF(r∗ = 2) with M = 40,000 (the green line) is the most accurate of all the filters ˆ ˆ considered, with ∆ distributed tightly around zero. The distribution of ∆ associated with 1 1 TPF(r∗ = 3) with M = 40,000 is slightly more disperse, with a larger left tail, as the higher tolerance for particle inefficiency translates into a higher variance for the likelihood estimate. Reducing the number of particles to M = 4,000 for both of these filters, results in a higher variance estimate of the likelihood. The most poorly performing TPF (with r∗ = 3 and ˆ M = 4,000) is associated with a distribution for ∆ that is similar to the one associated 1 with the BSPF which uses M = 40,000. Overall, the TPF compares favorably with the BSPF when θ = θm. The performance differences become even more stark when we consider θ = θl; depicted

25 Table 2: Small-Scale Model: PF Summary Statistics BSPF TPF Number of Particles M 40,000 4,000 4,000 40,000 40,000 Target Ineff. Ratio r∗ 2 3 2 3 High Posterior Density: θ = θm ˆ Bias ∆ -1.44 -0.88 -1.53 -0.31 -0.05 1 ˆ StdD ∆ 1.92 1.36 1.69 0.44 0.60 1 ˆ Bias ∆ -0.11 0.10 -0.37 0.05 -0.12 2 T−1(cid:80)T N 1.00 4.31 3.24 4.31 3.23 t=1 φ,t Average Run Time (s) 0.81 0.43 0.34 3.98 3.30 Low Posterior Density: θ = θl ˆ Bias ∆ -6.52 -2.05 -3.12 -0.32 -0.64 1 ˆ StdD ∆ 5.25 2.10 2.58 0.75 0.98 1 ˆ Bias ∆ 2.97 0.36 0.71 -.004 -0.11 2 T−1(cid:80)T N 1.00 4.36 3.29 4.35 3.28 t=1 φ,t Average Run Time (s) 1.56 0.41 0.33 3.66 2.87 Notes: The results are based on N = 100 independent runs of the particle filters. The run ˆ ˆ likelihood discrepancies ∆ and ∆ are defined in (47) and (48). 1 2 in the right panel of Figure 1. While the sampling distributions indicate that the likelihood estimates are less accurate for all the particles filters, the BSPF deteriorates by the largest amount. The TPF, by targeting an inefficiency ratio, adaptively adjust to account for the for relatively worse fit of θl. The results are also born out in Table 2, which displays summary statistics for the two types of likelihood approximation errors as well as information about the average number of ˆ stages and run time of each filter. The results for ∆ convey essentially the same story as 1 ˆ Figure 1. The bias associated with ∆ highlights the performance deterioration associated 2 with the BSPF when considering θ = θl. The bias of almost 3 is substantially larger than for any of the TPFs. The row labeled T−1(cid:80)T N shows the average number of tempering iterations assot=1 φ,t ciated with each particle filter. The BSPF has, by construction, always an average of one. When r∗ = 2, the TPFs use about 4 stages per time period. With a higher tolerance for inefficiency, when r∗ = 3, that number falls to just above 3. Note that when considering θl, the TPF always uses a greater number of stages, reflecting the relatively worse fit of the model under θ = θl compared to θ = θm. Table 2 also displays the average run time of each

26 Figure 2: Small-Scale Model: Accuracy of Filtered State 3.5 3.0 2.5 2.0 1.5 1.0 0.5 BSPF, M=40k TPF(r*=2), M=4k TPF(r*=2), M=40k TPF(r*=3), M=4k TPF(r*=3), M=40k 0.0 1984 1989 1994 1999 ˆ Notes: ThefiguredepictsRMSEsassociatedwithE[gˆ Y ]. ResultsarebasedonN = 100 t 1:t run | independent runs of the particle filters. filter (in seconds).9 When using the same number of particles, the BSPF runs much more quickly than the TPFs, reflecting the fact that the additional tempering iterations require many more likelihood evaluations, in addition to the computational costs associated with the mutation phase. For a given level of accuracy, however, the TPF requires many fewer particles. For instance, using M = 4,000, the TPF yields more precise likelihood estimates than the BSPF using M = 40,000 and takes about half as much time to run. Finally, we consider the accuracy of the filtered state estimates. We consider the latent government spending shock as a prototypical hidden state. Using the Kalman filter we can compute E[gˆ Y ], which we compare to the particle filter approximation, denoted by t 1:T | Eˆ [gˆ Y ]. Figure 2 plots root-mean-squared errors (RMSEs) for Eˆ [gˆ Y ]. The ranking of t 1:T t 1:T | | the filters is consistent with the ranking based on the accuracy of the likelihood approximations. The BSPF performs the worst. Using the TPF with M = 40,000 particles reduces the RMSE roughly by a factor of three. Great Recession Sample. It is well known that the BSPF is very sensitive to outliers. To examine the extent to which this is also true for the tempered particle filter, we re-run the above experiments on the sample 2003Q1 to 2013Q4. This period includes the Great Recession, which was a large outlier from the perspective of the small-scale DSGE model 9Theruntimesarestochastic. Inprinciple,thereshouldbenodifferenceinrunningtheBSPFonthetwo parameters. However, in practice, there were a few runs that took significantly longer, which contaminated the average reported in the table.

27 Figure 3: Small-Scale Model: Distribution of Log-Likelihood Approximation Errors, Great Recession Sample θ = θm θ = θl 0.30 0.25 0.20 0.15 0.10 0.05 0.00 350 300 250 200 150 100 50 0 − − − − − − − ytisneD 0.25 TPF(r∗=2),M=40000 0.20 0.15 TPF(r∗=3),M=40000 0.10 TPF(r∗=2),M=4000 0.05 TPF(r∗=3),M=4000 BSPF,M=40000 0.00 350 300 250 200 150 100 50 0 − − − − − − − ytisneD TPF(r∗=2),M=40000 TPF(r∗=3),M=40000 TPF(r∗=2),M=4000 TPF(r∗=3),M=4000 BSPF,M=40000 Notes: Density estimate of ∆ ˆ = lnpˆ(Y θm) lnp(Y θm) based on N = 100 runs of 1 1:T 1:T run | − | the particle filters. (and most other econometric models). Figure 3 plots the density of the approximation errors of the log likelihood estimates associated with each of the filters. The difference in the distribution of approximation errors between the BSPF and the TPFs is massive. For θ = θm and θ = θl, the approximation errors associated with the BSPF are concentrated in the range of -200 to -300, almost two orders of magnitude larger than the errors associated with the TPFs. This happens because the large drop in output in 2008Q4 is not predicted by the forward simulation in the BSPF. In turn, the filter collapses, in the sense that the likelihood increment in that quarter is estimated using essentially only one particle. Table 3 tabulates the results for each of the filters. Consistent with Figure 3 the bias associated with the log likelihood estimate is 215 and 279 for θ = θm and θ = θl, − − respectively, compared to about 8 and 10 for the worst performing TPF. For θ = θm, − − the TPF(r∗ = 2) with M = 40,000 has a bias only of 2.8 with a standard deviation of − 1.5, which is about 25 times smaller than the BSPF. It is true that this variant of the filter takes about 6 times longer to run than the BSPF, but even when considering M = 4,000 particles, the TPF estimates are still overwhelmingly more accurate – and are computed more quickly – than the BSPF estimates. A key driver of this result is the adaptive nature of the tempered particle filter. While the average number of stages used is about 5 for r∗ = 2 and 4 for r∗ = 3, for t = 2008Q4 – the period with the largest outlier – the tempered particle

28 Table 3: Small-Scale Model: PF Summary Statistics – The Great Recession BSPF TPF Number of Particles M 40,000 4,000 4,000 40,000 40,000 Target Ineff. Ratio r∗ 2 3 2 3 High Posterior Density: θ = θm ˆ Bias ∆ -215.63 -5.93 -7.91 -2.84 -4.27 1 ˆ StdD ∆ 36.74 3.01 3.36 1.55 1.80 1 ˆ Bias ∆ -1.00 -0.85 -0.95 -0.71 -0.91 2 T−1(cid:80)T N 1.00 5.12 3.87 5.08 3.86 t=1 φ,t Average Run Time (s) 0.38 0.28 0.18 2.28 2.09 Low Posterior Density: θ = θl ˆ Bias ∆ -279.12 -7.26 -9.98 -3.81 -5.82 1 ˆ StdD ∆ 41.74 3.44 4.22 1.68 2.15 1 ˆ Bias ∆ -1.00 -0.89 -0.99 -0.86 -0.98 2 T−1(cid:80)T N 1.00 5.36 4.04 5.33 4.03 t=1 φ,t Average Run Time (s) 0.37 0.29 0.23 2.40 2.10 Notes: The results are based on N = 100 independent runs of the particle filters. The run ˆ ˆ likelihood discrepancies ∆ and ∆ are defined in (47) and (48). 1 2 Figure 4: Small-Scale Model: BSPF versus TPF in 2008Q4 BSPF TPF 3.5 3.5 Forecast Density 3.0 BSPF Filtered Density 3.0 True Filtered Density 2.5 2.5 2.0 2.0 1.5 1.0 1.5 0.5 1.0 0.0 1.0 0 0 . . 0 5 4 3 2 1 0 1 2 0.8 φ 0 n .6 0.4 0.2 0.0 4 − 3 − 2 − 1 0 1 2 − Notes: Left panel: forecast density pˆ(s Y ), BSPF filtered density pˆ(s Y ), and true t 1:t−1 t 1:t | | filtered density p(s Y ) where s equals model-implied output growth and t = 2008Q4. t t−1 t | Right panel: forecast density pˆ(s Y ) (blue), waterfall plot of density estimates pˆ (s Y ) t 1:t−1 n t 1:t | | for n = 1,...,N , and true filtered density p(s Y ) (red). φ t 1:t | filter uses about 15 stages, on average.

29 Figure4providesanillustrationofwhytheTPFprovidesmuchmoreaccurateapproximationsthantheBSPF.Wefocusononeparticularstate, namelymodel-impliedoutputgrowth, whichisobservedoutputgrowthminusitsmeasurementerror. Wefocusont = 2008Q4. The left panel depicts the BSPF approximations pˆ(s Y ) and pˆ(s Y ) as well as the “true” t 1:t−1 t 1:t | | densityp(s Y ). TheBSPFessentiallygeneratesdrawsfromtheforecastdensitypˆ(s Y ) t 1:t t 1:t−1 | | and reweights them to approximate the density p(s Y ). In 2008Q4, these densities have t 1:t | very little overlap. This implies that essentially one draw from the forecast density receives all the weight and the BSPF filtered density is a point mass. This point mass provides a poor approximation of p(s Y ). t 1:t | The right panel of Figure 4 displays a waterfall plot of density estimates pˆ (s Y ) for n t 1:t | n = 1,...,N = 15. The densities are placed on the y-axis at the corresponding value of φ φ . The first iteration in the tempering phase has φ = 0.002951, which corresponds to an n 1 inflation of the measurement error variance by a factor over 300. This density looks similar to the predictive distribution p(s Y ), with a 1-step-ahead prediction for output growth t 1:t−1 | of about 1% (in quarterly terms). As we move through the iterations, φ increases slowly n − at first and pˆ (s Y ) gradually adds more density where s 2.5. Each correction step of n t 1:t t | ≈ − Algorithm 2 requires only modest reweighting of the particles and the mutation steps refresh the particle values. The filter begins to tolerate relatively large changes from φ to φ , n n+1 as more particles lie in this region, needing only three stages to move from φ 0.29 to n ≈ φ = 1. Alongside pˆ (s Y ) we also show the true filtered density in red, obtained from N N t 1:t φ | the Kalman filter recursions. The TPF approximation at n = N matches the true density φ extremely well. 4.2 The Smets-Wouters Model We next assess the performance of the tempered particle filter for the Smets and Wouters (2007), henceforth SW, model. This model forms the core of the latest vintage of DSGE models. While we leave the details of the model to the Online Appendix, it is important to note that the SW model is estimated over the period 1966Q1 to 2004Q4 using seven observables: the real per capita growth rates of output, consumption, investment, wages; hours worked, inflation, and the federal funds rate. Moreover, the SW model has a highdimensional state space s with more than a dozen state variables. The performance of the t BSPF deteriorates quickly due to the increased state space and the fact that it is much more difficult to predict seven observables than it is to predict three observables with a DSGE

30 Table 4: SW Model: Parameter Values θm θl θm θl ˜ β 0.159 0.182 π¯ 0.774 0.571 ¯ l 1.078 0.019 α 0.181 0.230 − σ 1.016 1.166 Φ 1.342 1.455 ϕ 6.625 4.065 h 0.597 0.511 ξ 0.752 0.647 σ 2.736 1.217 w l ξ 0.861 0.807 ι 0.259 0.452 p w ι 0.463 0.494 ψ 0.837 0.828 p r 1.769 1.827 ρ 0.855 0.836 π r 0.090 0.069 r 0.168 0.156 y ∆y ρ 0.982 0.962 ρ 0.868 0.849 a b ρ 0.962 0.947 ρ 0.702 0.723 g i ρ 0.414 0.497 ρ 0.782 0.831 r p ρ 0.971 0.968 ρ 0.450 0.565 w ga µ 0.673 0.741 µ 0.892 0.871 p w σ 0.375 0.418 σ 0.073 0.075 a b σ 0.428 0.444 σ 0.350 0.358 g i σ 0.144 0.131 σ 0.101 0.117 r p σ 0.311 0.382 lnp(Y θ) 943.0 956.1 w | − − Notes: β ˜ = 100(β−1 1). − model. As a consequence, the estimation of nonlinear variants of the SW model has proven extremely difficult. We compute the particle filter approximations conditional on two sets of parameter values, θm and θl, which are summarized in Table 4. θm is the parameter vector associated with the highest likelihood value among the draws that we generated with a posterior sampler. θl is a parameter vector that attains a lower likelihood value. The log-likelihood difference between the two parameter vectors is approximately 13. The standard deviations of the measurement errors are chosen to be approximately 20% of the sample standard deviation of the time series.10 For comparison purposes, the parameter values and the data are identical to the ones used in Chapter 8 of Herbst and Schorfheide (2015). We run each filter N = 100 times. run Figure 5 displays density estimates of the approximation errors associated with the log 10The standard deviations for the measurement errors are: 0.1731 (output growth), 0.1394 (consumption growth), 0.4515 (investment growth), 0.1128 (wage growth), 0.5838 (log hours), 0.1230 (inflation), 0.1653 (interest rates).

31 Figure 5: Smets-Wouters Model: Distribution of Log-Likelihood Approximation Errors θ = θm θ = θl 0.018 0.016 0.014 0.012 0.010 0.008 0.006 0.004 0.002 0.000 700 600 500 400 300 200 100 0 100 − − − − − − − ytisneD 0.016 TPF(r∗=2),M=40000 0.014 TPF(r∗=3),M=40000 0.012 0.010 TPF(r∗=2),M=4000 0.008 TPF(r∗=3),M=4000 0.006 0.004 BSPF,M=40000 0.002 0.000 600 500 400 300 200 100 0 100 200 − − − − − − ytisneD TPF(r∗=2),M=40000 TPF(r∗=3),M=40000 TPF(r∗=2),M=4000 TPF(r∗=3),M=4000 BSPF,M=40000 Notes: Density estimate of ∆ ˆ = lnpˆ(Y θm) lnp(Y θm) based on N = 100 runs of 1 1:T 1:T run | − | the particle filters. likelihood estimates under θ = θm and θ = θl. We use M = 40,000 particles for the BSPF. For the TPF we use M = 4,000 or M = 40,000 and consider r∗ = 2 and r∗ = 3. Moreover, in the mutation step (Algorithm 3) we set N = 1 and c = 0.3. Under both parameter MH ∗ values, the BSPF exhibits the most bias, with its likelihood estimates substantially below the true likelihood value. The distribution of the bias falls mainly between -400 and -100. This means that eliciting the posterior distribution of the SW model using, for example, a particle Markov chain Monte Carlo algorithm with likelihood estimates from the bootstrap particle filter would be nearly impossible. The TPFs perform better, although they also underestimate the likelihood by a large amount. Table5underscorestheresultsinFigure5. Thebest-performingTPF,whilethreetofour timesmoreaccuratethantheBSPF,stillgeneratesabiasinthelog-likelihoodapproximation of about 55 and a standard deviation of about 21 for θ = θm. Moreover, this increased − performance comes at a cost: the TPF(r∗ = 2),M = 40,000 filter takes about 29 seconds, while the BSPF takes only 4 seconds. Even the variants of the TPF, which run more quickly than the BSPF, still have wildly imprecise estimates of the likelihood; though, to be sure, these estimates are in general better than those of the BSPF. It is well known that in sequential Monte Carlo algorithms for static parameters the mutation phase is crucial. For example, Bognanni and Herbst (2015) show that tailoring the mutation step to model can substantially improve performance. The modification of the mutation step is not immediately obvious. One clear way to allow the particles to better

32 Table 5: SW Model: PF Summary Statistics BSPF TPF Number of Particles M 40,000 4,000 4,000 40,000 40,000 Target Ineff. Ratio r∗ 2 3 2 3 High Posterior Density: θ = θm ˆ Bias ∆ -235.50 -126.09 -144.57 -55.71 -65.94 1 ˆ StdD ∆ 60.30 46.55 44.32 20.73 23.81 1 ˆ Bias ∆ -1.00 -1.00 -1.00 -1.00 -1.00 2 T−1(cid:80)T N 1.00 6.19 4.75 6.14 4.71 t=1 φ,t Average Run Time (s) 4.28 2.75 2.11 28.83 22.40 Low Posterior Density: θ = θl ˆ Bias ∆ -263.31 -138.69 -168.76 -66.92 -83.08 1 ˆ StdD ∆ 78.14 48.18 50.15 24.26 29.14 1 ˆ Bias ∆ -1.00 -1.00 -1.00 -1.00 -1.00 2 T−1(cid:80)T N 1.00 6.25 4.81 6.21 4.78 t=1 φ,t Average Run Time (s) 4.17 2.34 2.16 26.01 20.14 ˆ ˆ Notes: The likelihood discrepancies ∆ and ∆ are defined in (47) and (48). Results are 1 2 based on N = 100 runs of the particle filters. run adapt to the current density is to increase the number of Metropolis-Hastings steps. While all of the previous results are based on N = 1, we now consider N = 10. Table MH MH 6 displays the results associated with this choice for variants of the TPF, along with the BSPF, which is unchanged the previous exercise. The bias shrinks dramatically. For the TPF(r∗ = 2),M = 40,000, when θ = θm, the bias falls from about 55 to about 6, with the standard deviation of the estimator decreasing − − by a factor of 6. Of course this increase in performance comes at a computational cost. Each filter takes about three times longer than their N = 1 counterpart. Note that this MH is less than you might expect, given the fact the number of Metropolis-Hastings steps at each iteration has increased by 10. This reflects two things. First, the mutation phase is easily parallelizable on a multi-core desktop computer. Second, a substantial fraction of computational time is spent during the resampling (selection) phase, which is not affected by increasing the number of Metropolis-Hastings steps.

33 Table 6: SW Model: PF Summary Statistics (N = 10) MH BSPF TPF Number of Particles M 40,000 4,000 4,000 40,000 40,000 Target Ineff. Ratio r∗ 2 3 2 3 High Posterior Density: θ = θm ˆ Bias ∆ -235.50 -21.06 -25.20 -6.45 -9.00 1 ˆ StdD ∆ 60.30 10.55 11.92 4.01 5.55 1 ˆ Bias ∆ -1.00 -1.00 -1.00 1.32 -0.61 2 T−1(cid:80)T N 1.00 6.11 4.69 6.07 4.69 t=1 φ,t Average Run Time (s) 3.92 8.45 6.07 81.70 62.33 Low Posterior Density: θ = θl ˆ Bias ∆ -263.31 -26.41 -34.48 -9.66 -13.72 1 ˆ StdD ∆ 78.14 10.85 12.66 5.51 6.31 1 ˆ Bias ∆ -1.00 -1.00 -1.00 0.17 -0.66 2 T−1(cid:80)T N 1.00 6.16 4.74 6.14 4.71 t=1 φ,t Average Run Time (s) 3.69 7.76 6.56 80.52 62.97 ˆ ˆ Notes: The likelihood discrepancies ∆ and ∆ are defined in (47) and (48). Results are 1 2 based on N = 100 runs of the particle filters. run 5 Conclusion We developed a particle filter that automatically adapts the proposal distribution for the particles s j to the current observation y . We start with a forward simulation of the statet t transition equation under an inflated measurement error variance and then gradually reducing the variance to its nominal level. In each step, the particle values and weights change so that the distribution slowly adapts to p(s j y ,s j ). We demonstrate that the algorithm t t t−1 | improves upon the standard bootstrap particle filter, in particular in instances in which the model generates very inaccurate one-step-ahead predictions of y . The tempering iterations t can also be used to improve a particle filter with a more general initial proposal distribution than the BSPF. Morever, our filter can be easily embedded in particle MCMC algorithms. References Andreasen, M. M.(2013): “Non-LinearDSGEModelsandtheCentralDifferenceKalman Filter,” Journal of Applied Econometrics, 28(6), 929–955.

34 Andrieu, C., A. Doucet, and R. Holenstein (2010): “Particle Markov Chain Monte Carlo Methods,” Journal of the Royal Statistical Society Series B, 72(3), 269–342. Arulampalam, S., S. Maskell, N. Gordon, and T. Clapp (2002): “A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking,” IEEE Transactions on Signal Processing, 50(2), 174–188. Bognanni, M., and E. P. Herbst (2015): “Estimating (Markov-Switching) VAR Models without Gibbs Sampling: A Sequential Monte Carlo Approach,” FEDS, 2015(116), 154. Cappe´, O., S. J. Godsill, and E. Moulines (2007): “An Overview of Existing Methods and Recent Advances in Sequential Monte Carlo,” Proceedings of the IEEE, 95(5), 899– 924. Cappe´, O., E. Moulines, and T. Ryden (2005): Inference in Hidden Markov Models. Springer Verlag. Chopin, N. (2002): “A Sequential Particle Filter for Static Models,” Biometrika, 89(3), 539–551. (2004): “Central Limit Theorem for Sequential Monte Carlo Methods and its Application to Bayesian Inference,” Annals of Statistics, 32(6), 2385–2411. Creal, D. (2007): “Sequential Monte Carlo Samplers for Bayesian DSGE Models,” Manuscript, University Chicago Booth. (2012): “ASurveyofSequentialMonteCarloMethodsforEconomicsandFinance,” Econometric Reviews, 31(3), 245–296. Del Moral, P. (2013): Mean Field Simulation for Monte Carlo Integration. Chapman & Hall/CRC. Doucet, A., and A. M. Johansen (2011): “A Tutorial on Particle Filtering and Smoothing: Fifteen Years Later,” in Handook of Nonlinear Filtering, ed. by D. Crisan, and B. Rozovsky. Oxford University Press. Durham, G., and J. Geweke (2014): “Adaptive Sequential Posterior Simulators for Massively Parallel Computing Environments,” in Advances in Econometrics, ed. by I. Jeliazkov, and D. Poirier, vol. 34, chap. 6, pp. 1–44. Emerald Group Publishing Limited.

35 Ferna´ndez-Villaverde, J., and J. F. Rubio-Ram´ırez (2007): “Estimating Macroeconomic Models: A Likelihood Approach,” Review of Economic Studies, 74(4), 1059–1087. Geweke, J., and B. Frischknecht (2014): “Exact Optimization By Means of Sequentially Adaptive Bayesian Learning,” Mimeo. Gordon, N., D. Salmond, and A. F. Smith(1993): “NovelApproachtoNonlinear/Non- Gaussian Bayesian State Estimation,” Radar and Signal Processing, IEE Proceedings F, 140(2), 107–113. Gust, C., E. Herbst, D. Lopez-Salido, and M. E. Smith (2016): “The Empirical Implications of the Interest-Rate Lower Bound,” Manuscript, Federal Reserve Board. Herbst, E., and F. Schorfheide (2014): “Sequential Monte Carlo Sampling for DSGE Models,” Journal of Applied Econometrics, 19(7), 1073–1098. (2015): Bayesian Estimation of DSGE Models. Princeton University Press, Princeton. Kollmann, R. (2015): “Tractable Latent State Filtering for Non-Linear DSGE Models Using a Second-Order Approximation and Pruning,” Computational Economics, 45(2), 239–260. Liu, J. S. (2001): Monte Carlo Strategies in Scientific Computing. Springer Verlag. Pitt, M. K., R. d. S. Silva, P. Giordani, and R. Kohn (2012): “On Some Properties of Markov Chain Monte Carlo Simulation Methods Based on the Particle Filter,” Journal of Econometrics, 171, 134–151. Smets, F., and R. Wouters (2007): “Shocks and Frictions in US Business Cycles: A Bayesian DSGE Approach,” American Economic Review, 97, 586–608. White, H. (2001): Asymptotic Theory For Econometricians. Academic Press.

Online Appendix A-1 Online Appendix for Tempered Particle Filtering Edward Herbst and Frank Schorfheide A Theoretical Derivations A.1 Monotonicity of Inefficiency Ratio Recall the definitions 1 e = (y Ψ(s j,n−1 ,t;θ))(cid:48)Σ−1(y Ψ(s j,n−1 ,t;θ)) j,t 2 t − t u t − t and (cid:18) (cid:19)d/2 φ w˜j,n(φ ) = n exp[ (φ φ )e ]. t n φ − n − n−1 j,t n−1 Provided that the particles had been resampled and W j,n−1 = 1, the inefficiency ratio can t be manipulated as follows: 1 (cid:80)M (cid:0) w˜j,n(φ ) (cid:1)2 InEff(φ ) = M j=1 t n n (cid:16) (cid:17)2 1 (cid:80)M w˜j,n(φ ) M j=1 t n (cid:16) (cid:17)d 1 (cid:80)M φn exp[ 2(φ φ )e ] = M j=1 φn−1 − n − n−1 j,t (cid:18) (cid:16) (cid:17)d/2 (cid:19)2 1 (cid:80)M φn exp[ (φ φ )e ] M j=1 φn−1 − n − n−1 j,t 1 (cid:80)M exp[ 2(φ φ )e ] = M j=1 − n − n−1 j,t (cid:16) (cid:17)2 1 (cid:80)M exp[ (φ φ )e ] M j=1 − n − n−1 j,t A (φ ) 1 n = . A (φ ) 2 n Note that for φ = φ we obtain ESS(φ ) = 1. We now will show that the inefficiency n n−1 n ratio is monotonically increasing on the interval [φ ,1]. Differentiating with respect to φ n−1 n

Online Appendix A-2 yields A(1)(φ )A (φ ) A (φ )A (1)(φ ) InEff(1)(φ ) = n 2 n − 1 n 2 n , n [A (φ )]2 2 n where M 2 (cid:88) A(1)(φ ) = e exp[ 2(φ φ )e ] n j,t n n−1 j,t −M − − j=1 (cid:32) (cid:33)(cid:32) (cid:33) M M 2 (cid:88) 1 (cid:88) A(2)(φ ) = exp[ (φ φ )e ] e exp[ (φ φ )e ] . n n n−1 j,t j,t n n−1 j,t M − − −M − − j=1 j=1 ThedenumeratorinInEff(1)(φ )isalwaysnon-negativeandstrictlydifferentfromzero. Thus, n we can focus on the numerator: A(1)(φ )A (φ ) A (φ )A (1)(φ ) n 2 n 1 n 2 n − (cid:32) (cid:33)(cid:32) (cid:33)2 M M 2 (cid:88) 1 (cid:88) = e exp[ 2(φ φ )e ] exp[ (φ φ )e ] j,t n n−1 j,t n n−1 j,t −M − − M − − j=1 j=1 (cid:32) (cid:33)(cid:32) (cid:33) M M 1 (cid:88) 2 (cid:88) exp[ 2(φ φ )e ] exp[ (φ φ )e ] n n−1 j,t n n−1 j,t − M − − M − − j=1 j=1 (cid:32) (cid:33) M 1 (cid:88) e exp[ (φ φ )e ] j,t n n−1 j,t × −M − − j=1 (cid:32) (cid:33) M 1 (cid:88) = 2 exp[ (φ φ )e ] n n−1 j,t M − − j=1 (cid:32) (cid:33)(cid:32) (cid:33) (cid:20) M M 1 (cid:88) 1 (cid:88) e exp[ (φ φ )e ] exp[ 2(φ φ )e ] j,t n n−1 j,t n n−1 j,t × M − − M − − j=1 j=1 (cid:32) (cid:33)(cid:32) (cid:33) M M (cid:21) 1 (cid:88) 1 (cid:88) e exp[ 2(φ φ )e ] exp[ (φ φ )e ] j,t n n−1 j,t n n−1 j,t − M − − M − − j=1 j=1 To simplify the notation we now define x = exp[ (φ φ )e ]. j,t n n−1 j,t − −

Online Appendix A-3 Note that 0 < x 1, which implies that x2 x . Moreover, e 0. We will use these j,t ≤ j,t ≤ j,t j,t ≥ properties to establish the following bound: A(1)(φ )A (φ ) A (φ )A (1)(φ ) n 2 n 1 n 2 n − (cid:32) (cid:33)(cid:34)(cid:32) (cid:33)(cid:32) (cid:33) (cid:32) (cid:33)(cid:32) (cid:33)(cid:35) M M M M M 1 (cid:88) 1 (cid:88) 1 (cid:88) 1 (cid:88) 1 (cid:88) = 2 x e x x2 e x2 x M j,t M j,t j,t M j,t − M j,t j,t M j,t j=1 j=1 j=1 j=1 j=1 (cid:32) (cid:33)(cid:34)(cid:32) (cid:33)(cid:32) (cid:33) (cid:32) (cid:33)(cid:32) (cid:33)(cid:35) M M M M M 1 (cid:88) 1 (cid:88) 1 (cid:88) 1 (cid:88) 1 (cid:88) 2 x e x x2 e x2 x2 ≥ M j,t M j,t j,t M j,t − M j,t j,t M j,t j=1 j=1 j=1 j=1 j=1 (cid:32) (cid:33)(cid:32) (cid:33)(cid:34)(cid:32) (cid:33) (cid:32) (cid:33)(cid:35) M M M M 1 (cid:88) 1 (cid:88) 1 (cid:88) 1 (cid:88) = 2 x x2 e x e x2 M j,t M j,t M j,t j,t − M j,t j,t j=1 j=1 j=1 j=1 0. ≥ We conclude that the inefficiency ratio InEff(φ ) in increasing in φ . (cid:4) n n A.2 Proofs for Section 3.1 The proofs in this section closely follow Chopin (2004) and Herbst and Schorfheide (2014). Throughout this section we will assume that h(θ) is scalar and we use absolute values h | | instead of a general norm h . Extensions to vector-valued h functions are straightforward. (cid:107) (cid:107) We use C to denote a generic constant. We will make repeated use of the following moment bound for r > 1 E(cid:2)(cid:12) (cid:12)X E[X] (cid:12) (cid:12) r(cid:3) C (cid:0)E(cid:2)(cid:12) (cid:12)X (cid:12) (cid:12) r(cid:3) + (cid:12) (cid:12) E[X] (cid:12) (cid:12) r(cid:1) (A.1) r − ≤ 2C E(cid:2)(cid:12) (cid:12)X (cid:12) (cid:12) r(cid:3) , r ≤ where C = 1 for r = 1 and C = 2r−1 for r > 1. The first inequality follows from the C r r r inequality and the second inequality follows from Jensen’s inequality. We will make use of the following SLLN (Markov, see White (2001) p. 33): Let Z be j { } a sequence of independent random variables with finite means µ = E[Z ]. If for some δ > 0, j j (cid:80)∞ E(cid:2) Z µ 1+δ (cid:3) /j1+δ < , then 1 (cid:80)M Z µ a.s 0. Note that E(cid:2) Z µ 1+δ (cid:3) < j=1 | j − j | ∞ M j=1 j − j −→ | j − j | C < implies that the summability condition is satisfied because (cid:80)∞ (1/j)1+δ < . ∞ j=1 ∞

Online Appendix A-4 Recall that under a multivariate Gaussian measurement error distribution, the density p (y s ) can be bounded from above. Thus, p (y s ) n˜ and ω˜j,n n˜ for any t˜and n˜. n t | t n t | t ∈ Ht˜ t ∈ Ht˜ Moreover, for any h(s ,s ) n we can deduce that h ˜ ( ) = h( )ω˜j,n n. t t−1 ∈ Ht · · t ∈ Ht Proof of Lemma 1. (i) (30) can be established as follows. Consider the summands I and II in (34). Recall that M 1 (cid:88)(cid:16) (cid:17) I = h(s˜j ,s j,N φ) E [h] W j,N φ. M t t−1 − p(·|sj t−1 ) t−1 j=1 j,N j,N Conditional on the particles s φ,W φ the summands in term I form a triangular array t−1 t−1 { } of mean-zero random variables that within each row are independently distributed. In Algorithm 4 the particles were resampled during the t 1 tempering iteration N , such that φ − N W φ = 1. Using the bound t−1 (cid:20) (cid:21) E (cid:12) (cid:12)h(s ,s ) E [h(s ,s )] (cid:12) (cid:12) 1+δ < C < p(·|st−1) t t−1 − p(·|st−1) t t−1 ∞ implied by the definition of 1 we can apply the SLLN to deduce that term I a.s. 0. The Ht −→ second term in (34) takes the form M (cid:18) (cid:90) (cid:90) (cid:19) 1 (cid:88) II = E [h] h(s ,s )p(s ,s Y )ds ds . M p(·|sj t−1 ) − t t−1 t t−1 | 1:t−1 t t−1 j=1 By assumption, E [h] N φ . The almost-sure convergence to zero of term II can now p(·|st−1) ∈ H t−1 be deduced from the recursive Assumption (29). By combining the convergence results for terms I and II we have established (30). (ii) The convergences in (31) and (33) follow immediately from the fact that p (y s ) 1 t t | ∈ 1. Moreover, if h(s ,s ) 1, then h(s ,s )p (y s ) 1. Finally, p (y s ) > 0, Ht t t−1 ∈ Ht t t−1 1 t | t ∈ Ht 1 t | t which implies that the almost-sure limit of the denominator in (31) is strictly positive. (iii)Toverify(32),letF ˜ denotetheσ-algebrageneratedbytheparticles s˜j,1 ,s j,N φ,W j,1 . t,1,M t t−1 t { }

Online Appendix A-5 Moreover, define M 1 (cid:88) E[h F ˜ ] = h(s˜j,1 ,s j,N φ)W ˜ j,1 , | t,1,M M t t−1 t j=1 and write (cid:90) h ¯1 h(s ,s )p (s ,s Y )ds (A.2) t,M − t t−1 1 t t−1 | 1:t t M = 1 (cid:88)(cid:0) h(s j,1 ,s j,N φ) E[h F ˜ ] (cid:1) M t t−1 − | t,1,M j=1 (cid:18) M (cid:90) (cid:90) (cid:19) 1 (cid:88) + h(s˜j,1 ,s N φ )W ˜ j,1 h(s ,s )p (s ,s Y )ds ds M t t−1 t − t t−1 1 t t−1 | 1:t t t−1 j=1 M = 1 (cid:88)(cid:0) h(s j,1 ,s j,N φ) E[h F ˜ ] (cid:1) M t t−1 − | t,1,M j=1 (cid:18) (cid:90) (cid:90) (cid:19) + h ˜1 h(s ,s )p (s ,s Y )ds ds t,M − t t−1 1 t t−1 | 1:t t t−1 = I +II. From (31), we can immediately deduce that term II converges to zero almost surely. Recall that we are resampling the pairs (s˜j,n ,s j,N φ). Thus, under multinomial resampling the t t−1 h(s j,1 ,s j,N φ)’s form a triangular array of iid random variables conditional on F ˜ . Using t t t,1,M Kolmogorov’s SLLN for iid sequences, it suffices to verify for that (cid:20) (cid:12) (cid:21) E (cid:12) (cid:12)h(s j,1 ,s j,N φ) (cid:12) (cid:12) (cid:12) (cid:12) F ˜ < . t t t,1,M (cid:12) ∞ We can manipulate the moment as follows (cid:20) (cid:12) (cid:21) M E (cid:12) (cid:12)h(s j,1 ,s j,N φ) (cid:12) (cid:12) (cid:12) (cid:12) F ˜ = 1 (cid:88)(cid:12) (cid:12)h(s˜j,1 ,s j,N φ) (cid:12) (cid:12)W ˜ j,1 t t−1 (cid:12) t,1,M M t t−1 t j=1 (cid:90) a.s (cid:12) (cid:12) (cid:12)h(s ,s )(cid:12)p (s ,s Y )ds ds < . t t−1 1 t t−1 1:t t t−1 −→ | ∞ Because h 1 implies h 1, we obtain the almost sure convergence from (31). (cid:4) ∈ Ht | | ∈ Ht Proof of Lemma 2. (i) We begin with the correction step. Recall that for any h(s ,s ) t t−1 ∈

Online Appendix A-6 n−1: t H p (y s )/p (y s ) n−1 and h(s ,s )p (y s )/p (y s ) n−1. n t | t n−1 t | t ∈ Ht t t−1 n t | t n−1 t | t ∈ Ht Thus, the recursive Assumption 2 yields the almost-sure convergence in (36) and (38). (ii) We proceed with the selection step. The convergence in (37) can be established with an argument similar to the one used in Step (iii) of the proof of Lemma 1. (iii) Finally, consider the mutation step. To establish the convergence in (39) we need to construct moment bounds for the terms I and II that appear in (41). Under the assumption that the resampling step is executed at every iteration n, term I takes the form: M (cid:18) (cid:19) 1 (cid:88) I = h(s j,n ,s j,N φ) E [h] . M t t−1 − Kn(·|sˆj t ,n;s j t− ,N 1 φ) j=1 Conditional on the σ-algebra generated by the particles of the selection step, term I is an average of independently distributed mean-zero random variables. By virtue of h n, we ∈ Ht can deduce that, omitting the j and n superscripts, E (cid:2)(cid:12) (cid:12)h(s ,s ) E [h] (cid:12) (cid:12) 1+δ(cid:3) < C < . Kn(·|sˆt;st−1) t t−1 − Kn(·|sˆt;st−1) ∞ a.s. Therefore, the SLLN implies that I 0. For term II, we have −→ M (cid:18) (cid:90) (cid:90) (cid:19) 1 (cid:88) II = E [h] h(s ,s )p (s ,s Y )ds ds . M Kn(·|sˆj t ,n;s j t− ,N 1 φ) − t t−1 n t t−1 | 1:t t t−1 j=1 Now define ψ(sˆ,s ) = E [h]. t t−1 Kn(·|sˆt;st−1) The definition of n implies that ψ(sˆ,s ) n−1. Thus, we can deduce from (37) that Ht t t−1 ∈ H t II a.s. 0. (cid:4) −→

Online Appendix A-7 A.3 Proofs for Section 3.2 The subsequent proof of the unbiasedness of the particle filter approximation utilizes Lemmas 3 and 5 below. Throughout this section, we use the convention that W j,0 = W j,N φ. t t−1 Moreover, we often use the j subscript to denote a fixed particle as well as a running index in a summation. That is, we write aj/ (cid:80)M aj instead of aj/ (cid:80)M al. We also define the j=1 l=1 information set = (cid:8) (s j,N φ,W j,N φ),(s j,1 ,W j,1),...,(s j,N φ,W j,N φ),..., (A.3) t−1,n,M 0 0 1 1 1 1 F (s j,1 ,W j,1),...,(s j,n ,W j,n) (cid:9)M . t−1 t−1 t−1 t−1 j=1 A.3.1 Additional Lemmas Lemma 3 Suppose that the incremental weights w˜j,n are defined as in (7) and (14) and that t there is no resampling. Then   (cid:89) N φ (cid:32) 1 (cid:88) M (cid:33) 1 (cid:88) M (cid:89) N φ w˜j,n W j,n−1 =  w˜j,n W j,N φ (A.4) M T T M T T−1 n=1 j=1 j=1 n=1 and   (cid:89) N φ (cid:32) 1 (cid:88) M (cid:33) (cid:89) N φ W j,N φ w˜j,n W j,n−1 =  w˜j,n W j,N φ . (A.5) T−h−1 M T−h−1 T−h−1 T−h−1 T−h−2 n=1 j=1 n=1 Proof of Lemma 3. The lemma can be proved by induction. If there is no resampling, then W j,n = W ˜ j,n. t t Part 1. The inductive hypothesis to show (A.4) takes the form   (cid:89) N φ (cid:32) 1 (cid:88) M (cid:33) 1 (cid:88) M (cid:89) N φ w˜j,n W j,n−1 =  w˜j,n W j,n∗−1 . (A.6) M T T M T T n=n∗ j=1 j=1 n=n∗

Online Appendix A-8 If the hypothesis is correct, then (cid:89) N φ (cid:32) 1 (cid:88) M (cid:33) w˜j,n W j,n−1 (A.7) M T T n=n∗−1 j=1     1 (cid:88) M (cid:89) N φ (cid:32) 1 (cid:88) M (cid:33) =   w˜j,n W j,n∗−1  w˜j,n∗−1 W j,n∗−2 M T T M T T j=1 n=n∗ j=1     1 (cid:88) M (cid:89) N φ w˜j,n∗−1 W j,n∗−2 (cid:32) 1 (cid:88) M (cid:33) =   w˜j,n  T T  w˜j,n∗−1 W j,n∗−2 M T 1 (cid:80)M w˜j,n∗−1 W j,n∗−2 M T T j=1 n=n∗ M j=1 T T j=1   1 (cid:88) M (cid:89) N φ =  w˜j,n W j,n∗−2 . M T T j=1 n=n∗−1 The first equality follows from (A.6) and the second equality is obtained by using the definition of W j,n∗−1. T It is straightforward to verify that the inductive hypothesis (A.6) is satisfied for n = N . ∗ φ Setting n = 1 in (A.6) and noticing that W j,0 = W j,N φ leads the desired result. ∗ T T−1 Part 2. To show (A.5), we can use the inductive hypothesis   (cid:89) N φ (cid:32) 1 (cid:88) M (cid:33) (cid:89) N φ W j,N φ w˜j,n W j,n−1 =  w˜j,n W j,n∗−1 . (A.8) T−h−1 M T−h−1 T−h−1 T−h−1 T−h−1 n=n∗ j=1 n=n∗ If the inductive hypothesis is satisfied, then (cid:89) N φ (cid:32) 1 (cid:88) M (cid:33) W j,N φ w˜j,n W j,n−1 (A.9) T−h−1 M T−h−1 T−h−1 n=n∗−1 j=1 (cid:89) N φ (cid:32) 1 (cid:88) M (cid:33)(cid:32) 1 (cid:88) M (cid:33) = W j,N φ w˜j,n W j,n−1 w˜j,n∗−1 W j,n∗−2 T−h−1 M T−h−1 T−h−1 M T−h−1 T−h−1 n=n∗ j=1 j=1   (cid:89) N φ w˜j,n∗−1 W j,n∗−2 (cid:32) 1 (cid:88) M (cid:33) =  w˜j,n  T−h−1 T−h−1 w˜j,n∗−1 W j,n∗−2 T−h−1 1 (cid:80)M w˜j,n∗−1 W j,n∗−2 M T−h−1 T−h−1 n=n∗ M j=1 T−h−1 T−h−1 j=1   N φ (cid:89) =  w˜j,n W j,n∗−2 . T−h−1 T−h−1 n=n∗−1

Online Appendix A-9 For n = N the validity of the inductive hypothesis can be verified as follows: ∗ φ (cid:32) (cid:33) M 1 (cid:88) j,N j,N j,N −1 W φ w˜ φ W φ (A.10) T−h−1 M T−h−1 T−h−1 j=1 (cid:32) (cid:33) w˜ j,N φ W j,N φ −1 1 (cid:88) M = T−h−1 T−h−1 w˜ j,N φ W j,N φ −1 1 (cid:80)M w˜ j,N φ W j,N φ −1 M T−h−1 T−h−1 M j=1 T−h−1 T−h−1 j=1 j,N j,N −1 = w˜ φ W φ . T−h−1 T−h−1 Setting n = 1 in (A.8) leads to the desired result. (cid:4) ∗ The following lemma simply states that the expected value of a sum is the sum of the expected values, but it does so using a notation that we will encounter below. Lemma 4 Supposesj, j = 1,...,M, isasequenceofrandomvariableswithdensity (cid:81)M p(sj), j=1 then (cid:90) (cid:90) (cid:18) M (cid:19)(cid:18) M (cid:19) M (cid:90) 1 (cid:88) (cid:89) 1 (cid:88) f(sj) p(sj) ds1 dsM = f(sj)p(sj)dsj. ··· M ··· M j=1 j=1 j=1 Proof of Lemma 4. The statement is trivially satisfied for M = 1. Suppose that it is true for M 1, then − (cid:90) (cid:90) (cid:18) M (cid:19)(cid:18) M (cid:19) 1 (cid:88) (cid:89) f(sj) p(sj) ds1 dsM (A.11) ··· M ··· j=1 j=1 (cid:90) (cid:90) (cid:18) M−1 (cid:19)(cid:18) M−1 (cid:19) 1 M 1 1 (cid:88) (cid:89) = f(sM)+ − f(sj) p(sM) p(sj) ds1 dsM ··· M M M 1 ··· − j=1 j=1 (cid:18) (cid:90) (cid:19)M−1(cid:90) 1 (cid:89) = f(sM)p(sM)dsM p(sj)dsj M j=1 (cid:18) M−1(cid:90) (cid:19)(cid:90) M 1 1 (cid:88) + − f(sj)p(sj)dsj p(sM)dsM M M 1 − j=1 M (cid:90) 1 (cid:88) = f(sj)p(sj)dsj, (A.12) M j=1 which verifies the claim for all M by induction. (cid:4)

Online Appendix A-10 Lemma 5 Suppose that the incremental weights w˜j,n are defined as in (7) and (14) and t that the selection step is implemented by multinomial resampling for a predetermined set of iterations n . Then ∈ N   E  (cid:89) N φ (cid:32) M 1 (cid:88) M w˜ T j,n W T j,n−1 (cid:33) (cid:12) (cid:12) (cid:12) (cid:12)F T−1,N φ ,M = M 1 (cid:88) M p(y T | s j T ,N − φ 1 )W T j, − N 1 φ (A.13) n=1 j=1 j=1 and   M 1 (cid:88) M E p(Y T−h:T | s j T ,N − φ h−1 )W T j, − N h φ −1 (cid:89) N φ (cid:32) M 1 (cid:88) M w˜ T j,n −h−1 W T j, − n− h− 1 1 (cid:33) (cid:12) (cid:12) (cid:12) (cid:12)F T−h−2,N φ ,M (A.14) j=1 n=1 j=1 M 1 (cid:88) j,N j,N = p(Y s φ )W φ . M T−h−1:T | T−h−2 T−h−2 j=1 Proof of Lemma 5. We first prove the Lemma under the assumption of no resampling, i.e., = . We then discuss how the proof can be modified to allow for resampling. N ∅ Part 1 (No Resampling). We deduce from Lemma 3 that   E (cid:20) (cid:89) N φ (cid:32) M 1 (cid:88) M w˜ T j,n W T j,n−1 (cid:33) (cid:12) (cid:12) (cid:12) (cid:12)F T−1,N φ ,M (cid:21) = M 1 (cid:88) M E (cid:20)  (cid:89) N φ w˜ T j,n W T j, − N 1 φ (cid:12) (cid:12) (cid:12) (cid:12)F T−1,N φ ,M (cid:21) . n=1 j=1 j=1 n=1 (A.15) The subsequent derivations focus on the evaluation of the expectation on the right-handside of this equation. We will subsequently integrate over the particles s 1:M,1 ,...,s 1:M,N φ −1 , T T which enter the incremental weights w˜j,n. We use s 1:M,n to denote the set of particle values T T s 1,n ,...,s M,n . Because W j,N φ it suffices to show that { T T } T−1 ∈ F T−1,N φ ,M   (cid:20) N φ (cid:12) (cid:21) E  (cid:89) w˜ T j,n  (cid:12) (cid:12) (cid:12)F T−1,N φ ,M = p(y T | s j T ,N − φ 1 ). (A.16) n=1 Recall that the initial state particle s j,1 is generated from the state-transition equation T

Online Appendix A-11 j,N p(s s φ). The first incremental weight is defined as T | T−1 w˜j,1 = p (y s j,1). T 1 T | T The incremental weight in tempering iteration n is given by p (y s j,n−1) w˜j,n = n T | T . T p (y s j,n−1) n−1 T | T Becauseweareomittingtheselectionstep,thenewparticlevalueisgeneratedinthemutation step by sampling from the Markov transition kernel s j,n K (sn s j,n−1 ,s j,N φ), (A.17) T ∼ n T| T T−1 which has the invariance property (cid:90) p (s y ,s ) = K (s sˆ ;s )p (sˆ y ,s )dsˆ . (A.18) n T T T−1 n T T T−1 n T T T−1 T | | | Using the above notation, we can write    N φ (cid:12) E  (cid:89) w˜ T j,n  (cid:12) (cid:12) (cid:12)F T−1,N φ ,M (A.19) n=1   (cid:90) (cid:90) (cid:89) N φ p (y s j,n−1) = ···  p n (y T | s T j,n−1) K n−1 (s j T ,n−1 | s j T ,n−2 ,s j T ,N − φ 1 ) n=3 n−1 T | T p (y s j,1) 2 T | T p (y s j,1)p(s j,1 s j,N φ)ds j,1 ds j,N φ −1 . ×p (y s j,1) 1 T | T T | T−1 T ··· T 1 T | T The bridge posterior densities were defined as (cid:90) p (y s )p(s s ) n t t t t−1 p (s y ,s ) = | | , p (y s ) = p (y s )p(s s )ds . (A.20) n t t t−1 n t t−1 n t t t t−1 t | p (y s ) | | | n t t−1 | Using the invariance property of the transition kernel in (A.18) and the definition of the

Online Appendix A-12 bridge posterior densities, we deduce that (cid:90) K (s j,n−1 s j,n−2 ,s j,N φ)p (y s j,n−2)p(s j,n−2 s j,N φ)ds j,n−2 (A.21) n−1 T | T T−1 n−1 T | T T | T−1 T (cid:90) = K (s j,n−1 s j,n−2 ,s j,N φ)p (s j,n−2 y ,s j,N φ)p (y s j,N φ)ds j,n−2 n−1 T | T T−1 n−1 T | T T−1 n−1 T | T−1 T = p (s j,n−1 y ,s j,N φ)p (y s j,N φ) n−1 T | T T−1 n−1 T | T−1 = p (y s j,n−1)p(s j,n−1 s j,N φ). n−1 T | T T | T−1 The first equality follows from Bayes Theorem in (A.20). The second equality follows from the invariance property of the transition kernel. The third equality uses Bayes Theorem again. We can now evaluate the integrals in (A.19). Consider the terms involving s j,1 T (cid:90) p (y s j,1) K (s j,2 s j,1 ,s j,N φ) 2 T | T p (y s j,1)p(s j,1 s j,N φ)ds j,1 (A.22) 2 T | T T−1 p (y s j,1) 1 T | T T | T−1 T (cid:90) 1 T | T = K (s j,2 s j,1 ,s j,N φ)p (y s j,1)p(s j,1 s j,N φ)ds j,1 2 T | T T−1 2 T | T T | T−1 T = p (y s j,2)p(s j,2 s j,N φ). 2 T | T T | T−1 Thus,    N φ (cid:12) E  (cid:89) w˜ T j,n  (cid:12) (cid:12) (cid:12)F T−1,N φ ,M (A.23) n=1   (cid:90) (cid:90) (cid:89) N φ p (y s j,n−1) = ···  p n (y T | s T j,n−1) K n−1 (s j T ,n−1 | s j T ,n−2 ,s j T ,N − φ 1 ) n=4 n−1 T | T p (y s j,2) 3 T | T p (y s j,2)p(s j,2 s j,N φ)ds j,2 ds j,N φ −1 ×p (y s j,2) 2 T | T T | T−1 T ··· T 2 T | T (cid:90) p (y s j,N φ −1 ) = N φ T | T p (y s j,N φ −1 )p(s j,N φ −1 s j,N φ)ds j,N φ −1 p (y s j,N φ −1 ) N φ −1 T | T T | T−1 T N φ −1 T | T j,N = p(y s φ). T | T−1 The first equality follows from (A.22). The second equality is obtained by sequentially integrating out s j,2 ,...,s j,N φ−2, using a similar argument as for s j,1. This proves the first T T T

Online Appendix A-13 part of the Lemma. Part 2 (No Resampling). Using Lemma 3 we write E (cid:20) p(Y s j,N φ ,θ)W j,N φ (cid:89) N φ (cid:18) 1 (cid:88) M w˜j,n W j,n−1 (cid:19)(cid:12) (cid:12) (cid:12) (cid:21) T−h:T | T−h−1 T−h−1 M T−h−1 T−h−1 (cid:12)F T−h−2,N φ ,M n=1 j=1   (cid:20) N φ (cid:12) (cid:21) = E p(Y T−h:T | s j T ,N − φ h−1 ,θ) (cid:89) w˜ T j,n −h−1 W T j, − N h φ −2 (cid:12) (cid:12) (cid:12)F T−h−2,N φ ,M (A.24) n=1 To prove the second part of the Lemma we slightly modify the last step of the integration in (A.23):     N φ (cid:12) E p(Y T−h:T | s j T ,N − φ h−1 ) (cid:89) w˜ T j,n −h−1  (cid:12) (cid:12) (cid:12)F T−2,N φ ,M (A.25) n=1 (cid:90) j,N j,N −1 j,N −1 j,N j,N −1 = p(Y s φ )p (y s φ )p(s φ s φ )ds φ T−h:T | T−h−1 N φ T−h−1 | T−h−1 T−h−1| T−h−2 T−h−1 j,N = p(Y s φ ), T−h−1:T | T−h−2 as required. Part 1 (Resampling in tempering iteration n¯). We now assume that the selection step is executed once, in iteration n¯, i.e., = n¯ . For reasons that will become apparent subse- N { } quently, we will use i subscripts for particles in stages n = 1,...,n¯ 1. Using Lemma 3, we − deduce that it suffices to show: (cid:20)(cid:18)n¯−1(cid:18) M (cid:19)(cid:19)(cid:18) M (cid:19) (cid:89) 1 (cid:88) 1 (cid:88) E w˜i,n W i,n−1 w˜j,n¯ W j,n¯−1 (A.26) M T T M T T n=1 i=1 j=1 (cid:18) 1 (cid:88) M (cid:18) (cid:89) N φ w˜j,n (cid:19) W j,n¯ (cid:19)(cid:12) (cid:12) (cid:12) (cid:21) × M T T (cid:12)F T−1,N φ ,M j=1 n=n¯+1 M 1 (cid:88) j,N j,N = p(y s φ)W φ. M T | T−1 T−1 j=1 To evaluate the expectation, we need to integrate over the particles s 1:M,1 ,...,s 1:M,N φ as well T T

Online Appendix A-14 as the particles sˆ1:M,n¯ generated during the selection step. We have to distinguish two cases: T Case 1, n = n¯ : s j,n K (s j,n s j,n−1 ,s j,N φ), j = 1,...,M (cid:54) T ∼ n T | T T Case 2, n = n¯ : s j,n K (s j,n sˆj,n ,s j,N φ), j = 1,...,M; T ∼ n T | T T sˆj,n MN (cid:0) s 1:M,n−1 ,W ˜ 1:M,n(cid:1) , j = 1,...,M T ∼ T T where MN( ) here denotes the multinomial distribution. · Inapreliminarystep, weareintegratingouttheparticlessˆ1:M,n¯. Theseparticlesenterthe T Markov transition kernel K (s j,n¯ sˆj,n¯ ,s j,N φ) as well as the conditional density p(sˆj,n¯ s 1:M,n¯−1). n¯ T | T T−1 T | T Under the assumption that the resampling step is executed using multinomial resampling, M 1 (cid:88) p(sˆj,n¯ s 1:M,n¯−1) = W ˜ j,n¯ δ(sˆj,n¯ s i,n¯−1), T | T M T T − T i=1 (cid:82) where δ(x) is the Dirac function with the property that δ(x) = 0 for x = 0 and δ(x)dx = 1. (cid:54) Integrating out the resampled particles yields p(s 1:M,n¯ s 1:M,n¯−1) (A.27) T | T (cid:90) M (cid:20) M (cid:21) (cid:89) 1 (cid:88) = K (s j,n¯ sˆj,n¯ ,s j,N φ) W ˜ i,n¯ δ(sˆj,n¯ s i,n¯−1) dsˆ1:M,n¯ n¯ T | T T−1 M T T − T T j=1 i=1 M (cid:90) (cid:20) M (cid:21) (cid:89) 1 (cid:88) = K (s j,n¯ sˆj,n¯ ,s j,N φ) W ˜ i,n¯ δ(sˆj,n¯ s i,n¯−1) dsˆj,n¯ n¯ T | T T−1 M T T − T T j=1 i=1 M (cid:20) M (cid:21) (cid:89) 1 (cid:88) = W ˜ i,n¯ K (s j,n¯ s i,n¯−1 ,s i,N φ) . M T n¯ T | T T−1 j=1 i=1 In the last equation, the superscript for s changes from j to i because during the resam- T−1 pling, we keep track of the history of the particle. Thus, if for particle j = 1 the value sˆ1,n¯ T is set to s 3,n¯−1, we also use s 3,N φ for this particle. T T−1 We can now express the expected value, which we abbreviate as , as the following E

Online Appendix A-15 integral: (cid:20)(cid:18)n¯−1(cid:18) M (cid:19)(cid:19)(cid:18) M (cid:19) (cid:89) 1 (cid:88) 1 (cid:88) = E w˜i,n W i,n−1 w˜j,n¯ W j,n¯−1 (A.28) E M T T−1 M T T n=1 i=1 j=1 (cid:18) 1 (cid:88) M (cid:18) (cid:89) N φ w˜j,n (cid:19) W j,n¯ (cid:19)(cid:12) (cid:12) (cid:12) (cid:21) × M T T (cid:12)F T−1,N φ ,M j=1 n=n¯+1 (cid:90) (cid:90) (cid:18)n (cid:89) ¯−1(cid:18) 1 (cid:88) M (cid:19)(cid:19)(cid:18) 1 (cid:88) M (cid:19)(cid:18) 1 (cid:88) M (cid:18) (cid:89) N φ (cid:19)(cid:19) = w˜i,n W i,n−1 w˜j,n¯ W j,n¯−1 w˜j,n ··· M T T−1 M T T M T n=1 i=1 j=1 j=1 n=n¯+1 (cid:18)n¯−1 M (cid:19)(cid:18) M (cid:20) M (cid:21)(cid:19) (cid:89)(cid:89) (cid:89) 1 (cid:88) K (s i,n s i,n−1 ,s i,N φ) W ˜ i,n¯ K (s j,n¯ s i,n¯−1 ,s i,N φ) × n T | T T−1 M T n¯ T | T T−1 n=1j=1 j=1 i=1 (cid:18) N φ −1 M (cid:19) (cid:89) (cid:89) K (s j,n s j,n−1 ,s j,N φ) ds 1:M,1 ds 1:M,N φ −1 . × n T | T T−1 T ··· T n=n¯+1j=1 For the second equality we used the fact that W j,n¯ = 1. T Using Lemma 4, we can write (cid:90) (cid:90) (cid:18) 1 (cid:88) M (cid:18) (cid:89) N φ (cid:19)(cid:18) N (cid:89) φ −1 (cid:89) M (cid:19) w˜j,n K (s j,n s j,n−1 ,s j,N φ) ds 1:M,n¯+1 ds 1:M,N φ −1 ··· M T n T | T T−1 T ··· T j=1 n=n¯+1 n=n¯+1j=1 1 (cid:88) M (cid:90) (cid:90) (cid:18) (cid:89) N φ (cid:19)(cid:18) N (cid:89) φ −1 (cid:19) = w˜j,n K (s j,n s j,n−1 ,s j,N φ) ds j,n¯+1 ds j,N φ −1 M ··· T n T | T T−1 T ··· T j=1 n=n¯+1 n=n¯+1 M 1 (cid:88) = F(s j,n¯ ,s j,N φ). (A.29) M T T−1 j=1

Online Appendix A-16 Now consider the following integral involving terms that depend on s 1:M,n¯: T (cid:90) (cid:18) M (cid:19)(cid:18) M (cid:19) I = 1 (cid:88) F (cid:0) s j,n¯ ,s j,N φ (cid:1) 1 (cid:88) w˜j,n¯ W j,n¯−1 (A.30) 1 M T T−1 M T T j=1 j=1 M (cid:20) M (cid:21) (cid:89) 1 (cid:88) W ˜ i,n¯ K (s j,n¯ s i,n¯−1 ,s i,N φ) ds 1:M,n¯ × M T n¯ T | T T−1 T j=1 i=1 (cid:18) M (cid:90) (cid:20) M (cid:21) (cid:19) = 1 (cid:88) F (cid:0) s j,n¯ ,s j,N φ (cid:1) 1 (cid:88) W ˜ i,n¯ K (s j,n¯ s i,n¯−1 ,s i,N φ) ds j,n¯ M T T−1 M T n¯ T | T T−1 T j=1 i=1 (cid:18) M (cid:19) 1 (cid:88) w˜j,n¯ W j,n¯−1 × M T T j=1 M (cid:90) (cid:20) M (cid:21) = 1 (cid:88) F (cid:0) s j,n¯ ,s j,N φ (cid:1) 1 (cid:88) w˜i,n¯ W i,n¯−1 K (s j,n¯ s i,n¯−1 ,s i,N φ) ds j,n¯ . M T T−1 M T T n¯ T | T T−1 T j=1 i=1 The first equality is the definition of I . The second equality is a consequence of Lemma 4. 1 The last equality is obtained by recalling that w˜i,n¯ W i,n¯−1 W ˜ i,n¯ = T T . T 1 (cid:80)M w˜i,n¯ W i,n¯−1 M i=1 T T We proceed in the evaluation of the expected value by integrating over the particle E values s 1:M,1 ,...,s 1:M,n¯−1: T T (cid:90) (cid:90) (cid:18)n¯−1(cid:18) M (cid:19)(cid:19) (cid:89) 1 (cid:88) = I w˜i,n W i,n−1 (A.31) E ··· 1 · M T T n=1 i=1 (cid:18)n¯−1 M (cid:19) (cid:89)(cid:89) K (s i,n s i,n−1 ,s i,N φ) ds 1:M,1 ds 1:M,n¯−1 , × n T | T T−1 T ··· T n=1j=1

Online Appendix A-17 where (cid:18)n¯−1(cid:18) M (cid:19)(cid:19) (cid:89) 1 (cid:88) I w˜i,n W i,n−1 1 · M T T n=1 i=1 (cid:18) M (cid:90) (cid:20) M (cid:21) (cid:19) = 1 (cid:88) F (cid:0) s j,n¯ ,s j,N φ (cid:1) 1 (cid:88) w˜i,n¯ W i,n¯−1 K (s j,n¯ s i,n¯−1 ,s i,N φ) ds j,n¯ M T T−1 M T T n¯ T | T T−1 T j=1 i=1 (cid:18)n¯−1(cid:18) M (cid:19)(cid:19) (cid:89) 1 (cid:88) w˜i,n W i,n−1 × M T T n=1 i=1 M (cid:90) (cid:20) M (cid:18)n¯−1(cid:18) M (cid:19)(cid:19) = 1 (cid:88) F (cid:0) s j,n¯ ,s j,N φ (cid:1) 1 (cid:88) w˜i,n¯ W i,n¯−1 (cid:89) 1 (cid:88) w˜i,n W i,n−1 M T T−1 M T T M T T j=1 i=1 n=1 i=1 (cid:21) K (s j,n¯ s i,n¯−1 ,s i,N φ) ds j,n¯ × n¯ T | T T−1 T M (cid:90) (cid:20) M (cid:18)n¯−1 (cid:19) = 1 (cid:88) F (cid:0) s j,n¯ ,s j,N φ (cid:1) 1 (cid:88) w˜i,n¯ (cid:89) w˜i,n W i,N φ M T T−1 M T T T−1 j=1 i=1 n=1 (cid:21) K (s j,n¯ s i,n¯−1 ,s i,N φ) ds j,n¯ . × n¯ T | T T−1 T The last equality follows from the second part of Lemma 3. Notice the switch from j to i superscript for functions of particles in stages n < n¯. Thus, M (cid:90) (cid:90) (cid:90) (cid:20) M (cid:18)n¯−1 (cid:19) = 1 (cid:88) F (cid:0) s j,n¯ ,s j,N φ (cid:1) 1 (cid:88) w˜i,n¯ (cid:89) w˜i,n W i,N φ (A.32) E M T T−1 ··· M T T T−1 j=1 i=1 n=1 (cid:21)(cid:18)n¯−1 M (cid:19) (cid:89)(cid:89) K (s j,n¯ s i,n¯−1 ,s i,N φ) K (s i,n s i,n−1 ,s i,N φ) ds 1:M,1 ds 1:M,n¯−1 ds j,n¯ × n¯ T | T T−1 n T | T T−1 T ··· T T n=1i=1 M (cid:90) (cid:20) M (cid:90) (cid:90) (cid:18) n¯ (cid:19) = 1 (cid:88) F (cid:0) s j,n¯ ,s j,N φ (cid:1) 1 (cid:88) (cid:89) w˜i,n W i,N φ M T T−1 M ··· T T−1 j=1 i=1 n=1 n¯−1 (cid:21) (cid:89) K (s j,n¯ s i,n¯−1 ,s i,N φ) K (s i,n s i,n−1 ,s i,N φ)ds i,1 ds i,n¯−1 ds j,n¯ × n¯ T | T T−1 n T | T T−1 T ··· T T n=1 The second equality follows from Lemma 4. The calculations in (A.23) imply that (cid:90) (cid:90) (cid:18) n¯ (cid:19) n¯−1 (cid:89) (cid:89) w˜i,n W i,N φ K (s i,n s i,n−1 ,s i,N φ)ds i,1 ds i,n¯−2 (A.33) ··· T T−1 n T | T T−1 T ··· T n=1 n=1 = p (y s i,n¯−1)p(s i,n¯−1 s i,N φ)W i,N φ n¯−1 T | T T | T−1 T−1

Online Appendix A-18 In turn, M (cid:90) (cid:20) M (cid:90) = 1 (cid:88) F (cid:0) s j,n¯ ,s j,N φ (cid:1) 1 (cid:88) K (s j,n¯ s i,n¯−1 ,s i,N φ) E M T T−1 M n¯ T | T T−1 j=1 i=1 (cid:21) p (y s i,n¯−1)p(s i,n¯−1 s i,N φ)W i,N φds i,n¯−1 ds j,n¯ × n¯ T | T T | T−1 T−1 T T M (cid:20) M (cid:90) = 1 (cid:88) 1 (cid:88) F (cid:0) s j,n¯ ,s j,N φ (cid:1) K (s j,n¯ s i,n¯−1 ,s i,N φ)ds j,n¯ (A.34) M M T T−1 n¯ T | T T−1 T i=1 j=1 (cid:21) p (y s i,n¯−1)p(s i,n¯−1 s i,N φ)W i,N φds i,n¯−1 × n¯ T | T T | T−1 T−1 T M (cid:90) = 1 (cid:88) F (cid:0) s i,n¯ ,s i,N φ (cid:1) p (y s i,n¯)p(s i,n¯ s i,N φ)W i,N φds i,n¯ M T T−1 n¯ T | T T | T−1 T−1 T i=1 1 (cid:88) M (cid:90) (cid:90) (cid:18) (cid:89) N φ (cid:19)(cid:18) N (cid:89) φ −1 (cid:19) = w˜j,n K (s j,n s j,n−1 ,s j,N φ) M ··· T n T | T T−1 j=1 n=n¯+1 n=n¯+1 p (y s j,n¯)p(s j,n¯ s j,N φ)W j,N φds j,n¯+1 ds j,N φ −1 p (y s j,n¯)p(s j,n¯ s j,N φ)ds j,n¯ × n¯ T | T T | T−1 T−1 T ··· T n¯ T | T T | T−1 T M 1 (cid:88) j,N j,N = p(y s φ)W φ. M T | T−1 T−1 j=1 The second equality is obtained by changing the order of two summations. To obtain the third equality we integrate out the s i,n¯−1 terms along the lines of (A.21). Notice that the T value of the integral is identical for all values of the j superscript. Thus, we simply set j = i and drop the average. For the fourth equality, we plug in the definition of F (cid:0) s i,n¯ ,s i,N φ (cid:1) and T T−1 replace the i index with a j index. The last equality follows from calculations similar to those in (A.23). This completes the analysis of Part 1. Part 2 (Resampling in tempering iteration n¯). A similar argument as for Part 1 can be used to extend the result for Part 2. Resampling in multiple tempering iterations. The previous analysis can be extended to the case in which the selection step is executed in multiple tempering iterations n , assuming ∈ N that the set does not itself depend on the particle system. (cid:4) N

Online Appendix A-19 A.3.2 Proof of Main Theorem Proof of Theorem 2. Suppose that for any h such that 0 h T 1 ≤ ≤ − M E(cid:2) pˆ(Y Y ,θ) (cid:3) = 1 (cid:88) p(Y s j,N φ ,θ)W j,N φ , (A.35) T−h:T | 1:T−h−1 |F T−h−1,N φ ,M M T−h:T | T−h−1 T−h−1 j=1 where   (cid:89) T (cid:89) N φ (cid:32) 1 (cid:88) M (cid:33) pˆ(Y T−h:T | Y 1:T−h−1 ,θ) =  M w˜ t j,n W t j,n−1 . t=T−h n=1 j=1 Then, by setting h = T 1, we can deduce that − M E(cid:2) pˆ(Y θ) (cid:3) = 1 (cid:88) p(Y s j,N φ,θ)W j,N φ. (A.36) 1:T | |F 0,N φ ,M M 1:T | 0 0 j=1 Recall that for period t = 0 we adopted the convention that N = 1 and assumed that the φ j,N j,N states were initialized by direct sampling: s φ p(s ) and W φ = 1. Thus, 0 0 0 ∼ (cid:20) (cid:21) E(cid:2) pˆ(Y θ) (cid:3) = E E(cid:2) pˆ(Y θ) (cid:3) (A.37) 1:T 1:T 0,N ,M | | |F φ (cid:20) M (cid:21) 1 (cid:88) = E p(Y s j,N φ,θ)W j,N φ M 1:T | 0 0 j=1 (cid:90) = p(Y s ,θ)p(s )ds 1:T 0 0 0 | = p(Y θ), 1:T | as desired. In the remainder of the proof we use an inductive argument to establish (A.35). If (A.35)

Online Appendix A-20 holds for h, it also has to hold for h+1: E(cid:2) pˆ(Y Y ,θ) (cid:3) T−h−1:T 1:T−h−2 T−h−2,N ,M | |F φ (cid:20) (cid:12) (cid:21) = E E(cid:2) pˆ(Y Y ,θ) (cid:12) (cid:12) (cid:3) pˆ(y Y ,θ) (cid:12) (cid:12) T−h:T 1:T−h−1 T−h−1,N ,M T−h−1 1:T−h−2 T−h−2,N ,M | F φ | (cid:12)F φ M = 1 (cid:88) E(cid:2) p(Y s j,N φ ,θ)W j,N φ pˆ(y Y ,θ) (cid:12) (cid:12) (cid:3) M T−h:T | T−h−1 T−h−1 T−h−1 | 1:T−h−2 F T−h−2,N φ ,M j=1     = M 1 (cid:88) M E p(Y T−h:T | s j T ,N − φ h−1 ,θ)W T j, − N h φ −1  (cid:89) N φ (cid:32) M 1 (cid:88) M w˜ T j,n −h−1 W T j, − n− h− 1 1 (cid:33)  (cid:12) (cid:12) (cid:12) (cid:12)F T−h−2,N φ ,M j=1 n=1 j=1 M 1 (cid:88) j,N j,N = p(Y s φ ,θ)W φ M T−h−1:T | T−h−2 T−h−2 j=1 Note that . Thus, the first equality follows from the law of iter- T−h−2,N ,M T−h−1,N ,M F φ ⊂ F φ ated expectations. The second equality follows from the inductive hypothesis (A.35). The third equality uses the definition of the period-likelihood approximation in (21) of Algorithm 2. The last equality follows from the second part of Lemma 5. We now verify that the inductive hypothesis (A.35) holds for h = 0. Using the definition of pˆ(y Y ,θ), we can write T 1:T−1 |   E(cid:2) pˆ(y T | Y 1:T−1 ,θ) |F T−1,N φ ,M (cid:3) = E  (cid:89) N φ (cid:32) M 1 (cid:88) M w˜ T j,n W T j,n−1 (cid:33) (cid:12) (cid:12) (cid:12) (cid:12)F T−1,N φ ,M (A.38) n=1 j=1 M 1 (cid:88) j,N j,N = p(y s φ)W φ. M T | T−1 T−1 j=1 The second equality follows from the first part of Lemma 5. Thus, we can deduce that (A.35) holds for h = T 1 as required. This completes the proof. (cid:4) − B Computational Details The code for this project is available at http://github.com/eph/tempered_pf. The applications in Section 4 were coded in Fortran and compiled using the Intel Fortran Compiler

Online Appendix A-21 (version: 13.0.0), including the math kernel library. The calculations in Algorithm 1, part 2(a)ii, Algorithm 2, part 1(a)i, and Algorithm 2, part 2(c) were implemented using OpenMP (shared memory) multithreading. C DSGE Models and Data Sources C.1 Small-Scale DSGE Model C.1.1 Equilibrium Conditions We write the equilibrium conditions by expressing each variable in terms of percentage deviations from its steady state value. Let xˆ = ln(x /x) and write t t (cid:104) (cid:105) 1 = βE e−τcˆt+1+τcˆt+Rˆ t−zˆt+1−πˆt+1 (A.39) t (cid:20)(cid:18) (cid:19) (cid:21) 1 1 (cid:0) (cid:1) 0 = eπˆt 1 1 eπˆt + (A.40) − − 2ν 2ν βE (cid:2)(cid:0) eπˆt+1 1 (cid:1) e−τcˆt+1+τcˆt+yˆt+1−yˆt+πˆt+1 (cid:3) t − − 1 ν (cid:0) (cid:1) + − 1 eτcˆt νφπ2 − ecˆt−yˆt = e−gˆt φπ2g (cid:0) eπˆt 1 (cid:1)2 (A.41) − 2 − ˆ ˆ R = ρ R +(1 ρ )ψ πˆ (A.42) t R t−1 R 1 t − +(1 ρ )ψ (yˆ gˆ)+(cid:15) R 2 t t R,t − − gˆ = ρ gˆ +(cid:15) (A.43) t g t−1 g,t zˆ = ρ zˆ +(cid:15) . (A.44) t z t−1 z,t Log-linearization and straightforward manipulation of Equations (A.39) to (A.41) yield the following representation for the consumption Euler equation, the New Keynesian Phillips

Online Appendix A-22 curve, and the monetary policy rule: (cid:18) (cid:19) 1 yˆ = E [yˆ ] R ˆ E [πˆ ] E [zˆ ] (A.45) t t t+1 t t t+1 t t+1 − τ − − +gˆ E [gˆ ] t t t+1 − πˆ = βE [πˆ ]+κ(yˆ gˆ) t t t+1 t t − ˆ ˆ R = ρ R +(1 ρ )ψ πˆ +(1 ρ )ψ (yˆ gˆ)+(cid:15) t R t−1 R 1 t R 2 t t R,t − − − where 1 ν κ = τ − . (A.46) νπ2φ In order to construct a likelihood function, we have to relate the model variables to a set of observables y . We use the following three observables for estimation: quarter-to-quarter t per capita GDP growth rates (YGR), annualized quarter-to-quarter inflation rates (INFL), and annualized nominal interest rates (INT). The three series are measured in percentages and their relationship to the model variables is given by the following set of equations: YGR = γ(Q) +100(yˆ yˆ +zˆ) (A.47) t t t−1 t − INFL = π(A) +400πˆ t t INT = π(A) +r(A) +4γ(Q) +400R ˆ . t t The parameters γ(Q), π(A), and r(A) are related to the steady states of the model economy as follows: γ(Q) 1 π(A) γ = 1+ , β = , π = 1+ . 100 1+r(A)/400 400 Thestructuralparametersarecollectedinthevectorθ. Sinceinthefirst-orderapproximation the parameters ν and φ are not separately identifiable, we express the model in terms of κ, defined in (A.46). Let θ = [τ,κ,ψ ,ψ ,ρ ,ρ ,ρ ,r(A),π(A),γ(Q),σ ,σ ,σ ](cid:48). 1 2 R g z R g z

Online Appendix A-23 C.1.2 Data Sources 1. Per Capita Real Output Growth Take the level of real gross domestic product, (FRED mnemonic “GDPC1”), call it GDP . Take the quarterly average of the Civilian t Non-institutionalPopulation(FREDmnemonic“CNP16OV”/BLSseries“LNS10000000”), call it POP . Then, t Per Capita Real Output Growth (cid:20) (cid:18) (cid:19) (cid:18) (cid:19)(cid:21) GDP GDP t t−1 = 100 ln ln . POP − POP t t−1 2. Annualized Inflation. Take the CPI price level, (FRED mnemonic “CPIAUCSL”), call it CPI . Then, t (cid:18) (cid:19) CPI t Annualized Inflation = 400ln . CPI t−1 3. Federal Funds Rate. Take the effective federal funds rate (FRED mnemonic “FED- FUNDS”), call it FFR . Then, t Federal Funds Rate = FFR . t C.2 The Smets-Wouters Model C.2.1 Equilibrium Conditions The log-linearized equilibrium conditions of the Smets and Wouters (2007) model take the following form:

Online Appendix A-24 yˆ = c cˆ +i ˆ i +z zˆ +ε g (A.48) t y t y t y t t h/γ 1 cˆ = cˆ + E cˆ (A.49) t t−1 t t+1 1+h/γ 1+h/γ wl (σ 1) + c c − ( ˆ l E ˆ l ) t t t+1 σ (1+h/γ) − c 1 h/γ 1 h/γ − (rˆ E πˆ ) − εb −(1+h/γ)σ t − t t+1 − (1+h/γ)σ t c c 1 βγ(1−σc) ˆ i = ˆ i + Eˆ i (A.50) t 1+βγ(1−σc) t−1 1+βγ(1−σc) t t+1 1 + qˆ +εi ϕγ2(1+βγ(1−σc)) t t qˆ = β(1 δ)γ−σc E qˆ rˆ +E πˆ (A.51) t t t+1 t t t+1 − − +(1 β(1 δ)γ−σc)E rˆk εb − − t t+1 − t yˆ = Φ(αk ˆs +(1 α) ˆ l +εa) (A.52) t t − t t k ˆs = k ˆ +zˆ (A.53) t t−1 t 1 ψ zˆ = − rˆk (A.54) t ψ t

Online Appendix A-25 (1 δ) k ˆ = − k ˆ +(1 (1 δ)/γ)ˆ i (A.55) t t−1 t γ − − +(1 (1 δ)/γ)ϕγ2(1+βγ(1−σc))εi − − t µˆp = α(k ˆs ˆ l ) wˆ +εa (A.56) t t − t − t t βγ(1−σc) ι πˆ = E πˆ + p πˆ (A.57) t 1+ι p βγ(1−σc) t t+1 1+βγ(1−σc) t−1 (1 βγ(1−σc)ξ )(1 ξ ) − p − p µˆp +ε p −(1+ι p βγ(1−σc))(1+(Φ 1)ε p )ξ p t t − rˆk = ˆ l +wˆ k ˆs (A.58) t t t − t 1 µˆw = wˆ σ ˆ l (cˆ h/γcˆ ) (A.59) t t − l t − 1 h/γ t − t−1 − βγ(1−σc) wˆ = (E wˆ (A.60) t 1+βγ(1−σc) t t+1 1 +E πˆ )+ (wˆ ι πˆ ) t t+1 1+βγ(1−σc) t−1 − w t−1 1+βγ(1−σc)ι w πˆ − 1+βγ(1−σc) t (1 βγ(1−σc)ξ )(1 ξ ) − w − w µˆw +εw −(1+βγ(1−σc))(1+(λ w 1)(cid:15) w )ξ w t t − rˆ = ρrˆ +(1 ρ)(r πˆ +r (yˆ yˆ∗)) (A.61) t t−1 − π t y t − t +r ((yˆ yˆ∗) (yˆ yˆ∗ ))+εr. ∆y t − t − t−1 − t−1 t The exogenous shocks evolve according to

Online Appendix A-26 εa = ρ εa +ηa (A.62) t a t−1 t εb = ρ εb +ηb (A.63) t b t−1 t ε g = ρ ε g +ρ ηa +η g (A.64) t g t−1 ga t t εi = ρ εi +ηi (A.65) t i t−1 t εr = ρ εr +ηr (A.66) t r t−1 t ε p = ρ ε p +η p µ η p (A.67) t r t−1 t p t−1 − εw = ρ εw +ηw µ ηw . (A.68) t w t−1 t − w t−1

Online Appendix A-27 The counterfactual no-rigidity prices and quantities evolve according to yˆ∗ = c cˆ∗ +i ˆ i∗ +z zˆ∗ +ε g (A.69) t y t y t y t t h/γ 1 cˆ∗ = cˆ∗ + E cˆ∗ (A.70) t 1+h/γ t−1 1+h/γ t t+1 wl (σ 1) + c c − ( ˆ l∗ E ˆ l∗ ) σ (1+h/γ) t − t t+1 c 1 h/γ 1 h/γ − r∗ − εb −(1+h/γ)σ t − (1+h/γ)σ t c c 1 βγ(1−σc) ˆ i∗ = ˆ i∗ + Eˆ i∗ t 1+βγ(1−σc) t−1 1+βγ(1−σc) t t+1 1 + qˆ∗ +εi (A.71) ϕγ2(1+βγ(1−σc)) t t qˆ∗ = β(1 δ)γ−σc E qˆ∗ r∗ (A.72) t − t t+1 − t +(1 β(1 δ)γ−σc)E rk∗ εb − − t t+1 − t yˆ∗ = Φ(αks∗ +(1 α) ˆ l∗ +εa) (A.73) t t − t t k ˆs∗ = k∗ +z∗ (A.74) t t−1 t 1 ψ zˆ∗ = − rˆk∗ (A.75) t ψ t (1 δ) k ˆ∗ = − k ˆ∗ +(1 (1 δ)/γ)ˆ i (A.76) t γ t−1 − − t +(1 (1 δ)/γ)ϕγ2(1+βγ(1−σc))εi − − t wˆ∗ = α(k ˆs∗ ˆ l∗)+εa (A.77) t t − t t rˆk∗ = ˆ l∗ +wˆ∗ k ˆ∗ (A.78) t t t − t 1 wˆ∗ = σ ˆ l∗ + (cˆ∗ +h/γcˆ∗ ). (A.79) t l t 1 h/γ t t−1 −

Online Appendix A-28 The steady state (ratios) that appear in the measurement equation or the log-linearized equilibrium conditions are given by γ = γ¯/100+1 (A.80) π∗ = π¯/100+1 (A.81) r¯ = 100(β−1γσcπ∗ 1) (A.82) − rk = γσc/β (1 δ) (A.83) ss − − (cid:18) αα(1 α)(1−α)(cid:19) 1− 1 α w = − (A.84) ss Φrk α ss i = (1 (1 δ)/γ)γ (A.85) k − − 1 α rk l = − ss (A.86) k α w ss k = Φl (α−1) (A.87) y k i = (γ 1+δ)k (A.88) y y − c = 1 g i (A.89) y y y − − z = rk k (A.90) y ss y 1 1 αrk k wl = − ss y . (A.91) c λ α c w y The measurement equations take the form: YGR = γ¯ +yˆ yˆ (A.92) t t t−1 − INF = π¯ +πˆ t t ˆ FFR = r¯+R t t CGR = γ¯ +cˆ cˆ t t t−1 − IGR = γ¯ +ˆ i ˆ i t t t−1 − WGR = γ¯ +wˆ wˆ t t t−1 − ¯ ˆ HOURS = l+l . t t

Online Appendix A-29 C.2.2 Data The data cover 1966:Q1 to 2004:Q4. The construction follows that of Smets and Wouters (2007). Output data come from the NIPA; other sources are noted in the exposition. 1. Per Capita Real Output Growth. Take the level of real gross domestic product (FRED mnemonic “GDPC1”), call it GDP . Take the quarterly average of the Civilian t Non-institutional Population (FRED mnemonic “CNP16OV” / BLS series “LNS10000000”), normalized so that its 1992Q3 value is 1, call it POP . Then, t Per Capita Real Output Growth (cid:20) (cid:18) (cid:19) (cid:18) (cid:19)(cid:21) GDP GDP t t−1 = 100 ln ln . POP − POP t t−1 2. Per Capita Real Consumption Growth. Take the level of personal consumption expenditures (FRED mnemonic “PCEC”), call it CONS . Take the level of the GDP t price deflator (FRED mnemonic “GDPDEF”), call it GDPP . Then, t Per Capita Real Consumption Growth (cid:20) (cid:18) (cid:19) CONS t = 100 ln GDPP POP t t (cid:18) (cid:19)(cid:21) CONS t−1 ln . − GDPP POP t−1 t−1 3. Per Capita Real Investment Growth. Take the level of fixed private investment (FRED mnemonic “FPI”), call it INV . Then, t Per Capita Real Investment Growth (cid:20) (cid:18) (cid:19) INV t = 100 ln GDPP POP t t (cid:18) (cid:19)(cid:21) INV t−1 ln . − GDPP POP t−1 t−1

Online Appendix 30 4. Per Capita Real Wage Growth. Take the BLS measure of compensation per hour for the nonfarm business sector (FRED mnemonic “COMPNFB” / BLS series “PRS85006103”), call it W . Then, t Per Capita Real Wage Growth (cid:20) (cid:18) (cid:19) (cid:18) (cid:19)(cid:21) W W t t−1 = 100 ln ln . GDPP − GDPP t t−1 5. Per Capita Hours Index. Take the index of average weekly nonfarm business hours (FRED mnemonic / BLS series “PRS85006023”), call it HOURS . Take the number of t employed civilians (FRED mnemonic “CE16OV”), normalized so that its 1992Q3 value is 1, call it EMP . Then, t (cid:18) (cid:19) HOURS EMP t t Per Capita Hours = 100ln . POP t The series is then demeaned. 6. Inflation. Take the GDP price deflator, then (cid:18) (cid:19) GDPP t Inflation = 100ln . GDPP t−1 7. Federal Funds Rate. Take the effective federal funds rate (FRED mnemonic “FED- FUNDS”), call it FFR . Then, t Federal Funds Rate = FFR /4. t

Cite this document

APA

Edward Herbst and Frank Schorfheide (2016). Tempered Particle Filtering (FEDS 2016-072). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2016-072

BibTeX

@techreport{wtfs_feds_2016_072,
  author = {Edward Herbst and Frank Schorfheide},
  title = {Tempered Particle Filtering},
  type = {Finance and Economics Discussion Series},
  number = {2016-072},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2016},
  url = {https://whenthefedspeaks.com/doc/feds_2016-072},
  abstract = {The accuracy of particle filters for nonlinear state-space models crucially depends on the proposal distribution that mutates time t-1 particle values into time t values. In the widely-used bootstrap particle filter this distribution is generated by the state-transition equation. While straightforward to implement, the practical performance is often poor. We develop a self-tuning particle filter in which the proposal distribution is constructed adaptively through a sequence of Monte Carlo steps. Intuitively, we start from a measurement error distribution with an inflated variance, and then gradually reduce the variance to its nominal level in a sequence of steps that we call tempering. We show that the filter generates an unbiased and consistent approximation of the likelihood function. Holding the run time fixed, our filter is substantially more accurate in two DSGE model applications than the bootstrap particle filter.},
}