ifdp · February 24, 2020

Piecewise-Linear Approximations and Filtering for DSGE Models with Occasionally Binding Constraints

Abstract

We develop an algorithm to construct approximate decision rules that are piecewise-linear and continuous for DSGE models with an occasionally binding constraint. The functional form of the decision rules allows us to derive a conditionally optimal particle filter (COPF) for the evaluation of the likelihood function that exploits the structure of the solution. We document the accuracy of the likelihood approximation and embed it into a particle Markov chain Monte Carlo algorithm to conduct Bayesian estimation. Compared with a standard bootstrap particle filter, the COPF significantly reduces the persistence of the Markov chain, improves the accuracy of Monte Carlo approximations of posterior moments, and drastically speeds up computations. We use the techniques to estimate a small-scale DSGE model to assess the effects of the government spending portion of the American Recovery and Reinvestment Act in 2009 when interest rates reached the zero lower bound.

K.7 Piecewise-Linear Approximations and Filtering for DSGE Models with Occasionally Binding Constraints Aruoba, Borağan S., Pablo Cuba-Borda, Kenji Higa-Flores, Frank Schorfheide, and Sergio Villalvazo Please cite paper as: Aruoba, Borağan S., Pablo Cuba-Borda, Kenji Higa-Flores, Frank Schorfheide, and Sergio Villalvazo (2020). Piecewise- Linear Approximations and Filtering for DSGE Models with Occasionally Binding Constraints. International Finance Discussion Papers 1272. https://doi.org/10.17016/IFDP.2020.1272 International Finance Discussion Papers Board of Governors of the Federal Reserve System Number 1272 February 2020

Board of Governors of the Federal Reserve System International Finance Discussion Papers Number 1272 February 2020 Piecewise-Linear Approximations and Filtering for DSGE Models with Occasionally Binding Constraints S. Borağan Aruoba, Pablo Cuba-Borda, Kenji Higa-Flores, Frank Schorfheide, and Sergio Villalvazo NOTE: International Finance Discussion Papers (IFDPs) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the International Finance Discussion Papers Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers. Recent IFDPs are available on the Web at www.federalreserve.gov/pubs/ifdp/. This paper can be downloaded without charge from the Social Science Research Network electronic library at www.ssrn.com.

Piecewise-Linear Approximations and Filtering for DSGE Models with Occasionally Binding Constraints ∗ S. Bora˘gan Aruoba Pablo Cuba-Borda Kenji Higa-Flores University of Maryland Federal Reserve Board University of Maryland Frank Schorfheide Sergio Villalvazo University of Pennsylvania University of Pennsylvania CEPR, NBER, PIER Current Version: February 10, 2020 Abstract Wedevelopanalgorithmtoconstructapproximatedecisionrulesthatarepiecewiselinear and continuous for DSGE models with an occasionally binding constraint. The functional form of the decision rules allows us to derive a conditionally optimal particle filter (COPF) for the evaluation of the likelihood function that exploits the structure of the solution. We document the accuracy of the likelihood approximation and embed it into a particle Markov chain Monte Carlo algorithm to conduct Bayesian estimation. Comparedwithastandardbootstrapparticlefilter,theCOPFsignificantlyreducesthe persistenceoftheMarkovchain, improvestheaccuracyofMonteCarloapproximations of posterior moments, and drastically speeds up computations. We use the techniques to estimate a small-scale DSGE model to assess the effects of the government spending portion of the American Recovery and Reinvestment Act in 2009 when interest rates reached the zero lower bound. JEL CLASSIFICATION: C5, E4, E5 KEY WORDS: Bayesian Estimation, Nonlinear Filtering, Nonlinear Solution Methods, Particle MCMC, ZLB ∗ Correspondence: B. Aruoba (aruoba@umd.edu), K. Higa-Flores (kenjihf@umd.edu): Department of Economics, University of Maryland, College Park, MD 20742. P. Cuba-Borda (pablo.a.cubaborda@frb.gov): DivisionofInternationalFinance,FederalReserveBoard,20thStreet&ConstitutionAve.,NW,Washington, DC 20551. F. Schorfheide (schorf@ssc.upenn.edu), S. Villalvazo (vsergio@sas.upenn.edu): Department of Economics, University of Pennsylvania, 133 S. 36th Street, Philadelphia, PA 19104. We are thankful for helpful comments and suggestions from participants of the 2018 and 2019 MFM conferences, the 2019 conferenceoftheSocietyforNonlinearDynamics, andtheAlejandroJustinianoMemorialconference. Higa- FloresandVillalvazogratefullyacknowledgefinancialsupportfromtheBecker-FriedmanInstituteunderthe Macro-FinancialModelingInitiative. AruobaandSchorfheidegratefullyacknowledgefinancialsupportfrom theNationalScienceFoundationunderGrantSES1851634. Theviewsexpressedinthispaperaresolelythe responsibility of the authors and should not be interpreted as reflecting the views of the Board of Governors of the Federal Reserve System or of anyone else associated with the Federal Reserve System.

This Version: February 10, 2020 1 1 Introduction Dynamic stochastic general equilibrium (DSGE) models with financial frictions are widely used in central banks, by regulators, and in academia to study the effects of monetary and macroprudential policies and the propagation of shocks in the macro economy. The most recent vintage of these models involves occasionally binding constraints arising from financial frictions and the zero lower bound (ZLB) on nominal interest rates. In order for these models to be useable for a quantitative analysis, they need to be solved numerically, and their parameters need to be estimated based on historical data. Two types of solution approaches for models with occasionally binding constraints have been used in the literature. The first group of solution algorithms can be broadly classified as global methods. Agents’ decision rules (or value functions associated with optimization problems) are represented by a family of flexible functions—for example, Chebyshev polynomials—or by a discrete mapping on a finite state-space domain. The flexible functions are parameterized by coefficients that are chosen such that the resulting decision rules (approximately)satisfythemodel’sequilibriumconditionsandsolvetheunderlyingintertemporal optimization problems. The second type of solution approaches are variants of the extended perfect-foresight path (EPFP) method that build on Fair and Taylor (1983). These algorithms rely on the assumption that, after H periods, the system reverts back to the steady state in which the constraint, say, is non-binding. With an initial guess about whether the constraint is binding in periods t+h, h = 1,...,H, it is possible to solve the dynamic system for the values of the endogenous variables. We can then compares the initial guess about the duration of the binding regime to the backward solution and iterates until consistency is achieved. Because the computations are based on the initial state, the previously described steps need to be repeated for every t in a multi-period simulation. From the model solution, we can construct a state-space representation for an estimable empirical model. The solution itself generates the state transition equations. A set of measurement equations can then be specified that links the state variables with the observables. Because the model solution is nonlinear, so is the state-space representation. Thus, a nonlinear filter is required to compute the likelihood function. For instance, in the context of DSGE models with a ZLB contraint, Gust et al. (2017) and Aruoba et al. (2018) use a particle filter in combination with a global solution to construct likelihood functions. Guerrieri and Iacoviello (2017) use an EPFP solution for a model in which the number of observables

This Version: February 10, 2020 2 equals the number of structural shocks and combine it with an inversion filter that essentially solves for the innovations as a function of the observables conditional on an initial state. The contribution of our paper is to construct an alternative model solution that (i) is able to capture an important aspect of the decision rule nonlinearity generated by an occasionally bindingconstraint, (ii)canbesolvedquickly, and(iii)allowsustoderiveanaccurateandfast filter for the evaluation of the likelihood function that exploits the structure of the solution. This likelihood approximation can then be embedded into an estimation procedure. One of our goals is to make the procedure efficient enough that it can be run on a desktop computer in a reasonable amount of time. The small-scale New Keynesian model in our empirical application is estimated using U.S. data from 1984 to 2018 in about 13.5 hours on a single core. The basic idea of the proposed solution method is to approximate agents’ decision rules globally by piecewise-linear functions that are continuous but have a kink in the region of the state space in which a constraint becomes binding. The decision rules account for the fact that, in the next period, the constraint could either be binding or non-binding and thus they capture precautionary behavior. Moreover, the decision rules only have to be computed once (as opposed to for each period t separately as in the EPFP approach.) Aruoba et al. (2018) used more flexible approximations for the decision rules, stitching together higher-order Chebyshev polynomials along the locus in the state-space where the ZLB constraint becomes binding. However, the decision rules on both sides of the kink remained approximately linear. On the one hand, our solution generalizes the widely used log-linearization techniques, which generate log-linear decision rules. On the other hand, compared with higher-order Chebyshev polynomials, the piecewise-linearity and continuity at the kink drastically reduce the number of coefficients that need to be determined and hence simplify computations. The resulting solution can be cast into a two-regime vector autoregressive representation (VAR) with an endogenous regime switching that is triggered when structural shock innovations cross a particular threshold. To solve the nonlinear filtering problem, we derive a conditionallyoptimalproposaldistributionforaparticlefilter. Aparticlefilterapproximates the distribution of a vector of hidden states s conditional on the sequence of observations t Y available in time t by a swarm of M particle values and weights {sj,Wj}M . The filter 1:t t t j=1 operates recursively and turns the swarm {sj ,Wj }M into a swarm {sj,Wj}M , which t−1 t−1 j=1 t t j=1 requires a change in the particle values (mutation) and a change in the particle weights (correction). It has been shown, that conditionally on {sj ,Wj }M it is optimal to sample t−1 t−1 j=1

This Version: February 10, 2020 3 sj ∼ p(s |y ,sj )—see Herbst and Schorfheide (2015) for a more detailed exposition. If t t t t−1 draws from this distribution can be obtained by direct sampling, the filter becomes very efficient. We show that for state transitions that are piecewise-linear, the conditionally optimal proposal is a mixture of truncated normal distributions that allows for direct sampling. In a sequence of numerical illustrations based on a small-scale New Keynesian DSGE model with a ZLB constraint, we document important properties of our solution algorithm and the conditionally optimal particle filter (COPF) likelihood approximation. We show that, compared to a naive bootstrap particle filter (BSPF), which samples sj from the statet transition density p(s |sj ), our COPF drastically reduces the variance of the likelihood t t−1 approximation holding the runtime fixed. In practice, this allows us to run the COPF with far fewer particles than the BSPF (150 in our empirical application), which in turn speeds up the computations. When we embed the more accurate COPF into a random walk Metropolis–Hastings (RWMH) algorithm, we are able to significantly reduce the persistence of the resulting Markov chain and therefore improve the accuracy of Monte Carlo approximations of moments of the posterior distribution. Using U.S. data from 1984 to 2018 on output growth, inflation, interest rates, and the government-spending to GDP ratio, we estimate the small-scale DSGE model using our proposed piecewise-linear and continuous (PLC) solution in combination with the COPF. From the estimated model, we compute dollar-for-dollar government spending multipliers associated with the increase in government spending that was part of the 2009 American Recovery and Reinvestment Act (ARRA). The counterfactual output levels are computed by lowering the exogenous government spending process in the model by an amount that is commensurable to the ARRA intervention and keeping all other exogenous processes at their historical levels. We find that the ex post ARRA multiplier, computed by comparing realized output and government spending with counterfactual levels, is approximately 0.8. Our paper is related to several strands of the literature. Global solutions for models with occasionally binding constraints are computed, for instance, in Christiano and Fisher (2000), Ferna´ndez-Villaverde et al. (2015), Maliar and Maliar (2015), Gust et al. (2017), and Aruoba et al. (2018). The decision rules can also be obtained iteratively through value function or policy function iteration. Examples of the latter methodology include Adam and Billi (2007), Nakata (2016), and Atkinson et al. (2020). Mendoza and Villalvazo (2019) propose a fixed-point iteration algorithm (FiPIt) that iterates on intertemporal equilibrium conditions, which is faster than traditional global solution methods. Under all of these

This Version: February 10, 2020 4 approaches, the decision rules only have to be computed once conditional on a vector of model parameters and can then easily be used for multi-step model simulation. EPFP solutions have been developed by Eggertsson and Woodford (2003), Christiano et al. (2015), and Guerrieri and Iacoviello (2015), who provide a software package called OccBin. The algorithms are typically designed for models in which the equilibrium conditions, withtheexceptionoftheoccasionallybindingconstraint, arelinearized.1 Kulishetal.(2017) focus on the ZLB application and treat the duration of the fixed-interest-rate regime as a parameter to be estimated. Holden (2019) implements the EPFP method using endogenous news shocks and provides a software called DynareOBC. Boehl (2019) transforms the linearized equilibrium conditions into an extended reduced-form system that depends only on the initial states and the expected number of periods at the constraint and allows for a more efficient computation of the solution. In approaches that utilize EPFP, a multi-step simulation requires running the shooting algorithm for every period, making the implementation of a standard particle filter difficult and time consuming. de Groot et al. (2019) compare the performance of local approximations and EPFP methods with the global solution of open-economy models with incomplete markets and occasionallybindingborrowingconstraints. Theirresultsshowthatsimulatedmomentsfrom local and EPFP methods can differ substantially from those obtained using global solutions and advise in favor of solving this class of models using the original nonlinear equilibrium conditions. Our methodology follows a similar strategy, with the approximated piecewiselinear decision rules pinned down by the nonlinear system of the equilibrium conditions. Our solution takes the form of an endogenous regime-switching VAR. Chen (2017) imposes the regime-switching structure in the model specification. In her setup, an exogenous two-state Markov process determines two policy regimes: a normal regime in which the interest rate is positive and follow a Taylor-type monetary policy rule, and a regime in which the interest rate is constrained by the ZLB. Because the regimes are exogenous, the model can be solved using the tools proposed by Farmer et al. (2011). Similarly, Bianchi and Melosi (2017) use an exogenous regime-switching process to characterize the ZLB dynamics during the Great Recession. Benigno et al. (2016) endogenize the regime-switching probability in a model of financial crisis, but, unlike in our paper, the transition from one to the other regime remains partly decoupled from the realization of the fundamental shocks. 1The resulting solution is often called piecewise-linear which is a bit misleading. It is the equilibrium conditions that are piecewise-linear. In contrast, our approach generates piecewise-linear decision rules.

This Version: February 10, 2020 5 A key feature of our paper is that it integrates model solution, likelihood approximation, and Bayesian estimation. There are a few papers that assess the interplay of existing model solution and likelihood evaluation techniques in Monte Carlo experiments. Cuba-Borda et al. (2019) take a simple consumption-savings model subject to a borrowing constraint. They illustrate that less accurate solution methods affect inference even when the inversion filter is available. They also show that, as one increases the measurement error variance in the BSPF, the likelihood misspecification becomes more problematic, making it harder to retrieve the parameter values that govern the DGP. In their setting, measurement error and solution approximation error make it difficult for the econometrician to identify the model regime that generates the data, and this incorrect classification of regimes leads to a bias in parameter estimates. In our empirical application, one of the observed time series allows us to exactly identify the regime, and we modify the COPF to capture this feature. Atkinsonetal.(2020)comparetheperformanceofafullynonlinearsolutionandavariant of the BSPF for estimation, with the approximated solution using OccBin and the inversion filter. They simulate data from a DSGE model that includes more frictions and shocks than the model used for estimation, and the latter is close to the model we use in this paper in terms of size. As such, their estimated model is misspecified with respect to the data generating process. Their results show that the nonlinear approach performs slightly better than the OccBin approach, but the differences are small. Moreover, relative to the pseudotrue parameters, the estimates from both approaches show biases in some key parameters, such as the degree of price rigidities. Since the OccBin-Inversion Filter approach can be scaled up easily and is faster, they argue that building a bigger and less misspecified model using this approach may be preferable. Similarly, our method offers scalability even without multicore processing or distributed computing. Boehl (2019) combines his model solution with a variant of an ensemble Kalman filter. His paper does not focus on accuracy comparisons of solution and estimation methods. Instead, it applies the proposed techniques to estimate a small-scale New Keynesian model on U.S. data. Similarly, our paper uses U.S. data to estimate the parameters of a small-scale New Keynesian model and quantifies the role of fiscal policy in the aftermath of the Great Recession. There is a large literature that studies fiscal spending multipliers using methodologies ranging from using DSGE models, structural vector autoregressions (SVARs), to more traditionalmacroeconometricmodels—seeBatinietal.(2014)forasurveyandreferences. Results from DSGE models tend to be quite sensitive to the size of the fiscal shock and modeling

This Version: February 10, 2020 6 assumptions with respect to monetary and fiscal policy rules and labor supply (Hills and Nakata (2018)). A fairly robust finding across most studies using DSGE models is that multipliers are larger at the ZLB than away from the ZLB. See, for example, Eggertsson (2011) and Christiano and Eichenbaum (2012). Using a fairly stylized New Keynesian model with passive fiscal policy and preferences that are additively separable in terms of consumption and leisure, we find that the ex post multiplier during the Great Recession, when the United States was at the ZLB, was larger than in normal times when interest rates were positive. One novel result we show through counterfactuals is that, in 2009 and 2010, there was very little room for the Federal Reserve to stimulate the economy with conventional monetary policy over and above what the policy rule implied, because adverse shocks kept the desired interest rate near zero despite the large expansionary fiscal policy due to ARRA. The remainder of the paper is organized as follows. Section 2 describes the small-scale New Keynesian DSGE model with ZLB constraint used in the subsequent analysis. In Section 3, we describe how to impose continuity on piecewise-linear decision rules and derive a canonical form for the DSGE model solution. Section 4 discusses how the decision rule coefficients are determined to approximately satisfy the model’s equilibrium conditions. The COPF is derived in Section 5. Section 6 presents some numerical experiments to document the accuracy of the likelihood approximation through the COPF, and Section 7 contains the empirical analysis. Finally, Section 8 concludes. Derivations and further implementation details are provided in the Online Appendix. 2 A Prototypical New Keynesian DSGE Model We will illustrate our solution and filtering methods based on a prototypical New Keynesian DSGE model. The model is identical to the one used in Aruoba et al. (2018). Variants of this model have been widely studied in the literature, and its properties are discussed in detail in Woodford (2003). The model economy consists of perfectly competitive final-goodsproducing firms, a continuum of monopolistically competitive intermediate goods producers, a continuum of identical households, and a government that engages in monetary and fiscal policy. To make this paper self-contained and introduce some important notation, we describe the preferences and technologies of the agents in Section 2.1 and summarize the equilibrium conditions in Section 2.2.

This Version: February 10, 2020 7 2.1 Preferences and Technologies Households. Households derive utility from consumption C relative to an exogenous habit t stock and disutility from hours worked H . We assume that the habit stock is given by the t leveloftechnologyA , whichensuresthattheeconomyevolvesalongabalancedgrowthpath. t We also assume that the households value transaction services from real money balances, detrended by A , and include them in the utility function. The households maximize t (cid:34) (cid:32) (cid:33)(cid:35) (cid:88) ∞ (C /A )1−τ −1 H1+1/η (cid:18) M (cid:19) E βsd t+s t+s −χ t+s +χ V t+s , (1) t t+s H M 1−τ 1+1/η P A t+s t+s s=0 subject to the budget constraint P C +T +M +B = P W H +M +R B +P D +P SC . t t t t t t t t t−1 t−1 t−1 t t t t Here β is the discount factor, d is a shock to the discount factor, 1/τ is the intertemporal t elasticity of substitution, η is the Frisch labor supply elasticity, and P is the price of the final t good. The shock d captures frictions that affect intertemporal preferences in a reduced-form t way. Fluctuations in d affect households patience and their desire to postpone consumption. t As is commonly exploited in the literature, a sufficiently large negative shock to d makes the t central bank cut interest rates all the way to the ZLB. The households supply labor services to the firms in a perfectly competitive labor market, taking the real wage W as given. At t the end of period t, households hold money in the amount of M . They have access to a t bond market where nominal government bonds B that pay gross interest R are traded. t t Furthermore, the households receive profits D from the firms and pay lump-sum taxes T . t t SC is the net cash inflow from trading a full set of state-contingent securities. t Firms. The final-goods producers aggregate intermediate goods, indexed by j ∈ [0,1], using the technology (cid:18)(cid:90) 1 (cid:19) 1− 1 ν Y = Y (j)1−νdj . t t 0 The firms take input prices P (j) and output prices P as given. Profit maximization implies t t that the demand for inputs is given by (cid:18) P (j) (cid:19)−1/ν t Y (j) = Y . t t P t Undertheassumptionoffreeentryintothefinal-goodsmarket,profitsarezeroinequilibrium,

This Version: February 10, 2020 8 and the price of the aggregate good is given by (cid:18)(cid:90) 1 (cid:19) ν− ν 1 ν−1 P = P (j) dj . (2) t t ν 0 We define inflation as π = P /P . t t t−1 Intermediate good j is produced by a monopolist who has access to the production technology Y (j) = A H (j), (3) t t t where A is an exogenous productivity process that is common to all firms and H (j) is the t t firm-specific labor input. Intermediate-goods-producing firms face quadratic price adjustment costs of the form φ (cid:18) P (j) (cid:19)2 t AC (j) = −π¯ Y (j), t t 2 P (j) t−1 where φ governs the price stickiness in the economy and π¯ is a baseline rate of price change that does not require the payment of any adjustment costs. In our quantitative analysis, we set π¯ = π , where π is the target inflation rate of the central bank. Firm j chooses its labor ∗ ∗ input H (j) and the price P (j) to maximize the present value of future profits t t (cid:34) (cid:35) ∞ (cid:18) (cid:19) (cid:88) P (j) E βsQ t+s Y (j)−W H (j)−AC (j) . (4) t t+s|t t+s t+s t+s t+s P t+s s=0 Here, Q is the time t value to the household of a unit of the consumption good in period t+s|t t+s, which is treated as exogenous by the firm. Government Policies. Monetary policy is described by an interest rate feedback rule. Because the ZLB constraint is an important part of our analysis we introduce it explicitly as follows: R = max{1, R∗eσR(cid:15)R,t}. (5) t t Here R∗ is the systematic part of monetary policy which reacts to the current state of the t economy (cid:34) (cid:18) π (cid:19)ψ1 (cid:18) Y (cid:19)ψ2 (cid:35)1−ρR R∗ = rπ t t RρR , t ∗ π γY t−1 ∗ t−1 where r is the steady-state real interest rate and π is the target-inflation rate. (cid:15) is a ∗ R,t monetary policy shock. Provided that the ZLB is not binding, the central bank reacts to deviations of inflation from the target rate π and deviations of output growth from its ∗

This Version: February 10, 2020 9 long-run value γ. The government consumes a stochastic fraction of aggregate output. We assume that government spending evolves according to (cid:18) (cid:19) 1 G = 1− Y (6) t t g t where g is an exogenous process. The government levies a lump-sum tax T (or provides a t t subsidy if T is negative) to finance any shortfalls in government revenues (or to rebate any t surplus). Its budget constraint is given by P G +M +R B = T +M +B . (7) t t t−1 t−1 t−1 t t t Exogenous shocks. The model economy is perturbed by four exogenous processes. Aggregate productivity evolves according to logA = logγ +logA +logz , where logz = ρ logz +σ (cid:15) . (8) t t−1 t t z t−1 z z,t Thus, on average, the economy grows at the rate γ, and z generates exogenous stationary t fluctuations of the technology growth rate around this long-run trend. We assume that the government spending shock follows the AR(1) law of motion logg = (1−ρ )logg +ρ logg +σ (cid:15) . (9) t g ∗ g t−1 g g,t The shock to the discount factor evolves according to logd = ρ logd +σ (cid:15) (10) t d t−1 d d,t The monetary policy shock (cid:15) is assumed to be serially uncorrelated. We stack the four R,t innovations into the vector (cid:15) = [(cid:15) ,(cid:15) ,(cid:15) ,(cid:15) ](cid:48) and assume that (cid:15) ∼ iidN(0,I). t z,t g,t d,t R,t t 2.2 Equilibrium Conditions Because the exogenous productivity process has a stochastic trend, it is convenient to characterize the equilibrium conditions of the model economy in terms of detrended consumption c ≡ C /A and detrended output y ≡ Y /A . t t t t t t

This Version: February 10, 2020 10 It is well known that the New Keynesian model features multiple equilibria; see Aruoba and Schorfheide (2016). In the remainder of the paper we focus on what the literature has called the targed-inflation equilibrium. This is essentially the equilibrium that arises in linearizedDSGEmodels, adjustedforthepresenceoftheZLBconstraint. Thecorresponding steady state is given by (cid:18) (cid:19) 1 γ 1−ν τ+1/η y π , r = , R = r π , y = gτ , c = ∗ . ∗ ∗ β ∗ ∗ ∗ ∗ χ ∗ ∗ g H ∗ Without loss of generality, for any variable x we can define the percentage deviations from t the steady state as xˆ = lnx −lnx . Using this notation we can substitute x by x exˆt.2 Our t t ∗ t ∗ goal is to write the equilibrium conditions as a system of expectational difference equations of the form E (cid:2) R(yˆ,cˆ,πˆ ,R ˆ ,yˆ ,cˆ ,πˆ ,R ˆ ,...) (cid:3) = 0, (11) t t t t t t+1 t+1 t+1 t+1 where R(·) captures residuals in the equilibrium conditions. The residual function comprises of the following elements. The consumption Euler equation leads to ˆ ˆ ˆ R (·) = d −d −τ(cˆ −cˆ)+R −πˆ −zˆ . (12) c t+1 t t+1 t t t+1 t+1 In a symmetric equilibrium, in which all firms set the same price P (j), the price-setting t decision of the firms leads to (cid:20) (cid:18) (cid:18) (cid:19) (cid:19) (cid:20)(cid:18) (cid:19) (cid:21) 1 1 1 1 1 (cid:0) (cid:1) R (·) = ln + 1− eτcˆt+yˆt/η −φπ2 eπˆt −1 1− eπˆt + (13) π ν ν ν ∗ 2ν 2ν (cid:21) (cid:16) (cid:17) +φβπ2 edˆ t+1−dˆ t (cid:0) e−τ(cˆt+1−cˆt) (cid:1)(cid:0) eyˆt+1−yˆt (cid:1)(cid:0) eπˆt+1 −1 (cid:1) eπˆt+1 . ∗ The aggregate resource constraint leads to (cid:20) (cid:21) 1 φ R (·) = yˆ −cˆ +ln − g (cid:0) π eπˆt −π¯ (cid:1)2 . (14) y t t egˆt 2 ∗ ∗ It reflects both government spending as well as the resource cost (in terms of output) caused by price changes. The monetary policy rule generates the residual function ˆ (cid:8) (cid:2) (cid:3) ˆ (cid:9) R (·) = R −max (1−ρ ) ψ πˆ +ψ (yˆ −yˆ +zˆ) +ρ R +σ (cid:15) , −log(rπ ) . (15) R t R 1 t 2 t t−1 t R t−1 R R,t ∗ 2Introducingxˆ doesnotimplythatwearelog-linearizingalloftheequilibriumconditions. Itisforemost t areparameterization. However,inourmodelithappenstobethecasethattheconsumptionEulerequation and (abstracting from the max operator) the monetary policy rule are log-linear.

This Version: February 10, 2020 11 We do not use a measure of money in the subsequent analysis and therefore drop the equilibrium condition that determines money demand. We stack the residual functions for the exogenous shocks as follows:   zˆ −ρ zˆ −σ (cid:15) t z t−1 z z,t  ˆ ˆ  d −ρ d −σ (cid:15)  t d t−1 d d,t  R (·) =  . (16) exo  gˆ −ρ gˆ −σ (cid:15)   t g t−1 g g,t  e −σ (cid:15) R,t R R,t 3 PLC Decision Rules and the Canonical Form Throughout this paper, we consider solutions for models with a single occasionally binding constraint. In this section, we show how we construct our piecewise-linear decision rules and how we impose continuity at the kink, where the constraint changes from being slack to being binding. In Sections 3.1 and 3.2, we do so using a generic notation that can be used with any model with an occasionally binding constraint. We provide an illustration in the context of the New Keynesian DSGE model in Section 3.3. Finally, in Section 3.4 we construct what we call the canonical form of the model, which will make the subsequent discussion about filtering and estimation easier. 3.1 PLC Decision Rules Let X = [x ,X(cid:48)] ∈ X be an n×1 vector of state variables. Here x is one particular element 1 2 1 of X and we assume that the vector of the remaining state variables, X , also contains a 2 constant. Let y denote a k × 1 vector of control variables. We assume that there is a (cid:0) (cid:1) linear(ized) function h x ,X ,y that determines whether the constraint is binding: 1 2 (cid:40) > 0 if constraint is non-binding (n) (cid:0) (cid:1) h x ,X ,y = . (17) 1 2 ≤ 0 if constraint is binding (b) The h(.) function may depend on the state variables (x ,X ) and some of the elements in y. 1 2 Because the function is assumed to be linear, we write it as h(x ,X ,y) = γ x +γ(cid:48)X +γ(cid:48)y. (18) 1 2 1 1 2 2 y

This Version: February 10, 2020 12 The γ’s are not free coefficients. They are obtained from the equilibrium conditions of the DSGE model. We define the kink function x = (cid:96)(X ) such that [(cid:96)(X ),X(cid:48)](cid:48) ∈ X 1 2 2 2 (cid:0) (cid:1) characterizesthelocusofpointsinthestatespaceforwhichh (cid:96)(X ),X ,y((cid:96)(X ),X ) = 0— 2 2 2 2 that is, the constraint is just binding. The linearity of h(·) in (18) and the assumed piecewise-linearity of the decision rules for y imply that (cid:96)(X ) is a linear function and we parameterize it as 2 (cid:96)(X ) = δ(cid:48)X . (19) 2 2 Here δ is a (n − 1)-dimensional vector. The δ coefficients are currently free and will be determined as functions of the decision rule coefficients and the coefficients of the constraint function h(.) as we explain below. Note that so far we have not yet made a determination whether the constraint is slack if x < δ(cid:48)X . 1 2 Returning to the control variables, we assume the decision rules for each yi are of the piecewise-linear form (cid:40) αi x +αi (cid:48) X if x ≥ (cid:96)(X ) yi(x ,X ) = 1,1 1 1,2 2 1 2 i = 1,...,k , (20) 1 2 αi x +αi (cid:48) X if x < (cid:96)(X ) 2,1 1 2,2 2 1 2 where each decision rule has 2n unknown coefficients. The decision rules are exactly linear if αi = αi and αi = αi . The specification in (20) makes the benefit of using the kink 1,1 2,1 1,2 2,2 function (cid:96)(.) clear: given the state variables x and X , we can easily determine which side 1 2 of the constraint we need to be, even when the constraint contains some control variables. 3.2 Imposing Continuity on Piecewise-Linear Decision Rules We now turn to imposing continuity on the decision rules at the kink, which means we impose the restriction that the two parts of each decision rule are equal to each other along the kink. Doing so will restrict a subset of the unknown α and δ coefficients. More specifically, continuity at x = δ(cid:48)X requires that for each i = 1,...,k 1 2 αi δ(cid:48)X +αi (cid:48) X = αi δ(cid:48)X +αi (cid:48) X ∀X , 1,1 2 1,2 2 2,1 2 2,2 2 2

This Version: February 10, 2020 13 which generates (n−1) restrictions for each i: αi δ(cid:48) +αi (cid:48) = αi δ(cid:48) +αi (cid:48) . (21) 1,1 1,2 2,1 2,2 Next, we impose restrictions that make the (cid:96)(.) and y(.) functions consistent with the (cid:0) (cid:1) constraint in (17). The condition h[g(X ),X ,y (cid:96)(X ),X ] = 0, which represents the kink 2 2 2 2 in terms of the h(.) function, can be written as k γ δ(cid:48)X +γ(cid:48)X + (cid:88) γi(αi δ(cid:48)X +αi (cid:48) X ) = 0 ∀X , 1 2 2 2 y 1,1 2 1,2 2 2 i=1 which leads to another set of (n−1) restrictions: k γ δ(cid:48) +γ(cid:48) + (cid:88) γi(αi δ(cid:48) +αi (cid:48) ) = 0. (22) 1 2 y 1,1 1,2 i=1 Counting all unknowns and restrictions, we have k(n+1) degrees of freedom.3 Let us assume thecoefficientsαi , αi , andαi foreachdecisionrulearefreeandcollecttheminthevector 1,1 1,2 2,1 ϑ of size k(n+1) ϑ = [α1 ,...,αk ,α1 (cid:48) ,...,αk (cid:48) ,α1 ,...,αk ](cid:48). (23) 1,1 1,1 1,2 1,2 2,1 2,1 In other words, we treat all the decision rule coefficients for the “1” regime and the coefficient in front of x in the “2” regime as free. The remaining decision rule coefficients in the “2” 1 regime, αi , i = 1,...,k, as well as all of the δ coefficients are determined as functions of 2,2 these free coefficients, which we now turn to. Conditional on ϑ, we can rewrite (22) as (cid:32) (cid:33) (cid:32) (cid:33) k k (cid:88) (cid:88) γ + γiαi δ(cid:48)(ϑ)+ γ + γiαi = 0 1 y 1,1 2 y 1,2 i=1 i=1 (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) a(ϑ) −B(cid:48)(ϑ) and solve for δ as 1 δ(cid:48)(ϑ) = B(cid:48)(ϑ), (24) a(ϑ) 3There are 2n α coefficients for each decision rule and (n−1) δ coefficients, which yield 2nk+n−1 unknowns. With (n−1) restrictions for each decision rule as derived in (21) and the (n−1) restrictions in (22), we get (k + 1)(n − 1) restrictions. Subtracting the number of restrictions from the number of unknowns, we get k(n+1).

This Version: February 10, 2020 14 where a(ϑ) is a scalar and b(ϑ) is a (n−1)-dimensional vector. By combining (24) with (21) we obtain an expression for the constrained decision rule coefficients αi : 2,2 (cid:18) (cid:19) 1 αi (cid:48) (ϑ) = (αi −αi ) B(ϑ) +αi (cid:48) . (25) 2,2 1,1 2,1 a(ϑ) 1,2 The last step is to determine which part of the decision rule in (20) corresponds to the part of the state space where the constraint is slack and which part where the constraint is (cid:0) (cid:1) binding. Take h x ,X ,y(x ,X ) for some x and X . Let us derive how its sign depends 1 2 1 2 1 2 on the sign of (x −δ(ϑ)(cid:48)X ). First, assume x > δ(cid:48)(ϑ)X , then 1 2 1 2 k h (cid:2) x ,X ,y(x ,X ) (cid:3) = γ x +γ(cid:48)X + (cid:88) γi(αi x +αi (cid:48) X ) (26) 1 2 1 2 1 1 2 2 y 1,1 1 1,2 2 i=1 (cid:32) (cid:33) (cid:32) (cid:33) k k = γ + (cid:88) γiαi x + γ(cid:48) + (cid:88) γiαi (cid:48) X , 1 y 1,1 1 2 y 1,2 2 i=1 i=1 (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) c(ϑ) D(cid:48)(ϑ) where c(ϑ) is a scalar and D(ϑ)(cid:48) is a (n − 1)-dimensional vector, which can be evaluated given model parameters and free decision-rule coefficients. We can rewrite (22) as (cid:34) (cid:35) (cid:34) (cid:35) k k γ + (cid:88) γiαi δ(cid:48) + γ(cid:48) + (cid:88) γiαi (cid:48) = 0, (27) 1 y 1,1 2 y 1,2 i=1 i=1 or simply c(ϑ)δ(cid:48)(ϑ)+D(cid:48)(ϑ) = 0, which implies D(cid:48)(ϑ) = −c(ϑ)δ(cid:48)(ϑ). Using this on (26), we get h[x ,X ,y(x ,X )] = c(ϑ)[x −δ(cid:48)(ϑ)X ]. (28) 1 2 1 2 1 2 Since we assumed x > δ(cid:48)(ϑ)X above in the derivations, we conclude that h(.) > 0 if and 1 2 only if c(ϑ) > 0. In other words, if c(ϑ) > 0, then the “1” regime in (20) corresponds to the constraint being slack. 3.3 Example: New Keynesian Model In this section we adapt the generic notation in the previous sections to the New Keynesian ˆ model with the ZLB outlined in Section 2. We partition the state space as x = R 1,t t−1

This Version: February 10, 2020 15 and X = [1,yˆ ,zˆ,d ˆ ,gˆ,e ](cid:48), and we have n = 7.4 As for the choice of control vari- 2,t t−1 t t t R,t ables to approximate, we have a few options. We approximate the decision rules πˆ(x ,X ) 1 2 ˆ and yˆ(x ,X ) directly and let the remaining control variables cˆ and R follow exactly from 1 2 the equilibrium conditions. Specifically, given x , X , yˆ = yˆ(x ,X ) and πˆ = πˆ(x ,X ), 1 2 t 1 2 t 1 2 ˆ cˆ(x ,X ) follows from solving for cˆ in R (·) in (14) and R(x ,X ) follows from solving for 1 2 t y 1 2 ˆ (cid:2) (cid:3)(cid:48) R in R (·) in (15). Thus, with a slight abuse of notation, we set y(·) = yˆ(·),πˆ(·) and t R k = 2. The constraint in this problem is the ZLB constraint which can be written in terms of ˆ the variables defined so far as R +log(r π ) ≥ 0, which leads to the h(.) function t ∗ ∗ (cid:0) (cid:1) (cid:2) (cid:3) ˆ h x ,X ,y (·) = (1−ρ) ψ πˆ(·)+ψ (yˆ(·)−yˆ +zˆ) +ρR +e +log(r π ). (29) 1,t 2,t t 1 2 t−1 t t−1 R,t ∗ ∗ Thus, the γ coefficients in (18) are γ = ρ, γ(cid:48) = [log(r π ), −(1−ρ)ψ , (1−ρ)ψ , 0, 0], γ(cid:48) = [(1−ρ)ψ , (1−ρ)ψ ], 1 2 ∗ ∗ 2 2 y 1 2 and we can write c(ϑ) in (26) as c(ϑ) = ρ+(1−ρ)ψ απ +(1−ρ)ψ αy . 1 1,1 2 1,1 If απ and αy are both positive, then c(ϑ) is also positive because the remaining struc- 1,1 1,1 tural parameters are positive under standard parameterizations. Thus, we label the “1” regime as the regime where the ZLB is slack, ’n’ (non-binding), and the “2” regime as the ’b’ (binding) regime. The decision rules in the regime when the ZLB is slack are not identical but closely resemble the linear decision rules of the model linearized around the steady state. ˆ These decision rules have the property that the coefficients on R in the π and y decision t−1 rules are positive. In our implementation, we check that c(ϑ) is indeed positive every time we solve the model. 4Inprinciple,oneoftheotherstatevariablescouldhavebeenchosenasx . However,wefounditnatural 1 to use the lagged interest rates, because all else being equal, higher lagged interest rates move the economy away from the ZLB constraint.

This Version: February 10, 2020 16 3.4 Canonical Form The final step in preparing the model solution for filtering is to cast the solution in terms of the following canonical form: (cid:40) Φ (n)+Φ (n)s +Φ (n)η if η < ζ(s ) 0 1 t−1 η t 1,t t−1 s = (30) t Φ (b)+Φ (b)s +Φ (b)η otherwise, 0 1 t−1 η t which is a vector autoregressive representation for a vector of model variables s . It will serve t as a transition equation in a state-space model. Thus, the vector s needs to include all varit ables (whether or not they are directly approximated) that are necessary for the construction of the measurement equations and all variables necessary to determine the transition of such variables, which are all the state variables. The canonical form resembles a regime-switching VAR with a “binding” (b) and “non-binding” (n) regime. However, the regime transition is not determined by an exogenous Markov process. Instead, it is determined by the realization of the shock innovations. The vector η is a linear transformation of the structural shock t innovations of the DSGE model. Because the definition of s is model and application specific, we outline the construction t ofthecanonicalforminthecontextoftheNewKeynesianDSGEmodelwithZLBconstraint. Define the vectors s t (cid:2) ˆ ˆ (cid:3)(cid:48) s = yˆ,πˆ ,R ,zˆ,d ,gˆ,e (31) t t t t t t t R,t (cid:2) (cid:3)(cid:48) and recall that (cid:15) = (cid:15) ,(cid:15) ,(cid:15) ,(cid:15) . We will begin by expressing the law of motion of s t z,t d,t g,t R,t t as a function of the innovations (cid:15) and then, later on, we transform the (cid:15) ’s into η’s: t t s = Φ (·)+Φ (·)s +Φ (·)(cid:15) . t 0 1 t−1 (cid:15) t Rather than providing detailed algebraic expressions for the elements of the Φ(·) matrices, we will provide an outline of how the expressions can be derived. Output, inflation, and interest rates. We use the first three rows of the Φ(·) matrices ˆ to represent the decision rules for yˆ and πˆ and the monetary policy rule that determines R . t t t Notethatthedecisionrulesin(20)areexpressedintermsofX = [R ˆ ,1,yˆ ,zˆ,d ˆ ,gˆ,e ](cid:48), t t−1 t−1 t t t R,t whereas the canonical form is written in terms of s —see (31). Thus, in order to generate t the equations for yˆ, πˆ , and R ˆ for the canonical form, we have to express X as a linear t t t t function of s and (cid:15) . t−1 t

This Version: February 10, 2020 17 Exogenous shocks. The remaining four rows of the Φ(·) matrices reproduce the law of motion of the exogenous shock processes in (16). From (cid:15) ’s to η ’s and defining the threshold condition. To express the threshold t t condition in the canonical form and transform the (cid:15) into η innovations, define t t 1 ζ(s ) = log(r π )+φ (n)+φ(cid:48)(n)s , η = − φ(cid:48)(n)(cid:15) t−1 ∗ ∗ 0 1 t−1 1,t (cid:107)φ (n)(cid:107) (cid:15) t (cid:15) such that the ZLB constraint is non-binding if and only if η < ζ(s ) 1,t t−1 as in (30). Let Null(x) be an orthogonal basis for the null space for the vector x. We define the vector η as t (cid:34) (cid:35) φ(cid:48)(n)/(cid:107)φ (n)(cid:107) η = (cid:15) (cid:15) (cid:15) . t (cid:0) (cid:1)(cid:48) t Null φ (n)/(cid:107)φ (n)(cid:107) (cid:15) (cid:15) The transformation has the property that, if E[(cid:15) (cid:15)(cid:48)] = I, then E[η η(cid:48)] = I as well. The t t t t definition of η as function of (cid:15) allows us to convert the Φ (·) matrix into Φ (·) and completes t t (cid:15) η the derivation of the canonical form. 3.5 Measurement Equations ThekeyrequirementfortheconditionallyoptimalparticlefilterthatisdevelopedinSection5 is that the conditional mean function (given s ) of the observables is piecewise-linear. t−1 This is guaranteed if the state-transition equation has the canonical form (30) and the measurement equation is linear in s as in t yo = A +A s +u , u ∼ N(0,ςΣ ), (32) t 0 s t t t u where yo is the vector of observable variables, u is a vector of measurement errors, and the t t constant ς allows us to scale the measurement error covariance. The small-scale New Keynesian DSGE model is typically estimated using output growth yo , inflation πo, and interest rates Ro. In addition, we will include a measure of the gr,t t t consumption-output ratio. Starting from the definition of s given in (31), we define the t augmented vector s˜ = [s(cid:48),yˆ ](cid:48) and add the trivial equation yˆ = yˆ to the canonical t t t−1 t−1 t−1

This Version: February 10, 2020 18 form in (30). Because the yˆ identity is linear, the structure of the canonical form is pret−1 served. Assuming that output growth is measured in quarter-on-quarter percentages, and inflation and interest rates are measured in annualized percentages, the system of measurement equations is yo = 100log(γ)+100(yˆ −yˆ +zˆ)+σ u gr,t t t−1 t u,y y,t πo = 400log(π )+400πˆ +σ u (33) t ∗ t u,π π,t Ro = 400log(R )+400R ˆ +σ u . t ∗ t u,R R,t We use data on government spending G to construct a measure of the consumption output t ratio: C /Y = 1−G /Y . We define cyo as linearly detrended 100·log(1−G /Y ). Because t t t t t t t in our model 1−G /Y = 1/g , we obtain the additional measurement equation t t t cyo = −100logg −100gˆ +σ u . (34) t ∗ t u,c c,t Thus, we are treating the exogenous process gˆ as observed in the estimation. Because the t law of motion of gˆ is linear, the PLC structure of the empirical model is maintained. t 4 PLC Model Solution In this section, we describe how the free coefficients ϑ in the PLC decision rules, as defined in (23), are determined. In a generic global approximation, the decision rules are assumed to be linear combinations of some known, flexible functions called basis functions, which are in turn nonlinear functions of states. The decision rules are parameterized by some coefficients that are the weights attached to each of the basis functions. There are two popular ways to choose these coefficients. In the collocation approach, these coefficients are chosen such that the residuals, the errors in equilibrium conditions, are set to exactly zero at some carefully selectedpointsinthestatespace. Thesepointstypicallycomefromagridthatisconstructed using a tensor product of grids for each state variable, which in turn are constructed using the roots of a set of complete polynomials. It is well known that tensor product grids used to approximate the solution of nonlinear models suffer from the curse of dimensionality. Maliar and Maliar (2014, 2015) propose a series of techniques based on stochastic simulations to construct lower dimensional grids that represent the ergodic distribution of the model. However, these simulation-based methods

This Version: February 10, 2020 19 require a time-consuming iterative procedure, and, in general, there does not seem to be a guarantee for the convergence of the grid and the approximate solution. For our application, where we need to solve the model with different parameters tens of thousands of times, neither the collocation approach that uses tensor grids, nor the iterative approach that uses the ergodic distribution seem feasible. Coleman et al. (2018) propose the use of random and quasi-random grids on a fixed hypercube, because they are easier and fastertoconstructbutlackthedimensionalityreduction. Smolyakgrids(KruegerandKubler (2004), Malin et al. (2011), Judd et al. (2014)) offer a balance in this trade-off, combining the advantages of a fixed and predetermined domain and the dimensionality reduction of sparse grid methods. We follow this approach and build a sparse Smolyak grid and minimize the sum of squared residuals over this grid to find the decision rule coefficients.5 Equilibrium Conditions. More formally, let us denote the generic equilibrium conditions as H[f (·),X] = 0, ∀ X ∈ X, (35) 0 where f (X) corresponds to the optimal decision rules. To simplify the notation, we dropped 0 the vector of DSGE model parameters θ from the conditioning set. For instance, for the New Keynesian DSGE model (35) becomes  (cid:16) (cid:17)  R yˆ (X ),cˆ (X ),πˆ (X ),R ˆ (X ),yˆ (X ),cˆ (X ),πˆ (X ),R ˆ (X ),... c 0 t 0 t 0 t 0 t 0 t+1 0 t 0 t+1 0 t+1 E t (cid:16) (cid:17)  = 0, R yˆ (X ),cˆ (X ),πˆ (X ),R ˆ (X ),yˆ (X ),cˆ (X ),πˆ (X ),R ˆ (X ),... π 0 t 0 t 0 t 0 t 0 t+1 0 t 0 t+1 0 t+1 ˆ whereweexplainedhowweconstructthedecisionrulesyˆ(.),cˆ(.),πˆ(.),andR(.)inSection3.3. Expectations over X can be evaluated by using the law of motion of the exogenous shocks t+1 in (16) and noting the the first three elements of X , R ˆ , 1, and yˆ, are known at time t. t+1 t t Thus, the equilibrium conditions only depend on the two decision rules y (.) and π (.) and 0 0 the current states X , just like (35) requires. t We approximate f (X) by PLC decision rules g(X;ϑ) ∈ G, where ϑ contains the free 0 coefficients that are necessary to characterize the PLC function and G is the set of all PLC functions. Todetermineϑ, weminimizethenormofthevector-valuedfunctionH[g(X;ϑ),X] 5One interpretation of our approach is that we are using the sum squared residual over the Smolyak grid as a proxy for integrating the squared residual function over the ergodic distribution. Monte Carlo experiments in Heiss and Winschel (2008) in the context of the calculation of the likelihood function of a mixed logit model, which also involves evaluating an integral without a closed-form expression, show that using a Smolyak grid provides superior performance over simulation techniques.

This Version: February 10, 2020 20 over a set of M grid points S obtained using a sparse Smolyak grid: 1 (cid:88) ϑ = argmin (cid:107)H[g(X;ϑ),X;θ](cid:107)2. ϑ M X∈S Solution Grid. In constructing the grid S, we follow Judd et al. (2014). The Smolyak grid is a sparse grid defined on the interval [−1,1]. To use it in an application, it has to be scaled so that it represents the space of X . The scaling of the grid amounts to picking minimum t and maximum values for each state variable. The extrema correspond to −1 and 1 in the original domain of the Smolyak grid, respectively. One of the properties of the Smolyak grid is it places grid points at the edges of the domain – at −1 and 1. Thus, we recommend picking values for the scaling that are not too extreme in order to have some mass on both sides of the grid point. In the context of the New Keynesian DSGE model, we proceed as follows. For the exogenous state variables in X , we linearly scale the grid so that it starts from the 10th t percentile to the 90th percentile of the distribution of each state variable. For the endogenous state variable yˆ , we use the same scaling as the exogenous state zˆ, since we have verified t−1 t ˆ that they have similar dispersion when simulating the model. Finally, for R , we use the t−1 observed nominal interest rate data. Because we want to analyze the ZLB, the grid is scaled ˆ so that it’s minimum value matches the ZLB with R = −log(r π ), which happens to t−1 ∗ ∗ be the 10th percentile. The maximum value is matched to the 90th percentile of R in the t data. For this variable in particular, we scale the grid so that the middle of the Smolyak ˆ grid coincides with the steady state at R = 0. t−1 Integration and Minimization. Expectations in the residual functions as in (11) are computed using the monomial integration rule M2 as in Judd et al. (2010). For a generic expression E [v(x )], our implementation with four random variables requires computing v(.) t t+1 at 33 nodes and taking a weighted average. In our experience, this method produces results that are very similar to using a Gauss-Hermite integration for each random variable. As an example, with 5 nodes per random variable, the latter approach would make it necessary to evaluate v(.) at 625 nodes and increase running time considerably. To minimize the objective function, we utilize a gradient-based nonlinear solver with Jacobians evaluated analytically. As an initial guess for the solver, we use the log-linear approximation. Because the log-linear decision rules are a special case of the PLC decision rules—recall that we defined the model variables in log-deviations from the steady state—we

This Version: February 10, 2020 21 candenotethembyg(0)(X). Wefindthefreecoefficientsϑ (withαi = αi andαi = αi for 0 11 21 12 22 alli = 1,...,k)thatgeneratethesamedecisionrulesandusethistoinitializetheminimization algorithm. Interpretation of PLC Decision Rules. WeoffertwointerpretationsofthePLCdecision rules. First, they can be viewed as an approximation to the optimal decision rules f (X). 0 In fact, our motivation for constructing PLC rules was that the decision rules computed in Aruoba et al. (2018) with Chebyshev polynomials for a New Keynesian DSGE model that is essentially identical to the model in Section 2, appeared to be almost piecewise-linear. While in any given model, the PLC decision rules may or may not provide accurate approximations of the optimal decision rules, there is no sense in which the PLC rules become more accurate “asymptotically.” Second, the PLC rules can be viewed as describing the behavior of boundedly rational agents. In principle, bounded rationality can take many forms. The basic notion is that decision making is constrained by agents’ abilities to gather, retain, and process decisionrelevant information. Boundedly rational agents may also be unable to solve a complicated mathematical problem. Under this interpretation, the PLC rules can be viewed as more easily computable decisions that have the additional benefit of being linear, except when the constraint in the model becomes binding. 5 Particle Filters for PLC Models Econometricinferenceisbasedonthestate-spacerepresentationthatcomprisesthenonlinear transitionequation(30)andthelinearmeasurementequation(32). Inthefollowingformulas, we write y instead of yo and represent the transition and measurement equations through t t the densities p(s |s ,θ) and p(y |s ,θ), respectively. We will subsequently use Y and t t−1 t t t1:t2 S to denote the sequences y ,...,y and s ,...,s . Moreover, we use θ to denote the t1:t2 t1 t2 t1 t2 vector of model parameters. The state-space model provides a joint density for the states s and the observations y : t t T (cid:89) p(Y ,S |θ) = p(y |s ,θ)p(s |s ,θ). (36) 1:T 1:T t t t t−1 t=1

This Version: February 10, 2020 22 Of particular interest are the sequence of estimates p(s |Y ) of the state vector and the t 1:t likelihood function, which is defined as T T (cid:90) (cid:90) (cid:89) (cid:89) p(Y |θ) = p(y |Y ,θ) = p(y |s ,θ)p(s |s ,θ)p(s |Y ,θ)ds ds . (37) 1:T t 1:t−1 t t t t−1 t−1 1:t−1 t t−1 t=1 t=1 These objects can be obtained from a nonlinear filter. We will use an algorithm that belongs to the class of sequential Monte Carlo filters to approximate p(s |Y ,θ) and p(y |Y ,θ), t 1:t t 1:t−1 also known as particle filters. In Section 5.1, we present a generic particle filtering algorithm that can be configured in many different ways by choosing a proposal density that mutates particles representing the state s into particles representing s . If the mutation is based on forward simulation of t−1 t the state-transition equation, one obtains the bootstrap particle filter (BSPF) described in Section 5.2. Our key contribution is to derive the proposal density that is tailored toward the piecewise-linear canonical form of the state-space model and leads to the conditionally optimal particle filter discussed in Section 5.3. 5.1 Generic Particle Filter There exists a large literature on particle filters. Surveys and tutorials can be found, for instance,inArulampalametal.(2002),Capp´eetal.(2007),DoucetandJohansen(2011),and Creal(2012). Kantasetal.(2014)discussusingparticlefiltersinthecontextofestimatingthe parameters of state-space models. A particle filter represents the density p(s |Y ,θ) through t 1:t a swarm of particles {sj,Wj}M with the property that posterior expectations E[h(s )|Y ,θ] t t j=1 t 1:t can be approximated by Monte Carlo averages of the form M 1 (cid:88) h(sj)Wj. M t t j=1 The approximation typically holds in form of a Law of Large Numbers or a Central Limit Theorem. Textbook treatments of the statistical theory underlying particle filters can be found in Capp´e et al. (2005), Liu (2001), and Del Moral (2013). By following the exposition in Herbst and Schorfheide (2015), the particle filter can be implemented using the following algorithm:

This Version: February 10, 2020 23 Algorithm 1 (Generic Particle Filter) 1. Initialization. Draw the initial particles from the distribution sj i ∼ id p(s |θ) and set 0 0 Wj = 1, j = 1,...,M. 0 2. Recursion. For t = 1,...,T: (a) Forecasting s . Draw s˜j from density g (s˜|sj ,θ) and define the importance t t t t t−1 weights p(s˜j|sj ,θ) ωj = t t−1 . (38) t g (s˜j|sj ,θ) t t t−1 (b) Forecasting y . Define the incremental weights t w˜j = p(y |s˜j,θ)ωj. (39) t t t t The predictive density p(y |Y ,θ) can be approximated by t 1:t−1 M 1 (cid:88) pˆ(y |Y ,θ) = w˜jWj . (40) t 1:t−1 M t t−1 j=1 (c) Define the normalized weights w˜jWj W ˜ j = t t−1 . (41) t 1 (cid:80)M w˜jWj M j=1 t t−1 (d) Selection. Resample the particles, for instance, via multinomial resampling. Let {sj}M denote M iid draws from a multinomial distribution characterized by supt j=1 port points and weights {s˜j,W ˜ j} and set Wj = 1 for j =,1...,M. An approxit t t mation of E[h(s )|Y ,θ] is given by t 1:t M 1 (cid:88) h ¯ = h(sj)Wj. (42) t,M M t t j=1 3. Likelihood Approximation. The approximation of the log-likelihood function is given by (cid:32) (cid:33) T M (cid:88) 1 (cid:88) lnpˆ(Y |θ) = ln w˜jWj . (43) 1:T M t t−1 t=1 j=1

This Version: February 10, 2020 24 The most important choice in the configuration of the algorithm is the proposal density g (s˜|sj ,θ). Differentchoicesoftheproposaldensityleadtodifferentversionsoftheparticle t t t−1 filter. 5.2 Bootstrap Particle Filter The BSPF was originally proposed by Gordon and Salmond (1993). It uses the statetransition equation as the proposal density, that is, g (s˜|sj ,θ) = p(s˜|sj ,θ). This choice t t t−1 t t−1 is attractive because it is straightforward to implement the forecasting step by forward simulation of the transition equation and the importance weights simplify to ωj = 1. A t well-known disadvantage is that the proposal distribution is blind and hence ignores information about s contained in the current observation y . This can lead to a large variance t t of the incremental weights ω˜j. This problem is exacerbated if the measurement error varit ance is small and p(y |s˜j) has thin tails or if the model is inappropriately parameterized or t t misspecified and therefore has difficulties predicting y one-step ahead. Because the BSPF t has been used in the DSGE model literature (see Ferna´ndez-Villaverde and Rubio-Ram´ırez (2007), An and Schorfheide (2007) and Herbst and Schorfheide (2015)), we will include it as a benchmark. 5.3 Conditionally Optimal Particle Filter The proposal density for the COPF utilizes information in y with the goal of minimizing t the variance of the incremental weights ω˜j. It is given by t g∗(s˜|sj ,θ) = p(s˜|y ,sj ,θ) ∝ p(y |s˜,θ)p(s˜|sj ,θ). (44) t t t−1 t t t−1 t t t t−1 Combining the formula for g∗(s˜|sj ,θ) with the expressions for the importance weights ωj t t t−1 t in (38) and the incremental weights ω˜j in (39), we obtain t p(y |s˜j,θ)p(s˜j|sj ,θ) ω˜j = t t t t−1 = p(y |sj ). (45) t p(s˜j|y ,sj ,θ) t t−1 t t t−1 The second equality follows from Bayes Theorem. It can be shown that conditional on {sj } the proposal density g∗(s˜|sj ,θ) minimizes the variance of the incremental weight t−1 t t t−1 w˜j in (39). t

This Version: February 10, 2020 25 While direct sampling from the conditionally optimal proposal density is elusive for most nonlinear state-space models, we can derive a convenient formula for the piecewiselinear state-transition equation (30). Note that, conditional on s , the current state s is t−1 t determined by η . It turns out, that it is more convenient to derive a conditionally optimal t proposal density for η , denoted by g∗(η |sj ,θ). t t t t−1 In order to state the result, we have to define the following objects: νj(·) = y −A −A (cid:0) Φ (·)−Φ (·)sj (cid:1) (46) t t 0 s 0 1 t−1 η¯j(·) = (cid:0) ςI +Φ(cid:48)(·)A(cid:48)Σ−1A Φ (·) (cid:1)−1 Φ(cid:48)(·)A(cid:48)Σ−1νj(·) t η s u s η η s u t Ω ¯ (·) = ς (cid:0) ςI +Φ(cid:48)(·)A(cid:48)Σ−1A Φ (·) (cid:1)−1 . η s u s η We use the argument (·) to indicate that the expressions are obtained either based on (cid:0) Φ (n),Φ (n),Φ (n) (cid:1) or (cid:0) Φ (b),Φ (b),Φ (b) (cid:1) . Here νj(·) is the error made in forecasting 0 1 η 0 1 η t y based on sj . η¯j(·), and Ω ¯ (·) are the posterior mean vector and covariance matrix of t t−1 t η |(y ,sj ) absent any truncation—that is, for ζ(sj ) being +∞ or −∞. Moreover, let t t t−1 t−1 Dj(n) = (2π)−ny/2|Σ |−1/2|ςI +Φ (n)(cid:48)A(cid:48)Σ−1A Φ (n)|1/2 (47) t u η s u s η (cid:26) (cid:27) 1 ×exp − νj(n)(cid:48)(ςΣ +A Φ (n)Φ(cid:48)(n)A(cid:48))−1νj(n) 2 t u s η η s t (cid:113) ×Φ (cid:0) (ζ(s )−η¯j (n)/ Ω ¯ (n) (cid:1) , N t−1 1,t 11 Dj(b) = (2π)−ny/2|Σ |−1/2|ςI +Φ (b)(cid:48)A(cid:48)Σ−1A Φ (b)|1/2 t u η s u s η (cid:26) (cid:27) 1 ×exp − νj(b)(cid:48)(ςΣ +A Φ (b)Φ(cid:48)(b)A(cid:48))−1νj(b) 2 t u s η η s t (cid:18) (cid:18) (cid:113) (cid:19)(cid:19) 1−Φ (ζ(s )−η¯j (b))/ Ω ¯ (b) . N t−1 1,t 11 It can be shown that p(y |sj ) = Dj(n)+Dj(b). t t−1 t t The characterization of the conditionally optimal proposal density is summarized in Proposition 1. A proof of the proposition is provided in the Online Appendix. Proposition 1 Suppose the state-transition equation is given by (30), η ∼ N(0,I), η is t 1,t a scalar, and the measurement equation is given by (32), then draws from the conditionally optimal proposal densities g∗(s˜|sj ,θ), j = 1,...,M, defined in (44) can be generated as t t t−1 follows:

This Version: February 10, 2020 26 1. Let (cid:40) ‘n’ with prob. λj Dj(n) ξj = t , where λj = t . t ‘b’ with prob. 1−λj t Dj(n)+Dj(b) t t t 2. If ξj = ‘n’ then generate η from the distribution t t ηj ∼ N (cid:0) η¯j (n),Ω ¯ (n) (cid:1)I{ηj ≤ ζ(sj )}, ηj |ηj ∼ N(η¯j (n,ηj ),Ω ¯ (n)) (48) 1,t 1,t 11 1,t t−1 2,t 1,t 2|1 1,t 2|1 and let s˜j = Φ (n)+Φ (n)sj +ηj. t 0 1 t−1 t If ξj = ‘b’ then generate ηj from the distribution t t ηj ∼ N (cid:0) η¯j(b),Ω ¯ (b) (cid:1)I{ηj > ζ(sj )}, ηj |ηj ∼ N (cid:0) η¯j (b,ηj ),Ω ¯ (b) (cid:1) (49) 1,t 1 11 1,t t−1 2,t 1,t 2|1 1,t 2|1 and s˜j = Φ (b)+Φ (b)sj +ηj. t 0 1 t−1 t 3. The incremental particle weight is ω˜j = D(n)+D(b). (cid:4) t Vanishing Measurement Errors. It is instructive to examine what happens as ς −→ ∞. For the conditional density of y |s to be nonsingular in the limit, it has to be the case t t−1 that the number of rotated structural innovations is at least n . Formally, the covariance y matrices A Φ (·)Φ(cid:48)(·)A(cid:48) have to be non-singular. For expositional purposes, we consider the s η η s case in which n = n and (A Φ (·)) are invertible n ×n matrices. This means, ignoring y η s η y y the truncation, under the invertibility assumption, we can solve for the innovations η as a t function of (y ,sj ): t t−1 ηj (·) = (A Φ (·))−1(y −A (Φ (·)+Φ (·)s )). t∗ s η t s 0 1 t−1 Now consider what happens if we let the measurement error variance converge to zero. First, the expressions in (46) remain well defined in the limit: η¯j −→ ηj (·), Ω ¯ (·) −→ 0. t t∗ The posterior variance converges to zero and the posterior mean converges to the innovation ηj (·) that generates the observed y conditional on sj . For the limit behavior on Dj(n), t∗ t t−1 t

This Version: February 10, 2020 27 the crucial term is (cid:40) (cid:113) 1 if ζ(sj )−η¯j (n) ≥ 0 lim Φ (cid:0) (ζ(sj )−η¯j (n))/ Ω ¯ (n) (cid:1) = t−1 1,t . N t−1 1,t 11 ς−→0 0 otherwise This term measures whether it is possible to explain y using the (n) coefficients, accounting t for the fact that the n regime is only active if η¯j (n) ≤ ζ(sj ). A similar analysis can 1,t t−1 be conducted for the term Dj(b). Thus, for each particle j, there are four possible cases t (ignoring equalities): Case 1 : η¯j (n) < ζ(sj ), η¯j (b) < ζ(sj ) 1,t t−1 1,t t−1 Case 2 : η¯j (n) ≥ ζ(sj ), η¯j (b) ≥ ζ(sj ) 1,t t−1 1,t t−1 Case 3 : η¯j (n) < ζ(sj ), η¯j (b) ≥ ζ(sj ) 1,t t−1 1,t t−1 Case 4 : η¯j (n) ≥ ζ(sj ), η¯j (b) < ζ(sj ). 1,t t−1 1,t t−1 In Case 1 Dj(b) = 0 and λj = 1. Here, only the (n) decision rules can rationalize the t t data conditional on sj . Case 2 is the opposite: Dj(n) = 0, λj = 0, and only the (b) t−1 t t decision rules can rationalize the data. Under Case (3), both Dj(n) and Dj(b) are strictly t t positive, 0 < λj < 1, and both decision rules could explain the data. Finally, in Case 4 t y is inconsistent with sj , and none of the decision rules can explain the data. If each t t−1 j = 1,...,M falls into Case 4, then the particle-filter based likelihood approximation for this particular parameterization of the DSGE model will be zero. Note that, if the measurement error variance is strictly greater than zero, (potentially very large) measurement errors could also rationalize the data under Case 4. The previous calculations highlight that, unlike for the BSPF, the weights of the COPF do not degenerate if one decreases the measurement error variance. In this case, if A Φ (·) is s η a square matrix, the COPF specializes to the inversion filter that solves for the innovations as a function of y and sj . Because our model is piecewise-linear, this inversion may have t t−1 one, two, or no solution(s). Perfectly Observed Regimes. Our ZLB application has the special feature that the observationy identifiestheregime. Lety = [y(cid:48) ,y ](cid:48), wherey correspondstothenominal t t 1,t 2,t 2,t interest rate. Suppose the ZLB is binding in the b regime and non-binding in the n regime. Then, the ex post regime probability λj is independent of sj and given by the indicator t−1 function. λj = I{y > c}. (50) 2,t

This Version: February 10, 2020 28 While the distribution of y is continuous in the n regime, for the binding regime the cont tinuous part of the distribution concentrates in the lower-dimensional subspace defined by y = c. Thus, the formulas for Dj(b) in (47) and the moments of the distribution of ηj in 2,t t t Proposition 1 in the b regime have to be adjusted to account for the reduced dimensionality of the continuous part of the y distribution. Further details are provided in the Online t Appendix. 6 Numerical Illustrations We now illustrate the proposed filtering method based on data simulated from the DSGE model of Section 2. In Section 6.1, we compare the accuracy of log-likelihood approximations obtainedwiththeproposedCOPFandanaivebootstrapparticlefilter(BSPF).InSection6.2 we embed the particle-filter approximations of the DSGE model likelihood function into an otherwise standard random walk Metropolis-Hastings (RWMH) algorithm and assess the improvements attainable with the COPF. 6.1 Accuracy of Likelihood Approximation We simulate a sample of 1,000 observations from the DSGE model loosely parameterized based on the empirical estimates reported in Section 7 and select a subsample of T = 140 observations, which is the sample size in the empirical application in Section 7. In order to increase the likelihood of hitting the ZLB in the simulation, we lower the target inflation rate π to an annualized rate of 0.5%. In the selected subsample, the ZLB binds 22% of ∗ the periods which is roughly consistent with our actual sample. The parameter values are summarized in the second column of Table 1. We begin by comparing the accuracy of the particle-filter-based likelihood approximation using the same parameter values that were used to generate the data. For the COPF, we set M = 250, and, for the BSPF, we consider M = 1,000 and M = 10,000. The average run times for a likelihood evaluation on a single core are 2.6, 1.8, and 18.3 seconds, respectively. While the data are generated without measurement errors, the likelihood evaluation imposes non-zero measurement errors to facilitate the application of the BSPF. The measurement errorcapturesthediscrepancybetweenmodel anddata. The smaller themeasurement errors are, the larger the penalty for deviations between model-predicted and actual observations.

This Version: February 10, 2020 29 Table 1: DGP and Prior Parameter DGP Prior Distribution Density P(1) P(2) HPD Low HPD High τ 2.00 G 2.00 0.20 1.67 2.32 κ 0.13 B 0.10 0.05 0.02 0.17 ψ 2.60 fixed at 2.60 1 ψ 0.98 fixed at 0.98 2 ρ 0.80 B 0.80 0.10 0.65 0.96 R ρ 0.97 B 0.80 0.10 0.65 0.96 g ρ 0.91 B 0.80 0.10 0.65 0.96 d ρ 0.37 B 0.40 0.20 0.08 0.73 z √ σ 0.0019 IG 0.005 4.00 0.003 0.010 R √ σ 0.0025 IG 0.005 4.00 0.003 0.010 g √ σ 0.017 IG 0.01 4.00 0.005 0.020 d √ σ 0.0058 IG 0.01 4.00 0.005 0.020 z η 0.72 fixed at 0.72 ν 0.10 fixed at 0.10 χ 1.00 fixed at 1.00 H g∗ 1.27 G 1.20 0.20 0.88 1.63 rAnet 0.22 G 1.00 0.40 0.39 1.64 gamQnet 0.33 N 0.50 0.25 0.09 0.91 piAnet 0.50 N 2.50 1.00 0.87 4.14 Notes: G is Gamma distribution; B is Beta distribution; IG is Inverse Gamma distribution; and N is Normal distribution. P(1) and P(2) are mean and standard deviations for Beta, Gamma, and Normal distributions. The IG distribution is parameterized as scaled inverse χ2 distribution with density √ p(σ2|s2,ν) ∝ (σ2)−ν/2−1exp[−νs2/(2σ2)], where P(1) is s2 and P(2) is ν. The density of σ is obtained √ by the change of variables σ = σ2. HPD(Low,High) refers to the boundaries of 90% highest prior density intervals. Weusethefollowingparametertransformations: β =exp{−rANet/400},γ =exp{gamQnet/100}, and π =exp{piAnet/400}. ∗ Thus, small measurement errors can be provided as a stress test for the filter in case the model (or its parameterization) is at odds with the data. The likelihood evaluation is conditional on a vector of initial states which we take to be the “true” initial states associated with the simulated observations. When we subsequently estimate the model, we treat the initial states as unknown parameters and include them into the vector θ. The prior distribution for the initial states captures the uncertainty about the level of the states in period t = 0. As discussed in detailed in Section 5, particle filters are stochastic algorithms. We run the COPF and the BSPF N = 100 times, respectively. Holding the parameter value θ run constant, each time we run the filter, we obtain a slightly different log-likelihood approx-

This Version: February 10, 2020 30 Figure 1: Density of Log-Likelihood Approximations Measurement Error ς = 0.2 Measurement Error ς = 0.1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -520 -515 -510 -505 -500 -465 -460 -455 -450 -445 Notes: Density plots are based on N = 100 runs of the BSPF (M = 1,000 is red solid and M = 10,000 run is red dashed) and COPF (M =250, blue solid), respectively. imation logpˆ(Y|θ). From repeated runs of the filter, we construct kernel estimates of the likelihood, which are depicted in Figure 1. We consider two values for the scaling parameter for the measurement errors, ς: 0.2 and 0.1. Because the BSPF ignores the information in y when generating the proposal draws t s˜j, the variance of the particle weights increases as ς falls. This, in turn, translates into t an increase in the variance of the log-likelihood approximation, which is clearly visible by comparing the solid red densities across panels for the two different values of ς. The precision of the COPF, on the other hand, increases as ς falls. When ς −→ 0, it is possible to uniquely determine the innovations ηj conditional on sj during non-ZLB periods, because t t−1 the number of observables equals the number of shocks. In order to achieve similar accuracy to the COPF, the number of particles for the BSPF has to be increased to M = 10,000, which implies that the BSPF likelihood evaluation takes roughly seven times as long as the COPF likelihood evaluation. Figure 2 depicts standard deviations of log-likelihood approximations as a function of the mean log-likelihood value across N = 100 runs of the filter for ς = 0.1 and ς = 0.05. run Each dot (or asterisk) in the two scatter plots corresponds to a parameter value θi generated from the particle Markov chain Monte Carlo (PMCMC) algorithm in Section 6.2. Two importantfindingsemerge. First,aswehaveseenalreadyfromFigure1,theCOPFlikelihood approximation is less dispersed than the BSPF approximation. The accuracy gain from the

This Version: February 10, 2020 31 Figure 2: Comparison of Log-Likelihood Approximations ς = 0.1 ς = 0.05 10 10 COPF BSPF 8 8 )) )) i i |Y 6 |Y 6 (p (p COPF n n BSPF l(D 4 l(D 4 d d tS tS 2 2 0 0 -470 -460 -450 -440 -430 -440 -430 -420 -410 -400 i i Mean(ln p(Y| )) Mean(ln p(Y| )) Notes: Standarddeviationsoflog-likelihoodapproximationsarebasedonN =100runsofthetwofilters. run Each dot (or asterisk) corresponds to a particular θi. We are using M =1,000 particles for the BSPF (red asterisks) and M =250 particles for the COPF (blue dots). conditionallyoptimalproposaldensityincreasesasthemeasurementerrorvariancedecreases, because the BSPF performance deteriorates. Second, while the accuracy of the COPF is independent of the log-likelihood value associated with the posterior draw θi, the accuracy of the BSPF approximation deteriorates the further the θi draw is in the tails of the posterior distribution. 6.2 Particle MCMC Wenowembedtheparticlefilterlikelihoodapproximationsinastandardsingle-blockRWMH algorithm. A textbook treatment of this algorithm can be found in Herbst and Schorfheide (2015). The particle RWMH algorithm operates on an enlarged probability space that includes all the random variables that are generated when running the particle filter. Write the particle filter approximation of the likelihood function as pˆ(Y|θ) = g(Y|θ,U), where U represents the random variables associated with the particle filter implementation. The sampler operates on an enlarged probability space that includes U and generates draws

This Version: February 10, 2020 32 from the joint posterior distribution p(θ,U|Y) ∝ g(Y|θ,U)p(θ)p(U). Because particle filter approximations of likelihood functions are unbiased, we deduce that the marginal posterior of θ is identical to the posterior that obtains if the exact likelihood is available: (cid:90) (cid:90) p(θ|Y) = p(θ,U|Y)dU ∝ g(Y|θ,U)p(θ)p(U)dU = p(Y|θ)p(θ). A formal analysis of PMCMC algorithms is provided in Andrieu et al. (2010). Unfortunately, there is no free lunch. The use of an enlarged probability space leads to an increase in the persistence of the Markov chain generated by the posterior sampler. This in turn, leads to less precise Monte Carlo approximations of posterior moments. In a nutshell, the higher the variance of the likelihood approximation, the larger the persistence of the resulting Markov chain. If the variance is too high, sampling from the posterior distribution will fail. The use of an accurate particle filter such as the COPF can alleviate this problem. The RWMH algorithm requires a covariance matrix for the proposal distribution (we use a multivariate normal distribution) that is constructed as follows. We start from a loglinearized version of the DSGE model that ignores the ZLB constraint. For this version, the likelihood function can be evaluated with the Kalman filter (KF). We conduct a preliminary MCMC run based on a diagonal covariance matrix with scaled prior variances on the diagonal. Based on the output of the preliminary run, we compute the posterior variance ˆ¯ covariance matrix for the linearized model that ignores the ZLB constraint, denoted by V . θ A version of this matrix that is scaled by c is used for all subsequent calculations. We set the scale of the measurement error variance to ς = 0.1. Thus, according to the left panel of Figure 2, the standard deviation of the log-likelihood approximation of the COPF is around 1 whereas the standard deviation for the COPF ranges from 3 to 5. We keep the number of particles for the BSPF at M = 1,000 and reduce the number of particles to M = 150 for the COPF to lower the runtime. The scale factor for the proposal covariance matrix of the RWMH algorithm is set to c = 0.1. This leads to an overall run time of 7.7 hours for the COPF-RWMH algorithm on a single core with an acceptance rate of 30%.6 6Herbst and Schorfheide (2015) documented that for the estimation of small-scale linearized DSGE models,anacceptanceratebetween15%and30%isassociatedwiththemostaccurateMonteCarloapproximations of posterior means.

This Version: February 10, 2020 33 Figure 3: Posterior Draws: Density and Autocorrelation Posterior τ Posterior σ d COPF COPF 2 BSPF 100 BSPF 80 1.5 60 1 40 0.5 20 0 0 1.5 2 2.5 0.01 0.015 0.02 0.025 0.03 0.035 0.04 ACF of τ Draws ACF of σ Draws d 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 COPF COPF BSPF BSPF 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Autocorrelations, All Parameters 1 1 Lag 10 Lag 30 Lag 20 Lag 40 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.5 0.6 0.7 0.8 0.9 1 0.5 0.6 0.7 0.8 0.9 1 Notes: BSPF configuration: number of particles M = 1,000, ME scale ς = 0.1, proposal covariance scale c = 0.1, run time is 10:10:27 (hh:mm:ss). COPF configuration: number of particles M = 150, ME scale ς =0.1,proposalcovariancescalec=0.1,runtimeis7:39:20(hh:mm:ss). Toprow: kerneldensityestimates of posterior distributions based on MCMC output. Vertical lines indicate true parameter values. Center row: autocorrelation functions of posterior draws based on COPF and BSPF. Bottom row: scatter plots of autocorrelations BSPF vs. COPF for various lags. Solid line is 45 degree line. ReplacingtheCOPFwiththecommonlyusedBSPFincreasestheruntimeoftheRWMH algorithm to 10.2 hours and lowers the acceptance rate to 4%, drastically increasing the

This Version: February 10, 2020 34 persistence of the Markov chain. Under some regularity conditions, the sequence of posterior draws generated from a RWMH algorithm satisfies a Central Limit Theorem (CLT) for dependent processes. The numerical accuracy of the Monte Carlo approximation of posterior means depends on the long-run covariance matrix of the sequence of parameter draws θi, i = 1,...,N. The larger the autocorrelation of these draws, the less precise the Monte Carlo approximation. The first row of Figure 3 compares posterior densities constructed from the output of the COPF- and BSPF-RWHM algorithms for two representative parameters: τ and σ . The d posteriordensitieslookverysimilarandbothpeaknearthe“true”parametervaluesdepicted by the solid vertical lines. The second row shows the autocorrelation functions for the τi and σi sequences. Here, stark differences emerge. While under the BSPF-based sampler, d the autocorrelation at lag 100 is still around 0.9; it is approximately 0.2 for draws from the COPF-RWMH. The last row of the figure compares autocorrelations for lags 10, 20, 30, and 40 for all estimated parameters. The solid lines are 45-degree lines. The two panels show that the COPF is able to reduce the autocorrelation for all estimated parameters, and hence it drastically improves the performance of the MCMC algorithm. Because the COPF can deliver accurate likelihood approximations with a relatively small number of particles, M = 150 in our numerical illustration, it is possible to accurately estimate a DSGE model with occasionally binding constraint without supercomputing capabilities in a relatively short amount of time. 7 Empirical Application We now estimate the small-scale New Keynesian DSGE model based on quarterly U.S. data andconductafiscalpolicyexperiment. TheestimationresultsaresummarizedinSection7.1, and the fiscal policy analysis appears in Section 7.2. 7.1 Estimation TheDSGEmodelisestimatedbasedondataonGDPgrowth(q-o-q%),thelogconsumption- GDPratio(scaledby100), GDPdeflatorinflation(annualized%), andnominalinterestrates

This Version: February 10, 2020 35 (annualized %) with data from 1984:Q1 to 2018:Q4. The data for the estimation was extracted from the FRB St. Louis FRED database (vintage 2019-10-30). Output growth is defined as real gross domestic product (GDPC1) growth converted into per capita terms. Our measure of population is Civilian Noninstitutional Population (CNP16OV). We compute population growth rates as log differences and apply an eight-quarter backward-looking moving average filter to the growth rates to smooth out abrupt changes in the population growth series. In constructing a measure of the consumption-GDP ratio, we define consumption as the difference between output and government spending. Thus, our measure of consumption includes investment and net exports. Government spending is constructed as real government consumption expenditures and gross investment (GCEC1). We remove a linear trend from the log consumption-GDP ratio to correct for different time trends in the price deflators of the GDP components. Inflation is defined as the log difference in the GDP deflator (GDPDEF), and the interest rate is the average effective federal funds rate (FED- FUNDS) within each quarter. During the period 2009:Q1 to 2015:Q4, when the effective federal funds rate was between 0 and 25 basis points, we set the interest rate exactly equal to zero and regard the ZLB as binding. The prior distribution used for the estimation is identical to the one in Table 1. We absorb the initial values of the latent state variables into the parameter vector and specify prior distributions over the initial states; see Table A-1 in the Online Appendix.7 We fix a number of parameters prior to estimation. Because our sample does not include observations on labor market variables, we fix the Frisch labor supply elasticity. Based on R´ıos-Rull et al. (2012), who provide a detailed discussion of parameter values that are appropriate for DSGE models of U.S. data, we set η = 0.72. The parameter ν, which captures the elasticity of substitution between intermediate goods, is not separately identifiable from the slope of the Phillips curve κ which in turn determines the adjustment cost parameter φ. We set ν = 0.1, which generates a markup of 10%. We fix the preference parameter at χ = 1. It determines H steady-state hours worked and is neither relevant for the model dynamics nor identifiable based on our observables. We also fix the monetary policy coefficients ψ and ψ at 2.60 and 1 2 0.98, respectively, which are values estimated in Aruoba and Schorfheide (2016). We start out by estimating the log-linearized version of the DSGE model that ignores the ZLB constraint, setting the measurement error variances of the state-space model to zero. Draws from the posterior are generated by a single-block RWMH algorithm. In an initial 7Thisapproachhastheadvantagethatuncertaintyabouttheinitialstatedoesnotaddtothevariability of the particle-filter-based likelihood approximation conditional on a parameter θi.

This Version: February 10, 2020 36 Table 2: Posterior Distribution (PLC / COPF) Parameter Mean MAP HPD Low HPD High τ 2.08 2.20 1.75 2.39 κ 0.12 0.17 0.08 0.17 ρ 0.82 0.82 0.79 0.85 R ρ 0.982 0.987 0.967 0.997 g ρ 0.95 0.97 0.93 0.97 d ρ 0.32 0.30 0.18 0.47 z σ 0.0018 0.0019 0.0016 0.0021 R σ 0.0026 0.0024 0.0023 0.0028 g σ 0.0270 0.0361 0.0170 0.0370 d σ 0.0060 0.0063 0.0052 0.0068 z g∗ 1.27 1.28 1.25 1.29 rAnet 0.64 0.83 0.27 1.04 gamQnet 0.36 0.32 0.23 0.48 piAnet 2.94 3.34 2.48 3.36 Notes: The estimation period is 1984:Q1 to 2018:Q4. The following parameter are fixed during the estimation: ψ = 2.6, ψ = 0.98, η = 0.72, ν = 0.10, and χ = 1.00. We use the following parameter 1 2 H transformations: β =exp{−rANet/400}, γ =exp{gamQnet/100}, and π =exp{piAnet/400}. MAP refers ∗ to the maximum posterior probability estimate. HPD(Low,High) refers to the boundaries of 90% highest posteriordensityintervals. COPFconfiguration: numberofparticlesM =150,MEscaleς =0.001,proposal covariance scale c = 0.2, N = 55,000 draws (drop first 10%), acceptance rate is 25%, run time is 13:27:23 (hh:mm:ss). run, we use a diagonal matrix with the prior variances to configure the covariance matrix of the proposal distribution. In the main run, we use the estimated posterior covariance matrix ˆ¯ V from the initial run to construct a proposal covariance matrix Σ with the scaling factor c = 0.2. We generate N = 110,000 draws, discarding the first 10,000. To estimate the DSGE model with PLC decision rules, we generate N = 55,000 draws from the posterior distribution of θ using the particle RWMH algorithm, discarding the first 5,000 draws. We use a scaling of c = 0.2 for the covariance matrix of the proposal distribution. The scale factor for the measurement error variances is set to ς = 0.001. The likelihood function is approximated using the COPF with M = 150 particles. We assume that, conditional on observing the nominal interest rate, it is known whether the ZLB is binding; see (50). The resulting acceptance rate of the particle RWMH algorithm is 25%, and the run time is 13.5 hours on a single core, which comes to about 0.9 seconds per draw. The parameter estimates are summarized in Table 2. The table reports lower and upper

This Version: February 10, 2020 37 endpointsofhighestposteriordensity(HPD)setsaswellasposteriormeans.8 Theparameter estimates are similar to the ones reported elsewhere in the literature for variants of the small-scale New Keynesian DSGE model. The estimated slope of the Phillips curve is κˆ = 0.12. The government spending shock is close to a unit-root process (ρˆ = 0.982), and g the estimated autoregressive parameter of the discount factor shock is ρˆ = 0.95. Thus, d innovations to these processes will have a long-lasting effect. One of the outputs of the estimation is the set of filtered values for the exogenous variables. We use these explicitly in our policy experiment and discuss them in the next section. 7.2 Fiscal Policy Analysis Therecentliteraturehasemphasizedthattheeffectsofexpansionaryfiscalpoliciesonoutput may be larger if the economy is at or near the ZLB. In the absence of the ZLB, a typical interest rate feedback rule implies that the central bank raises nominal interest rates in response to rising inflation and output caused by an increase in government spending. This monetary contraction raises the real interest rate, reduces private consumption, and overall dampens the stimulating effect of the fiscal expansion. If, however, the economy remains at the ZLB despite the expansionary fiscal policy, then the increase in inflation that results from the fiscal expansion reduces the real rate. In turn, current-period demand is stimulated, amplifying the positive effect on output. We will use our model to provide a quantitative assessment of this effect. ARRA Government Spending. Because our model solution is nonlinear, the effect of a fiscal intervention depends on the initial condition and the size of the intervention. We use the Great Recession and the subsequent period in the U.S. as our laboratory and consider a fiscal intervention that is calibrated to a portion of the ARRA of February 2009 as we explain below. Our analysis is conducted from an ex post perspective, where we extract the historical shocks that make our model match the realized U.S. data, which include both a fiscal and monetary intervention, and ask what would have happened if one or both of the policy interventions were not implemented. 8We compare these estimates with those obtained from a linearized model using the Kalman Filter and data that exclude the ZLB episode. The most noteworthy differences are in ρ and σ , both of which need d d to be larger when the ZLB episode is used in order to deliver large (negative) and persistent shocks that take the economy to the ZLB. We also find that rAnet and gamQnet estimates are somewhat smaller in the full sample, both of which are consistent with related results in the literature.

This Version: February 10, 2020 38 Figure 4: Ex Post Policy Analysis 4 2 Intervention Intervention No Intervention 1 No Intervention 2 0 0 -1 -2 -2 07Q4 08Q4 09Q4 10Q4 07Q4 08Q4 09Q4 10Q4 2 2 0 1 0 -1 -2 0 -2 -4 -1 -3 07Q4 08Q4 09Q4 10Q4 07Q4 08Q4 09Q4 10Q4 07Q4 08Q4 09Q4 10Q4 4 5 0 3 4 -2 2 3 2 1 -4 1 0 0 -6 -1 07Q4 08Q4 09Q4 10Q4 07Q4 08Q4 09Q4 10Q4 07Q4 08Q4 09Q4 10Q4 Notes: Theverticalredlinecorrespondsto2009:Q2,whichisthedateoftheARRAintervention. Intervention (black) versus no-intervention paths (blue). Along the no-intervention path, we set monetary policy shocks to zero and lower the innovation to the government spending shock in 2009:Q2 by the size of the ARRA intervention. Red dashed lines represent paths in the absence of exogenous shock innovations from 2009:Q2 onwards. The level processes in the second row of the figure are standardized by the unconditional standard deviations of the corresponding AR(1) processes. Inflation and the interest rate are expressed in terms of annualized percentage rates. ARRA of February 2009 consisted of a combination of tax cuts and benefits; entitlement programs; and funding for federal contracts, grants, and loans. We focus on the third component, becauseitcanbe interpretedasanincreaseing . WemodeltheARRAspending t as a one-period positive shock of δARRA to the demand shock process, where we calibrated δARRA = 0.0077 using data on the disbursement of ARRA funds, as we explain in the Online Appendix. This one-time shock is roughly 3.2σ , and, since gˆ is highly serially correlated, g t the effect of the shock will slowly decay over time. We assume that the ARRA innovation to government spending took place in the second quarter of 2009. Ex Post Policy Analysis. We use the COPF to obtain estimates of the exogenous shock

This Version: February 10, 2020 39 processes for the period 2009:Q2 through 2011:Q1. The subsequent results are based on the maximum posterior probability (MAP) estimator of the DSGE model parameters. The panels in the first two rows of Figure 4 show the filtered monetary policy and government spending innovations and the levels of the government spending, technology growth, and discount factor shock processes. Recall that, in the model (cid:15) and (cid:15) are N(0,1) ran- R,t g,t dom variables. The level processes in the second row of the figure are standardized by the unconditional standard deviations of the corresponding AR(1) processes. For the government spending and the monetary policy shocks, we distinguish between intervention and no-intervention paths. According to the estimated model, the drop in output during the Great Recession is generated by drastic falls in the technology growth and discount factor shock processes. Because of the stylized structure of the DSGE model, these two shocks also absorb the contribution of financial shocks and financial accelerator effects. While the technology shock is not very persistent (ρˆ = 0.32) and reverts back to zero by the end of 2009, the mean reversion of the z discount factor shock is very slow, and it remained below 2 standard deviations until the end of 2010. Meanwhile the government spending process is positive, indicating that fiscal policy startedtobecomeexpansionary(relativetothehistoricalaverage)in2008. Thefilteredmonetary policy innovations (cid:15)ˆ turn out to be negative past 2009:Q2, which captures an effort R,t|t by the Federal Reserve to keep the policy rate lower than what the policy rule implies. This is how our model that abstracts from explicitly modeling unconventional monetary policies implemented in this period (quantitative easing, forward guidance) handles the existence of these policies in the data. Because the actual path of the government spending shock already contains the effect of fiscal expansion due to ARRA, we compute the counterfactual path by subtracting the effect of ARRA from the filtered demand shock gˆ using t|t gˆC = gˆ −ρt−T∗δARRA for t = T ,T +1,...,T +7, (51) t|t t|t g ∗ ∗ ∗ where T corresponds to 2009:Q2, the period the ARRA intervention is implemented in. The ∗ magnitude of the ARRA intervention is reflected in the difference between the intervention (black) and no-intervention (blue) government spending innovation depicted in the top left panel of Figure 4. The ARRA intervention shifts gˆ persistently downward as shown in the t center left panel of the figure. To measure the effect of the combined fiscal and monetary policy, we set the counterfactual monetary policy shocks to zero.

This Version: February 10, 2020 40 Table 3: Ex Post Multipliers Intervention 1Q 4Q 8Q Fiscal Only 0.79 0.76 0.73 Fiscal + Monetary 0.76 0.81 0.86 Notes: Ex post analysis is conditional on filtered 2009:Q2 - 2011:Q1 shocks. The pure fiscal multipliers are obtainedbysettingthemonetarypolicyshockstozero,whereasthecombinedfiscalandmonetarymultipliers are based on leaving the monetary policy shocks at their filtered values. The main finding is depicted in the bottom panels. The ex post effect of the intervention is defined as Xo −XC, where Xo is the observed value of a generic variable and XC is the counterfactual path along which the policy intervention is removed.9 In the figure, the ex post effect is given by the gap between the no intervention and the intervention path. In the absence of the ARRA and monetary interventions, output and inflation would have been persistently lower than they actually were. In particular, annualized inflation would have been 40 basis points lower, and annualized output growth would have been 28 basis points lower, on average, over these eight quarters. Based on the output and government spending paths, we can also compute cumulative dollar-for-dollar multipliers (cid:80)H (Yo −YC ) µ = τ=1 T∗−1+τ T∗−1+τ where H = 1,...,8, H (cid:80)H (G0 −GC ) τ=1 T∗−1+τ T∗−1+τ which are reported in Table 3. The pure fiscal multipliers are obtained by setting the monetary policy shocks to zero, whereas the combined fiscal and monetary multipliers are based on leaving the monetary policy shocks at their filtered values. The ex post multipliers are around 0.8 according to our estimated model. We started out this section providing an explanation for why fiscal multipliers tend to be higher when the economy is at the ZLB. If wecomputefiscalmultipliersconditionalontheeconomybeinginastateinwhichthecentral bank would respond to rising output and inflation with an increase in interest rate, then the fiscal multipliers would be around 0.65, which is indeed lower than our ex post multiplier. However, the presence of the ZLB only generates a modest increase in the multiplier. The difference between the two types of multipliers reported in Table 3 is small, because as the no-intervention path of the nominal interest rate indicates, the adverse discount factor 9Details on the algorithm to compute the effects of the policy interventions are reported in the Online Appendix.

This Version: February 10, 2020 41 shock kept the economy close to the ZLB, leaving very little room for conventional monetary policy interventions ex post. The red dashed lines in the second-row panels of Figure 4 represent the post-2009:Q1 path of the exogenous shock processes in the absence of further innovations, which is the expectedpathconditionalon2009:Q1innovation. Subsequenttechnologyanddiscountfactor innovations had only small effects on the path of the respective exogenous processes, keeping them roughly in line with expectations. Thus, the low level of the discount factor shock also implied very low interest rates from an ex ante perspective, leaving little scope for the Fed to boost the effects of a fiscal expansion through a zero-interest-rate policy. 8 Conclusion Likelihood-basedestimationofnonlinearDSGEmodelsiscomputationallychallenging. While it is becoming easier for economists to access powerful computer clusters that enable massive parallel computation, the ability to solve and estimate models on a desktop computer remains useful and desirable. Computations can often be simplified and accelerated considerably by taking shortcuts in regard to model solution or estimation techniques. The goal of this paper has been to develop a new solution method that captures important aspects of the nonlinearity generated by occasionally binding constraints and, at the same time, allows for efficient filtering and likelihood-based estimation. The piecewise-linearity of the decision rules allows us to solve the model faster and to derive a conditionally optimal proposal distribution for a particle filter. This filter delivers a much more accurate likelihood approximation than a standard bootstrap particle filter and enables us to estimate a nonlinear New Keynesian DSGE model with a ZLB constraint in a relatively short amount of time on a single core processor. References Adam, K. and R. Billi (2007): “Discretionary Monetary Policy at the Zero Lower Bound on Nominal Interest Rates,” Journal of Monetary Economics, 54, 728–752. An, S. and F. Schorfheide (2007): “Bayesian Analysis of DSGE Models,” Econometric Reviews, 26, 113–172.

This Version: February 10, 2020 42 Andrieu, C., A. Doucet, and R. Holenstein (2010): “Particle Markov Chain Monte Carlo Methods,” Journal of the Royal Statistical Society Series B, 72, 269–342. Arulampalam, S., S. Maskell, N. Gordon, and T. Clapp (2002): “A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking,” IEEE Transactions on Signal Processing, 50, 174–188. Aruoba, S. B., P. Cuba-Borda, and F. Schorfheide (2018): “Macroeconomic Dynamics Near the ZLB: A Tale of Two Countries,” Review of Economic Studies, 85, 87–118. Aruoba, S. B. and F. Schorfheide (2016): “Inflation Dynamics During and After The Zero Lower Bound,” in Inflation Dynamics and Monetary Policy, 2015 Jackson Hole Symposium Volume Published by the Federal Reserve Bank of Kansas City. Atkinson, T., A. Richter, and N. A. Throckmorton (2020): “The Zero Lower Bound and Estimation Accuracy,” Journal of Monetary Economics, forthcoming. Batini, N., L. Eyraud, L. Forni, and A. Weber (2014): “Fiscal Multipliers; Size, Determinants, andUseinMacroeconomicProjections,” IMFTechnicalNotesandManuals 14/04, International Monetary Fund. Benigno, G., A. Foerster, C. Otrok, and A. Rebucci (2016): “Estimating Macroeconomic Models of Financial Crisis: An Endogenous Regime Switching Approach,” Manuscript, University of Missouri. Bianchi, F. and L. Melosi (2017): “Escaping the Great Recession,” American Economic Review, 107, 1030–58. Boehl, G. (2019): “Efficient Solution, Filtering and Estimation of Models with OBCs,” Working Paper. Cappe´, O., S. J. Godsill, and E. Moulines (2007): “An Overview of Existing Methods and Recent Advances in Sequential Monte Carlo,” Proceedings of the IEEE, 95, 899–924. Cappe´, O., E. Moulines, and T. Ryden (2005): Inference in Hidden Markov Models, Springer Verlag. Chen, H. (2017): “The effects of the near-zero interest rate policy in a regime-switching dynamic stochastic general equilibrium model,” Journal of Monetary Economics, 90, 176 – 192.

This Version: February 10, 2020 43 Christiano, L. J. and M. Eichenbaum (2012): “Notes on Linear Approximations, Equilibrium Multiplicity and E-learnability in the Analysis of the Zero Lower Bound,” Manuscript, Northwestern University. Christiano, L. J., M. Eichenbaum, and M. Trabandt (2015): “Understanding the Great Recession,” American Economic Journal: Macroeconomics, 7, 110–67. Christiano, L. J. and J. D. Fisher (2000): “Algorithms for Solving Dynamic Models with Occationally Binding Constraints,” Journal of Economic Dynamics & Control, 24, 1179–1232. Coleman, C., S. Lyon, L. Maliar, and S. Maliar (2018): “Matlab, Python, Julia: What to Choose in Economics?” CEPR Discussion Papers 13210, C.E.P.R. Discussion Papers. Creal, D. (2012): “A Survey of Sequential Monte Carlo Methods for Economics and Finance,” Econometric Reviews, 31, 245–296. Cuba-Borda, P., L. Guerrieri, M. Iacoviello, and M. Zhong (2019): “Likelihood Evaluation of Models with Occasionally Binding Constraints,” Journal of Applied Econometrics, 1–13. de Groot, O., B. Durdu, and E. Mendoza (2019): “Global v. Local Methods in the Analysis of Open-Economy Models with Incomplete Markets,” Tech. rep., National Bureau of Economic Research. Del Moral, P. (2013): Mean Field Simulation for Monte Carlo Integration, Chapman & Hall/CRC. Doucet, A. and A. M. Johansen (2011): “A Tutorial on Particle Filtering and Smoothing: Fifteen Years Later,” in Handook of Nonlinear Filtering, ed. by D. Crisan and B. Rozovsky, Oxford University Press. Eggertsson, G. B. (2011): “What fiscal policy is effective at zero interest rates?” in NBER Macroeconomics Annual 2010, ed. by D. Acemoglu and M. Woodford, University of Chicago Press, vol. 25, 59–112. Eggertsson, G. B. and M. Woodford (2003): “The Zero Bound on Interest Rates and Optimal Monetary Policy,” Brookings Papers on Economic Activity, 34, 139–235.

This Version: February 10, 2020 44 Fair, R. C. and J. B. Taylor (1983): “Solution and Maximum Likelihood Estimation of Dynamic Nonlinear Rational Expectations Models,” Econometrica, 51, 1169–1185. Farmer, R. E., D. F. Waggoner, and T. Zha(2011): “Minimalstatevariablesolutions to Markov-switching rational expectations models,” Journal of Economic Dynamics and Control, 35, 2150 – 2166. Ferna´ndez-Villaverde, J., G. Gordon, P. Guerro´n-Quintana, and J. F. Rubio- Ram´ırez (2015): “Nonlinear Adventures at the Zero Lower Bound,” Journal of Economic Dynamics and Control, 57, 182 – 204. Ferna´ndez-Villaverde, J. and J. F. Rubio-Ram´ırez (2007): “Estimating Macroeconomic Models: A Likelihood Approach,” Review of Economic Studies, 74, 1059–1087. Gordon, N. and D. Salmond (1993): “A Novel Approach to Nonlinear/Non-Gaussian Bayesian State Estimation,” IEEE Proceedings-F, 140, 107–113. Guerrieri, L. and M. Iacoviello (2015): “OccBin: A Toolkit for Solving Dynamic Models with Occasionally Binding Constraints Easily,” Journal of Monetary Economics, 70, 22–38. ——— (2017): “Collateral constraints and macroeconomic asymmetries,” Journal of Monetary Economics, 90, 28 – 49. Gust, C., E. Herbst, D. Lopez-Salido, and M. E. Smith (2017): “The Empirical Implications of the Interest-Rate Lower Bound,” American Economic Review, 107, 1971– 2006. Heiss, F. and V. Winschel (2008): “Likelihood approximation by numerical integration on sparse grids,” Journal of Econometrics, 144, 62 – 80. Herbst, E. and F. Schorfheide (2015): Bayesian Estimation of DSGE Models, Princeton University Press. Hills, T. S. and T. Nakata (2018): “Fiscal Multipliers at the Zero Lower Bound: The Role of Policy Inertia,” Journal of Money, Credit and Banking, 50, 155–172. Holden, T. D. (2019): “Existence and uniqueness of solutions to dynamic models with occasionally binding constraints,” Manuscript, Deutsche Bundesbank.

This Version: February 10, 2020 45 Judd, K. L., L. Maliar, and S. Maliar (2010): “A Cluster-Grid Projection Method: Solving Problems with High Dimensionality,” NBER Working Paper, 15965. Judd, K. L., L. Maliar, S. Maliar, and R. Valero (2014): “Smolyak method for solving dynamic economic models: Lagrange interpolation, anisotropic grid and adaptive domain,” Journal of Economic Dynamics and Control, 44, 92–123. Kantas, N., A. Doucet, S. Singh, J. Maciejowski, and N. Chopin (2014): “On ParticleMethodsforParameterEstimationinState-SpaceModels,” arXiv Working Paper, 1412.8659v1. Krueger, D. and F. Kubler (2004): “Computing Equilibrium in OLG Models with Stochastic Production,” Journal of Economic Dynamics and Control, 28, 1411–1436. Kulish, M., J. Morley, and T. Robinson (2017): “Estimating DSGE Models with Zero Interest Rate Policy,” Journal of Monetary Economics, 88, 35–49. Liu, J. S. (2001): Monte Carlo Strategies in Scientific Computing, Springer Verlag. Maliar, L. and S. Maliar (2014): “Numerical Methods for Large-Scale Dynamic Economic Models,” in Handbook of Computational Economics Vol. 3, ed. by K. Schmedders and K. L. Judd, Elsevier, vol. 3, 325–477. ——— (2015): “Merging Simulation and Projection Approaches to Solve High-Dimensional Problems with an Application to a New Keynesian Model,” Quantitative Economics, 6, 1–47. Malin, B. A., D. Krueger, and F. Kubler (2011): “Solving the multi-country real businesscyclemodelusingaSmolyak-collocationmethod,”Journalof EconomicDynamics and Control, 35, 229–239. Mendoza, E. G. and S. Villalvazo (2019): “FiPIt: A Simple, Fast Global Method for Solving Models with Two Endogenous States and Occasionally Binding Constraints,” Manuscript, University of Pennsylvania. Nakata, T. (2016): “Optimal fiscal and monetary policy with occasionally binding zero bound constraints,” Journal of Economic Dynamics and Control, 73, 220 – 240. R´ıos-Rull, J.-V., F. Schorfheide, C. Fuentes-Albero, M. Kryshko, and R. Santaeulalia-Llopis (2012): “Methods versus Substance: Measuring the Effects of Technology Shocks,” Journal of Monetary Economics, 59, 826–846.

This Version: February 10, 2020 46 Woodford, M. (2003): Interest and Prices, Princeton University Press.

This Version: February 10, 2020 A-1 Online Appendix to “Piecewise-Linear Approximations and Filtering for DSGE Models with Occasionally Binding Constraints” S. Borag˘an Aruoba, Pablo Cuba-Borda, Kenji Higa-Flores, Frank Schorfheide, and Sergio Villalvazo This Appendix consists of the following sections: A. Equilibrium Conditions for the Model of Section 2 B. Proofs and Derivations for Section 5 C. Additional Details for the Empirical Application

This Version: February 10, 2020 A-2 A Equilibrium Conditions for the Model of Section 2 In this section we sketch the derivation of the equilibrium conditions presented in Section 2. A.1 Households The representative household solves (cid:34) (cid:32) (cid:33)(cid:35) (cid:88) ∞ (C /A )1−τ −1 H1+1/η (cid:18) M (cid:19) max E βsd t+s t+s −χ t+s +χ V t+s , t t+s H M {Ct+s,Ht+s,Bt+s,Mt+s} 1−τ 1+1/η P t+s A t+s s=0 subject to: P C +T +B +M = P W H +M +R B +P D +P SC , t t t t t t t t t−1 t−1 t−1 t t t t Consumption and bond holdings. Let βsd λ be the Lagrange multiplier on the t+s t+s household budget constraint. Then the first-order condition with respect to consumption and bond holdings are given by: (cid:18) C (cid:19)−τ 1 t P λ = t t A A t t d t+1 λ = β R λ . t t t+1 d t Combining the two equation of the bond holding first order condition, leads to the consumption Euler equation: (cid:34) (cid:35) d (cid:18) C /A (cid:19)−τ 1 R 1 = βE t+1 t+1 t+1 t , t d C /A γz π t t t t+1 t+1 where γz = A /A . We define the stochastic discount factor as: t+1 t+1 t d (cid:18) C /A (cid:19)−τ 1 t+1 t+1 t+1 Q = . t+1|t d C /A γz t t t t+1 Labor-Leisure Choice. Takingfirst-orderconditionswithrespecttoH yieldsthestandard t

This Version: February 10, 2020 A-3 intratemporal optimality condition for the allocation of labor W (cid:18) C (cid:19)τ t = χ t H1/η. A H A t t t A.2 Intermediate Goods Firms Each intermediate good producer buys labor services H (j) at the real wage W . Firms face t t nominal rigidities in terms of price adjustment costs. The adjustment cost, expressed as a (cid:16) (cid:17) fraction of firms’ real output, is given by the function Φ Pt(j) . We assume that the p Pt−1(j) adjustment cost function is twice-continously differentiable, weakly increasing and weakly convex, Φ(cid:48) ≥ 0 and Φ(cid:48)(cid:48) ≥ 0. The firm maximizes expected discounted real profits with p p respect to H (j) and P (j): t t ∞ (cid:18) (cid:18) (cid:19) (cid:19) (cid:88) P (j) P (j) E βsQ t+s A H (j)−Φ t+s A H (j)−W H (j) , t t+s|t t+s t+s p t+s t+s t+s t+s P P (j) t+s t+s−1 s=0 subject to (cid:18) P (j) (cid:19)−1/ν t A H (j) = Y . t t t P t We use µ βsQ to denote the Lagrange multiplier associated with this constraint. In t+s t+s|t equilibrium, the firms use the households’ stochastic discount factor to discount future profits. Price setting decision. Setting Q = 1, the first-order condition with respect to P (j) is t|t t given by: A H (j) (cid:18) P (j) (cid:19) A H (j) µ (cid:18) P (j) (cid:19)−1/ν−1 Y 0 = t t −Φ(cid:48) t t t − t t t P p P (j) P (j) ν P P t t−1 t−1 t t (cid:20) (cid:18) (cid:19) (cid:21) P (j) P (j) +βE Q Φ(cid:48) t+1 A H (j) t+1 . t t+1|t p P (j) t+1 t+1 P2(j) t t Firms’ labor demand. Taking first-order conditions with respect to H (j) yields t (cid:18) (cid:19) P (j) P (j) t t W = A −Φ A −µ A . t t p t t t P P (j) t t−1 Symmetric equilibrium. We restrict attention to a symmetric equilibrium where all firms choose the same price P (j) = P ∀j. This assumption implies that in equilibrium all firms t t

This Version: February 10, 2020 A-4 face identical marginal costs and demand the same amount of labor input. Combining the firms’ price setting and labor demand first order conditions and assuming that the price adjustment costs are quadratic, i.e., (cid:18) P (j) (cid:19) φ (cid:18) P (j) (cid:19)2 t t Φ = −π¯ , p P (j) 2 P (j) t−1 t−1 we obtain: (cid:18) C (cid:19)τ φ (cid:18) P (cid:19)2 (1−ν)−χ t H1/η − t −π¯ + H A t 2 P t t−1 (cid:18) (cid:19) (cid:20) (cid:18) (cid:19) (cid:21) P P P P Y νφ t −π¯ t = νβE Q t+1 Φ(cid:48) t+1 t+1 . P P t t+1|t P p P Y t−1 t−1 t t t A.3 Equilibrium Conditions Resource constraint. The derivation of the aggregate resource constraint is straightforward. In equilibrium real profits by intermediate producers is given by: D = Y −Φ (π )Y −W H t t p t t t t Substituting this into the household budget constraint we obtain: (cid:20) (cid:21) T M B M R B t t t t−1 t−1 t−1 C + + + − − = W H +Y −Φ (π )Y −W H t t t t p t t t t P P P P P t t t t t From the government budget constraint in (7) we can see that the term in square brackets corresponds to real government expenditure G . Simplifying yields: t C +G = [1−Φ (π )]Y t t p t t The technology process introduces a long-run trend in the variables of the model. To make the model stationary we use the following transformations: y = Y /A , c = C /A , t t t t t t and note that Y /Y = yt γz . We also define the gross inflation rate π = P /P . The t t−1 yt−1 t t t t−1 equilibrium conditions shown in Section 2.2 of the main text follow immediately.

This Version: February 10, 2020 A-5 Steady States and Reparameterizations. Let τ(1−ν) 1 r = γ/β, φ = , b = . νπ κ 2ν ∗ The steady states are given by (cid:20) 1−ν +φν(1−β)π (π −π¯)−0.5φ∗(π −π¯)2(cid:21)(1/(τ+1/η)) ∗ ∗ ∗ c = ∗ χ ((1/(g ))−0.5φ(π −π¯)2)−1/η H ∗ ∗ c ∗ y = ∗ 1/g −0.5φ(π −π¯)2 ∗ ∗ R = π r ∗ ∗

This Version: February 10, 2020 A-6 B Derivations for Section 5 We provide a proof of Proposition 1 which contains the formulas for the terms λ , η¯ (·), t 1,1 ¯ ¯ Ω (·), η¯ (·), and Ω (·) that appear in the proposition. Throughout this section we set 11 2|1,t 2|1 theinterceptinthemeasurementequationA = 0andwedropthesubscriptfromthematrix 0 A . s Proof of Proposition 1. Conditional on s the current state s is determined by η . In t−1 t t order to derive g∗(s˜|sj ) (we are omitting θ from the conditioning set), we will work in the t t t−1 (η ,s ) space and derive (also omitting tildes and j superscripts) the conditionally optimal t t−1 proposal distribution g∗(η |y ,s ) = p(η |y ,s ) ∝ p(y |η ,s )p(η ) t t t t−1 t t t−1 t t t−1 t and the incremental particle weights (cid:90) ω˜j = p(y |s ) = p(y |η ,s )p(η )dη . t t t−1 t t t−1 t t Define yˆ (·) = A(Φ (·)+Φ (·)s )+AΦ (·)η , ν (·) = y −yˆ (·). t|t−1 0 1 t−1 η t t t t|t−1 We will denote the density of a N(µ,Σ) random variable Y by p (y;µ,Σ). Using this N notation, we write p(y |η ,s )p(η ) (A.1) t t t−1 t = p (cid:0) y ;yˆ (n), ςΣ (cid:1) p (η ;0,1)p (η ;0,I)I{η ≤ ζ(s )} N t t|t−1 u N 1,t N 2,t 1,t t−1 +p (cid:0) y ;yˆ (b), ςΣ (cid:1) p (η ;0,1)p (η ;0,I)I{η > ζ(s )} N t t|t−1 u N 1,t N 2,t 1,t t−1 = I +II.

This Version: February 10, 2020 A-7 We will begin by manipulating term I(n). Omitting the (n) arguments we obtain: I = I{η ≤ ζ ˜ (s )}(2π)−ny/2|ςΣ |−1/2(2π)−nη/2 1,t t−1 u (cid:26) (cid:27) (cid:26) (cid:27) 1 1 (cid:0) (cid:1) ×exp − (ν −AΦ η )(cid:48)(ςΣ )−1(ν −AΦ η ) exp − η2 +η(cid:48) η 2 t η t u t η t 2 1,t 2,t 2,t (cid:26) (cid:27) 1 = I{η ≤ ζ ˜ (s )}(2π)−ny/2|ςΣ |−1/2(2π)−nη/2exp − ν(cid:48)Σ−1ν 1,t t−1 u 2 t u t (cid:26) (cid:27) (cid:26) (cid:27) 1 1 (cid:0) (cid:1) ×exp − η(cid:48)Φ(cid:48)A(cid:48)(ςΣ )−1AΦ η +ν(cid:48)(ςΣ )−1AΦ η exp − η2 +η(cid:48) η . 2 t η u η t t u η t 2 1,t 2,t 2,t Note that term I takes the form of a product between “likelihood function” and “prior.” The prior covariance matrix of η is Ω = I, and the negative Hessian and the maximum of t the “log-likelihood” function are Ω ˆ−1 = Φ(cid:48)A(cid:48)(ςΣ )−1AΦ , ηˆ = Ω ˆ Φ(cid:48)A(cid:48)(ςΣ )−1ν . (A.2) η u η t η u t With this notation we can write (cid:26) (cid:27) 1 I = I{η ≤ ζ(s )}(2π)−ny/2|ςΣ |−1/2(2π)−nη/2|Ω|−1/2exp − ν(cid:48)Σ−1ν 1,t t−1 u 2 t u t (cid:26) (cid:27) (cid:26) (cid:27) 1 1 ×exp − (cid:0) η(cid:48)Ω ˆ−1η −2ηˆ(cid:48)Ω ˆ−1η (cid:1) exp − η(cid:48)Ωη . 2 t t t t 2 t t Now define the quasi posterior mean and covariance matrices for η |(y ,s ) t t t−1 Ω ¯ = (cid:0) Ω−1 +Ω ˆ−1 (cid:1)−1 , η¯ = Ω ¯ Ω ˆ−1ηˆ. (A.3) t t This leads to (cid:26) (cid:27) (cid:26) (cid:27) 1 1 I = (2π)−ny/2|ςΣ |−1/2|Ω|−1/2|Ω ¯ |1/2exp − ν(cid:48)Σ−1ν exp η¯(cid:48)Ω ¯−1η¯ u 2 t u t 2 t t (cid:26) (cid:27) 1 ×I{η ≤ ζ(s )}(2π)−nη/2|Ω ¯ |−1/2exp − (η −η¯)(cid:48)Ω ¯−1(η −η¯) . 1,t t−1 t t t t 2 We now decompose the kernel of the “posterior” of η |(y ,s ) in the second line of the t t t−1 preceding equation into a conditional and a marginal distribution. We use the “1” subscript to indicate the marginal posterior of η and the “2|1” subscript to indicate the conditional 1

This Version: February 10, 2020 A-8 mean and variance associated with the posterior of η given η : 2,t 1,t η¯ (η ) = η¯ +Ω ¯ Ω ¯−1(η −η¯ ), Ω ¯ = Ω ¯ −Ω ¯ Ω ¯−1Ω ¯ (A.4) 2|1,t 1,t 2,t 21 11 1,t 1,t 2|1,t 22 21 11 12 Thus, (cid:26) (cid:27) (cid:26) (cid:27) 1 1 I = (2π)−ny/2|ςΣ |−1/2(2π)−nη/2|Ω|−1/2|Ω ¯ |1/2exp − ν(cid:48)Σ−1ν exp η¯(cid:48)Ω ¯−1η¯ u 2 t u t 2 t t (cid:26) (cid:27) 1 ×(2π)−(nη−1)/2|Ω ¯ |−1/2exp − (η −η¯ )(cid:48)Ω ¯−1(η −η¯ ) (A.5) 2|1 2 2,t 2|1,t 2|1 2,t 2|1,t (cid:26) (cid:27) 1 ×I{η ≤ ζ(s )}(2π)−1/2|Ω ¯ |−1/2exp − (η −η¯ )(cid:48)Ω ¯−1(η −η¯ ) . 1,t t−1 11 2 1,t 1,t 11 1,t 1,t This is the final form for term I in (A.1). Integrating I in (A.5) with respect to (η ,η ) and re-introducing the (n) arguments 1,t 2,t yields (cid:90) (cid:90) D(n) = I(η ,η )dη dη (A.6) 1,t 2,t 2,t 1,t (cid:113) = (2π)−ny/2|ςΣ |−1/2|Ω|−1/2|Ω ¯ (n)|1/2Φ (cid:0) (ζ(s )−η¯ (n)/ Ω ¯ (n) (cid:1) u N t−1 1,t 11 (cid:26) (cid:27) (cid:26) (cid:27) 1 1 ×exp − ν (n)(cid:48)Σ−1ν (n) exp η¯(cid:48)(n)Ω ¯−1(n)η¯(n) . 2 t u t 2 t t The analysis of term II proceeds in almost identical manner, with the understanding that term I depends on Φ (n), Φ (n), and Φ (n), whereas term II depends on Φ (b), Φ (b), and 0 1 η 0 1 ˆ ¯ Φ (b). As a consequence the posterior coefficient matrices ηˆ, Ω, η¯, and Ω should also be η indexed by either (n) or (b). Because for term II the inequality in the indicator function is reversed, we obtain (cid:18) (cid:18) (cid:113) (cid:19)(cid:19) D(b) = (2π)−ny/2|ςΣ |−1/2|Ω|−1/2|Ω ¯ (b)|1/2 1−Φ (ζ(s )−η¯ (b))/ Ω ¯ (b) u N t−1 1,t 11 (cid:26) (cid:27) (cid:26) (cid:27) 1 1 ×exp − ν (n)(cid:48)Σ−1ν (n) exp η¯(cid:48)(b)Ω ¯−1(b)η¯(b) (A.7) 2 t u t 2 t t Using the formulas for I, II, D(n), and D(b), we can write the posterior density of η as t follows: p(y |η ,s )p(s ) I(n)+II(b) t t t−1 t−1 p(η |y ,s ) = = . (A.8) t t t−1 (cid:82) p(y |η ,s )p(s )dη D(n)+D(b) t t t−1 t−1 t

This Version: February 10, 2020 A-9 Thus, the resulting conditionally optimal proposal is given by the following mixture. Define D(n) λ = . (A.9) D(n)+D(b) Then, with probability λ η ∼ N(η¯ (n),Ω ¯ (n)I{η ≤ ζ(s )}, η |η ∼ N(η¯ (n,η ),Ω ¯ (n)) (A.10) 1,t 1,t 11 1,t t−1 2,t 1,t 2|1,t 1,t 2|1 and with probability 1−λ η ∼ N (cid:0) η¯ (b),Ω ¯ (b) (cid:1)I{η > ζ(s )}, η |η ∼ N (cid:0) η¯ (b,η ),Ω ¯ (b) (cid:1) . (A.11) 1,t 1,t 11 1,t t−1 2,t 1,t 2|1,t 1,t 2|1 The incremental weight is constant and given by the following formula: ω˜j = p(y |sj ) = D(n)+D(b). (A.12) t t t−1 This completes the proof of the proposition. (cid:4) In the remainder of this section we consider to special cases: (i) y identifies the regime t without error. This is the case, for instance, for a DSGE model with ZLB constraint if the interest rate is observed without error, at least when it hits the ZLB. (ii) Measurement errors that are zero or very close to zero. (i) Known Regime. Let y = [y(cid:48) ,y ] and partition A(cid:48) = [A(cid:48),A(cid:48)] so that the partitions t 1,t 2,t 1 2 of A conform with the partitions of y . Assume that the n-regime is active if and only if t y > c. In the b-regime y = c. Moreover, let δ(y ;c) denote the Dirac delta function with 2,t 2,t 2t (cid:82) the property that δ(y ;c) = 0 for y (cid:54)= c and δ(y ;c)dy = 1. Using the above notation, 2t 2t 2t 2t we can rewrite (A.1) as p(y |η ,s )p(η ) t t t−1 t = p (cid:0) y ;yˆ (n), ςΣ (cid:1) p (η ;0,1)p (η ;0,I)I{η ≤ ζ(s )} N t t|t−1 u N 1,t N 2,t 1,t t−1 +p (cid:0) y ;yˆ (b), ςΣ (cid:1) δ(y ;c)p (η ;0,1)p (η ;0,I)I{η > ζ(s )} N 1t 1t|t−1 u,11 2t N 1,t N 2,t 1,t t−1 = I +II The formula for D(n) in (A.6) remains unchanged. The formula for D(b) in (A.7) changes to ˜ D(b) = δ(y ;c)D(b) 2t

This Version: February 10, 2020 A-10 with the understanding that Σ in D(b) needs to be replaced by Σ . After having observed u u,11 y we know whether y = c so that we can define t 2,t λ = I{y > c}. 2,t Conditional on λ, we can simulate [η ,η(cid:48) ](cid:48) from (A.10) and (A.11), respectively. Finally, 1,t 2,t (cid:40) D(n) if y > c ω˜j = p(y |sj ) = 2,t , t t t−1 D(b) if y = c 2,t where here D(b) corresponds to (A.7) and does not include the Dirac function δ(y ;c). 2t (ii) Zero Measurement Errors. Consider a linear state-space model without regimes: y = As +u , s = Φ +Φ s +Φ η . t t t t 0 1 t−1 η t Ignoring the censoring and dropping the regime indicator, note that D = p(y |s ). We can t t−1 write y = A(Φ +Φ s )+AΦ η +u . t 0 1 t−1 η t t Note that (cid:0) (cid:1) AΦ η +u ∼ N 0,AΦ Φ(cid:48)A(cid:48) +ςΣ . η t t η η u Thus, we can deduce that the term D(n) in (A.6) can be rewritten as D(n) = (2π)−ny/2|AΦ Φ(cid:48)A(cid:48) +ςΣ |−1/2 η η u (cid:26) (cid:27) 1 × exp − (y −A(Φ +Φ s ))(cid:48) (cid:2) AΦ Φ(cid:48)A(cid:48) +ςΣ (cid:3)−1 (y −A(Φ +Φ s )) 2 t 0 1 t−1 η η u t 0 1 t−1 (cid:0) (cid:112) ¯ (cid:1) × Φ (ζ(s )−η¯ )/ Ω N t−1 1 11 A similar adjustment can be made to the term D(b) in (A.6). The advantage of this alternative expression is that we can take the limit ς −→ 0. The argument of the Gaussian cdf

This Version: February 10, 2020 A-11 behaves as follows: η¯ = Ω ¯ Ω ˆ−1ηˆ t = (cid:0) Ω−1 +Ω ˆ−1 (cid:1)−1 Φ(cid:48)A(cid:48)(ςΣ )−1ν η u t = ς−1 (cid:0) Ω−1 +ς−1Φ(cid:48)A(cid:48)Σ−1AΦ (cid:1)−1 Φ(cid:48)A(cid:48)Σ−1ν η u η η u t = (cid:0) ςΩ−1 +Φ(cid:48)A(cid:48)Σ−1AΦ (cid:1)−1 Φ(cid:48)A(cid:48)Σ−1ν , η u η η u t −→ (cid:0) Φ(cid:48)A(cid:48)Σ−1AΦ (cid:1)−1 Φ(cid:48)A(cid:48)Σ−1ν η u η η u t which eliminates divisions by ς. Moreover, Ω ¯ = ς (cid:0) ςΩ−1 +Φ(cid:48)A(cid:48)Σ−1AΦ (cid:1)−1 −→ 0. η u η Thus, (cid:40) (cid:0) (cid:112) ¯ (cid:1) 1 if ζ(s t−1 )−η¯ 1 ≥ 0 lim Φ (ζ(s )−η¯ )/ Ω = N t−1 1 11 ς−→0 0 otherwise Because the posterior covariances matrices are zero in the limit, the sampling in (A.10) and (A.11) is replaced by setting η and η equal to their means. 1,t 2,t

This Version: February 10, 2020 A-12 C Additional Details for the Empirical Application Prior. TableA-1summarizesthepriordistributionfortheinitialstatesusedintheempirical analysis. Table A-1: Prior Distributions for Initial States Parameter Density P(1) P(2) (cid:15) N 0.00 .002 R,0 gˆ N 0.00 .012 0 zˆ N 0.00 .008 0 ˆ d N 0.00 .170 0 πˆ N 0.00 .030 0 cˆ N 0.00 .030 0 ˆ R +0.01 G 0.01 .008 0 Notes: N isNormaldistribution;G isGammadistribution. P(1)andP(2)aremeanandstandarddeviations for Normal and Gamma distributions. We set yˆ =cˆ +gˆ and yˆ =yˆ 0 0 0 −1 0 Calibration of ARRA. Table A-2 summarizes the award and disbursements of funds for federal contracts, grants, and loans. We translate the numbers in the table into a one-period location shift of the distribution of (cid:15) below. g,t Table A-2: ARRA Funds for Contracts, Grant, and Loans Awarded Received Nominal GDP 2009:2 158 36 3488 2009:3 17 18 3533 2009:4 26 8 3568 2010:1 16 24 3603 2010:2 33 26 3644 2010:3 9 21 3684 2010:4 4 19 3704 2011:1 4 20 3751 2011:2 8 17 3791 2011:3 0 12 3830 2011:4 3 9 3870 2012:1 0 8 3899 Notes: Data were obtained from www.recovery.gov.

This Version: February 10, 2020 A-13 Note that a one-time innovation (cid:15)ARRA generates a response g,t gˆARRA = ρhσ (cid:15)ARRA. t+h g g g,t We use the log-linear approximation 1 ζ ˆARRA = gˆARRA, t g −1 t ∗ where ζ = G /Y . We connect ζ ˆARRA to the data in Table A-2 using the relationship t t t t (cid:18) GARRA/Y (cid:19) ζ ˆARRA = log t t . t G /Y ∗ ∗ Figure A-2 compares the time path of gˆARRA constructed from the impulse response to a t (cid:15)ARRA = 0.0077 (the red solid line) and the time path constructed from the disbursements g,t in Table A-2 (the blue dashed line).10 Figure A-1: Calibration of Fiscal Policy Intervention 0.014 Received Simulated 0.011 0.0080 0.0050 09Q2 09Q4 10Q2 10Q4 11Q2 11Q4 Computational Details for Fiscal Policy Experiment. The following algorithm describes how we compute the effect of a combined fiscal and monetary intervention. 10Recall that σ =0.0024, hence the ARRA impulse in our experiment is equal to 3.2×σ . g g

This Version: February 10, 2020 A-14 Algorithm 2 (Effect of Combined Fiscal and Monetary Policy Intervention) 1. Initialize the simulation by setting (R ,y ,z ,g ,d ) equal to the mean estimate obtained 0 0 0 0 0 with the particle filter 2. Generate a baseline trajectory that includes the intervention based on the sequence of innovations obtained from the COPF: {(cid:15) ,(cid:15) ,(cid:15) ,(cid:15) }H . z,T∗+s g,T∗+s d,T∗+s R,T∗+s s=0 3. Generate the innovation sequence for the counterfactual trajectories without intervention according to (cid:15) = (cid:15)I −δARRA; (cid:15) = (cid:15)I for s = 1,...,H; g,T∗ g,T∗ g,T∗+s g,T∗+s (cid:15) = (cid:15)I for s = 0,...,H; z,T∗+s z,t (cid:15) = (cid:15)I for s = 0,...,H; d,T∗+s d,t (cid:15) = 0 for s = 0,...,H; R,T∗+s 4. Conditional on (R ,y ,z ,g ,d ), compute {R ,y ,π }H and 0 0 0 0 0 T∗+s T∗+s T∗+s s=0 {RI ,yI ,πI }H based on {(cid:15) } and {(cid:15)I }, respectively, and let T∗+s T∗+s T∗+s s=0 T∗+s T∗+s IRF(x |(cid:15)I,(cid:15) ) = (lnxI −lnx ). (A.13) t t t t t We report results for δARRA = 0.0077 and H = 7 in the main text. When we consider only a fiscal policy, we set (cid:15)I = 0 for t = T∗,...,T∗ +7 as well. R,t

Cite this document
APA
S. Boragan Aruoba, Pablo Cuba-Borda, Kenji Higa-Flores, Frank Schorfheide, & and Sergio Villalvazo (2020). Piecewise-Linear Approximations and Filtering for DSGE Models with Occasionally Binding Constraints (IFDP 2020-1272). Board of Governors of the Federal Reserve System, International Finance Discussion Papers. https://whenthefedspeaks.com/doc/ifdp_2020-1272
BibTeX
@techreport{wtfs_ifdp_2020_1272,
  author = {S. Boragan Aruoba and Pablo Cuba-Borda and Kenji Higa-Flores and Frank Schorfheide and and Sergio Villalvazo},
  title = {Piecewise-Linear Approximations and Filtering for DSGE Models with Occasionally Binding Constraints},
  type = {International Finance Discussion Papers},
  number = {2020-1272},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2020},
  url = {https://whenthefedspeaks.com/doc/ifdp_2020-1272},
  abstract = {We develop an algorithm to construct approximate decision rules that are piecewise-linear and continuous for DSGE models with an occasionally binding constraint. The functional form of the decision rules allows us to derive a conditionally optimal particle filter (COPF) for the evaluation of the likelihood function that exploits the structure of the solution. We document the accuracy of the likelihood approximation and embed it into a particle Markov chain Monte Carlo algorithm to conduct Bayesian estimation. Compared with a standard bootstrap particle filter, the COPF significantly reduces the persistence of the Markov chain, improves the accuracy of Monte Carlo approximations of posterior moments, and drastically speeds up computations. We use the techniques to estimate a small-scale DSGE model to assess the effects of the government spending portion of the American Recovery and Reinvestment Act in 2009 when interest rates reached the zero lower bound.},
}