feds · November 30, 2015

Estimating (Markov-Switching) VAR Models without Gibbs Sampling: A Sequential Monte Carlo Approach

Abstract

Vector autoregressions with Markov-switching parameters (MS-VARs) fit the data better than do their constant-parameter predecessors. However, Bayesian inference for MS-VARs with existing algorithms remains challenging. For our first contribution, we show that Sequential Monte Carlo (SMC) estimators accurately estimate Bayesian MS-VAR posteriors. Relative to multi-step, model-specific MCMC routines, SMC has the advantages of generality, parallelizability, and freedom from reliance on particular analytical relationships between prior and likelihood. For our second contribution, we use SMC's flexibility to demonstrate that the choice of prior drives the key empirical finding of Sims, Waggoner, and Zha (2008) as much as does the data.

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Estimating (Markov-Switching) VAR Models without Gibbs Sampling: A Sequential Monte Carlo Approach Mark Bognanni and Edward P. Herbst 2015-116 Please cite this paper as: Bognanni, Mark, and Edward P. Herbst (2015). “Estimating (Markov-Switching) VAR Models without Gibbs Sampling: A Sequential Monte Carlo Approach,” Finance and Economics DiscussionSeries2015-116. Washington: BoardofGovernorsoftheFederalReserve System, http://dx.doi.org/10.17016/FEDS.2015.116. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Estimating (Markov-Switching) VAR Models without Gibbs Sampling: A Sequential Monte Carlo Approach ∗ Mark Bognanni Federal Reserve Bank of Cleveland † Edward Herbst Federal Reserve Board of Governors This Draft: December 10, 2015 Abstract. VectorautoregressionswithMarkov-switchingparameters(MS-VARs)fitthe databetterthandotheirconstant-parameterpredecessors.However,Bayesianinference forMS-VARswithexistingalgorithmsremainschallenging.Forourfirstcontribution, weshowthatSequentialMonteCarlo(SMC)estimatorsaccuratelyestimateBayesian MS-VARposteriors.Relativetomulti-step,model-specificMCMCroutines,SMChas theadvantagesofgenerality,parallelizability,andfreedomfromrelianceonparticular analyticalrelationshipsbetweenpriorandlikelihood.Foroursecondcontribution,we useSMC’sflexibilitytodemonstratethatthechoiceofpriordrivesthekeyempirical findingofSims,Waggoner,andZha(2008)asmuchasdoesthedata. JEL:C11,C18,C32,C52,E3,E4,E5 Keywords:VectorAutoregressions,SequentialMonteCarlo,Regime-SwitchingModels, BayesianAnalysis WethankToddClark,RonGallant,EricLeeper,JamesHamilton,GiorgioPrimiceri,JuanF. Rubio-RamírezandFrankSchorfheideforhelpfulcommentsandconversations.Wealsothank Dan Waggoner for a particularly helpful conference discussion and the other participants of various conferences for their feedback. The views expressed in this paper do not necessarily reflectthoseofFederalReserveBankofCleveland,theFederalReserveBoardofGovernors,or theFederalReserveSystem. ∗Correspondence:MarkBognanni,FederalReserveBankofCleveland,POBox6387,Cleveland,OH44101-1387.Email:email.markbognanni@gmail.com,Web:http://markbognanni.com. †Correspondence:BoardofGovernorsoftheFederalReserveSystem,20thStreetandConstitutionAvenueN.W.,WashingtonD.C.20551.Email:edward.p.herbst@frb.gov.

1. Introduction The use of vector autoregressions (VARs) has grown steadily since Sims (1980)andVARsnowserveasavitalelementofthemacroeconomist’stoolkit. BayesianmethodshavecometodominatetheliteratureonVARapplicationsfor twomainreasons.Firstly,VARs’largenumberofparametersrelativetodatain typicalmacroeconomicapplicationshasledresearcherstofavortheadditional parameter discipline that Bayesian priors can provide. Secondly, researchers havedevelopedmethodsthatmakeBayesianestimationofVARsstraightforward. PosteriorsamplingisoftenthemostchallengingaspectofBayesianinference, but for VARs a knownfamily ofpriors yields (conditional)posteriors amenable toanefficientposteriorsamplingalgorithmcalledtheGibbssampler. InrecentyearstheinterestsofeconomistshavemovedbeyondthebasicVAR toextensionswithtime-varyingparameters.Onesuchextension,andthefocusof thispaper,istheVARwithMarkov-switchingparameters(MS-VAR)pioneered by Sims and Zha (2006). As a byproduct of their inquiry into the cause of the “Great Moderation,” Sims and Zha (2006) document superior data fit of every MS-VAR they estimate relative to the constant-coefficient VAR (CC-VAR). Yet, despite the data’s demonstrated preference for Markov-switching models, few researchershaveusedMS-VARsineconomicapplications. WesuspectthatthesparseuseofMS-VARsowestothecomplicatednessofthe estimationprocess;MS-VARsdonotadmitMCMCsamplersthatpossessesthe efficiencyorsimplicityoftheirconstant-parameterpredecessors.Sims,Waggoner, and Zha (2008) expound upon the methods used in Sims and Zha (2006) and describethefollowingfour-stepprocedureforMS-VARestimationandmodel comparison.First,searchinthemodel’shighdimensionalparameterspacefor theposteriormode,fromwhichoneinitializestheMCMCalgorithm.Second, code and deploy a highly model-specific Gibbs sampler, which relies on socalledMetropolis-within-Gibbssteps.Third,imposebothsignandstate-labeling normalizations on the posterior draws at the post-processing stage, which is necessaryforthestabilityoftheestimatorinstep4.Fourthandfinally,codea nontrivialaugmentationofthemodifiedharmonicmean(MHM)algorithmfor 1

estimating the marginal data density (MDD), which is necessary for Bayesian modelcomparison. In a recent paper investigating the macroeconomic effects of financial crises, andwhichisanotableexceptiontothehesitanceofeconomiststouseMS-VARs, HubrichandTetlow(2015)usethealgorithmofSimsetal.(2008)andsummarize thelengthoftheprocessasfollows,“Computationofaspecification’sposterior modeand the marginaldatadensity takesaminimum of6hours inclock time and can take as long as 8 days, depending on the specifics of the run. Adding lags,imposingrestrictionsonswitchingonvariancesandrestrictingswitchingin equationcoefficientsiscostlyintermsofcomputingtimes.”1 Ofcourse,evenat theendofthisprocessuncertaintyremainsaboutwhetherornotonefoundthe trueposteriormodeinthefirststep. Motivatedbythesedifficulties,weestimateMS-VARsusinganalternative classofalgorithms calledSequentialMonteCarlo(SMC). OurSMCalgorithm beginsbypropagatingasetof“particles”fromthepriordistribution,whereeach particlecontains avectorof values forthe model’sparameters. The algorithm then movesand reweights theparticles to iterativelyapproximate a sequenceof distributions, each of which combines the prior with partial information from the likelihood. Each distribution in the sequence uses more information from the likelihood than its predecessor and the algorithm concludes once the full likelihoodhasbeenincorporated.Whenthealgorithmconcludes,theresearcher has a set of particles that serve as a discrete distribution approximating the model’strueposterior. UsingSMCtoestimateMS-VARsallowsustosidestepmanyoftheaforementionedchallenges.Inparticular,SMChasfourkeyfeaturesthatmakeitattractive forourpurposes.First,thealgorithm’sinitializationwithmanyrandomdraws fromthepriornegatesboththeneedforatime-consumingmodesearchandany riskofresidualdependenceonauniquestartingvalue.Second,thealgorithmis genericanddoesnotrelyonanyparticularanalyticalconvenienceoftheposterior. Rather,oneneedsonlytheabilitytoevaluateaposteriorkernelpointwise,which 1ThequotationcomesfromtheOnlineAppendixofHubrichandTetlow(2015),inwhichthe authorsdescribethedetailsoftheestimationprocess. 2

negatestheneedtocodeamulti-step,model-specificGibbsSamplerorforthe modeltoevenallowonetoderiveaGibbsSampler.2 Third,thealgorithmgeneratesanestimateofthemodel’sMDDasabyproduct,whichnegatestheneedfor timeconsumingpost-processingandcodingauniqueMHMalgorithm.Fourth, unlike MCMC algorithms, which must run serially, SMC’s computations admit almost arbitrary parallelization, which makes SMC an increasingly practical approach as modern computer architectures continue to expand their parallel potential. WhiletheaforementionedpropertiesofSMCaredesirable,researchershave yet to demonstrate that SMC can effectively estimate the posteriors of highdimensionaltime-seriesmodelswhenusingacomputationallyfeasiblequantity of particles. Our first contribution is to show that SMC algorithms can indeed perform this task. We show this by demonstrating the algorithm’s ability to estimateMDDsintwosettingsinwhichweknowthetrueMDDinclosedform, settingswhichthusprovideagoldstandardforassessingSMC’sperformance. The first test setting is the familiar reduced-form CC-VAR with conjugate prior,whichweconsiderthesimplestpossibletestrelevanttoourinterests.Forthe CC-VARweshowsolidperformancebySMCunderavarietyofchoicesforthe algorithm’stuningparametersandhighlightafewsmallchangestoexistingSMC implementations that yield particularly dramatic performance improvements for VARs.Onecanuseourchangetothealgorithmgenerally,butitsperformance gains for VARs owes to its improved accounting for the correlation structure among parameters that is typically present in both VAR priors and posteriors. Thesecondtestsettingisamixtureofreduced-formCC-VARposteriors,which imitates the multi-modality of more complicated models. Remarkably, when confrontedwiththemulti-modalposteriortheSMCalgorithmestimatesMDDs aswellorbetterthanstandardMDDestimatorsevenwhenweprovidethestandard 2InprincipleonecanusethesamebasicSMCalgorithmtoestimatereduced-formVARs, structuralandexactly-identifiedVARs,structuralandover-identifiedVARs,VARswithsteadystate priors, and MS-VARs, each of which relies on a unique posterior sampler when using MCMCforestimation.Weconsiderthealgorithm’sgenericnesstobeanargumentinfavorof itsuse.AsGeweke(2004)emphasizes,relianceonmodel-specificGibbssamplersforposterior simulationtypicallyinvolvesalengthyprocessesoftediousalgebraandcoding,bothofwhich lendthemselveswelltomakingdifficult-to-detecterrors. 3

estimatorswithiiddrawsfromtheposterior. HavingestablishedSMC’sviabilityforhighdimensionaltimeseriesmodels, we exploit its genericness to make our second contribution: a demonstration of the importance of the prior for Bayesian model selection among MS-VAR specifications. We useSMC to estimate asuite of MS-VARs similar tothose of Sims et al. (2008) under a variety of priors. The main empirical result of both Sims and Zha (2006) and Sims et al. (2008) is that the data clearly favors MS- VARs with regime switching only in shock variances, and not other economic dynamics. In this sense, the authors come down on the side of “good luck” in thedebateoverthecauseof the“GreatModeration.” Wefind that,whenusing thebestfittingprior,theposteriorprobabilitythatthebestmodelincludesboth time-varying shock variances and time-varying economic dynamics shifts from 6% to 43%. This result suggests that prior choice deserves particularly careful attentionwhen comparing competingMS-VARmodels, apointthat shouldbe welltakengenerallybyresearchersusingtheBayesianapproachtocomparing differentspecificationsofdenselyparameterizedmodels. Lastly, our positive results on SMC’s usefulness in practical applications constitutea contributionofgeneral interesttoeconomists.SMC providesaway forward when MCMC algorithms are either inefficient or reliant on the use of undesirablepriorsforefficiency.Aseconomistsestimateincreasinglycomplicated models,itseemslesslikelythattheywillfindpriorsthatbothyieldaposterior amenabletoGibbssamplingandperfectlyrepresenteconomists’aprioribeliefs. Indeed, the existenceof such a prior seems more likelyto be the result of divine coincidencethanthenorm. Withregardstotheestimationalgorithm,ourpaperbuildsontherecentwork by Durham and Geweke (2012) and Herbst and Schorfheide (2014), who also exploretheuseofSMCalgorithmsforestimatingeconometricmodels.Durham andGeweke(2012) emphasize themassiveparallelizationpossibilitiesforSMC algorithms,particularlyforusewithGPUs.HerbstandSchorfheide(2014)apply SMCalgorithmstotheestimationofDSGEmodelsandshowthatDSGE-model posteriors can possess multi-modality that random walk Metropolis-Hastings algorithmsfailtouncoverinreasonableamountsoftime.Wealsomakeuseofa 4

number of advances from the statistics literature, on which we elaborate further inthenextsection. Fromheretherestofthepaperproceedsasfollows.InSection2wedescribe theestimationproblem,ourestimationalgorithm,anditsplacewithinthelarger SMC literature. In Section 3 we demonstrate the algorithm’s effectiveness in settingsinwhichwehaveclosed-formexpressionsfortheobjectsweestimate. InSection4wedescribetheMS-VARmodels,thethreepriorsweconsider,and ourestimationresults.InSection5weconclude. 2. Sequential Monte Carlo Methods Let𝜃 betheparametersofamodeland𝑌 bethedatarelevantforthemodel’s likelihoodfunction.TheBayesianresearcherisinterestedintheposteriordensity 𝑝(𝜃 𝑌),whichisgivenby | 𝑝(𝑌 𝜃)𝑝(𝜃) | (1) 𝑝(𝜃 𝑌) = , where𝑝(𝑌) = 𝑝(𝑌 𝜃)𝑝(𝜃)𝑑𝜃 , | ∫ | 𝑝(𝑌) 𝑝(𝜃)denotesthepriordensity,and𝑝(𝑌 𝜃)denotesthelikelihood.Theterm𝑝(𝑌) | is known as the “marginal data density” (MDD) or “marginal likelihood”, an importantmeasure ofmodel fitin Bayesianstatistics.For easeof exposition,in this section we abbreviate these objects by 𝜋(𝜃) = 𝑝(𝜃 𝑌), 𝑓(𝜃) = 𝑝(𝑌 𝜃)𝑝(𝜃), | | and𝑍 = 𝑝(𝑌),whichgivesanequivalentexpressionto(1)as 𝑓(𝜃) (2) 𝜋(𝜃) = . 𝑍 An unfortunatefeatureof Bayesian inference is that in most applicationsof practicalinterestonesdoesnotknowthemomentsof𝜋(𝜃)inclosed-form.Hence, posterior inference often relies on devising a method to sample from 𝜋(𝜃). To putthesefactsinthecontextofourapplications,notethatforVARstheliterature haspreviouslyconcentratedonfamiliesofpriorsthatinduceaposteriorsuchthat either𝜋(𝜃)canbesampleddirectlyorthereexistsapartitioningoftheparameters 𝜃 = [𝜃1,…,𝜃𝑛] such that each conditional posterior can be sampled directly, 5

yieldingdrawsfromtheposteriorthroughaGibbssampler.3 WhileforMS-VARs noknownpriorsinduceposteriorsfromwhichwecansamplewithapureGibbs Samplerorforwhichweknow𝑍 inclosed-form. 2.1 Overview of the Sequential Monte Carlo Method In our applications we use SMC algorithms to approximate 𝜋(𝜃) and 𝑍.4 Since importance sampling (IS) serves as the keystone of SMC, we begin our descriptionofSMCmethodswithabriefdescriptionofIS.5 ISapproximatesthe targetdensity𝑓(⋅)byadifferent,easy-to-sampledensity𝑔(⋅),whichissometimes knownasthe“sourcedensity.”ISisbasedontheidentity 1 𝐸 [ℎ(𝜃)] = ℎ(𝜃)𝜋(𝜃)𝑑𝜃 = ℎ(𝜃)𝑤(𝜃)𝑔(𝜃)𝑑𝜃, 𝜋 ∫ 𝑍 ∫ (3) Θ 𝑓(𝜃) where 𝑤(𝜃) = , 𝑔(𝜃) 𝑖𝑖𝑑 If 𝜃𝑖 ∼ 𝑔(𝜃), 𝑖 = 1,…,𝑁, then, under suitable regularity conditions—see Geweke(1989)—theMonteCarloestimate 𝑁 ∑ 𝑤(𝜃𝑖) (4) ℎ̄ = ℎ(𝜃𝑖)𝑊̃ 𝑖, where 𝑊̃ 𝑖 = , 1 ∑𝑁 𝑤(𝜃𝑗) 𝑖=1 𝑁 𝑗=1 converges almost surely (a.s.) to 𝐸 [ℎ(𝜃)] as 𝑁 ⟶ ∞. The set of pairs {(𝜃𝑖, 𝜋 𝑊̃ 𝑖)}𝑁 provides a discrete distribution that approximates 𝜋(𝜃). The 𝑊̃ 𝑖’s are 𝑖=1 knownasthe(normalized)importanceweightsassignedtoeachparticlevalue𝜃𝑖. Thedistancebetween𝑔(⋅)and𝑓(⋅)determinestheaccuracyoftheapproximation (per particle)and theuniformity (or lackthereof) of thedistribution of weights reflects the size of this distance. If the distribution of weights is very uneven, 3ResearchersusuallyestimateVARsunderaconjugateprioroftheNormal-InverseWishart form.OnecanefficientlyestimatestructuralVARsthathavelinearover-identifyingrestrictions byusingthealgorithmdescribedinWaggonerandZha(2003a). 4Wedescribeonlyourparticularalgorithmhere.Chopin(2002),DelMoral,Doucet,and Jasra(2006),Creal(2012),andHerbstandSchorfheide(2014)offeradditionaldetailsonSMC implementation. 5Indeed,theSMCmethodweuseinthispaperissometimesknownasIteratedBatchImportanceSampling. 6

the Monte Carlo approximation ℎ̄ is inaccurate, because only a few particles contributemeaningfullytotheestimate.Ontheotherhand,uniformweightsarise if𝑔(⋅) ∝ 𝑓(⋅),whichmeansthatwearesamplingdirectlyfrom𝜋(𝜃). Unfortunately, constructing “good” importance distributions, 𝑔(⋅), is difficult whentheeconometricianknowslittleabouttheshapeof𝑓.6 TheSMCalgorithm weuseattacksthisproblembyrecursivelybuildingparticleapproximationstoa sequenceofdistributions,startingfromaknowndistribution,theprior,andthen slowlyaddinginformationfromthelikelihooduntilwehaveobtainedaparticle approximation to the posterior. Specifically, we use 𝑛 to index a sequence of distributionsoftheform (5) 𝜋 (𝜃) = 𝑓 𝑛 (𝜃) = [𝑝(𝑌 | 𝜃)]𝜙 𝑛𝑝(𝜃) , 𝑛 = 1,…,𝑁 . 𝑛 𝑍 ∫ [𝑝(𝑌 𝜃)]𝜙 𝑝(𝜃)𝑑𝜃 𝜙 𝑛 | 𝑛 andchooseanincreasingsequenceofvaluesforthescalingparameter,𝜙 ,such 𝑛 that 𝜙 = 0 and 𝜙 = 1. The choice of 𝜙 = 0 means that the initial target 1 𝑁 1 𝜙 distribution, 𝜋 (𝜃), issimplytheprior,𝑝(𝜃). Hence,one initializesthe algorithm 1 by propagating the particles as random draws from the prior. The choice of 𝜙 = 1 means that the final target distribution, 𝜋 (𝜃), is the posterior. Thus 𝑁 𝑁 𝜙 𝜙 thefinalparticlesapproximatethedistributionofinteresttotheresearcher.7 2.2 The Sequential Monte Carlo Algorithm Algorithm1describesthethreestepstoconstructaparticleapproximationto 𝜋 fromaparticleapproximationto𝜋 ,intheterminologyof Chopin(2002). 𝑛 𝑛−1 The general form of Algorithm 1 is the same as the one used in Herbst and Schorfheide (2014),but wedescribe its key features hereforcompleteness. The 6ThereisahistoryofusingimportancesamplingtechniquesforVARestimationwhendirect samplersandGibbssamplersareunavailable,namelyLeeper,Sims,Zha,Hall,andBernanke (1996),Uhlig(1997),andKadiyalaandKarlsson(1997)usedimportancesamplersforparts of VAR posteriors. However since the late 1990s researchers estimating VARs have largely abandoned this approach because of the difficulty of finding a good “𝑔,” which resulted in inefficientalgorithms. 7The“likelihoodtempering”formulationisnottheonlyavenueonecouldhavepursued.For exampleDurhamandGeweke(2012)proposeaGPU-basedSMCalgorithmasablackboxfor manytimeserieseconomicmodelswith𝑓 𝑛 (𝜃)=𝑝(𝑌 1∶𝑛| 𝜃)𝑝(𝜃).DurhamandGeweke(2012)’s “datatempering”approachisattractiveforobtainingon-lineparameterestimates. 7

Algorithm1:SimulatedTemperingSMC Initialization.(𝜙 = 0).Drawtheinitialparticlesfromtheprior: 1 𝜃𝑖 𝑖 ∼ 𝑖𝑑 𝑝(𝜃), 𝑊𝑖 = 1, 𝑖 = 1,…,𝑁. 1 1 for𝑛 = 2,…,𝑁 do 𝜙 1.Correction.Reweighttheparticlesfromstage𝑛−1bydefiningthe incrementalandnormalizedweights 𝑤̃𝑖𝑊𝑖 𝑤̃𝑖 𝑛 = [𝑝(𝑌 | 𝜃 𝑛 𝑖 −1 )]𝜙 𝑛 −𝜙 𝑛−1, 𝑊̃ 𝑛 𝑖 = 1 ∑𝑁 𝑛 𝑤̃ 𝑛 𝑖 − 𝑊 1 𝑖 , 𝑖 = 1,…,𝑁. 𝑁 𝑖=1 𝑛 𝑛−1 2.Selection.Computetheeffectivesamplesize ( ) 𝑁 1 ∑ 𝐸𝑆𝑆 = 𝑁∕ (𝑊̃ 𝑖)2 𝑛 𝑁 𝑛 𝑖=1 if 𝐸𝑆𝑆 < 𝑁 ∕2then 𝑛 𝑝𝑎𝑟𝑡 Resampletheparticlesviamultinomialresamplingandreinitializethe weightstouniform,i.e. 𝑊𝑖 = 1, 𝜃̂𝑖 ∼ {𝜃𝑗 ,𝑊̃ 𝑗} , 𝑖 = 1,…,𝑁 𝑛 𝑛 𝑛−1 𝑛 𝑗=1,…,𝑁 else 𝑊𝑖 = 𝑊̃ 𝑖, 𝜃̂𝑖 = 𝜃𝑖 𝑛 𝑛 𝑛 𝑛−1 end 3.Mutation.Propagateeachparticle{𝜃̂𝑖,𝑊𝑖}via𝑀 stepsofanMCMC 𝑛 𝑛 algorithmwithtransitiondensity𝜃𝑖 ∼ 𝐾 (𝜃 𝜃̂𝑖;𝜁 )andstationary 𝑛 𝑛 𝑛| 𝑛 𝑛 distribution𝜋 (𝜃).(SeeAlgorithm2fordetailsandthedefinitionof𝜁 ). 𝑛 𝑛 end Computeposteriormoments.Anapproximationof𝔼 [ℎ(𝜃)]isgivenby 𝜋 𝑛 𝑁 1 ∑ (6) ℎ̄ = ℎ(𝜃𝑖)𝑊𝑖. 𝑛,𝑁 𝑁 𝑛 𝑛 𝑖=1 Thisapproximationisvalidusingtheparticleapproximations, {𝜃𝑖 ,𝑊̃ 𝑖} 𝑁 𝑝𝑎𝑟𝑡,{𝜃̂𝑖,𝑊𝑖} 𝑁 𝑝𝑎𝑟𝑡 and{𝜃𝑖,𝑊𝑖} 𝑁 𝑝𝑎𝑟𝑡 afterthecorrection,selection, 𝑛−1 𝑛 𝑖=1 𝑛 𝑛 𝑖=1 𝑛 𝑛 𝑖=1 andmutationstep,respectively. 8

algorithm initializes with particles sampled from 𝑝(𝜃) and assigned uniform weights. We then enter the recursions. We enter any stage 𝑛 of the recursion withaparticleapproximation{𝜃 ,𝑊̃ 𝑖 } 𝑁 𝑝𝑎𝑟𝑡 of𝜋 .Inthefirststepofstage 𝑛−1 𝑛−1 𝑖=1 𝑛−1 𝑛, the correction step, the particles are reweighted according to 𝜋 . This is an 𝑛 importancesampleof𝜋 using𝜋 astheproposaldistribution.Inthesecondstep, 𝑛 𝑛−1 selection,ifthesampleisunbalancedinthesensethatonlyafewparticleshave meaningful weight, the particles are rejuvenatedusing multinomial resampling. This process ensures that the sampler avoids the well-known issue of particle impoverishment.Ontheotherhand,theresamplingitselfinducesnoiseintothe simulation, and so we avoid doing it unless necessary. In the third and final step, mutation,particlesaremovedaroundtheparameterspace,using𝑀 iterationsof aMetropolis-Hastingsalgorithmoneachindividualparticle. Thelaststep,mutation,iscrucial.Mutationallowsparticlestomovetowards areasofhigherdensityof𝜋 andensuresdiversityacrossreplicatedparticleswhen 𝑛 resamplingoccursduringtheselectionstep.Werethealgorithmtorunwithout mutation,repeatedresamplingofthecorrectedparticleswouldleaveonlyafew uniquevaluessurvivinguntilthefinalstage,resultinginapoorapproximationto theposterior. Fromacomputationalperspective,apointtostressaboutthemutationstepis thateachparticleoperatesindependentlyofoneanother,inasenseforming𝑁 𝑝𝑎𝑟𝑡 independentMarkovchains.ThisstandsincontrasttoMCMCalgorithms,which relyonasinglechain.Theindependenceofparticlesduringmutationallowsus to exploit parallel computations during the mutation step, which provides the benefitofgreatlyspeedingupthealgorithm,ashighlightedbybothDurhamand Geweke(2012)andHerbstandSchorfheide(2014). WefollowHerbst andSchorfheide(2014) inour specification for thetemper- 𝑁 ingschedule,{𝜙 } 𝜙 ,andchooseaschedulewhichfollows 𝑛 𝑛=1 ( )𝜆 𝑛−1 (7) 𝜙 = . 𝑛 𝑁 −1 𝜙 The hyperparameter 𝜆(> 0) controls the rate at which “information” from the likelihoodisaddedtothesampler.If𝜆 = 1,thenthescheduleislinear,and,very 9

roughly speaking, each stage has the same contribution. We use 𝜆 > 1 which meansthatweaddonlysmallincrementsofthelikelihoodtothepriorintheearly stages ofthesampler andadd larger incrementsinthe laterstages. Wediscuss the role of 𝜆 in more detail in Section 3, in which we test the algorithm under variouschoicesforthetuningparameters. 2.3 MCMC Transition Kernel Algorithm 1 presents the generic SMC algorithm for estimating Bayesian models,butdoesnotspecifytheexactnatureoftheMCMCtransitionkernelused for particlemutation.AsweshowinSection3,theformoftheMCMCkernel cancruciallyaffecttheperformanceofthesampler.Ourbasemutationkernelisa blockrandomwalkMetropolis-Hasting(RWMH)sampler,detailedinAlgorithm 2. Block MH algorithms have been useful in the estimation of DSGE models (see,forexample,ChibandRamamurthy(2010)andHerbst(2012)).Breaking theparametervectorinto blocks reducesthedimensionalityof thetargetdensity for each MCMC step, making it easier to well approximate it by the proposal density. AkeyconsiderationaffectingtheefficiencyofanyRW-MHalgorithmisthe constructionoftheproposalvariance.Ourchoiceofproposalcovariancedeparts fromHerbstandSchorfheide(2014)inasimplebutimportantway;weusethe multivariatenormalapproximationtotheconditionalvarianceforblock𝑏,while HerbstandSchorfheide(2014)usetheestimateofthemarginalvarianceforthe block𝑏parameters.Tobemoreprecise,weuse (8) Σ̂ = [Σ̂ ] −[Σ̂ ] [Σ̂ ]−1 [Σ̂ ] , 𝑏,𝑛 𝑛 𝑏,𝑏 𝑛 𝑏,−𝑏 𝑛 −𝑏,−𝑏 𝑛 −𝑏,𝑏 ratherthan (9) Σ̂ = [Σ̂ ] . 𝑏,𝑛 𝑛 𝑏,𝑏 Themarginalvarianceignorestherelationshipbetweentheparametersinblock 𝑏 and the other “conditioning” parameters, which makes it a poor choice for (MS)VARsbecauseofthenontrivialcorrelationstructuresinstandardpriorsand 10

Algorithm2:MutationStep 𝑁 Let{𝐵 } 𝜙 beasequenceofrandompartitionsoftheparametervector.Fora 𝑛 𝑛=2 givenpartition𝐵 ,let𝑏denotetheblockoftheparametervectorsothat𝜃𝑖 𝑛 𝑏,𝑛 referstothe𝑏elementsofthe𝑖thparticle.Furtherlet𝜃𝑖 thedenotethe <𝑏,𝑛 subpartitionof𝐵 referingtoelementsof𝜃𝑖 partitionedbeforethe𝑏thsetandso 𝑛 𝑛 on. Ateachstage,𝑛,obtainaparticleestimateofthecovarianceoftheparameters afterselectionbutbeforemutation, 𝑁 𝑁 ∑𝑝𝑎𝑟𝑡 ∑𝑝𝑎𝑟𝑡 Σ̂ = 𝑊𝑖(𝜃̂𝑖 −𝜇̂ )(𝜃̂𝑖 −𝜇̂ )′ with𝜇̂ = 𝑊𝑖𝜃̂𝑖. 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑖=1 𝑖=1 Denoteacovariancematrixforthe𝑏-thblock,atstage𝑛,whichissomefunction 𝜁(.)ofΣ̂ as, 𝑛 Σ̂ = 𝜁(Σ̂ ). 𝑏,𝑛 𝑛 Weconsidertwodifferentfunctions𝜁(.),whichwedescribe,andcomparethe performanceof,inthetext. Let𝑀 beaninteger(≥ 1)definingthenumberofMetropolis-Hastingsstepsin themutationstage.Introduceanadditionalsubscript𝑚sothat𝜃𝑖 referstothe 𝑚,𝑏,𝑛 𝑏thblockofthe𝑛thstage,𝑖thparticleafter𝑚Metropolis-Hastingssteps.Set 𝜃𝑖 = 𝜃̂𝑖 . 0,𝑏,𝑛 𝑏,𝑛 for𝑚 = 1,…,𝑀 do for𝑏 ∈ 𝐵 do 𝑛 ( ) 1. Drawaproposal𝜃∗ ∼ 𝑁 𝜃𝑖 ,Σ̂ . 𝑏 𝑚−1,𝑏,𝑛 𝑏,𝑛 [ ] [ ] Denote𝜃∗ = 𝜃𝑖 ,𝜃∗,𝜃𝑖 and𝜃𝑖 = 𝜃𝑖 ,𝜃𝑖 . 𝑚,<𝑏,𝑛 𝑏 𝑚−1,>𝑏,𝑛 𝑚,𝑛 𝑚,<𝑏,𝑛 𝑚−1,≥𝑏,𝑛 2. Withprobability, { } [𝑝(𝑌 | 𝜃∗)]𝜙 𝑛𝑝(𝜃∗) 𝛼 = min ,1 [𝑝(𝑌 | 𝜃 𝑚 𝑖 ,𝑛 )]𝜙 𝑛 𝑝(𝜃 𝑚 𝑖 ,𝑛 ) Set𝜃𝑖 = 𝜃∗.Otherwiseset𝜃𝑖 = 𝜃𝑖 . 𝑚,𝑏,𝑛 𝑏 𝑚,𝑏,𝑛 𝑚−1,𝑏,𝑛 end end RetainthelaststepoftheMetropolis-Hastingssampler.Set𝜃𝑖 = 𝜃𝑖 forall 𝑏,𝑛 𝑀,𝑏,𝑛 𝑏 ∈ 𝐵 . 𝑛 11

posteriors. In Section 3 we show that this simple change greatly improves the algorithm’sefficiency. 2.4 Theoretical Considerations Wemakenotheoreticalcontributionsinthispaper,sowewillnotgointodetail abouttheformalargumentsprovingthestronglawoflargenumbers(SLLN)and centrallimittheorem(CLT)fortheparticleapproximationin(6)attheconclusion ofAlgorithm1.ReadersinterestedinthedetailsoftheSLLNandCLTshould refertoChopin(2002),whichprovidesarecursivecharacterizationoftheSLLN and CLT that apply after each of the correction, selection, and mutation steps. HerbstandSchorfheide(2014)characterizethehighlevelassumptionssufficient fortheSLLNandCLTtoapplywhenthemutationstageisadaptive;thatis,when featuresoftheMCMCalgorithmdependonpreviousparticleapproximations. Whiledifficulttoverifyinpractice,theextensionoftheSLLNandCLTprovides at least a basis for the use of such a transition kernel. Finally, the variances associatedwiththeCLTshavetheformulationgiveninChopin(2002),butthe recursive form is, unfortunately, not useful in practice. Hence, to characterize the uncertainty around our estimates in subsequent sections we use numerical standarderrorscomputedacrossmultipleindependentrunsofthealgorithm,as inDurhamandGeweke(2012). 3. Sequential Monte Carlo in Two Controlled Experiments SinceweknowofnootherresearchevaluatingSMC’seffectivenessinapplicationscomparabletoours,wefinditworthwhiletoverifythatSMCcanreliably estimatetheposteriorsofVAR-sizedmodels.WeverifyourSMCalgorithm’s effectivenessbydemonstratingtheaccuracyofitsMDDestimatesfortwomodels forwhichweknowthetrueMDDinclosed-form:1)aVARwithconjugateprior and,asamorechallengingtest,2)amixtureofVARposteriors. 3.1 The Constant-Parameter VAR TheMS-VARsweestimateinsubsequentsectionsbuildoffofaparameterization of the VAR model known as the “structural” form. The structural VAR 12

hastheform 𝑝 ∑ (10) 𝑦′𝐴 = 𝑦′ 𝐹 +𝐹 +𝜀′ , 𝜀 ∼ (0,𝐼 ), for1 ≤ 𝑡 ≤ 𝑇 𝑡 𝑡−𝑙 𝑙 0 𝑡 𝑡 𝑛 𝑙=1 where𝑦 isan𝑛×1vectorofobservablesattime𝑡,𝑦 isthetime𝑡−𝑙realization 𝑡 𝑡−𝑙 of the same observables, 𝑝 is the number of lags of the observables, and 𝜀 is 𝑡 an 𝑛 × 1 vector of structural shocks. Letting 𝑥 = [𝑦′ ,…,𝑦′ ,1]′ and 𝐹 = 𝑡 𝑡−1 𝑡−𝑝 [𝐹′,…,𝐹′,𝐹′]′,wecanwritetheVARmorecompactlyas 1 𝑝 0 (11) 𝑦′𝐴 = 𝑥′𝐹 +𝜀′ , 𝜀 ∼ (0,𝐼), 𝑡 𝑡 𝑡 𝑡 andrefertoitsparametersas𝜃 = (𝐴,𝐹). 𝑆 Standard priors for 𝜃 described in Sims and Zha (1998) and Waggoner 𝑆 andZha(2003a)donotadmitaclosedformexpressionforthemodel’sMDD, which wouldmakeit difficultto assessour algorithm.However, inthe absence ofoveridentifyingrestrictionsonthematrices𝐴and𝐹,onecanderiveaprior for𝜃 asachangeofvariablesfromaprioroveranalternative(“reduced-form”) 𝑆 parameterization,withparameters𝜃 .Thereduced-formparametersaredefined 𝑅𝐹 as (12) 𝜃 = (𝛴,Φ) = 𝑔(𝜃 ) = ((𝐴𝐴′)−1,𝐹𝐴−1), 𝑅𝐹 𝑆 inwhichcasetheVARiswrittenas (13) 𝑦′ = 𝑥′Φ+𝑢′ , 𝑢 ∼ (0,Σ), 𝑡 𝑡 𝑡 𝑡 Standard priors 𝑝 (⋅) admit a closed-form expression for the VAR’s MDD. 𝑅𝐹 Usingthisdistributionandchangeofvariables,theresultingpriordensityfor𝜃 𝑆 issimply (14) 𝑝 (𝜃 ) = 𝑝 (𝑔(𝜃 )) 𝐽(𝑔(𝜃 ),𝜃 ) , 𝑅𝐹𝐵 𝑆 𝑅𝐹 𝑆 | 𝑆 𝑆 | where𝐽(𝑔(𝜃 ),𝜃 )istheJacobianof𝑔.Weuse𝑅𝐹𝐵 toindicatethatpriorfor 𝑆 𝑆 13

thestructuralparameters𝜃 isbasedonthischangeofvariablesfromastandard 𝑆 reduced-form prior. The details of 𝑝 are described in subsequent sections, 𝑅𝐹𝐵 but for our present purposes it suffices to know that 1) we can easily sample fromit,2)wehaveclosedformexpressionsfor 𝑝 and𝐽 thatadmitpointwise 𝑅𝐹 evaluation,and 3) itgives usan exactexpression for the model’s MDD. Inother words, we can estimate the model with SMC and and compare SMC’s MDD estimate to the true MDD. To sample from 𝑝(𝜃 ) we first sample from 𝑝(𝜃 ) 𝑆 𝑅𝐹 andthentransform𝜃 into𝜃 viathefunction𝑔−1,whichwedefineas 𝑅𝐹 𝑆 (15) 𝑔−1(𝜃 ) = ((𝑐ℎ𝑜𝑙(𝛴)′)−1,Φ(𝑐ℎ𝑜𝑙(𝛴)′)−1) = (𝐴,𝐹) = 𝜃 , 𝑅𝐹 𝑆 where𝑐ℎ𝑜𝑙(⋅)referstothelowertriangularCholeskymatrix.8 3.2 Experiment 1: SMC Accuracy for VAR Posteriors WefirsttestSMC’sperformanceonaVARwith𝑛 = 3variablesand𝑝 = 3 lags. Thedata forour test consists ofobservations on theoutput gap, inflation (GDPdeflater),andtheFederalFundsRatefrom1959:Q1to2005:Q4.Weuse theexactdatasetfromtheempiricalexampleofSimsetal.(2008),whichwealso usewhenestimatingMarkov-switchingmodelsinSection4. Recall that the SMC sampler described in Section 2 features a number of tuningparametersthatmustbesetbytheuser.Forourbaselineexperiment,we set 𝑁 = 2000, 𝑁 = 500, 𝑀 = 1, 𝑁 = 3 (random), and 𝜆 = 4.9 We 𝑝𝑎𝑟𝑡 𝜙 𝑏𝑙𝑜𝑐𝑘𝑠 run20MonteCarloreplicationsofthesamplerandexaminethedistributionof ln(MDD)estimates.ThefirstrowofTableIshowstheresultsunderthebaseline 8Itiswellknownintheliteraturethatourchoiceof𝑔−1isnottheuniquedefinitionforwhich 𝑔(𝑔−1(𝜃 )) = 𝜃 :multiplyingboth𝐴and𝐹 fromtherightbyanorthogonalmatrixwould 𝑅𝐹 𝑅𝐹 yieldanalternative𝜃̃ forwhich𝑔(𝜃̃ )wouldequalthesame𝜃 .Inthepresentcontextour 𝑆 𝑆 𝑅𝐹 definitionof𝑔−1isjustanormalizationanditslackofuniquenessisirrelevant.BoththeVAR likelihoodandourchoiceofpriorareinvarianttoorthogonalrotations,sothemodel’sMDDis invarianttoalternativechoicesof𝑔−1.Rubio-Ramírez,Waggoner,andZha(2010)document thatthedensityofanypriorfor𝜃 derivedfromthe𝜃 parameterizationwillbeinvariantto 𝑆 𝑅𝐹 orthogonalrotation. 9AppendixAprovidesathoroughexaminationofSMC’seffectivenessunderavarietyof valuesforthetuningparameters.Inparticular,TableA-1representsamatrixofapproximate “partialderivatives”ofMDDestimationaccuracywithrespecttovarioustuningparameters. 14

TABLE I ACCURACY OF SMC ESTIMATES OF ln(MDD): EFFECT OF RW-MH PROPOSAL COVARIANCE MATRIX Σ RMSE 𝑝𝑟𝑜𝑝 Conditional 0.21 Unconditional 1.90 Notes:TheotherSMCtuningparametersforthisexerciseare𝑁 = 2000,𝑁 = 3, 𝑝𝑎𝑟𝑡 𝑏𝑙𝑜𝑐𝑘𝑠 Random Blocking, 𝑁 = 500, 𝑀 = 1, and 𝜆 = 4. The VAR model has 𝑝 = 3 lags. 𝜙 RMSEistherootmeansquarederroroftheestimatesofofln(MDD).Thetruevalueof theln(MDD)is1791.9. tuning parameters in terms of root mean squared error (RMSE) of ln(MDD) estimates.Wecanseethatthesamplerisquiteaccurate. ThesecondrowofTableIdemonstratestheramificationsofusing(9)asthe RWMH proposal variance rather the conditional approximation, given by (8). Thesamplerusing(9)mostcloselyresemblestheoneusedforDSGEmodelsby HerbstandSchorfheide(2014).Usingtheunconditionalvarianceestimateinthe blockRWMHleadstosubstantialdeteriorationinthesampler’sperformance, as theRMSEoftheln(MDD)estimatesincreasesbynearlyanorderofmagnitude. To contextualize the efficiency gains from our modification of the Herbst and Schorfheide(2014)proposalvariance,wefindthatthegainsinaccuracyfrom usingtheconditionalapproximationaresignificantlygreaterthanthegainsfrom doubling thenumber ofparticles (oreven movingfrom 1000 to5000 particles). OnereasonforthisisthattheVARpriorexhibitssubstantialcorrelationamong keyparameters.Whenthiscorrelationstructureisnotaccountedfor,thesampler performsverypoorlyintheearlystageswhenthepriordominatesthelikelihood contribution. 3.3 Experiment 2: SMC Accuracy for a Mixture of VAR Posteriors Sims et al. (2008) stress that the posterior of MS-VARs “tends to be non- Gaussian” and may well contain “multiple peaks.” Indeed, when estimating MS-VARsinSection4wefindevidenceoffat-tailedandmultipeakedposterior densities in our posterior draws, even after normalizing them. To determine 15

whether or not SMC can stand up to such irregularities, we conduct a Monte Carlosimulationonabimodaltargetdensityforwhich:1)weknowtheintegrating constantinclosed-form,whichprovidesanabsolutemeasureofsuccess,2)wecan samplethe targetdistributiondirectly andthenapplyexistingMDD estimation techniques,whichprovidesa relativemeasure ofsuccess, and3) thedistribution issimilartotheSMC-estimatedposterioroftheMS-VARsweconsiderinSection 4,whichprovidesourcontrolledexperimentwithempiricalrelevance. Weconstructthebimodaltargetdistributionasthemixtureoftwoposteriors foraparametervector𝜃 thatshareacommonprior,butareinformedbydifferent observations.Letting𝑝(𝜃)beaprior,𝑝(𝑌 𝜃)themodel’slikelihoodfunctionfor | observations𝑌,and (16) 𝑝(𝑌) = 𝑝(𝜃)𝑝(𝑌 𝜃)𝑑𝜃 , ∫ | Θ ourtargetdensityisgivenby ( ) ( ) 𝑝(𝜃)𝑝(𝑌 𝜃) 𝑝(𝜃)𝑝(𝑌 𝜃) 𝑝̃(𝜃 𝑌 ,𝑌 ) = 𝛼 1| +(1−𝛼) 2| , 𝛼 ∈ [0,1]. | 1 2 𝑝(𝑌 ) 𝑝(𝑌 ) 1 2 Wetake𝛼 asgivenandknown,soweimplicitlyconditiononthisvalue.Defining (17) 𝐿̃ (𝜃 𝑌 ,𝑌 ) = 𝛼𝑝(𝑌 )𝑝(𝑌 𝜃)+(1−𝛼)𝑝(𝑌 )𝑝(𝑌 𝜃), | 1 2 2 1| 1 2| whichwecallapseudo-likelihood,and (18) 𝑝̃(𝑌 ,𝑌 ) = 𝑝(𝑌 )𝑝(𝑌 ), 1 2 1 2 andperformingsomesimplealgebra,wecanwritethetargetdistributionas 𝑝(𝜃)𝐿̃ (𝜃 𝑌 ,𝑌 ) (19) 𝑝̃(𝜃 𝑌 ,𝑌 ) = | 1 2 . | 1 2 𝑝̃(𝑌 ,𝑌 ) 1 2 For the mixture components in our experiment we use posteriors of the VAR model described in Section 3.1 and hence we know 𝑝(𝑌 ) and 𝑝(𝑌 ) (and thus 1 2 𝑝̃(𝑌 ,𝑌 )) in closed-form. We execute 50 replications of SMC estimation of 1 2 16

𝑝̃(𝜃 𝑌 ,𝑌 ),whichincludeestimatesoftheMDD,𝑝̃(𝑌 ,𝑌 ). | 1 2 1 2 ToprovideabenchmarkforSMC,wealsosampledirectlyfrom𝑝̃(𝜃 𝑌 ,𝑌 ) | 1 2 andestimatetheMDDwithstandardtechniques.10 Inparticularweestimatethe MDDfromthedirectsamplewithtwoversionsofthemodifiedharmonicmean method:theversionoriginallydescribedinGeweke(1989),whichwereferto as “MHM,” and the version adapted for better performance with non-Gaussian distributionsinSimsetal.(2008),whichwerefertoas“MHM-SWZ.”11 Although we call this exercise a “benchmark” for SMC, it is in fact a very highbar.Thetaskofposteriorsamplingisoftenextremelychallenging,tosay nothing of MDD estimation from the resulting sample. Researchers typically simulateposteriordrawsusingMCMCalgorithmsforwhichiiddrawsrepresent apracticalupperboundonthequalityoftheposteriorapproximation.12 Thusthe benchmarkexerciseactuallysidestepsoneofthemajorchallengesoftheconventionalapproachtoestimatingMDDs.Onemightthensaythatourbenchmarkfor SMCis,infact,anupperboundontheperformanceoftheconventionalapproach. Meanwhile, we charge SMC with the doubly difficult task of simultaneously samplingtheposteriorandMDDestimation. Table II shows the results of our simulation for a VAR(𝑛 = 3,𝑝 = 5), from whichwearriveatfourmainconclusions.Firstly,andmostcentraltoourinterests, SMC estimates the MDD as well as, or better than, either MHM estimator when we give the MHM estimators 10,000 i.i.d. draws. From this we conclude that theSMC algorithm showssuperiorperformance inthepresence ofsubstantive multimodality.Andfurthermorethesesimulationresultsgiveusacompelling reasontotrustournumericalestimatesinSection4. Secondly,theMHM-SWZestimatorperformswell(comparedtotraditional MHM)inthepresenceofbimodality.EventhoughtheMHM-SWZestimatorcon- 10Withknown𝛼,wecaneasilysample𝑝̃(𝜃 | 𝑌 1 ,𝑌 2 )directly.SeeAppendixD.1forthedirect samplingalgorithm. 11Frühwirth-Schnatter(2004)documentsthepoorperformanceofChib’sestimatorforeven smallmixturemodels,sowedonotconsiderithere. 12Inprinciple,itispossibletouseotherMonteCarlomethodstoobtainestimatesofmodel momentsthatareevenmoreprecisethanthoseachievedwithiiddraws(i.e.,antitheticvariates). Inpractice,thesemethodsareimpracticalintheenvironmentswestudyandtheyarenotwidely usedbyeconometricians.SeeGeweke(2005)fordetailsonthisclassofmethods. 17

TABLE II ACCURACY OF ln(MDD) ESTIMATES FOR MULTIMODAL TARGET DENSITY WITH DIFFERENT METHODS OF POSTERIOR SAMPLING AND MDD ESTIMATION. PosteriorSampler MDDEstimator RMSE SMC:𝑁 = 2000 SMC 0.421 𝑝𝑎𝑟𝑡 SMC:𝑁 = 5000 SMC 0.337 𝑝𝑎𝑟𝑡 MHM-SWZ 0.311 Direct:10,000draws MHM 1.068 MHM-SWZ 0.812 SingleMode MHM 0.829 Notes:VAR(𝑛 = 3,𝑝 = 5),trueln𝑝(𝑌) = 1725.289.Valuesarebasedon50replications. “MHM"referstotheoriginalimplementationofthemodifiedharmonicmeanestimator fromGeweke(1989).“MHM-SWZ"referstotheadaptationofMHMproposedand implementedinSimsetal.(2008).VARalgorithmsettings:SMCsampleruses𝜆 = 4, 𝑛 = 500,𝑁 = 8;andMHMestimateuses𝑝 = 0.9fortruncation. 𝜙 𝑏𝑙𝑜𝑐𝑘𝑠 structsitsapproximatingdensityaroundonlyoneofthedistribution’smodes,the approximatingdensityhasfatenoughtailstoeffectivelyincorporateinformation throughouttheparameterspace. Thirdly,bimodalityrendersMDDestimationviatheMHMmethodofGeweke (1989) hopeless, as it fails even under large numbers of draws from the target distribution.Sincetheaveragebiasisintermsofunitsofln𝑝(𝑌),wecaninterpret these values as approximately percentage errors of 𝑝(𝑌). Hence, for the VAR simulation,theMHMestimator tendsto overstate 𝑝(𝑌)bymore than50% ofits truevalue. Lastly, the extent to which multimodal target densities pose problems for MCMC methods remains a subject ofdebate, a debate whose waters we do not caretowadeintoanymoredeeplythannecessary,butwedowishtodocument thestakesofproperposteriorsampling.ThebasicconcernwhenusingMCMC isthat thesamplermay notmixproperly inareasonable amountoftime. Inthe worst-case scenario, the MCMC sampler never leaves a neighborhood around themodenearesttothepointfromwhichthealgorithminitialized.13 Therowsin 13Celeux, Hurn, and Robert (2000) document degeneracies of this nature when posterior samplingwithsimpleMCMCformixturemodels.However,Geweke(2007)showsthatthere 18

TableIIlabeled“SingleMode”showMDDestimatescomputedfromacaricature ofafailedMCMCalgorithm,i.e.thedrawsaresimulatedfromonlyoneofthe twomodes.Insuchasituationtheresultsaredisastrous. 4. Sequential Monte Carlo in Practice: MS-VAR Estimation Whileconceptuallystraightforward,justacursoryglanceatSimsetal.(2008) revealsthatinferenceforMS-VARsismessyinpractice.Inthissectionwerevisit the empirical application of Sims et al. (2008) using SMC estimation and two alternative prior specifications. We show that the use of an off-the-shelf prior commonlyusedinreduced-formVARssignificantlyimprovesdatafitforMS- VARs andmeaningfullyaltersthe posteriorprobabilityassigned tomodelsthat allowchangestomacroeconomicdynamics. 4.1 Structural MS-VAR Model WeestimateMS-VARmodelsoftheform (20) 𝑦′𝐴(𝑠) = 𝑥′𝐹(𝑠)+𝜀′Ξ(𝑠)−1 , 𝜀 ∼ 𝑖𝑖𝑑(0 ,𝐼 ) 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 𝑛 𝑛 (21) Ξ(𝑠) = 𝑑𝑖𝑎𝑔([𝜉 (𝑠),…,𝜉 (𝑠)]) 𝑡 1 𝑡 𝑛 𝑡 (22) 𝑝(𝑠 𝑆 ,𝑌 ,𝜃,𝑞) = 𝑞 𝑡| 𝑡−1 𝑡−1 𝑠,𝑠 𝑡 𝑡−1 (23) 𝑞 = 𝑞 , for𝑡 > 0 𝑠=𝑖,𝑠 =𝑗 𝑖,𝑗 𝑡 𝑡−1 whereΞ(𝑠)isan𝑛×𝑛diagonalmatrix,𝑠 isthetime𝑡realizationofadiscrete 𝑡 𝑡 latentprocessthatwecalla“state.”𝑆 isthehistoryofstatesuptoandincluding 𝑡−1 𝑡−1,and𝑌 isthehistoryofobservationsuptoandincluding𝑡−1. 𝑡−1 existMCMCmethods,suchasthemethodofFrühwirth-Schnatter(2001),abletohandlethea priori-known-and-symmetricmultimodalityofmixturemodelsthatresultsfromthearbitrariness ofstatelabeling.UnliketheexamplesinCeleuxetal.(2000)andGeweke(2007),ourexperience estimatingMS-VARsinthesubsequentsectionindicatesthepresenceofasymmetricposterior multimodality, also known as “genuine multimodality,” in addition to the typical symmetric multimodalitywhichcan,intheory,benormalizedaway.Thetargetdensityinthepresentsection possessesonlygenuinemultimodalitybyconstruction. 19

Let𝐻 bethetotalnumberofstatesinthelatentprocessandlet (24) 𝐴 = {𝐴(ℎ)} , 𝐹 = {𝐹(ℎ)} , Ξ = {Ξ(ℎ)} . ℎ∈{1,…,𝐻} ℎ∈{1,…,𝐻} ℎ∈{1,…,𝐻} We then let 𝜃 = {𝐴,𝐹,Ξ}. Note that we use the set notation in (24) to collect onlytheuniqueparametersineachsetofmatrices;nothingaboutourframework sofar assumesthat allparameters of𝐴(1)and𝐴(2), oranyother twostates, are unique.14 Thestateofthelatentprocessattime𝑡maybedeterminedbythejoint realization of 𝐾 independent latent processes, which each govern a different subsetof𝜃.We willrefer tothesetofparameterscorresponding toonlyprocess 𝑘as𝜃 .Thenotation𝑠 referstothejointstateofalllatentprocesses,whilethe 𝑘 𝑡 notation𝑠𝑘 referstothestateofonlyprocess𝑘. 𝑡 TheMS-VARhasthelikelihood 𝑇 ∏ (25) 𝑝(𝑌 𝜃,𝑞) = 𝑝(𝑦 𝜃,𝑞,𝑌 ) 𝑇| 𝑡| 𝑡−1 𝑡=1 where 𝐻 ∑ (26) 𝑝(𝑦 𝜃,𝑞,𝑌 ) = 𝑝(𝑦 𝜃,𝑞,𝑠,𝑌 )𝑝(𝑠 𝜃,𝑞,𝑌 ). 𝑡| 𝑡−1 𝑡| 𝑡 𝑡−1 𝑡| 𝑡−1 ℎ=1 Toevaluate(26)notethat 𝑝(𝑦 𝜃,𝑞,𝑠,𝑌 ) = (2𝜋)𝑛∕2 det(𝐴(𝑠)−1′Ξ(𝑠)−1𝐴(𝑠)−1) −1∕2 𝑡| 𝑡 𝑡−1 | 𝑡 𝑡 𝑡 | (27) { 1 } ×exp − (𝑦′𝐴(𝑠)−𝑥′𝐹(𝑠))Ξ(𝑠)2(𝑦′𝐴(𝑠)−𝑥′𝐹(𝑠)) . 2 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 andonecanevaluate𝑝(𝑠 𝜃,𝑞,𝑌 ),usingthefilteringalgorithmsinSimsetal. 𝑡| 𝑡−1 (2008). Theprobabilitymodelforthedatadescribedin(20)-(23)belongstotheclass ofmodelsconsideredinSimsetal.(2008)andmatchesthegeneralformoftheir 14Forexample,onecouldrestricttheregime-switchingsothat𝐴(1)and𝐴(2)differbyonly theirlastcolumn,whichisonespecificationconsideredinbothSimsandZha(2006)andSims etal.(2008). 20

empirical application. Each MS-VAR we estimate uses the same lag length (5 quarters)andexactlythesamedatasetasSimsetal.(2008),whichwealsoused in Section 3. We also follow Sims et al. (2008) in assuming that 𝜃 = {𝐴,𝐹} 1 and 𝜃 = {Ξ} follow independent regime-switching processes. Our only point 2 of departure from Sims et al. (2008) is that we do not restrict the parameters multiplyingvariable𝑖inequation𝑗 ateachlag𝑙 tochangeonlyproportionally acrossregimes.Wefindthatnotimposingthoserestrictionsallowsthemodelto achievesuperiordatafit. Since{𝐴,𝐹}determinetheconditionalmeanof𝑦 and{Ξ}determinesthe 𝑡 volatilityofthestructuralshocks,werefertothestateof{𝐴,𝐹}attime𝑡as𝑠𝑚 𝑡 andthestateofΞattime𝑡as𝑠𝑣.Denotethenumberofregimesfor{𝐴,𝐹}as𝐻 𝑡 𝑚 andthenumberofregimesforΞas𝐻 .Ifamodelhas𝐻 = 2and𝐻 = 3,then 𝑣 𝑚 𝑣 werefertoitusingtheshorthand2m3v.Sinceweassumethatthetwoprocesses evolveindependently,a2m3vmodelhas6jointstates. WeestimateMS-VARmodelsunderavarietyofchoicesfor𝐻 and𝐻 .For 𝑚 𝑣 eachchoiceof𝐻,weestimatethemodelunderthreedifferentpriorsfor(𝐴,𝐹), tothedescriptionofwhichwenowturn. 4.2 Priors for MS-VAR Coefficients 𝐻 𝐻 For each MS-VAR the priors on {(𝐴(ℎ ),𝐹(ℎ ))} 𝑚 and {Ξ(ℎ )} 𝑣 are 𝑚 𝑚 ℎ =1 𝑣 ℎ =1 𝑚 𝑣 independentandidenticalacrossℎ andℎ respectively. 𝑚 𝑣 1. SZPrior. This isthe priororiginally described inSims andZha (1998)and usedinSimsetal.(2008).Foreachstateℎ ,thepriortakestheform 𝑚 (28) 𝑎(ℎ ) ∼ (0,𝐼 ⊗𝐻 ) 𝑚 𝑛 0 (29) 𝑓(ℎ ) 𝑎(ℎ ) ∼ (𝑣𝑒𝑐(𝑆̄𝐴(ℎ )),𝐼 ⊗𝐻 ), 𝑚 | 𝑚 𝑚 𝑛 + where [ ] 𝐼 (30) 𝑎(ℎ ) = 𝑣𝑒𝑐(𝐴(ℎ )), 𝑓(ℎ ) = 𝑣𝑒𝑐(𝐹(ℎ )), 𝑆̄ = 𝑛 𝑚 𝑚 𝑚 𝑚 0 (𝑛(𝑝−1)+1)×𝑛 21

and 𝐻 ,𝐻 are prior parameters.15 In practice, the prior is implemented with 0 + dummyobservationsasdescribedinSimsandZha(1998).Thedummyobservationsdependonafewmomentsconstructedfromthedata,𝑦̄ and𝑠̄,andvector ofhyperparametersthatcontroltheinfluenceofdifferentsubsetsofthedummy observations.Thestandardimplementationsets𝑦̄asthemeanoftheobservations used to initialize the lags of the VAR and 𝑠̄ as the standard deviations of the residualsfromunivariateautoregressionsforeachdataseries,bothofwhichwe followhere.16 ForthispriorwesetΛidenticallytoSimsetal.(2008)at𝜆 = 1.0, 0 𝜆 = 1.0, 𝜆 = 1.0, 𝜆 = 1.2, 𝜆 = 0.1, 𝜇 = 1.0, and 𝜇 = 1.0 and we refer to 1 2 3 4 5 6 thissetofvaluesasΛ . 𝑆𝑊𝑍 2. Reduced-Form-Based (RFB) Prior. We take up the suggestion of Sims and Zha (1998) and derive a prior for (𝐴,𝐹) by placing a prior distribution over the reduced-form dynamics, summarized by Φ and Σ, then mapping to (𝐴(ℎ ),𝐹(ℎ ))via(15)inSection3.17 AppendixB.2givestheexactexpressions 𝑚 𝑚 forthedensityoftheRFBprior. Discussion. We pause here to describe the key features of, and relationship between,theSZandRFBpriors.AkeyaspectoftheSZpriorisitscenteringof𝐴 at0.SimsandZha(1998)notethatthepriorfor𝐴in(28)isequivalenttowhatone wouldderivefrominverse-WishartbeliefsaboutΣ = (𝐴𝐴′)−1 with𝑛+1degrees offreedomandwhilealsoignoringtheJacobiantermforthetransformationfrom 𝐴 → Σ.Withappropriatechoicesofhyperparameters,theRFBpriorfor𝐴differs fromtheSWZprioronlyinthatitincludestheJacobian,whichservestorecenter beliefsabout𝐴awayfrom0.Onecanseethesedifferencesintheprior𝐴densities clearlyinFigure1.18 SincetheVAR’s forecasterrorshavecovariances(𝐴𝐴′)−1, centering beliefs about 𝐴 at 0 amounts to centering beliefs about the VAR’s 15AspointedoutinRubio-Ramírezetal.(2010),theSZpriorhasthedesirablepropertiesof invarianceofthedensitywithrespecttoorthogonalrotationsof(𝐴,𝐹). 16AshasbeencommonsinceLitterman(1986),weusesixlagsintheunivariateautoregressions fromwhichweestimate𝑠̄. 17SimsandZha(1998)statethat,“Abetterprocedure,which,however,wouldnothavebeen verydifferentinpractice,wouldhavebeentoderiveourprioron𝐴 fromanaturalprioron𝛴−1, 0 theWishart.” 18Thevaluesinthefigurehavebeen“signnormalized”topositivevalues. 22

forecasterrorsat∞.InAppendixB.2.2wederivethat𝑝 (𝐹 𝐴) = 𝑝 (𝐹 𝐴) 𝑅𝐹𝐵 | 𝑆𝑍 | and thus the entirety of the difference between the SZ and RFB priors derives from their differences for 𝑝(𝐴). Thus the RFB prior differs from the SZ prior withrespectto 𝑝(𝐴),whilemaintainingthechoices for Λ inSimset al.(2008) mentionedabove. Though an undesirable property, the SZ prior’s mode of 𝐴 = 0 has little effectontheposteriorofconstant-parameterVARs.SimsandZha(1998)note thatthesamplesizeintypicalmacroapplicationsislargeenoughthatthiswill typicallybethecase:theJacobiantermconsistsof 𝑛 ∏ (31) 𝐽(𝑔(𝐴),𝐴) = 2𝑛 𝑎𝑗 , | | 𝑗𝑗 𝑗=1 while the likelihood contains ∏𝑛 𝑎𝑇, hence ignoring the Jacobian will have 𝑖=1 𝑖𝑖 little effect on posterior estimates as long as 𝑇 is “considerably larger than” 𝑛. However, effective sample sizes informing regime-specific parameters of MS- VARsmaywellbesmallenoughthattheomittedJacobiantermhasasubstantial effectoninference.Thebasiclogicissimple(andhardlynew):priorbeliefshave a larger effect on posteriors when sample sizes are smaller. Thus undesirable featuresofapriortypicallyemployedformodelsinformedbylargesamplesizes candistortinferencewhenreemployedformodelswith“smaller”samplesizes. 3.Reduced-Form-BasedHierarchical(RFB-Hier.)Prior. Itisknowninthe literaturethat VARposteriorscanbe sensitivetothe choiceofhyperparameters, Λ. However, it is not obvious to us what type of, or how much, shrinkage one shouldimposeinMS-VARs.Shouldweincreaseshrinkagetorestrictthe“size” ofthelargeparameterspaceinherentinMS-VARs?Orshouldwedecreasethe standardtypesofshrinkagetolettheparametersofdifferent,possiblyhighlytransitory,regimes takeonvaluesthat onemightconsider unreasonableinconstant parameterVARs? For thesereasons wefollow the approach ofGiannone, Lenza, andPrimiceri (2015)andformahierarchicalmodelinwhichweputpriorsoversomeelements of Λ, as well as 𝑦̄ and 𝑠̄, treating them as an additional vector of parameters 23

to estimate.19 Following Giannone et al. (2015), we estimate 𝜆 , 𝜇 , 𝜇 , and 𝑠̄. 0 5 6 We also estimate 𝜆 and 𝑦̄ since it seems reasonable to us that the MS-VAR 4 mightfavoradditionalflexibilityfortheconstantterm(controlledby𝜆 andthe 4 average level of variables (controlled by 𝑦̄).20 When estimating 𝜆 , 𝜆 , 𝜇 , and 0 4 5 𝜇 , we give each parameter a prior from the Gamma distribution with a mode 6 atthevalueusedintheSZandRFBpriorsandastandarddeviationofone.For {𝑦̄ }𝑛 ,wesetidenticalandindependentNormaldistributionscenteredat0with 𝑗 𝑗=1 a standard deviation of 0.1. For {𝑠̄ }𝑛 , we use relatively diffuse independent 𝑗 𝑗=1 InverseGammadistributionsasinGiannoneetal.(2015). The hierarchical approach gives us 10 additional parameters to estimate. However,whenusingtheSMCalgorithm,estimating(Λ,𝑦̄,𝑠̄)doesnotintroduce anyadditionalcomplications.Tobesure,theefficacyofthesamplerwilldiminish slightly because of the increased dimensionality of the parameter space, but estimationproceedswithoutanymodificationofthealgorithm.Werewetouse MCMC methods to estimate the model with the RFB-Hier. prior, we would have toinclude anadditionalMetropolis-Hastingsstepin oursampler, asthere are not natural conditionally conjugate relationships for all of hyperparameters. Moreover,estimatingtheseparameterswoulddeterioratetheperformanceofthe algorithm,giventheobviousrelationshipsbetweenΛand(𝐴,𝐹)andthefactthat posteriorsofsomeofthehyperparameters–inparticular𝑦̄–arenonstandard.21 Priorson OtherParameters. ThepriorsforallMS-VARsweconsidershare commonspecificationsforthevolatilitiesandtransitionregimes.Forthevolatilities,𝑝(𝜉 (ℎ ))areindependentandidenticallydistributedacross𝑗 andℎ such 𝑗 𝑣 𝑣 that (32) 𝜉2(ℎ ) ∼ (𝛼̄ ,𝛽̄ ) 𝑗 𝑣 𝑗 𝑗 19SimsandZha(1998)alsosuggestthepotentialtoformahierarchicalpriorbyputtingprior beliefsoverΛ,statingthat,“Inprinciple,thesehyperparameterscanbeestimatedorintegrated outinahierarchicalframework.” 20WefollowGiannoneetal.(2015)andfixtheotherparameters. 21SeethediscussioninHerbstandSchorfheide(2015)onparameterblockinginMetropolis- Hastingsalgorithms. 24

FIGURE 1.—PriorDensitiesfor𝐴 22 0.007 0.006 0.005 0.004 0.003 0.002 0.001 0.000 0 100 200 300 400 500 600 700 ytisneD SWZ RFB Notes: The figure shows the prior distribution for the parameter 𝐴 under the 22 SWZandRFBpriors. andweset𝛼̄ = 1and𝛽̄ = 1forall𝑗 and𝑘,asinSimsetal.(2008).Additionally, 𝑗 𝑗 wenormalizethefirststateofvolatilitiessothat𝜉 (1) = 1forall𝑗. 𝑗 Priorsoverthetransitionprobabilities𝑞 forboththemeanandshockregimes 𝑖𝑗 are of the unrestricted Dirichlet form from Sims et al. (2008). For an 𝑛 state process𝑖,thisdistributionisparameterizedby𝑛hyperparameters,{𝛼 }𝑛 which 𝑖𝑗 𝑗=1 SWZsuggestelicitingbyintrospectionaboutthepersistenceofeachregime.For everyspecification(regardlessofthenumberofregimes),weset 𝛼 = 5.667,𝑖 = 𝑗 and𝛼 = 1,𝑖 ≠ 𝑗. 𝑖,𝑗 𝑖,𝑗 Foratwostateprocess,thisimpliesanaveragedurationofagivenregimeofabout 6.5quarters.Asthenumberofstatesincreases,thisexpectedlengthdecreases. 4.3 Estimation Details Under each prior, we estimate MS-VARs for 𝐻 = 1,2 and 𝐻 = 1,…,5 𝑚 𝑣 usingtheSMCalgorithmdescribedinSection2.Weset𝑁 = 4000,𝑁 = 𝑝𝑎𝑟𝑡 𝑏𝑙𝑜𝑐𝑘𝑠 12 (random) and 𝑀 = 1, using the conditional variance given by the normal 25

approximation for the mutation step. For the tempering schedule we set 𝜆 = 4 and𝑁 = 2000. 𝜙 We have run our SMC sampler in both Fortran and MATLAB. In Fortran, estimationofa given modeltakesbetweenoneandten minutes,withlikelihood evaluationsparallelizedacrossthe12coresofadesktopwithanIntelXeonx5670 CPU.TheMatlabversionexecutingonthesamemachineroughlytakesbetween twentyminutestosixhours,dependingonthenumberofstates.22 Foreachspecificationweestimatethemodelwith50independentrunsofthe algorithmandreportboththepointestimateandstandarderrorofthemodel’s log(MDD).OwingtothedifficultyofMCMCestimationofMS-VARs,previous researchers have been able to report standarderrors only from different subsets of draws along a single MCMC chain.23 Since we initialize each run of SMC from an independent draw of initial particles, there is no risk of our standard error estimates being spuriously small because of influential initial conditions. Hence,weinterprettheprecisionofourestimatesto,infact,reflectaccuracy. 4.4 Estimation Results: MS-VAR Model Selection TableIIIshowsthepointestimatesandassociatedstandarderrorsoflog(MDD) values of MS-VARs, including the constant-parameter VAR (the special case of 1m1v), for each of the three priors. Figure 2 shows the results graphically. Fromourestimationresultswededucefourmainfindings. Firstly,andconsistent with the key findings of Sims and Zha (2006) and Sims et al. (2008), the best fittingmodelforeachpriorisa1m3vor1m4vmodeland,furthermore,regimeswitchinginshockvariancesiscriticaltofittingthedata.Indeed,theworstfitting 22Inprinciple,wecouldalsosimulatefromtheposteriorsusingthesamplerproposedbySims etal.(2008)fortheSWZpriorandmodifytheMetropolis-within-Gibbsstepsofthesampler toaccommodatetheRFBprior.However,liketheotherresearcherswequotedinSection1,we havefoundtheMCMCestimationprocesscumbersomeandlengthy.Experimentationacross modelsindicateddifficultieswithreliablyfindingtheposteriormode,makingthebatchestimation exercisetedious.OnasubsetofmodelswiththeSWZprior,whichwesuccessfullyrepeatedly sampled using MCMC, the SMC and MCMC posteriors more-or-less coincided. The SMC posteriorswereslightlywiderthantheMCMCones,whichgenerallyindicatesamorethorough posteriorexploration. 23For MCMC estimation, Gelman and Rubin (1992) emphasize the importance of using multipleindependentchains,witheachchaininitializedfromadifferentstartingvalue. 26

TABLE III SMC ESTIMATES OF ln(MDD) FOR MS-VAR MODELS. Prior SWZ RFB RFB-Hier. Model ln(MDD) (S.E.) ln(MDD) (S.E.) ln(MDD) (S.E.) 1m 1v 1759.10 (0.07) 1754.77 (0.08) 1778.15 (0.78) 1m 2v 1869.51 (0.09) 1873.24 (0.13) 1877.93 (0.71) 1m 3v 1872.64 (0.11) 1877.83 (0.18) 1880.92 (0.81) 1m 4v 1872.57 (0.12) 1879.17 (0.14) 1880.07 (1.07) 1m 5v 1871.27 (0.16) 1878.03 (0.15) 1878.82 (1.29) 2m 1v 1845.23 (1.45) 1836.78 (2.84) 1857.68 (2.73) 2m 2v 1867.48 (0.33) 1873.70 (0.55) 1879.94 (0.74) 2m 3v 1869.98 (0.45) 1877.34 (0.55) 1880.32 (0.93) 2m 4v 1869.55 (0.27) 1878.22 (0.47) 1879.58 (1.17) 2m 5v 1868.26 (0.45) 1876.83 (0.43) 1877.49 (1.55) Notes: ln(MDD) estimates are means from 50 independent runs of the algorithm for eachmodel.Wegivestandarderrorsofthelog(MDD)estimates,computedacrossthe 50runs,inparentheses.TheSMCalgorithmhyperparametersare𝑁 = 4000,𝜆 = 4, 𝑝𝑎𝑟𝑡 𝑁 = 12,𝑁 = 2000,and𝑀 = 1. 𝑏𝑙𝑜𝑐𝑘𝑠 𝜙 regime-switchingspecificationisalwaysthe2m1vmodelbyalargemargin. Secondly,changingthepriorfromSWZtoRFBtoRFB-Hierincreasesthe MDDofallregime-switchingmodels(otherthan2m1v).Wetakethistomean thatthetwovariantsoftheRFBpriorarefirstandforemostfavoredbythedata rather than any particular specification. The MDD improvements are large for anyparticularspecification,inmanycasesexceeding10logpoints.24 Thirdly,inadditiontoimprovingdatafitforallmodels,changingtheprior fromSWZtoRFBtoRFB-Hierdramaticallyincreasetheposteriorprobabilityof a2mspecificationbeingthecorrectmodel.25 TableIVshowstheposteriorprobabilityon2mspecificationsconditionaloneachprior,alongwiththeunconditional posteriorprobability on allspecifications for each prior. Under theSWZprior, 24Notethat,withequalprioroddsontwomodels,a10pointdifferenceintheirlog(MDD)s putstheposterioroddsinfavorofthebetterfittingmodelabove20,000to1. 25Incalculationsofposteriorprobabilities,weassumeallmodelsareaprioriequiprobable. 27

FIGURE 2.—log(MDD)estimatesforeachMS-VARspecificationandprior. 1885 1880 1875 1870 1865 m1v2 m1v3 m1v4 m1v5 m2v2 m2v3 m2v4 m2v5 model ddm Notes:Thefigure showsbox plotsfor logMDD ofeachspecification andprior, computedfrom50independentrunsoftheSMCalgorithmforeachspecificationpriorcombination.Weomitthe1m1vand2m1vbecausetheyaretheworstfitting modelsbywidemargins. thereisnegligibleprobability(0.06)onchangesinthemeanparameters;thatis, the2𝑚models.UndertheRFBprior,inwhichwehaveonlytakenintoaccount theJacobiantermandtheIW degreesoffreedom,theprobabilityincreasesto 0.29.UndertheRFB-Hier.theprobabilityincreasesfurtherto0.43,nearlyacoinflipwiththeonly-variances-changeexplanation.Thisfindingcontrastswiththe resultsinSimsandZha(2006)andSimsetal.(2008)whofindalandslidevictory (10logpointsinSimsetal.(2008))fortheonly-variances-changespecification. RecallthatakeydifferencebetweenourmodelandthemodelsinSimsandZha (2006)andSimsetal.(2008)isthatwedonotimposetheadditionalrestriction of only proportional switching across the coefficients multiplying variable 𝑖in equation𝑗.Simsetal.(2008)expresstheconcernthatallowingallparametersto 28

TABLE IV POSTERIOR PROBABILITY OF 2M MODEL CONDITIONAL ON PRIOR. Prior 𝑃(2𝑚 𝑃𝑟𝑖𝑜𝑟,𝑌) 𝑃(𝑃𝑟𝑖𝑜𝑟 𝑌) | | SWZ 0.06 0.00 RFB 0.29 0.12 RFB-Hierarchical 0.43 0.88 Notes:Thesecondcolumngivestheposteriorprobabilityofthe2mmodels,conditional ontheprior.Thethirdcolumngivestheposteriorprobabilityonallmodelsestimated withaparticularprior. change would over-parameterize the model and such models would be heavily penalized for their complexity in the MDD calculation. Our results show that thesefearsareunwarranted.26 Fourth,acrossallpriorsandmodelspecifications,Markov-switchingparametersofferlargegainsinmodelfitcomparedtoconstant-parameterspecifications, aswasalsofoundinSimsandZha(2006).Forallmodelswithatleast2volatility regimes,theMDDgainsexceedastaggering100logpoints. 4.5 Estimation Results: Examining the 2m3v Model 4.5.1 Hyperparameter Posteriors Figure3showsthepriorsandposteriorsfortheestimatedelementsofΛunder both1m3vand2m3vspecificationsandAppendixEcontainstablessummarizing theposteriorsofallestimatedhyperparameters,forallspecifications.Themost important feature of the estimated hyperparameter posteriors is the difference of the 𝜇 posteriors (lower left panel of Figure 3) under the 1m3v and 2m3v 5 specifications.Thehyperparameter𝜇 controlsshrinkagetowardsthe“sums-of- 5 coefficients” dummy observations. In particular, the 2m3v model wants virtually no influence for these observations. This represents a dimension in which the model strongly favors weaker prior restrictions to make the best use of the 26Someresearchersalsofindtheproportionalityrestrictionsundesirableontheoreticalgrounds. Asiswell-known,andwaspointedoutparticularlystarklyinBenatiandSurico(2009),onewould expectallcoefficientsoftheVARrepresentationofaDSGEmodeltochangeifonechangesthe DSGEmodel’spolicyruleparameters. 29

FIGURE 3.—PosteriorofΛ 2.0 1.5 1.0 0.5 0.0 0 1 2 3 4 5 ytisneD λ 0 4.5 4.0 Prior 3.5 1m3v 3.0 2m3v 2.5 2.0 1.5 1.0 0.5 0.0 0 1 2 3 4 5 ytisneD λ 4 2.0 1.5 1.0 0.5 0.0 0 1 2 3 4 5 ytisneD µ 5 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0 2 4 6 8 10 ytisneD µ 6 infrequentlyoccurringadditionalconditionalmeanregime. 4.5.2 Conditional Mean Regimes Ratherthandescribethetime-seriesofregimeprobabilitiesconditionalon an estimate of the model’s posterior mode, Figure 4 shows the time-series of regime-probabilitiesforeachofthe4000particlesfromasinglerunoftheSMC algorithmforthe2m3vmodel.Inparticular,thefigureshowsthetime-seriesdata usedinestimation(scaleonleftaxis)togetherwiththeposteriorprobabilitiesof estimatedregimes (conditionalmeansintop panel,shockvariances inbottom panel).Forexample,ifwelookattheyear1990inthetoppanelandseethatthe figure’sbackgroundisuniformlywhitefromtoptobottom,thenthatmeansthat virtuallyall4000particlesareinagreementabouttheregimeprobabilityatthat date. Fromthe top panelofFigure4 onecanseea substantialamountofdisagreementacrossparticlesaboutthetimingofregimeoccurrences.Inparticular,the posterior contains two modes which support alternative interpretations of the 30

ℎ = 1regime.Tomakethebimodalitymorevisuallyapparent,wesortedthe 𝑚 particles in Figure 4 in ascending order from the top according to the average probabilityofℎ = 2overthe2yearsofobservationsfrom1965:Q1to1966:Q4. 𝑚 Near the top of the figure, one can see a set of particles favoring an ℎ = 1 𝑚 occurrence in 1965-1966, some of which favor a recurrence in the late 1990s. Thesesameparticlesputlessprobabilityonℎ = 1intheearly1980sthandothe 𝑚 particlesnear thebottomof thepanel. Documentingthis relationship formally, thereisanegativecorrelationof-0.34betweentheaverageprobabilityofℎ = 1 𝑚 over 1965-1966 and the average probability of ℎ = 1 over 1980-1981, thus 𝑚 revealingasubstantialamountofmultimodality.27 Thekeymacroeconomicfeatureofthefirstmodeisperiodsofrapideconomic growthwith littleinflationand thuslittlemovementin thenominalinterest rate. OnemightinterprettheparametervaluecorrespondingtothismodeasrepresentingperiodsofaparticularlyflatPhillipsCurve.Othertime-seriesinvestigations have uncovered nonlinearities and/or time-variation in the Phillips curve that mesh well with this type of time-variation in economic dynamics. Stock and Watson(2010)andmanyreferencesthereindocumentanonlinearrelationship betweenthetraditionalgapmeasuresandinflation,whereinthestrongestPhillips curverelationshipoccursinrecessions.Theirfindingisroughlyconsistentwith theinterpretationoftheparametervaluesgeneratingthismode:thattherelationship between inflation and economic slack deteriorates during (some) periods of quicklydiminishingslack. Onecan alsoexaminehowa fixedcoefficientstructuralgeneral equilibrium modelfitstheeconomicdynamicsduringtheℎ = 1period.Thattheseperiods 𝑚 might represent a structural change economic dynamics is, in a sense, visible 27Whenwedescribethefeaturesofaparticularregime’sposterior,thereisanissueabout whichofagivenparticle’sparametervaluesrepresentwhichregime.Inthestatisticsliteratureon mixturemodels,thisisawell-knownannoyancereferredtoasthe“labelswitchingproblem.”We referreadersinterestedintheissuetoJasra,Holmes,andStephens(2005)’sexcellentdescription andsurveyofsolutions.Sinceonecanalwaysrelabelregimesarbitrarily,amodelwith2regimes necessarilyhas2symmetricmodes.Thestatisticsliteratureonmixturemodelsusestheterm “genuinemultimodality”torefertomultimodalityintheposteriorthatexistsevenafternormalizing drawsaroundoneoftheinherently-symmetricmodes.AppendixCcontainsthedetailsonour handlingofnormalizationandrelabelingfortheMS-VAR. 31

fromthehistoricaldecompositionsimpliedbytheNK-DSGEmodelofSmets andWouters(2007).TheSmetsandWouters(2007)modelinterpretsthesecond halfofthe1990sasaperiodinwhichthejointdynamicsofoutputgrowthand inflationarecausedbyasequenceofsimilarlysizednegative“mark-up”shocks occurringformorethan5yearsinarow.The“markup”shocksintheSmetsand Wouters (2007) model function largely as a time-varying slope to the Phillips Curve. The persistence of necessary mark up shocks suggests a dimension of modelmisspecification. The key macroeconomic feature of the second mode (the particles whose time-seriesofprobabilitiesarenearer the bottomofFigure4’stoppanel)isan increasedresponsivenessofthenominalinterestratetochangesininflation.To documentthisformallywecalculatetheimpulseresponseofthenominalinterest rate to a one standard deviation sized inflation shock (the second shock in the structural system) on impact under each regime, conditional on the posterior drawbelongingtotheregionaroundthesecondmode.Figure5showsdensity estimatesof thesetworesponses undereachregime. WhiletheIRFsunder the second regime are not as sharply identified as those of the first, the greater probabilityofalargerresponseunderregime1isstillapparentinthefigure. 4.5.3 Shock Variance Regimes Turningtotheshockvolatilityregimes,showninthesecondpanelof Figure 4,onecanseethatasingleregimeprevailedfromthemid-1980stotheendof the sample and that same regime occurs in the late 1960s. For most posterior drawsthisregimehasthelowestvarianceforallthreestructuralshocks. Notsurprisingly,theregimewiththelargestshockstandarddeviationsoccurs during the mid-1970s and early 1980s, similar to the 1𝑚 models. Echoing the mainresult ofSims andZha(2006) andSimset al.(2008), ourmodelinterprets theGreatModerationasaonce-and-for-alldecreaseinshockvolatilities,inline with a “good luck” explanation. The same “good luck” regime also prevailed duringthelate1960s. 32

5. Conclusion LedbySimsandZha(2006)andSimsetal.(2008),MS-VARshaveplayed a prominent role the debate over whether or not any structural change to US macroeconomicdynamicshasoccurredinthelast50years.Inthispaperwehave shownthatsomesmalltweakstorecently-developedSMCalgorithmsallowsus toapplythemtoMS-VARestimation.SMCdeliversfast,reliablecharacterization of posteriors and dramatically broadens the space of tractable priors. We use the easeofSMCimplementationunderalternativepriorstoshowthat,relativetothe conclusionsofSimsetal.(2008),theuseofanoff-the-shelfpriortypicallyapplied toreduced-formVARsimprovesdatafitandsubstantiallyaltersposteriorbeliefs about changes to economic dynamics. When using the hierarchical reducedform-based prior we find a 43% chance that the true model features changing macroeconomicdynamicseitherintheformofaperiodicallyflatteningPhillips Curveorincreasedresponsivenessofthemonetaryauthoritytoinflationshocks. The results in our paper suggest that the choice of priors deserves careful attention when working with densely-parameterized models, such as MS-VARs. Itmaywellbethecasethatappropriatepriorsforsuchmodelsrequireustodepart frompreviousmethods that werechosenfor eitheranalytical orcomputational tractability.Whetherornotsuchdeparturesarenecessaryisanempiricalquestion, butthispapershowsthatitisaquestionwhoseanswerwillmostlikelybefound byusingSMCmethods. 33

FIGURE 4.—ObservablesandRegimeProbabilities m Data and P(s |data) t 20 3 1 15 10 Draws from Percentage p(3|Y) Points 5 3 2000 1.0 ygap t 0 P(sm = 2) t : 0.5 t -5 R t 0.0 -10 3 4000 196519701975198019851990199520002005 Date v Data and P(s |data) t 20 3 1 15 10 Draws from Percentage p(3|Y) Points 5 3 2000 1.0 0 P(sv = 1) t 0.5 -5 0.0 -10 3 4000 196519701975198019851990199520002005 Date Notes:Thefigureshowsthetime-seriesdatausedinestimation(leftaxis)together withtheposteriorprobabilitiesfortheconditionalmeanregimes(toppanel)and volatilityregimes(bottompanel),foreachof4000particlesfromasinglerunof SMCforthe2m3vmodel.Toppanelparticlesaresortedaccordingtoprobability ℎ = 1averagedover1965-1966. 𝑚 34

FIGURE 5.—DensityEstimatesforIRFof𝑅 to𝜀𝜋. 𝑡 𝑡 mode 2: R response on impact to inflation shock 8 regime 2 6 y tis 4 n e d 2 regime 1 0 -1 -0.5 0 0.5 1 1.5 2 percentage point response Notes:Figureshowsthedensityestimatesfortheimpactresponseoftheinterest ratetoaonestandarddeviationinflationshockforeachregime,conditionalon beingonthesecondmode(describedintext). 35

A. Additional Computational Results A.1 Assessing the Importance of Tuning Parameters The researcher applying SMC faces a few key questions about SMC’s use inpractice.Howshouldonechoose𝑛 (or𝜆)?Howmanyparticlesshouldone 𝜙 use? How many parameter blocks? While theoretical results on the optimal choiceofthealgorithm’stuningparametersarebeyondthescopeofthispaper, inthis sectionwe exploit therelativetransparency ofVARs tomovebeyondthe suggestionsofHerbstandSchorfheide(2014)andfindwell-performingchoices fortuningparameters. Toassesstheimportanceofeachofthealgorithmparameters,wevaryeach component while holding the rest of the hyperparameters at the baseline case. This gives a rough “partial derivative” of each parameter’s contribution to the effectivenessofthealgorithm.Inparticular,we 1)considertheuseoftheproposal distributionforthemutationstepsasdescribedinHerbstandSchorfheide(2014), 2)varythenumberofparticlesto1000and5000,3)varythenumberofblocksand themechanismforselectingthem,4)assessthetrade-offbetweenthenumberof bridgedistributionsandintermediateMetropolis-Hastingsstepswhilekeepingthe numberoflikelihoodevaluationsfixedbysetting(𝑁 ,𝑀)=(50,10),and5)vary 𝜙 the 𝜙 schedule by testing 𝜆 = 1,7. We run 20 Monte Carlo replications of the sampler for each configuration of hyperparameters and examine the distribution of the estimates of ln𝑝(𝑌). Table A-1 shows the results of our Monte Carlo exercise. Eachrow afterthefirst describesadeviation fromthe baselinetuning parameters and shows the estimation performance of the algorithm under that parameterization. Underthe baselinesetting, thesampler usingthe structural parameterization is slightly more accurate. The primary reason for this is that the RWMH is restrictedtodrawswhichsatisfyapositivedefinitenessconditionforΣ.When adrawdoesnothavethisproperty,itisrejected,reducingtheefficiencyofthe MH algorithm and hence the size of movements in the parameter space. The structural parameterization operates on the Cholesky decomposition of Σ thus negatingtheproblemofdrawinginadmissableparameterizationsandallowing 36

TABLE A-1 SMC ESTIMATES OF ln𝑝(𝑌) FOR VAR: EFFECTS OF ALGORITHM TUNING PARAMETERS VARParameterization SMCTuningParameters ReducedForm Structural Σ 𝑁 𝑁 Blocking 𝑁 𝑀 𝜆 RMSE RMSE 𝑝𝑟𝑜𝑝 𝑝𝑎𝑟𝑡 𝑏𝑙𝑜𝑐𝑘𝑠 𝜙 Cond 2000 3 Random 500 1 4 0.29 0.21 Un - - - - - - 1.37 1.90 - 1000 - - - - - 0.39 0.47 - 5000 - - - - - 0.19 0.11 - - 1 - - - - 0.61 0.50 - - 2 - - - - 0.39 0.38 - - 2 (Φ,Σ) - - - 0.44 3.95 - - 3 Row - - - 0.26 0.75 - - 4 (Row,Σ) - - - 0.18 1.51 - - - - 50 10 - 0.43 1.33 - - - - - - 1 1.87 4.02 - - - - - - 7 0.41 0.37 Notes:Thesymbol“-”indicatesinheritanceoftheparametervaluefromthebaseline parameterizationgiveninthefirstlineofthetable.RMSEistherootmeansquarederror oftheestimatesofln𝑝(𝑌).VARhas3variables,3lags,andaconstantterm.Thetrue valueisoftheln(MDD)is1791.9. formoreeffectivemoves.28 The first set of deviations we consider, line two in Table A-1, shows what happenswhen(9)isusedastheRWMHproposalvariancerathertheconditional approximation,givenby(8).Thisvariationofthesamplermostcloselyresembles the one used for DSGE models by Herbst and Schorfheide (2014). Using the unconditional variance estimate in the block RWMH leads to substantial deteriorationinperformanceofthesampler.Whiletheaveragelogmarginaldata density still reliably estimates the true value, the standard deviation of the log 28SinceouridentificationschemeistheCholeskydecomposition,negativeelementsalongthe diagonalshouldtechnicallyhavezerodensity.Howeverourpriordensitydoesnotactuallyrule outthesevaluesandthustreatsthesignofacolumnof𝐴and𝐹 assimplyanormalization. 37

MDDestimateacrossthetwenty simulationshasincreasedmarkedly:relative to thebaselinealgorithmtheRMSEisaboutfivetimeslargerforthereduced-form parameterizationand almosttentimes larger for the structural. Onereason for this is that the VARprior exhibits substantial correlation among keyparameters. Whenthisisnotaccountedfor,thesamplerperformsverypoorlyinthekeyearly stages when the prior dominates the likelihood contribution. To contextualize theefficiencygainsfromourmodificationoftheHerbstandSchorfheide(2014) proposalvariance,we findthatthegains inaccuracyfromusing theconditional approximationaresignificantly greaterthan thegainsfrom doublingthe number ofparticles(orevenmovingfrom1000to5000particles). Thesecondsetof deviations we consider, rows3 and4ofTableA-1,shows theeffectsofchangingthequantityofparticles.Asonewouldexpect,RMSEs fall as the number of particles increases, roughly in line with the central limit theoremsinthepreviouslymentionedliterature. The third set of deviations we consider, rows 5 through 9 of Table A-1, examinestheroleoftheblockingconfigurationsoftheparametervectorduring the mutation phase. First, we consider using a single block for all parameters and we can see that failing to break the parameters into smaller blocks yields RMSEstwiceaslargeasourbaselineconfiguration.Second,weallowfortwo blocksinsteadofthebaselinenumber,three.Thesetwoblocksarechoseneither randomly or by dividing the parameter vector in a “natural way,” with one block for Φ and another for Σ. We also allow for a three block fixed scheme where the parameters are grouped by the row in which they enter this VAR. For the samplers using the reduced form parameterization, the effects of blocking is generally smaller. Reducing the number of blocks to 2, but maintaining the randomassignmentofparametersintoblockseachstage,resultsinanincrease in the RMSE to 0.39, relative to the baseline of 0.29, which has three blocks. Removing the randomizationevery stage andpartitioning the parameter inthe “naturalway”:[Φ,Σ],results ina modestincrease inthe RMSE.Forthesampler using the structural parameterization, the quality of the marginal data density estimate deteriorates much more when using a fixed block scheme. Under the naturalpartitioningof𝜃 intoΦandΣ,theRMSEofthelogmarginaldatadensity 38

is3.95,morethententimesthesizewhenrandomizingtheblocks. The fourth type of deviation we consider concerns the number of 𝜙 stages andmutationsteps.Row10ofTableA-1showstheresultswhenthenumberof stages𝑁 isreducedto50butthenumberofintermediateMHstepsisincreased 𝜙 to 10, thus keeping the total number of likelihood evaluations the same as under thebaselineconfiguration.Weseethatperformance,measuredintermsofRMSE is,deterioratesunderthissettingrelativetothebaseline.Inthecaseofstructural parameterization,theincreaseinRMSEissubstantial.Onereasonforthisisthat the dropin the numberof intermediatestagescauses the “difference” between twosubsequentdistributionstoincreasesubstantially,inawaythattheincreased MHstepscannotcompensatefor.Anotherreasonisthateventhoughtheblocks arerandomizedateachstage,theblocksarefixedwithinthesequenceofmutation MHstepsatagivenstage,sothatevenafew“bad”configurationsofblockscan deteriorateperformancedespitealargenumberofMHsteps. Finally,thefifthsetofdeviationsweconsider,thebottomtworowsofTable A-1, shed light on the role of the 𝜙 schedule. When 𝜆 = 1, the schedule is linear,resultingininformationbeingaddedtooquickly.Onlyafewparticleshave meaningfulweightaswemovefromthepriortotheearlystagesoftheschedule. This means that many particles at the end of the algorithm share a common ancestor, and this dependence manifests itself in poor estimates. Indeed, this configurationistheonlyoneexhibitingmeaningfulbias.Moreover,theRMSEof thelogmarginaldatadensityestimateunderthestructualparameterizationis4.02 morethantwicethatofthereducedformestimate,suggestingthatthediscrepancy betweenthepriorandposteriorisworseunderthestructuralparameterization. Addinginformation“too”slowlydoesnotincurthesamepenalty,astheresults when 𝜆 = 7, show. While the RMSEs of 0.41 and 0.37 are slightly higher than under the baseline case, because of the relatively large differences in the distributionslaterinthesampler,themeanerrorisstillquitesmall.Onereason for this is that the shape of the posterior is largely determined when 𝜙 is quite small, so even large differences between 𝜙 later in the schedule don’t result in radicallydifferentdistributions. Overall, theSMCalgorithmworkswellacrossa widerange ofvalues forthe 39

hyperparametersunderboththereducedformandstructuralparameterizations oftheVAR. B. VAR Priors B.1 Conjugate Reduced-Form VAR Prior and MDD Expression The standard conjugate prior for the parameters (𝛴,Φ) of a reduced-form VAR specifies Inverse-Wishart beliefs about 𝛴 and Gaussian beliefs about 𝑣𝑒𝑐(Φ) 𝛴. | (33) Σ ∼ (Ψ,𝜈) (34) 𝑣𝑒𝑐(Φ) Σ = (𝑣𝑒𝑐(Φ∗),Σ⊗Ω−1) | whereΨ,𝜈,Φ∗,andΩpriorhyperparametersspecifiedbytheeconometrician.In practice,researcherstypicallyimplementVARpriorsbysupplementingthedata matrices𝑌 and𝑋 withdummyobservations𝑌∗ and𝑋∗.Theresultingposterior forΣandΦisidenticalundereitherapproachaslongas (35) Ω = 𝑋∗′𝑋∗ (36) Φ∗ = ( 𝑋∗′𝑋∗ )−1 𝑋∗′𝑌∗ ( ) ( ) (37) Ψ = 𝑌∗′𝑌∗ − Φ∗′ΩΦ∗ (38) 𝑑 = 𝑇∗ −𝑚, with𝑇∗ and𝑚thenumberofrowsandcolumnsof𝑋∗ respectively. Giventhedataandchoicesofpriorhyperparametersanddefining (39) 𝑆̃ = (𝑌′𝑌 +Φ∗′ΩΦ∗)−(𝑋′𝑌 +ΩΦ∗)′(𝑋′𝑋 +Ω)−1(𝑋′𝑌 +ΩΦ∗), 40

theMDDoftheVARisgiveninclosedformbytheexpression ( (𝑋′𝑋 +Ω) −𝑛∕2 )( 𝑆̃ +Ψ −(𝑇+𝜈)∕2 ) 𝑝(𝑌) = (2𝜋)−𝑇𝑛∕2 | | | | Ω −𝑛∕2 Ψ −𝜈∕2 | | | | (40) ( )( ) 2(𝑇+𝜈)𝑛∕2 Γ ((𝑇 +𝜈)∕2) × 𝑛 . 2𝜈𝑛∕2 Γ (𝜈∕2) 𝑛 B.2 Reduced-Form-Based Prior B.2.1 Prior Density for 𝐴 Ourreduced-form-basedpriorfor𝐴isderivedfromtherelationship(𝐴𝐴′)−1 = 𝛴. Asis standardin theanalysisof reduced-formVARs, wegive𝛴 a densityof theinverse-Wishartfamily(Ψ,𝜈),i.e. (41) 𝑝(𝛴 Ψ,𝜈) = | Ψ | 𝜈∕2 𝛴 −(𝜈+𝑛+1)∕2exp { − 1 𝑡𝑟[Ψ𝛴−1] } | | | 2𝜈𝑛∕2Γ (𝜈∕2) 2 𝑛 andthenderivetheimplieddensityof𝐴fromthemappings𝑔 and𝑔−1 described in(12)and(15). Thedensityfor𝛴 givenby(41)isequivalenttospecifyingaWishartdensity for𝛴−1 withscalematrixΨ−1.Thuswemightjustaswellwrite [ ] 1 𝑝(𝛴−1 Ψ−1,𝜈) = | 2𝜈𝑛∕2 Ψ−1 𝜈∕2Γ (𝜈∕2) (42) | | 𝑛 { 1 } × 𝛴−1 (𝜈−𝑛−1)∕2exp − 𝑡𝑟[Ψ𝛴−1] . | | 2 Letting (43) 𝑔 (𝐴) = 𝛴−1 = 𝐴𝐴′ 𝛴−1 wethenhave [ ] 1 𝑝 (𝐴 Ψ,𝜈) = 𝑅𝐹𝐵 | 2𝜈𝑛∕2 Ψ−1 𝜈∕2Γ (𝜈∕2) (44) | | 𝑛 { 1 } × 𝐴𝐴′ (𝜈−𝑛−1)∕2exp − 𝑡𝑟(Ψ(𝐴𝐴′)) 𝐽(𝛴−1,𝐴) | | | | 2 41

where 𝐽(𝛴−1,𝐴) denotes the Jacobian of the transformation from 𝐴 to 𝛴−1. Magnus and Neudecker (1980)show that, assuming the upper triangularity of 𝐴, onecanwrite𝐽 as29 𝑛 ∏ (45) 𝐽(𝛴−1,𝐴) = 2𝑛 𝐴𝑖 . | | 𝑖𝑖 𝑖=1 B.2.2 Prior Density for 𝐹 Thereduced-formparametersonlaggedcoefficientsoftheVARhavedensity { 1 } 𝑝(Φ 𝛴) = (2𝜋)−𝑘𝑛∕2 Σ⊗Ω−1 −1∕2exp − (𝛽 −𝛽∗)′(Σ⊗Ω−1)−1(𝛽 −𝛽∗)] . | | | 2 where𝛽 = 𝑣𝑒𝑐(Φ)and𝛽∗ = 𝑣𝑒𝑐(Φ∗).Recallthatthemappingin(12)defines (46) 𝑔 (𝐹 𝐴) = Φ = 𝐹𝐴−1 Φ | Hencethedensityof𝐹 𝐴isgivenby | (47) 𝑝 (𝐹 𝐴) = 𝑝(𝑔 (𝐹 𝐴)) 𝐽(Φ,𝐹 𝐴) . 𝑅𝐹𝐵 | Φ | | | | Defining (48) 𝑉 = (𝐴𝐴′)−1 ⊗Ω−1 , 𝐴,Ω wecanwrite 𝑝 (𝐹 𝐴) = (2𝜋)−𝑘𝑛∕2 𝑉 −1∕2 𝑅𝐹𝐵 | | 𝐴,Ω| { 1 } (49) ×exp − (𝑣𝑒𝑐(𝐹𝐴−1)−𝛽∗)′𝑉−1(𝑣𝑒𝑐(𝐹𝐴−1)−𝛽∗) 2 𝐴,Ω × 𝐽(Φ,𝐹 𝐴) , | | | 29SeeTable6.2inMagnusandNeudecker(1980).Assumingthat𝐴isuppertriangular,the relevantrowofthetableis(vb). 42

where 𝑑𝐹𝐴−1 𝑑𝐼 𝐹𝐴−1 (50) 𝐽(Φ,𝐹 𝐴) = = 𝑚 | 𝑑𝐹 𝑑𝐹 (51) = (𝐴−1)′ ⊗𝐼 . 𝑚 B.3 Relationship Between RFB Prior and SZ Prior B.3.1 Densities for 𝐴 Inthis appendixwe showthat thereexist choicesofhyperparameters (Ψ,𝜈) fortheinverse-Wishartpriorin(41)thatyieldtheSZpriorfor𝐴aslongasthe Jacobianofthetransformationisexcluded. Letting𝑘 denotesakernelforthedensity𝑝 wecanwrite 𝑅𝐹𝐵 𝑅𝐹𝐵 { 1 } (52) 𝑘 (𝐴 Ψ,𝜈) = 𝐴𝐴′ (𝜈−𝑛−1)∕2exp − 𝑡𝑟(Ψ(𝐴𝐴′)) 𝐽(𝛴−1,𝐴) 𝑅𝐹𝐵 | | | 2 | | From(28),thedensityfor𝐴intheSZpriorisgivenby { 1 } (53) 𝑝 (𝐴 𝐻 ) = (2𝜋)−𝑛2∕2 (𝐼 ⊗𝐻 ) −1∕2exp − 𝑎′(𝐼 ⊗𝐻 )−1𝑎 𝑆𝑍 | 0 | 𝑛 0 | 2 𝑛 0 { 1 } ∝ exp − 𝑎′(𝐼 ⊗𝐻 )−1𝑎 (54) 2 𝑛 0 = 𝑘 (𝐴 𝐻 ) 𝑆𝑍 | 0 where𝑎 = 𝑣𝑒𝑐(𝐴). Werewritetheexponentialtermin(52),ignoringthe−1∕2,as (55) 𝑡𝑟[Ψ(𝐴𝐴′)] = 𝑡𝑟[𝐴′Ψ𝐴] (56) = 𝑣𝑒𝑐(Ψ𝐴)′𝑣𝑒𝑐(𝐴) (57) = ((𝐼 ⊗Ψ)𝑣𝑒𝑐(𝐴))′𝑣𝑒𝑐(𝐴) 𝑛 (58) = 𝑣𝑒𝑐(𝐴)′(𝐼 ⊗Ψ)𝑣𝑒𝑐(𝐴) 𝑛 (59) = 𝑎′(𝐼 ⊗Ψ)𝑎 𝑛 (60) = 𝑎′(𝐼 ⊗Ψ−1)−1𝑎, 𝑛 43

whichmatchestheexponentialtermin(54)withΨ−1 = 𝐻 .Thuswecanwrite 0 𝑘 (𝐴 Ψ−1 = 𝐻 ,𝜈) 𝑅𝐹𝐵 | 0 (61) { 1 } = 𝐴𝐴′ (𝜈−𝑛−1)∕2exp − 𝑎′(𝐼 ⊗𝐻 )−1𝑎 𝐽(𝛴−1,𝐴) | | 2 𝑛 0 | | (62) = 𝑘 (𝐴 𝐻 ) 𝐴𝐴′ (𝜈−𝑛−1)∕2 𝐽(𝛴−1,𝐴) . 𝑆𝑍 | 0 | | | | Notingthat 𝑛 ∏ (63) 𝐴𝐴′ = 𝐴 𝐴′ = 𝐴 2 = 𝐴 , | | | || | | | 𝑖𝑖 𝑖=1 wherethelastequalityfollowsfrom𝐴’striangularity,wecanwrite ( ) 𝑛 ∏ (64) 𝑘 (𝐴 Ψ−1 = 𝐻 ,𝜈) = 𝑘 (𝐴 𝐻 ) 𝐴𝜈−𝑛−1 𝐽(𝑔(𝐴),𝐴) . 𝑅𝐹𝐵 | 0 𝑆𝑍 | 0 𝑖𝑖 | | 𝑖=1 SincetheexpressionfortheJacobianin(45)raiseseach𝐴 toauniquepower, 𝑖𝑖 onecannotfinda value𝜈 whichcancelsallof thetermsbesides𝑘 ontheright 𝑠𝑧 handsideof (64).Thus,asstatedinSims andZha(1998),theonlywaytoalign thetwokernelsistoexcludethe Jacobian.Denotingtheresultingkernel𝑘 , 𝑅𝐹𝐵∕𝐽 onecanseethatsetting𝜈 = 𝑛+1in ( ) 𝑛 ∏ (65) 𝑘 (𝐴 Ψ−1 = 𝐻 ,𝜈) = 𝐴𝜈−𝑛−1 𝑘 (𝐴 𝐻 ). 𝑅𝐹𝐵∕𝐽 | 0 𝑖𝑖 𝑆𝑍 | 0 𝑖=1 alignsthekernels: (66) 𝑘 (𝐴 Ψ−1 = 𝐻 ,𝜈 = 𝑛+1) = 𝑘 (𝐴 𝐻 ). 𝑅𝐹𝐵∕𝐽 | 0 𝑆𝑍 | 0 B.3.2 Densities for 𝐹 𝐴 | Proposition:IfΩ−1 = 𝐻 andΦ∗ = 𝑆̄ then𝑝 (𝐹 𝐴) = 𝑝 (𝐹 𝐴). + 𝑆𝑍 | 𝑅𝐹𝐵 | Proof: 44

Thetworelevantdensitiesaregivenby 𝑝 (𝐹 𝐴) = (2𝜋)−𝑘𝑛∕2 𝐼 ⊗𝐻 −1∕2 𝑆𝑍 | | 𝑛 +| ⏟⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏟ 𝐷 𝑆𝑍 (67) { 1 } ×exp − 𝑣𝑒𝑐(𝐹 −𝑆̄𝐴)′(𝐼 ⊗𝐻 )−1𝑣𝑒𝑐(𝐹 −𝑆̄𝐴) , 2 ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞ 𝑛 ⏞⏞⏞⏟⏞ + ⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ 𝑅 𝑆𝑍 whichisthepriordensityfor𝐹 𝐴from(29)and | 𝑝 (𝐹 𝐴) = (2𝜋)−𝑘𝑛∕2 𝑉 −1∕2 𝐽(𝐹,𝐴) 𝑅𝐹𝐵 | | 𝐴,Ω| | | ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ 𝐷 𝑅𝐹𝐵 (68) { 1 } ×exp − (𝑣𝑒𝑐(𝐹𝐴−1)−𝛽∗)′𝑉−1(𝑣𝑒𝑐(𝐹𝐴−1)−𝛽∗) , 2 𝐴,Ω ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ 𝑅 𝑅𝐹𝐵 whichisthedensityfrom(49).Weprovetheclaimbyshowingthat𝑅 = 𝑅 𝑆𝑍 𝑅𝐹𝐵 and𝐷 = 𝐷 . 𝑆𝑍 𝑅𝐹𝐵 Wefirstshowthat𝑅 = 𝑅 .Notethat 𝑆𝑍 𝑅𝐹𝐵 (69) 𝑉−1 = ((𝐴𝐴′)−1 ⊗Ω−1)−1 𝐴,Ω (70) = ((𝐴𝐴′)⊗Ω) (71) = (𝐴⊗𝐼 )(𝐼 ⊗Ω)(𝐴′ ⊗𝐼 ). 𝑚 𝑛 𝑚 Letting𝛽 = 𝑣𝑒𝑐(𝐹𝐴−1)andΦ = 𝐹𝐴−1 wederivethat (𝛽 −𝛽∗)′𝑉−1(𝛽 −𝛽∗) 𝐴,Ω (72) = (𝛽 −𝛽∗)′(𝐴⊗𝐼 )(𝐼 ⊗Ω)(𝐴′ ⊗𝐼 )(𝛽 −𝛽∗) 𝑚 𝑛 𝑚 (73) = [(𝐴⊗𝐼 )′(𝛽 −𝛽∗)]′(𝐼 ⊗Ω)[(𝐴′ ⊗𝐼 )(𝛽 −𝛽∗)] 𝑚 𝑛 𝑚 (74) = 𝑣𝑒𝑐(𝐼 (Φ−Φ∗)𝐴)′(𝐼 ⊗Ω)𝑣𝑒𝑐(𝐼 (Φ−Φ∗)𝐴) 𝑚 𝑛 𝑚 45

SettingΦ∗ = 𝑆̄ wecanseethat (75) 𝐼 (Φ−Φ∗)𝐴 = (𝐹𝐴−1 −𝑆̄ )𝐴 𝑚 (76) = (𝐹 −𝑆̄𝐴) andthensubstituting(76)into(74)wehave 𝑣𝑒𝑐(𝐼 (Φ−Φ∗)𝐴)′(𝐼 ⊗Ω)𝑣𝑒𝑐(𝐼 (Φ−Φ∗)𝐴) 𝑚 𝑛 𝑚 (77) = 𝑣𝑒𝑐(𝐹 −𝑆̄𝐴)′(𝐼 ⊗Ω)𝑣𝑒𝑐(𝐹 −𝑆̄𝐴) 𝑛 (78) = 𝑣𝑒𝑐(𝐹 −𝑆̄𝐴)′(𝐼 ⊗Ω−1)−1𝑣𝑒𝑐(𝐹 −𝑆̄𝐴). 𝑛 SettingΩ−1 = 𝐻 completestheproofthat𝑅 = 𝑅 . + 𝑆𝑍 𝑅𝐹𝐵 Wenowshowthat𝐷 = 𝐷 .Wefirstnotethat 𝑆𝑍 𝑅𝐹𝐵 (79) 𝑉 −1∕2 = (𝐴𝐴′)−1 ⊗Ω−1 −1∕2 | 𝐴,Ω| | | (80) = (𝐴𝐴′)⊗Ω 1∕2 | | (81) = ( 𝐴𝐴′ 𝑚 Ω 𝑛)1∕2 | | | | (82) = ( 𝐴 2𝑚 Ω 𝑛)1∕2 | | | | (83) = 𝐴 𝑚 Ω 𝑛∕2 . | | | | Nextnotethat (84) 𝐽(𝐹,𝐴) = (𝐴−1)′ ⊗𝐼 | | | 𝑚| (85) = (𝐴−1)′ 𝑚 𝐼 𝑛 | | | 𝑚| (86) = 𝐴 −𝑚 . | | 46

Finallyusing(83)and(86)wehavethat (87) 𝑉 −1∕2 𝐽(𝐹,𝐴) = 𝐴 𝑚 Ω 𝑛∕2 𝐴 −𝑚 | 𝐴,Ω| | | | | | | | | (88) = Ω 𝑛∕2 | | (89) = ( 𝐼 𝑚 Ω 𝑛)1∕2 | 𝑛| | | (90) = 𝐼 ⊗Ω−1 −1∕2 . | 𝑛 | AgainsettingΩ−1 = 𝐻 completestheproofthat𝐷 = 𝐷 . + 𝑆𝑍 𝑅𝐹𝐵 B.4 Details for the Minnesota Prior Thereduced-formbasedpriorisaMinnesota-stylepriorcenteredatarandom walk. The multivariate-normal-inverse-Wishart density parameterization is set viadummy-observationsfollowingcloselytheprocedureinSimsandZha(1998). Theirapproachrequiresthreesetsofhyperparameters𝑦̄,𝜎̄,and (91) Λ = [𝜆 ,𝜆 ,𝜆 ,𝜆 ,𝜆 ,𝜇 ,𝜇 ] 0 1 2 3 4 5 6 Thefirstparameter𝜆 controlstheoveralltightnessoftheprior.Theparameter 0 𝜆 functionssimilarlyto 𝜆 butit doesnotaffectbeliefsabout theconstantterm. 1 0 The parameter 𝜆 should always be set to 1 in this framework. The parameter 2 𝜆 shrinks the prior for the own lags so that prior standard deviation on lag 𝑙 3 shrinks by 𝑙−𝜆 3. The parameter 𝜆 controls tightness of beliefs on the constant 4 term in the VAR. The parameter 𝜇 controls what is known as the “sums-of- 5 coefficients” dummy. Higher values give more weight to the view that, if an elementoftheobservables hasbeennearitsmean𝑦̄ forsometime, 𝑦̄ willbea 𝑖 𝑖 goodforecastforthatobservable,regardlessofthevaluesofotherobservables. This induces correlations between “own” lags of Φ. Finally, 𝜇 controls the 6 so-called “co-persistence” dummy observations. The observations are similar to the “sums-of-coefficients”, but operate jointly on the observables, inducing correlationsamongcolumnsofΦ. 47

C. Normalization and Regime Labeling in the MS-VAR TheMS-VARposteriordensityisinvarianttosignchangesonVARequations andstatelabeling.Tointerpretourresultseconomicallywethushavetoperform normalizationinbothofthesedimensions. C.1 Sign Normalization Foreachstateof{𝐴,𝐹},wefirstnormalizeeachcolumnofthe𝐴((ℎ ),𝐹(ℎ )) 𝑚 𝑚 systembysign,forcingnonnegativityof𝐴(ℎ )’sdiagonalelements.Whenwe 𝑚 changethesignofthe𝐴 elementtosatisfynonnegativity,wealsochangethe 𝑖𝑖 signofallelementsinthe𝑖thcolumnof(𝐴(ℎ ),𝐹(ℎ )).WiththeCholeskyiden- 𝑚 𝑚 tification employed in this paper, this method ofsign-normalizationimplements the“likelihood-preserving”normalizationofWaggonerandZha(2003b). C.2 Regime Labeling After normalizing signs we still need to assign regime labels in each draw. Todoso,weimplementaversionofthealgorithmdescribedinStephens(2000) forclusteringinference.Thisalgorithmseekstominimizethetheexpectedloss fromreportinga sequenceofstateprobabilities 𝑄(𝜃),whenthe lossfunctionis the Kullback-Leibler divergence of 𝑄(𝜃) from the true state probabilities,𝑃(𝜃). Hence,thealgorithmselectsstatelabelsusingarulethathasareasonabledecision theoreticfoundation.Asimilarapproachusedinthepopulationgeneticsliterature is that of Jakobssonand Rosenberg (2007),who minimizea differentnotionof averagedistancebetween𝑄(𝜃)acrossdraws.Bothapproachesgiveverysimilar results to the posteriors reported in the text. We take this approach because it tendstoleavelessseveremultimodalityintheposteriorafterregimelabeling. D. Bimodal Example D.1 Direct Sampler for Mixture of Posteriors Togenerate𝑛 draws,execute: 𝑠𝑖𝑚 48

Algorithm3:DirectSamplerforMixtureofPosteriors for𝑖 = 1,…,𝑛 do 𝑠𝑖𝑚 1. Drawlatentstate𝑠 accordingto 𝑖 𝑝(𝑠 = 1) = 𝛼 𝑖 𝑝(𝑠 = 2) = 1−𝛼 𝑖 2. DrawΣ 𝑠,Φ,𝑌 ,whichisadrawfrom𝑝(Σ Φ,𝑌 ).Undertheconjugate 𝑖| 𝑖 𝑖 𝑠 𝑖| 𝑖 𝑠 𝑖 𝑖 priorthisissimply𝑝(Σ 𝑌 ). 𝑖| 𝑠 𝑖 3. DrawΦ 𝑠,Σ,𝑌 ,whichisadrawfrom𝑝(Φ Σ,𝑌 ). 𝑖| 𝑖 𝑖 𝑠 𝑖| 𝑖 𝑠 𝑖 𝑖 end E. RFB-Hierarchical: Hyperparameter Posteriors TablesA-2,A-3,andA-4showtheposteriormeanand90percentcredible setfortheestimatedhyperparametersundertheRFB-Hierarchicalpriorfromone runoftheSMCsampler. 49

TABLE A-2 POSTERIOR OF Λ Mean [05,95] Mean [05,95] 𝜆 0 1m1v 0.969 [0.631,1.405] 2m1v 0.938 [0.708,1.225] 1m2v 1.226 [0.809,1.763] 2m2v 1.319 [0.862,1.820] 1m3v 1.046 [0.725,1.448] 2m3v 1.025 [0.704,1.438] 1m4v 1.119 [0.767,1.566] 2m4v 1.147 [0.816,1.553] 1m5v 1.057 [0.721,1.481] 2m5v 0.890 [0.654,1.203] 𝜆 4 1m1v 0.325 [0.018,1.083] 2m1v 0.125 [0.012,0.350] 1m2v 0.174 [0.012,0.613] 2m2v 0.122 [0.012,0.307] 1m3v 0.220 [0.012,0.771] 2m3v 0.162 [0.017,0.438] 1m4v 0.189 [0.010,0.613] 2m4v 0.127 [0.014,0.316] 1m5v 0.183 [0.012,0.595] 2m5v 0.145 [0.019,0.348] 𝜇 5 1m1v 1.794 [0.022,6.278] 2m1v 0.597 [0.001,2.821] 1m2v 1.824 [0.020,5.984] 2m2v 0.366 [0.000,1.785] 1m3v 1.838 [0.025,6.094] 2m3v 0.405 [0.000,2.044] 1m4v 1.756 [0.023,5.654] 2m4v 0.534 [0.000,2.435] 1m5v 1.726 [0.027,5.588] 2m5v 0.467 [0.001,2.103] 𝜇 6 1m1v 3.015 [0.894,7.131] 2m1v 2.646 [0.852,5.305] 1m2v 4.619 [1.665,9.561] 2m2v 3.533 [1.660,6.275] 1m3v 4.175 [1.300,9.270] 2m3v 3.256 [1.371,6.117] 1m4v 4.056 [1.307,8.655] 2m4v 2.971 [1.232,5.419] 1m5v 3.709 [1.214,8.052] 2m5v 2.654 [1.098,5.198] 50

TABLE A-3 POSTERIOR OF 𝑦̄ Mean [05,95] Mean [05,95] 𝑦̄ 1 1m1v -0.002 [-0.117,0.110] 2m1v 0.000 [-0.081,0.079] 1m2v -0.003 [-0.148,0.136] 2m2v -0.007 [-0.117,0.113] 1m3v 0.000 [-0.136,0.143] 2m3v -0.007 [-0.115,0.113] 1m4v -0.002 [-0.132,0.137] 2m4v 0.007 [-0.104,0.113] 1m5v -0.004 [-0.131,0.127] 2m5v -0.019 [-0.115,0.097] 𝑦̄ 2 1m1v 0.010 [-0.161,0.163] 2m1v 0.005 [-0.141,0.143] 1m2v 0.008 [-0.160,0.169] 2m2v 0.002 [-0.122,0.128] 1m3v 0.003 [-0.167,0.166] 2m3v 0.005 [-0.131,0.128] 1m4v 0.008 [-0.162,0.168] 2m4v -0.030 [-0.131,0.104] 1m5v 0.011 [-0.157,0.173] 2m5v 0.009 [-0.120,0.127] 𝑦̄ 3 1m1v 0.015 [-0.185,0.198] 2m1v -0.002 [-0.169,0.161] 1m2v 0.007 [-0.206,0.211] 2m2v 0.040 [-0.154,0.182] 1m3v 0.010 [-0.201,0.207] 2m3v 0.041 [-0.142,0.172] 1m4v 0.014 [-0.194,0.215] 2m4v -0.045 [-0.177,0.151] 1m5v 0.009 [-0.202,0.204] 2m5v -0.024 [-0.164,0.157] 51

TABLE A-4 POSTERIOR OF 𝑠̄ Mean [05,95] Mean [05,95] 𝑠̄ 1 1m1v 0.023 [0.018,0.031] 2m1v 0.022 [0.018,0.026] 1m2v 0.035 [0.024,0.048] 2m2v 0.032 [0.024,0.042] 1m3v 0.031 [0.021,0.045] 2m3v 0.030 [0.022,0.041] 1m4v 0.031 [0.021,0.044] 2m4v 0.031 [0.023,0.040] 1m5v 0.028 [0.019,0.040] 2m5v 0.024 [0.019,0.030] 𝑠̄ 2 1m1v 0.028 [0.020,0.037] 2m1v 0.025 [0.020,0.031] 1m2v 0.038 [0.025,0.053] 2m2v 0.032 [0.023,0.042] 1m3v 0.032 [0.021,0.046] 2m3v 0.029 [0.020,0.041] 1m4v 0.032 [0.021,0.044] 2m4v 0.029 [0.020,0.040] 1m5v 0.033 [0.021,0.049] 2m5v 0.026 [0.020,0.034] 𝑠̄ 3 1m1v 0.023 [0.017,0.031] 2m1v 0.018 [0.015,0.022] 1m2v 0.035 [0.023,0.049] 2m2v 0.028 [0.020,0.039] 1m3v 0.024 [0.017,0.035] 2m3v 0.026 [0.019,0.035] 1m4v 0.026 [0.018,0.037] 2m4v 0.024 [0.019,0.032] 1m5v 0.025 [0.018,0.034] 2m5v 0.021 [0.017,0.027] 52

References BENATI, L. AND P. SURICO (2009): “VAR Analysis and the Great Moderation,” American EconomicReview,99,1636–52. CELEUX,G.,M.HURN,ANDC.P.ROBERT(2000):“Computationalandinferentialdifficulties with mixture posterior distributions,” Journal of the American Statistical Association, 95, 957–970. CHIB,S.ANDS.RAMAMURTHY(2010):“TailoredRandomizedBlockMCMCMethodswith ApplicationtoDSGEModels,”JournalofEconometrics,155,19–38. CHOPIN,N.(2002):“ASequentialParticleFilterforStaticModels,”Biometrika,89,539–551. CREAL,D.(2012):“ASurveyofSequentialMonteCarloMethodsforEconomicsandFinance,” EconometricReviews,31,245–296. DEL MORAL, P., A. DOUCET, AND A. JASRA (2006): “Sequential Monte Carlo Samplers,” JournaloftheRoyalStatisticalSociety,SeriesB,68,411–436. DURHAM,G.ANDJ.GEWEKE(2012):“AdaptiveSequentialPosteriorSimulatorsforMassively ParallelComputingEnvironments,”UnpublishedManuscript. FRÜHWIRTH-SCHNATTER,S.(2001):“MarkovchainMonteCarloestimationofclassicaland dynamicswitchingandmixturemodels,”JournaloftheAmericanStatisticalAssociation,96, 194–209. ———(2004):“EstimatingMarginalLikelihoodsforMixtureandMarkovSwitchingModels UsingBridgeSamplingTechniques,”TheEconometricsJournal,7,143–167. GELMAN,A.ANDD.B.RUBIN(1992):“InferencefromIterativeSimulationUsingMultiple Sequences,”StatisticalScience,7,457–472. GEWEKE,J.(1989):“BayesianInferenceinEconometricModelsUsingMonteCarloIntegration,” Econometrica,57,1317–1399. ———(2004):“Gettingitright:JointDistributionTestsofPosteriorSimulators,”Journalofthe AmericanStatisticalAssociation,99,799–804. ——— (2005): Contemporary Bayesian Econometrics and Statistics, vol. 537, Wiley- Interscience. ———(2007):“Interpretationandinferenceinmixturemodels:SimpleMCMCworks,”ComputationalStatistics&DataAnalysis,51,3529–3550. GIANNONE,D.,M.LENZA,ANDG.E.PRIMICERI(2015):“Priorselectionforvectorautoregressions,”ReviewofEconomicsandStatistics,97,436–451. HERBST,E.ANDF.SCHORFHEIDE(2014):“SequentialMonteCarloSamplingforDSGEModels,” JournalofAppliedEconometrics,29,1073–1098. ———(2015):“BayesianEstimationofDSGEModels,”Mimeo. HERBST,E.P.(2012):“GradientandHessian-basedMCMCforDSGEModels,”Unpublished Manuscript,FederalReserveBoard. HUBRICH, K. AND R. J. TETLOW (2015): “Financial stress and economic dynamics: The transmissionofcrises,”JournalofMonetaryEconomics,70,100–115. JAKOBSSON,M.ANDN.A.ROSENBERG(2007):“CLUMPP:aclustermatchingandpermutation programfordealingwithlabelswitchingandmultimodalityinanalysisofpopulationstructure,” Bioinformatics,23,1801–1806. JASRA,A.,C.HOLMES,ANDD.STEPHENS(2005):“MarkovChainMonteCarloMethodsand theLabelSwitchingProbleminBayesianMixtureModeling,”StatisticalScience,20,50–67. KADIYALA,K.R.ANDS.KARLSSON(1997):“NumericalMethodsforEstimationandInference inBayesianVAR-Models,”JournalofAppliedEconometrics,12,99–132. 53

LEEPER,E.M.,C.A.SIMS,T.ZHA,R.E.HALL,ANDB.S.BERNANKE(1996):“WhatDoes MonetaryPolicyDo?”BrookingsPapersonEconomicActivity,1996,pp.1–78. LITTERMAN, R. (1986): “Forecasting with Bayesian Vector Autoregressions: Five Years of Experience,”JournalofBusiness&EconomicStatistics,4,25–38. MAGNUS, J. R. AND H. NEUDECKER (1980):“TheEliminationMatrix:SomeLemmasand Applications,”SIAMJournalonAlgebraicDiscreteMethods,1,422–449. RUBIO-RAMÍREZ, J. F., D. F. WAGGONER, AND T. ZHA (2010):“StructuralVectorAutoregressions:TheoryofIdentificationandAlgorithmsforInference,”TheReviewofEconomic Studies,77,665–696. SIMS,C.ANDT.ZHA(1998):“BayesianMethodsforDynamicMultivariateModels,”InternationalEconomicReview,39,949–68. SIMS,C.A.(1980):“MacroeconomicsandReality,”Econometrica,48,1–48. SIMS,C.A.,D.F.WAGGONER,ANDT.ZHA(2008):“MethodsforinferenceinlargemultipleequationMarkov-switchingmodels,”JournalofEconometrics,146,255–274. SIMS,C.A.ANDT.ZHA(2006):“WereThereRegimeSwitchesinU.S.MonetaryPolicy?”The AmericanEconomicReview,96,54–81. SMETS,F.ANDR.WOUTERS(2007):“ShocksandFrictionsinUSBusinessCycles:ABayesian DSGEApproach,”AmericanEconomicReview,97,586–606. STEPHENS,M.(2000):“Dealingwithlabelswitchinginmixturemodels,”JournaloftheRoyal StatisticalSociety:SeriesB(StatisticalMethodology),62,795–809. STOCK, J. H. AND M. W. WATSON (2010):“Modelinginflationafterthecrisis,”Tech.rep., NationalBureauofEconomicResearch. UHLIG,H.(1997):“BayesianVectorAutoregressionswithStochasticVolatility,”Econometrica, 65,pp.59–73. WAGGONER,D.ANDT.ZHA(2003a):“AGibbssamplerforstructuralvectorautoregressions,” JournalofEconomicDynamicsandControl,28,349–366. WAGGONER, D. F. AND T. ZHA (2003b):“Likelihoodpreservingnormalizationinmultiple equationmodels,”JournalofEconometrics,114,329–347. 54

Cite this document
APA
Mark Bognanni and Edward P. Herbst (2015). Estimating (Markov-Switching) VAR Models without Gibbs Sampling: A Sequential Monte Carlo Approach (FEDS 2015-116). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2015-116
BibTeX
@techreport{wtfs_feds_2015_116,
  author = {Mark Bognanni and Edward P. Herbst},
  title = {Estimating (Markov-Switching) VAR Models without Gibbs Sampling: A Sequential Monte Carlo Approach},
  type = {Finance and Economics Discussion Series},
  number = {2015-116},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2015},
  url = {https://whenthefedspeaks.com/doc/feds_2015-116},
  abstract = {Vector autoregressions with Markov-switching parameters (MS-VARs) fit the data better than do their constant-parameter predecessors. However, Bayesian inference for MS-VARs with existing algorithms remains challenging. For our first contribution, we show that Sequential Monte Carlo (SMC) estimators accurately estimate Bayesian MS-VAR posteriors. Relative to multi-step, model-specific MCMC routines, SMC has the advantages of generality, parallelizability, and freedom from reliance on particular analytical relationships between prior and likelihood. For our second contribution, we use SMC's flexibility to demonstrate that the choice of prior drives the key empirical finding of Sims, Waggoner, and Zha (2008) as much as does the data.},
}