feds · March 31, 2014

Small Sample Properties of Bayesian Estimators of Labor Income Processes

Abstract

There exists an extensive literature estimating idiosyncratic labor income processes. While a wide variety of models are estimated, GMM estimators are almost always used. We examine the validity of using likelihood based estimation in this context by comparing the small sample properties of a Bayesian estimator to those of GMM. Our baseline studies estimators of a commonly used simple earnings process. We extend our analysis to more complex environments, allowing for real world phenomena such as time varying and heterogeneous parameters, missing data, unbalanced panels, and non-normal errors. The Bayesian estimators are demonstrated to have favorable bias and efficiency properties.

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Small Sample Properties of Bayesian Estimators of Labor Income Processes Taisuke Nakata and Christopher Tonetti 2014-25 NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Small Sample Properties of Bayesian Estimators of Labor Income Processes∗ Taisuke Nakata† Christopher Tonetti‡ Federal Reserve Board Stanford GSB March 31, 2014 Abstract There exists an extensive literature estimating idiosyncratic labor income processes. While a wide variety of models are estimated, GMM estimators are almost always used. We examine the validity of using likelihood based estimation in this context by comparing the small sample properties of a Bayesian estimator to those of GMM. Our baseline studies estimators of a commonly used simple earnings process. We extend our analysis to more complex environments, allowing for real world phenomena such as time varying and heterogeneous parameters, missing data, unbalanced panels, and non-normal errors. The Bayesian estimators are demonstrated to have favorable bias and efficiency properties. JEL: C32, C33, D12, D31, D91, E21 Keywords: Labor Income Process, Small Sample Properties, GMM, Bayesian Estimation, Error Component Models ∗WethankTimCogley,KonradMenzel,JohnRoberts,TomSargent,MattSmith,andGianlucaViolanteforuseful comments and suggestions. The views expressed in this paper, and all errors and omissions, should be regarded as those solely of the authors, and are not necessarily those of the Federal Reserve Board of Governors or the Federal Reserve System. †Correspondingauthor;DivisionofResearchandStatistics,FederalReserveBoard,20thStreetandConstitution Avenue N.W. Washington, D.C. 20551; Email: taisuke.nakata@frb.gov; Phone: 202-452-2972. ‡Stanford GSB, 655 Knight Way, Stanford, CA 94305; Email: tonetti@stanford.edu. 1

1 Introduction Measuring individual income risk is essential to answering a wide range of economic questions. For the vast majority of individuals, labor income is overwhelmingly the largest component of total income. Accordingly, accurate and precise estimation of labor earnings dynamics is important for analyzing people’s consumption and savings behavior, designing fiscal policy, and understanding the sources of and changes in inequality. DatingbacktoLillardandWillis(1978),LillardandWeiss(1979),MaCurdy(1982),andAbowd and Card (1989), there is a history of fitting ARMA models to panel data to understand the labor income risk facing individuals. Error component models in which labor income is the sum of a transitory and a persistent shock are extremely common.1 While no shortage of models have been estimated, a striking feature of the literature is that, almost always, the estimation routine is based on a minimum distance estimator, namely GMM. Recent trends of increasing model complexity have led economists to depart from the simple GMM estimators that have dominated the literature. Sometimes these complex models are estimated by techniques similar to traditional GMM, like the Simulated Method of Moments.2 However, sometimes a very different likelihood based approach, Bayesian estimation, is employed. An early example using Bayesian estimation is Geweke and Keane (2000), who jointly estimate earnings process parameters and marital status to analyze the transition probabilities between income quartiles over the life cycle. More recently, Jensen and Shore (2010, 2011) estimate complex labor income processes with a focus on heterogeneity in idiosyncratic risk and its evolution over time. Given ongoing advances in computational power and Markov chain Monte Carlo (MCMC) techniques, we are likely to see a growing use of these estimators. While there exist decades of research that documents the properties of minimum distance estimators of labor income processes, there are no papers, to our knowledge, that systematically examine the small sample properties of likelihood based estimators on these types of error component models in panel settings.3 Although the benefits of Bayesian estimation are well known to theoretical econometricians, they are not widely understood across fields in the context of estimating labor income processes. Thus, as the profession adopts widespread use of Bayesian estimators for increasingly elaborate models, it is useful to document the properties of Bayesian estimators on simple income processes. This will provide a clear explanation of the Bayesian estimation routine, a better understanding of the source of estimation results, and a justification for its use in more complicated settings. In this paper, we examine the validity of using likelihood based estimation by comparing the 1Often the persistent component is assumed to follow a random walk as in MaCurdy (1982), Abowd and Card (1989), Gottschalk and Moffitt (1994), Meghir and Pistaferri (2004), and Blundell, Pistaferri, and Preston (2008). 2For example, Browning, Ejrnaes, and Alvarez (2010) estimate a model with many dimensions of heterogeneity using the Simulated Method of Moments. 3For example, a 1996 issue of the Journal of Business & Economic Statistics is dedicated to the small sample properties of GMM estimators, featuring notable papers on estimating covariance structures like Altonji and Segal (1996) and Clark (1996). 2

small sample properties of a Bayesian estimator to those of GMM. First, as a stepping stone, we derive the maximum likelihood estimator by translating the labor income process into linear Gaussian state space (LGSS) form and applying standard filtering procedures. With this machinery established in an easy-to-understand framework, we provide a concise, but self-contained, derivation of the Bayesian estimator. We then conduct a Monte Carlo analysis of a Bayesian estimator and a GMM estimator on a commonly used simple labor income process. Although the asymptotic dominance of properly specified likelihood based estimators is textbook, we provide the first systematic analysis of the small sample properties of likelihood based estimators of labor income processes. While the difference between estimators is typically modest, the Bayesian estimator is more efficient: parameter estimates are unbiased with smaller standard errors. Although the initial findings demonstrate the better small sample properties of the Bayesian estimator, the exercise was performed on a simple model with ideal dataset conditions. In Section 4, we extend the analysis to more complex specifications that capture real world phenomena. In Section 4.1, we first look at an environment in which datasets are unbalanced panels and suffer frommissingdata. Wemodifytheestimatorstoaccommodatetheserealisticfeaturesofthedataset and study how the relative performance of the Bayesian estimator is affected. In Section 4.2, we modify the labor income process to allow for time-variation in the shock variances. Time variation inthevariancesofthelaborincomeprocesshasbeendocumentedinmanypapersandisessentialin understanding the changes in labor market risks.4 In Section 4.3, we allow parameter heterogeneity acrossindividuals. Manypapershavefoundsubstantialheterogeneityinvariousaspectsofthelabor income process and emphasized its importance for understanding the risks people face. Finally, in Section 4.4, we extend the labor income process to allow for non-normal shocks. There is empirical evidence that labor income shocks are non-normal and have fat tails.5 To both learn abouttheshapeoftheerrordistributionsandtoaddresstheconcernofdistributionmisspecification, we let shocks be distributed according to a mixture of normals, which allows for a very flexible error distribution structure. These extensions establish that the beneficial small sample properties of Bayesian estimators are maintained in more complicated scenarios—thus demonstrating the usefulness of this estimator for applied economists. We proceed in Section 2 to present the different estimators of the simple income process. Section 3 discusses their small sample properties and Section 4 discusses extensions to the simple income process, including time varying variances, missing data, unbalanced panels, heterogeneous parameters, and distributional assumptions on errors. Section 5 concludes. 4See for example Gottschalk and Moffitt (1994), Heathcote, Perri, and Violante (2010), and Blundell, Pistaferri, and Preston (2008) for GMM based estimations of the sequence of variances. 5SeeforexampleHirano(1998)andGewekeandKeane(2000)forBayesianestimationsofincomeprocessesthat allow for non-normal shocks. 3

2 Estimators of the Labor Income Process It is well known from the labor economics literature (since Abowd and Card (1989)) that labor income is well described by an error components model, where residual labor earnings are the sum of a transitory and persistent shock. Often the transitory shock is i.i.d and the persistent shock follows an AR(1) process. Because the model fits well and is relatively simple, it has become very commonly used in the Labor and Macro literatures. Thus, we adopt this pervasive simple income process and present the GMM and Bayesian estimators. These estimators are developed to be used on panel data sets such as the PSID. Accordingly, our baseline labor income process is given by: (cid:15) = ρ(cid:15) +Σ0.5η (1) i,t i,t−1 η i,t y = (cid:15) +Σ0.5ν (2) i,t i,t ν i,t where y is the residual from a log income regression for an individual i ∈ {i,...,N} at time i,t t ∈ {1,...,T}.6 (cid:15) isthepersistentcomponentofincomeandisassumedtofollowanAR(1)process. i,t While {y }t=1,T is observed, {(cid:15) }t=1,T is not. ν is the transitory component of income. η is i,t i=1,N i,t i=1,N i,t i,t the shock to the persistent component of income. ν and η are standard normal and Σ , Σ are i,t i,t ν η the variances of the transitory and persistent innovations, respectively. We also estimate properties of the initial state, (cid:15) , which we assume is normally distributed with zero mean and variance i,0 Σ . ν , η , and (cid:15) are independent of each other for all i and t. Thus the parameter vector Z0 i,t i,t i,0 to be estimated is θ := (ρ,Σ ,Σ ,Σ ). In the next subsections, we outline different techniques to ν η Z0 estimate these parameters. 2.1 GMM The standard strategy in the literature for estimating labor income processes is to use GMM. The goal of GMM estimation is to choose parameters that minimize the distance between empirical and theoretical moments. Identification and estimation rely on moments constructed from crosssectional income autocovariances.7 T(T−1) We wish to estimate the system given by equations (1) and (2). There are moments, 2 where T is the total number of time periods. The total number of individuals, N, does not affect 6There are many ways to obtain residual labor income. To remove predictable components that are associated with the individual or the aggregate state of the economy, it is common to regress idiosyncratic labor income on a vector of observables such as individual demographics and variables that control for aggregates. There are different methodstoremoveaggregatecomponents,suchasusingtimedummyvariablesoracommoncorrelatedeffectapproach (Pesaran (2006)). While important for the macro implications of the idiosyncratic labor earnings process, our paper is concerned only with estimating the residual labor income process. 7This section develops the already well-understood GMM estimation routine. For other sources see Blundell, Pistaferri, and Preston (2008), Heathcote, Perri, and Violante (2010), or Guvenen (2009). We present the general case in which ρ is estimated. 4

the number of moments because the expectations are taken cross-sectionally over individuals.  E[y ·y ] ... ... ...  i,1 i,1  E[y ·y ] E[y ·y ] ... ...   i,2 i,1 i,2 i,2    M(θ) = vec E[y i,3 ·y i,1 ] E[y i,3 ·y i,2 ] ... ...     ...    E[y ·y ] E[y ·y ] ... E[y ·y ] i,T i,1 i,T i,2 i,T i,T where the upper right triangle of the matrix is redundant by symmetry. Particular entries in the moment matrix M(θ) map into particular functions of the parameters θ = (ρ,Σ ,Σ ,Σ ). Define ν η Z0 the cross-sectional moment m (θ) between agents at time t and t+j: t,j m (θ) = E[y ·y ] t,j i,t i,t+j = E[((cid:15) +Σ0.5ν )((cid:15) +Σ0.5ν )] i,t ν i,t i,t+j ν i,t+j (cid:40) E[(cid:15)2 ]+Σ if j = 0 = i,t ν ρjE[(cid:15)2 ] if j > 0 i,t where t (cid:88) E[(cid:15)2 ] = ρ2tΣ + ρ2(t−k)Σ i,t Z0 η k=1 Simple algebraic manipulation of the above equations reveal (over) identification of the parameters.8 Define the estimator as θˆ = min (M(θ)−Mˆ)(cid:48)W(M(θ)−Mˆ) GMM θ where Mˆ stacks the sample covariance from the data. To implement the estimator, we need to choose the weighting matrix, W. Altonji and Segal (1996) show that the Optimal Minimum Distance (OMD) estimator, where W is the optimal weighting matrix, introduces significant small sample bias. They study the small sample properties oftheGMMestimatorwithseveralalternativeweightingmatricesandrecommendusingtheEqually 8The moments are only slightly modified when variances are time varying as in Section 4.2: (cid:26) E[(cid:15)2 ]+Σ if j =0 E[y ·y ] = i,t ν,t i,t i,t+j ρjE[(cid:15)2 ] if j >0 i,t t E[(cid:15)2 ] = ρ2tΣ + (cid:88) ρ2(t−k)Σ i,t Z0 η,k k=1 5

Weighted Minimum Distance (EWMD) estimator, where W is the identity matrix. In light of their result, many papers in the literature use the EWMD estimator.9 Thus, we use the GMM estimator with identity weighting matrix throughout the paper. Extensions to different weighting matrices are straightforward.10 With all objects of the optimization problem defined, the estimates can be calculated computationally using an optimizer to choose parameters θ that minimize the distance between model and sample moments. 2.2 Likelihood Based Estimation The maximum likelihood estimator is presented before deriving the Bayesian estimator. This establishes the linear Gaussian state space structure, which is used in both estimators. For the simple income process, it is feasible to compute the maximum likelihood estimator. However, the MLE estimator fails to scale with model complexity, as for more complicated specifications the likelihood is difficult or impossible to write in closed form. Bayesian estimation becomes useful in exactly these cases. 2.2.1 MLE As its name suggests, the maximum likelihood estimate is defined as the parameter values that maximize the likelihood function. θˆ = max L({y }t=1,T|θ) MLE i,t i=1,N θ For the simple labor income process, we can analytically compute the likelihood for any given θ, and thus it is feasible to solve this maximization problem. Analytical derivation of the likelihood function is straightforward once we notice that the labor income process forms a linear Gaussian state space (LGSS) system. We can draw from the time series econometrics literature to construct the likelihood using the Kalman filter. Define the canonical form of a LGSS as S = A S +B v (3) t t t−1 t t Y = C z +D S +w (4) t t t t t t where 9An alternative used by Blundell, Pistaferri, and Preston (2008) is the Diagonally Weighted Minimum Distance (DWMD) estimator, where W is the optimal weighting matrix with off-diagonal elements set to zero. 10We examined the performance of the OMD and DWMD estimators in unreported exercises and found that the EWMD estimator tends to outperform the OMD and DWMD estimators across various data generating processes. TheOMDandDWMDestimatorsaresometimesslightlymoreefficientforthepersistenceparameterthantheEWMD estimator,butsuchimprovedefficiencyintheestimateofthepersistenceparameteralwayscomeswithsubstantially deteriorated efficiency in the estimates of other parameters. 6

v ∼ N(0,Q ) t t w ∼ N(0,R ) t t S ∼ N(S ,P ) 0 0|0 0|0 Equation (3) is the state equation (or transition equation) and equation (4) is the observation equation (or measurement equation). S is a vector of latent state variables, and Y is the vector of t t observables. v isavectorofinnovationstolatentstatevariables, andw isavectorofmeasurement t t error. z is a vector of observed exogenous variables. To map this model into our labor income t process of an individual Y is residual log income and S is the persistent component of income.11 t t For notational convenience, let θ be a vector containing all parameters of the model (A , B , C , t t t t D , Q , R , S , P ). In many applications, parameters are time-invariant (θ = θ for all t), but t t t 0|0 0|0 t this is not a necessary requirement. All processes discussed in this paper will hold A , B , C , D t t t t constant over time; later, when we discuss the time varying volatility case, Q and R will vary. t t Thus, our simple labor income process can be mapped into canonical form as A = ρ, B = 1, C = 0, t t t D = 1, Q = Σ , R = Σ , t t η t ν z = 0, S = 0, P = Σ t 0|0 0|0 Z0 for all t, with parameter vector θ = (ρ,Σ ,Σ ,Σ ) as before. ν η Z0 We can now use the Kalman filter to derive the log likelihood of a LGSS, which is provided in Appendix A.1.12 As idiosyncratic shocks are defined to be uncorrelated across individuals, one can obtain the log likelihood of the income data for all individuals by summing the log likelihood of each individual (i.e., logL({y }t=1,T|θ) = (cid:80)N logL({y }t=1,T|θ)). One obvious advantage i,t i=1,N i=1 i,t of this methodology is the relative ease of estimating more complicated income processes, such as ARMA(p,q) processes, since they allow a LGSS representation. Given that the likelihood is defined in terms of canonical LGSS matrices, to estimate a different parameterization of the labor income process, all that remains is to map the process into state space form. With the analytical likelihood derived in Appendix A.1, we can now use an optimizer to numerically maximize the log likelihood to produce the MLE estimates. 11Forclarityofexposition,wesetz tobezeroandassumeY isresidualincome. However,z couldincludeallof t t t thetraditionalfirststageconditioningvariableslikeeducation,age,race,etc.,toremovethepredictablecomponents of Y . This would allow estimation to be run in one step, instead of running a first-stage regression to recover t idiosyncratic residual income. The general case is presented in the Appendix. 12For a detailed analysis, see chapter 13.4 in Hamilton (1994). 7

2.2.2 Bayesian Estimation: Gibbs Sampling In computing the maximum likelihood estimator, two conditions are crucial. One is that we can compute the likelihood function analytically and the other is that the likelihood function is well behaved.13 For the simple labor income process, these two conditions are satisfied and the MLE estimates can be easily computed. However, they are not satisfied for more complex labor income processes. Bayesian methods are useful in those cases either because they don’t require an analytical likelihood function or because the prior distribution smooths local maxima. For transparency of exposition, in this section we provide an overview of the Bayesian estimator for the standard income process, without the complications that it is well equipped to handle. The goal of the Bayesian estimation is to characterize the posterior distribution of parameters, (cid:16) (cid:17) which is defined as the distribution of parameters conditional on data, p θ|{y }t=1,T . The i,t i=1,N posterior distribution is related to two objects, the prior distribution and the likelihood function, by Bayes Theorem: (cid:16) (cid:17) p θ|{y }t=1,T ∝ L({y }t=1,T|θ)p(θ) i,t i=1,N i,t i=1,N where p(θ) is the prior distribution of parameters, which the researcher specifies. When the prior is uniform, the posterior is the same as the likelihood (as a function of parameters). The tighter the prior, the less the posterior reflects information from the data. Comparing the posterior to the prior provides a sense of how much the data informed the parameter estimates.14 To characterize the posterior distribution, the Bayesian estimator draws a large sample from (cid:16) (cid:17) p θ|{y }t=1,T , which can be used to compute statistics about the parameters of interest. Once i,t i=1,N we characterize the posterior distribution, we can compute a point estimate according to some loss function. Following other applied work in the Bayesian literature, we use the median of the posterior distribution as the point estimate.15 To obtain this sample from the posterior, we use the Gibbs sampling algorithm. The Gibbs sampling algorithm sequentially draws from the distribution of each parameter and latent variable conditional on the previous draws for all other parameters. Since each draw is conditional on the previousdraws, thesamplesarenotindependent. However, thestationarydistributionofthedraws thus generated can be shown to equal the joint posterior distribution of interest.16 This algorithm is useful whenever the joint density of the parameters and data is unknown, but the conditional distributions of each parameter is known. See Appendix A.2 for details about the Gibbs sampling procedure. To sequentially draw from the conditional distributions of interest, there are three techniques 13If there exist multiple local maxima, the optimization routine may struggle to find the global maximum, i.e., the MLE estimates. This is especially true if the likelihood function is high dimensional. 14See Geweke (2005) or Gelman, Carlin, Stern, and Rubin (2009) for a Bayesian statistics reference. 15Thereisnoapriori“correct”statistictouseasthepointestimate. Forexample,onecouldusethemode,mean, ormedianoftheposteriordistribution. Whilethemodecorrespondscloselytotheprincipleofmaximumlikelihood, the median is more robust computationally when sample sizes are limited. 16See Robert and Casella (2004) for a comprehensive treatment of MCMC theory and practice. 8

employed. The first is estimating the variance of a normal distribution with known mean. The second is estimating a linear regression model. The third is sampling the latent states of a LGSS model. To estimate Σ , realize y −(cid:15) = Σ0.5ν . Thus, conditioning on the epsilon sequence, ν i,t i,t ν i,t we have a sample distributed N(0,Σ ). Given a conjugate prior, the posterior can be calculated ν analytically,allowingthisiteration’sΣ tobesetequaltoadrawfromtheconditionalposterior. To ν estimate ρ, realize from equation 1 that ρ is the coefficient in a linear regression model with known independent and dependent regressors with error distributed N(0,Σ ). Again, the conditional η posterior can be calculated analytically, allowing this iteration’s ρ to be set equal to a draw from the posterior. The most complicated Gibbs block of this estimation is drawing the sequence of unobserved states in a LGSS. Given linearity and normality, the Kalman smoother provides the distribution of each state at each time conditional on all available data. However, because there is persistence in the states, knowing the particular draw of (cid:15) alters the conditional density of (cid:15) .17 Carter and i,t i,t−1 Kohn (1994) develop an algorithm that recursively updates, backwards through time, the Kalman smoothed conditional densities of each state at time t given the draw of the state at time t+1.18 With these three tools, we are able to sequentially draw from the conditional distributions of the parameters and latent variables of interest. 3 Small Sample Properties Now that the different estimators have been presented, we proceed to analyze the small sample properties of the estimators using Monte Carlo analysis. Our Monte Carlo simulation exercise is designed as follows. Simulate 100 panel datasets using the baseline labor income processes given by equations 1 and 2.19 In the benchmark simulation, each data set contains 500 people and 10 time periods (i.e., N=500 and T=10). However, to mimic the features of available panel datasets while also providing a more general depiction of the estimators’ performance, we consider alternative values of N ∈ {100,2000} and T ∈ {5,20}. Motivated by Blundell, Pistaferri, and Preston (2008), as a benchmark calibration we let the true parameter values be (ρ,Σ ,Σ ,Σ ) = η ν Z0 (1.00,0.02,0.05,0.15).20 For each of the 100 datasets generated, we compute the point estimates of themodelparametersbasedontheBayesianmethodandtheGMM.Whenrequiredbyoptimization or sampling, the same initial values are used across estimators. For the Bayesian estimation, we need to specify prior distributions of parameters. We choose 17Wecouldsampleeach(cid:15) asaseparateblock. ThisistheapproachtakenbyJacquier,Polson,andRossi(1994) i,t whenestimatingastochasticvolatilitymodel. However,thiswouldleadtoahighlyauto-correlated,slowlyconverging Markov chain. 18For some extensions, it is easy to replace the Carter and Kohn (1994) algorithm with the more computationally efficient algorithm developed by Durbin and Koopman (2002). See the online technical appendix for detailed description of the Durbin and Koopman (2002) algorithm. 19MonteCarloanalysisisoftenbasedonalargenumberofdatasets. However,duetothecomputationalintensity of the Bayesian estimator, we present results with a smaller than usual number of simulations. A robustness check onthebaselinespecificationwithalargernumberofdatasets(500)replicatedtheresultsofthe100sampleexercise. 20These parameter values are standard and similar to the estimates found throughout the traditional literature. 9

priorparameters suchthatthe varianceof thepriordistributionisverylargeandthustheposterior distribution is mainly determined by the likelihood function. Specifically, we set the prior for ρ to be Normal with mean 0 and an arbitrarily large variance, truncated at support [-1,1]. For Σ , Σ , Z0 η and Σ we use the inverse-gamma distribution with degree of freedom 2 and location parameter ν equal 0.01. For each estimator, we then compute the mean, standard deviation, and the root mean square error (RMSE) of the point estimates across the 100 datasets. The difference between the mean and true parameter values measures the bias of the estimator, while the standard deviation measures the efficiency. The root mean square error measures the average squared deviation of the estimator from the true parameter value, combining information on the bias and standard deviation of the estimator. Accordingly, we emphasize root mean square error as the overall measure of the estimators’ efficiency. Table 1 shows the small sample properties of the two estimators. For each parameter and estimator, the first number is the mean of the point estimates across 100 simulated datasets, the second number in the parentheses is the standard deviation of the point estimates, and the last number in the square brackets is the RMSE. The Bayesian estimator performs better than the GMM estimator in terms of RMSEs. For all parameter estimates, the RMSEs of the Bayesian estimator are smaller than those of the GMM estimator. WhilethedifferencestendtobesmallasbothestimatorshaveverylowRMSEs,theyare sometimesnon-trivial. Forexample,theRMSEsoftheBayesianestimatorareabout30percentand 50 percent smaller than the GMM estimator for Σ and Σ . The outperformance of the Bayesian η ν estimator comes from both smaller bias and smaller standard deviation. For all parameters, the mean of the Bayesian estimator is closer to the true parameter values than that of the GMM estimator and the standard deviation of the Bayesian estimator is smaller. Improved efficiency could result from using better moment conditions or from using informative priors. To demonstrate that the improved performance of the Bayesian estimator is not driven by the priors, we first re-estimate the model, setting the prior mean of ρ to 0.5 and scaling up the variances’ prior means by a factor of 10. This very different prior specification did not significantly affect the posterior distribution. This means that the posterior distribution is mainly determined by the likelihood function. Second, we know that the GMM estimator with optimal orthogonality conditions is asymptotically and numerically equivalent to MLE, since the asymptotically optimal GMMestimatorusesthescoreformomentconditions. Forthebaselinespecification,wepresentthe MLE estimates in Table 1, which have very similar RMSEs to the Bayesian estimates.21 Thus, efficiency improvements come from using a likelihood based estimator, not from imposing informative priors. Outperformance of the Bayesian estimator is robust to different specifications of the data generating process. Table 2 reports the RMSEs for alternative values of N (=100,2000), T (=5,20), 21In comparing the MLE and Bayesian estimates, it should be noted that we use the median of the posterior as the Bayesian point estimate, where the mode of the posterior more closely corresponds to the MLE estimate. This difference becomes especially important when comparing truncated parameters, like ρ. 10

and ρ (=0.8). In most cases, the RMSEs of the Bayesian estimators are smaller or roughly equal for all parameter values. For ρ, the estimators give roughly equal RMSEs with alternative T and N. However, with ρ = 0.8, the Bayesian estimator delivers smaller RMSEs. For Σ , one estimator Z0 does not consistently perform better than the other across different cases. For Σ and Σ , the η ν RMSEs of the Bayesian estimator are consistently smaller, sometimes by a large amount. 3.1 Robustness to Model Misspecification WhiletheBayesianestimatorperformswellinsmallsamplesonaproperlyspecifiedmodel, how do its small sample properties compare to the GMM estimator when the model is misspecified? We address this question with two exercises. First, we estimate the model given by equations (1) and (2) assuming that ρ = 1, when the true data generating process has ρ = 0.95. As displayed in Table 3, the Bayesian estimator has significantly lower RMSEs for both Σ (0.005 vs. 0.014) and η Σ (0.004 vs. 0.023) and a modestly lower RMSE for Σ (0.031 vs. 0.033). ν Z0 Second, we estimate the same model assuming errors are normally distributed, when they are in fact distributed according to a mixture of normals, using the data generating process detailed in Section 4.4. As shown in Table 4, the Bayesian and GMM estimators perform similarly, with the Bayesian estimator resulting in moderately smaller RMSEs on Σ (0.0021 vs. 0.0026), Σ (0.0026 η ν vs. 0.0042), and Σ (0.0127 vs. 0.0135), and an identical RMSE for ρ (0.0048). Z0 Thus, even though the Bayesian estimator heavily relies on parametric assumptions, it can still have as good or better small sample properties when the model is misspecified. 4 Robustness to Extended Labor Income Processes The previous section showed that the Bayesian estimator outperforms the GMM estimator in finite samples on a simple labor income process. However, when using real data, researchers often need to modify the simple process to address unique features of the dataset or to incorporate additional features into the model. Since some of these modifications are quite common or even necessary, it would be useful to know how robust are the findings based on the simple labor income process. Accordingly, this section considers four such modifications to the baseline earnings model and documents the performance of the Bayesian estimator in each setting. The modifications we consider are (1) unbalanced panel with missing data, (2) time-varying volatilities, (3) heterogeneous parameters, (4) non-normal shocks. Adjusting for an unbalanced panel with missing data is necessary whenever one estimates the model using real data. Time-varying volatilities are particularly common in the literature, regardless of the estimation methods used. Recent literature has documented evidence suggesting substantial heterogeneity in income process parameters across individuals. While non-normality is often ignored in GMM estimation, previous studies using Bayesian methods often estimated the income process allowing for non-normality and found significant improvements in model fit. Of course, there are many other interesting extensions to the labor income process. These 11

include a richer modeling of volatility evolution (such as GARCH or stochastic volatility), time or agevariationinotherparameters(suchastheautoregressivecoefficient),andaricherdecomposition of labor income (such as ARMA(p,q)).22 There has also been a growing interest in allowing for different dimensions of heterogeneity, in income profiles, initial levels, autoregressive coefficients, variances of idiosyncratic shocks, and more.23 Most of these extensions can be handled within the LGSS structure and can thus be estimated given the methods previously discussed. However, our purpose in this section is not to be comprehensive, but to investigate how some common complicationsoftheincomeprocessesmayaffecttherelativeperformanceoftheBayesianestimator. Thus, we focus on the aforementioned four extensions. 4.1 Data Structure Most panel datasets used for estimating labor income processes are unbalanced. That is, the dataset contains different cohorts and does not necessarily measure the initial labor market experiences of all cohorts. They are also known to contain many missing observations. This can be due to a variety of reasons, such as misreporting, attrition of the participants, and top-coding. To examine how the Bayesian estimator performs on an unbalanced panel dataset with missing observations, we modify the baseline data generating process in two ways. First, we assign a 5 percent probability of being missing to each of the T*N observations. Second, we let a fraction (30 percent)ofthepopulationenterthedatasetwhentheyareage4andleaveatage10tomakethedata unbalanced. As in the benchmark calibration of the previous simple labor income process, we set N=500 and T=10, and let the true parameter values be (ρ,Σ ,Σ ,Σ ) = (1.00,0.02,0.05,0.15). η ν Z0 The Gibbs sampling algorithm needs to be modified to take missing data into account, but the estimation of a LGSS model with missing observations has been well studied and the modification is straightforward. The details on how to modify the Gibbs sampling procedure are given in Appendix A.3. Prior distributions are specified in the same way as in the baseline labor income process. Table5comparestheBayesianestimatorandtheGMMestimatorwiththisdatastructure. The Bayesian estimator continues to perform better than the GMM estimator. While the RMSEs for Σ are roughly equal across the two estimators (0.0147 for the Bayesian estimator and 0.0157 for Z0 the GMM estimator), the RMSEs associated with the Bayesian estimator are substantially smaller than those from the GMM estimator for ρ, Σ , and Σ . As in the baseline exercise, the smaller η ν RMSEs of the Bayesian estimator comes from its smaller bias and smaller standard deviation. For all parameters except ρ, the mean of the Bayesian estimator is closer to the true parameter values than that of the GMM estimator. For all parameters, the standard deviation of the Bayesian estimator is smaller. The overall outperformance of the Bayesian estimator remains robust to different probabilities of missing observations and different fractions of the second cohort, as well as to alternative values 22It is important to properly model measurement error, as documented by French (2004). 23See Browning, Hansen, and Heckman (1999) and Jensen and Shore (2010) for a discussion. 12

of N, T, and ρ. These robustness results are available from the authors upon request. 4.2 Time Varying Variances There is ample evidence that the variances of labor income shocks are time-varying. Many authors have documented a significant low frequency rise in the variances of income shocks over time. For example, Blundell, Pistaferri, and Preston (2008) and Gottschalk and Moffitt (1994) analyze the PSID and find increasing variances of shocks during the 1980s. Heathcote, Perri, and Violante (2010) conduct extensive analysis on wage inequality and show the continuous rise in income volatility over the past 40 years. Others have found that there is a fluctuation in shock volatility at the business cycle frequency. For example, using the PSID, Storesletten, Telmer, and Yaron (2004) found that the variance of income shocks is larger during recessions than during booms. Understanding the evolution of volatility in the labor income process is important in itself, asitallowsustounderstandthechangeintheriskenvironmentfacingworkers. Itisalsoimportant in analyzing many macroeconomic topics, such as the welfare cost of business cycles or the degree of self-insurance and market incompleteness. While there are many different ways to model time variation in the variances, the majority of the literature models heteroskedasticity by treating the variance at each time as a different parameter.24 We follow the literature and modify the baseline labor income process so that each variance parameter is specific to the time period. That is, our new data generating process is described by (cid:15) = ρ(cid:15) +Σ0.5η (5) i,t i,t−1 η,t i,t y = (cid:15) +Σ0.5ν (6) i,t i,t ν,t i,t where Σ , Σ are the variances of the transitory and persistent innovations at time t, respec- ν,t η,t tively. We focus on the case with T=10 and calibrate the variances according to the estimates of Blundell, Pistaferri, and Preston (2008).25 As before, we set N=500 and let the true parameter values for the rest of the parameters be (ρ,Σ ) = (1.00,0.15). Z0 The linear Gaussian state space structure is designed to allow parameters to vary over time. The only critical modification in the estimation routine is to feed the correct parameter values into theKalmanfilterattherighttimes. TomapournewprocessintoaLGSS,weneedtodefineA , B , t t C , D , Q , R for all t. We set A = ρ, B = 1, C = 0, D = 1 ∀ t, keeping these matrices constant t t t t t t t t over time as in the benchmark simple labor income process. We set Q = Σ ,...,Q = Σ and 1 η,1 T η,T R = Σ ,...,R = Σ . Estimation then proceeds as in the stationary variances case. Prior 1 ν,1 T ν,T distributions for ρ and Σ are specified in the same way as in the baseline labor income process. Z0 24Asanotablealternative,MeghirandPistaferri(2004)modeltheevolutionofthevarianceasaGARCHprocess. 25Since the timing convention in our model is slightly different from that of Blundell, Pistaferri, and Preston (2008),Σ andΣ arenotseparatelyidentifiedfromΣ andΣ ,respectively. Thus,wesetΣ andΣ to η,1 η,10 Z0 ν,10 η,1 η,10 their true values in the estimation. 13

For the prior distributions of Σ and Σ , we use the inverse-gamma distribution with degree of η,t ν,t freedom 2 and location parameter equal to 0.01 for all t. Table 6 compares the performance of the two estimators. As in the previous cases, the Bayesian estimator tends to deliver smaller RMSEs than the GMM estimator. For the time specific shock variances, the Bayesian estimator delivers substantially—about 30-50 percent—smaller RMSEs than the GMM estimator, except for Σ and Σ where the differences are small between two η,2 ν,1 estimators. The outperformance of the Bayesian estimator comes from both smaller bias and standard deviation. For the Σ and Σ as well as ρ and Σ , the RMSEs of the two estimators η,2 ν,1 Z0 are roughly the same. As before, these results are robust to alternative parameterizations of the data generating process, and the robustness results are available from the authors upon request. For both estimators, the estimates of each variance parameter are not as precise as in the benchmark case. As an example, consider the RMSEs of Σ in the benchmark DGP and Σ in ν ν,10 the DGP with time-varying variance. The true parameter values (0.05 and 0.0506) are roughly the same. However, for both the Bayesian and GMM estimators, the RMSEs of Σ are about three ν,10 times larger than those of Σ . The reduction in the precision of the estimators is not surprising ν because we are estimating a larger number of parameters using the same number of observations. Nevertheless, both estimators remain sufficiently precise to closely track the rise and fall of the variances over time. 4.3 Heterogeneity Understanding the heterogeneity in income process parameters is essential for answering questions with distributional aspects—across individuals and over the life cycle. Thus, many authors have estimated labor income processes allowing for heterogeneity in parameters across individuals. Lillard and Weiss (1979), Baker (1997), and Guvenen (2009) studied labor income processes in which the deterministic growth rate of income is heterogeneous across people. Jensen and Shore (2010, 2011) estimated models with heterogeneity in the variances of shocks to income processes. Browning, Ejrnaes, and Alvarez (2010) went further and estimated labor income processes where all parameters are allowed to be heterogeneous across individuals. While many authors have stayed within the framework of the method-of-moment estimator to handle heterogeneity, some have employed Bayesian methods. To obtain some idea about the performance of the Bayesian estimator in the presence of heterogeneity, we modify our benchmark labor income process to introduce heterogeneity in one parameter—the deterministic growth rate of labor income. This modified process is often referred to as the labor income process with heterogeneous income profiles (HIP), and is specified as follows: (cid:15) = ρ(cid:15) +Σ0.5η (7) i,t i,t−1 η i,t y = β t+(cid:15) +Σ0.5ν (8) i,t i i,t ν i,t 14

whereβ isnormallydistributedwithmeanzeroandvarianceΣ . Ourbenchmarklaborincome i β process is a special case of this process with Σ = 0. To estimate the parameters of this model, β the Gibbs sampler needs to be modified to include two additional blocks—one for drawing Σ β and the other for drawing {β }N . Appendix A.4 shows the details. We calibrate Σ based on i i=1 β the estimate from Guvenen (2009). As before, we set N=500 and T=10, and calibrate the other parameters to be (ρ,Σ ,Σ ,Σ ) = (1.00,0.02,0.05,0.15). For the prior distribution of Σ , we use η ν Z0 β the inverse-gamma distribution with degree of freedom 2 and location parameter equal 0.01. Prior distributions of the other parameters are the same as in the baseline labor income process. Table 7 shows the results of the Monte Carlo experiment. The Bayesian estimator performs favorablycomparedtotheGMMestimatorinthemodelwithheterogeneousincomeprofiles. While both estimates are somewhat larger (0.0006) than the actual value (0.0004) for Σ , the Bayesian β estimator is more efficient and has a lower RMSE. Estimating this additional parameter does not altertherelativeperformanceofthetwoestimatorsforotherparameters. Inparticular,eventhough the Bayesian estimator’s RMSE for ρ becomes somewhat larger than the GMM counterpart, the Bayesian estimator continues to dominate the GMM estimator for all of the variance parameters. 4.4 Non-Normal Shocks There is some evidence that shocks to the labor income process are not normally distributed. Hirano(1998)andGewekeandKeane(2002)bothfindthatlaborincomeshocksarefat-tailed. Capturing such non-normality is important in itself, but is also important for understanding earnings mobility and the persistence of poverty. Even if one is not interested in the higher moments of the shock distribution, modeling nonnormality is important as misspecification of the error distribution can reduce the efficiency of the estimator. The main efficiency gains in MLE arise because the estimator takes advantage of the parametric form of the shock distributions. The choice between GMM and likelihood-based estimation can then be viewed as a trade-off between smaller standard errors vs. misspecified distributions. To take advantage of increased efficiency, but to limit the possible damage from assuming the wrong distribution, the aforementioned authors allow for shocks to be distributed according to a mixture of normals. Mixtures of normal distributions are very flexible, allowing for a more robust estimation procedure. Models with few mixture components have been shown to perform well in small samples and can be viewed as a flexible parametric model that is an alternative to non-parametric density estimation without the associated curse of dimensionality.26 In the following Monte Carlo analysis, we follow the previous papers and modify the baseline DGP so that the shocks are distributed according to a mixture of normal distributions. For simplicity of exposition, we use a mixture of two normal distributions. We also abstract from skewness by fixing the means of the normal distributions to be zero. Thus, our new data generating process 26Norets (2010); Norets and Pelenis (forthcoming, 2012) document the theory and practice of approximating continuous distributions using a finite number of mixtures of normal distributions. They show “that large classes of conditionaldensitiescanbeapproximatedintheKullback-Leiblerdistancebydifferentspecificationsoffinitesmooth mixtures of normal densities or regressions.” 15

is described as (cid:15) = ρ(cid:15) +η (9) i,t i,t−1 i,t y = (cid:15) +ν (10) i,t i,t i,t With probability π , η is drawn from a normal distribution with variance h . With prob- η,1 i,t η,1 ability π = (1 − π ), η is drawn from a normal distribution with variance h . Similarly, η,2 η,1 i,t η,2 ν is drawn from a normal distribution with variance h with probability π , and ν is drawn i,t η,1 ν,1 i,t from a normal distribution with variance h with probability π = (1−π ). Notice that we ν,2 ν,2 ν,1 have increased the number of parameters to describe the distribution of η and ν. In the benchmark income process, we only needed to estimate four parameters, (ρ,Σ ,Σ ,Σ ). Here, we need to Z0 η ν estimate eight parameters, (ρ,Σ ,π , h , h , π , h , h ). The added distributional flexi- Z0 η,1 η,1 η,2 ν,1 ν,1 ν,2 bility comes at the cost of additional parameters to estimate. With sufficient sample size, the gains should outweigh the losses. We calibrate the parameters of the mixture distribution so that both transitory and persistent shocks are fat-tailed (positive excess kurtosis). We set N=500 and T=10, and calibrate the rest of the parameters to be (ρ,Σ ) = (1.00,0.15). The details on how to modify the Gibbs sampling Z0 algorithm to handle non-normality is given in Appendix A.5. Prior distributions for ρ and Σ are Z0 specified in the same way as in the baseline labor income process. Prior distributions of parameters specific to the mixture normal model is discussed in Appendix A.5. Since it is not common to estimate kurtosis in the GMM framework, we simply apply the unmodified GMM estimator. Table8comparesthefinitesamplepropertiesoftheBayesianestimatorandtheGMMestimator. For the Bayesian estimator, in addition to the estimates for all parameters, we report the second and fourth moments of the shock distribution implied by the point estimates of the parameters governing the mixture distribution. To concisely convey the results, we report the average of the point estimates across 100 datasets for the parameters governing the mixture. Even though the Bayesian estimator is now estimating a larger number of parameters, its performance remains almost as good as in the benchmark case. The RMSEs of the Bayesian estimator are roughly the same as those in the benchmark labor income process, except for Σ . η The RMSEs of the Bayesian estimators remain smaller than those of the GMM estimator for the estimated second moments of η, ν, and the initial state (cid:15) . The Bayesian estimator also performs i,0 wellinestimatingtheparametersgoverningthemixturedistribution. Forparametersgoverningthe distributionofν,themeanoftheestimatorcomesveryclosetothetruevalues,andthus,theimplied kurtosis estimate is very close to the true kurtosis. For parameters governing the distribution of η, the means of the estimators are not as close to the true values as in the case for ν. For example, the estimator overstates the probability that ν is drawn from the distribution with larger variance by 4.4 percentage points. As a result, the implied kurtosis estimate is slightly downward biased, but its average estimate (8.29) is reasonably close to the true value of 8.97. For both η and ν, 16

the standard deviation and thus RMSEs for the kurtosis estimates are larger compared to other parameters. This is not surprising given that they are identified from the realizations of very large infrequent shocks. Estimating the model with mixtures of normal distributions is substantially more computationallyintensivethanestimatingthemodelsconsideredearlier. Thus, wearelimitedinexamininghow robust these results are to alternative specifications of the parameters and priors. Nevertheless, our Monte Carlo exercise suggests that the Bayesian estimator is useful in estimating the labor income processes even when the errors are not normally distributed. 5 Conclusion In this paper, we have examined the validity of performing Bayesian estimation on widely used error component models of labor income processes. Although the differences are not large, the Monte Carlo analysis reveals that Bayesian parameter estimates have favorable efficiency and bias properties relative to GMM in small samples. After accounting for documented features of the data—such as missing values, heteroskedasticity, parameter heterogeneity, and non-normal errors—the beneficial small sample properties of the Bayesian estimator remain. The favorable smallsamplepropertiesprovideajustificationforemployingBayesiantechniquesinfutureresearch. The detailed derivation of the estimators provides a clear exposition of the estimation routines and a better understanding of the source of the estimation results. It is also designed to help applied economists develop Bayesian estimation routines for related labor income processes. Our analysis of the extensive Monte Carlo experiments suggests that Bayesian methods are appropriate for the relevant types of processes and data typically used by applied labor economists. 17

References Abowd, J. M., and D. Card (1989): “On the Covariance Structure of Earnings and Hours Changes,” Econometrica, 57(2), 411–445. Altonji, J. G., and L. Segal(1996): “Small-SampleBiasinGMMEstimationofCovarianceStructures,” Journal of Business and Economic Statistics, 14(3), 353–366. Baker, M. (1997): “Growth-Rate Heterogeneity and the Covariance Structure of Life-Cycle Earnings,” Journal of Labor Economics, 15(2), 338–375. Blundell, R., L. Pistaferri, and I. Preston(2008): “ConsumptionInequalityandPartialInsurance,” American Economic Review, 98:5, 1887–1921. Browning, M., M. Ejrnaes, and J. Alvarez (2010): “Modelling Income Processes With Lots of Heterogeneity,” Review of Economic Studies, 77(4), 1353–1381. Browning, M., L. P. Hansen, and J. J. Heckman (1999): Micro data and general equilibrium modelsvol. 1 of Handbook of Macroeconomics, chap. 8, pp. 543–633. Elsevier, 1 edn. Carter, C. K., and R. Kohn (1994): “On Gibbs Sampling for State Space Models,” Biometrika, 81(3), 541–553. Clark, T. E. (1996): “Small-Sample Properties of Estimators of Nonlinear Models of Covariance Structure,” Journal of Business and Economic Statistics, 14(3), 367–373. Durbin, J., and S. J. Koopman (2002): “A Simple and Efficient Simulation Smoother for State Space Time Series Analysis,” Biometrika, 89(3), 603–616. French, E. (2004): “The Labor Supply Response to (Mismeasured but) Predictable Wage Changes,” The Review of Economics and Statistics, 86(2), 602–613. Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (2009): Bayesian Data Analysis. Chapman & Hall. Geweke, J. (2005): Contemporary Bayesian Econometrics and Statistics. Wiley Series in Probability and Statistics. (2007): “Interpretation and Inference in Mixture Models: Simple MCMC Works,” Computational Statistics & Data Analysis, 51(7), 3529–3550. Geweke, J., and M. Keane (2000): “An Empirical Analysis of Earnings Dynamics Among Men in the PSID: 1968-1989,” Journal of Econometrics, 96, 293–356. Gottschalk, P., and R. A. Moffitt (1994): “The Growth of Earnings Instability in the U.S. Labor Market,” Brookings Papers on Economic Activity, 25(2), 217–272. Guvenen, F. (2009): “An empirical investigation of labor income processes,” Review of Economic Dynamics, 12(1), 58–79. Hamilton, J. D. (1994): Time Series Analysis. Princeton University Press, 1 edn. Heathcote, J., F. Perri, and G. L. Violante (2010): “Unequal We Stand: An Empirical Analysis of Economic Inequality in the United States 1967-2006,” Review of Economic Dynamics, 13(1), 15–51. Hirano, K. (1998): “A Semiparametric Model for Labor Earnings Dynamics,” in Practical Nonparametric and Semiparametric Bayesian Statistics,ed.byD.D.Dey,P.Muller,andD.Sinha,chap.20,pp.355–367. Springer. 18

Jacquier, E., N. G. Polson, and P. E. Rossi (1994): “Bayesian Analysis of Stochastic Volatility Models,” Journal of Business and Economic Statistics, 12(4), 371–389. Jensen, S. T., and S. H. Shore (2010): “Changes in the Distribution of Income Volatility,” mimeo. (2011): “Semi-Parametric Bayesian Modelling of Income Volatility Heterogeneity,” Journal of the American Statistical Association, 106, 1280–1290. Lillard, L. A., and Y. Weiss (1979): “Components of Variation in Panel Earnings Data: American Scientists 1960-1970,” Econometrica, 47(2), 437–454. Lillard, L. A., and R. J. Willis (1978): “Dynamic Aspects of Earning Mobility,” Econometrica, 46(5), 985–1012. MaCurdy, T. E. (1982): “The Use of Time Series Processes to Model the Error Structure of Earnings in a Longitudinal Data Analysis,” Journal of Econometrics, 18(1), 83–114. Meghir, C., and L. Pistaferri (2004): “Income Variance Dynamics and Heterogeneity,” Econometrica, 72(1), 1–32. Norets, A. (2010): “Approximation of Conditional Densities by Smooth Mixtures of Regressions,” Annals of Statistics, 38(3), 1733–1766. Norets, A., and J. Pelenis (2012): “Bayesian modeling of joint and conditional distributions,” Journal of Econometrics, 168(2), 332–346. (forthcoming): “Posterior Consistency in Conditional Density Estimation by Covariate Dependent Mixtures,” Economic Theory. Pesaran, H. (2006): “Estimation and Inference in Large Heterogeneous Panels with Multifactor Error Structure,” Econometrica, 74(4), 967–1012. Robert, C. P., and G. Casella (2004): Monte Carlo Statistical Methods, Springer Texts in Statistics. Springer, 2nd edn. Storesletten, K., C. I. Telmer, and A. Yaron (2004): “Cyclical Dynamics in Idiosyncratic Labor Market Risk,” Journal of Political Economy, 112(3), 695–717. 19

Tables Table 1: Finite-Sample Properties of the Bayesian and GMM Estimators: Benchmark Labor Income Process (N=500, T=10) Parameter True Value Bayesian GMM MLE ρ 1.0000 0.9953 0.9963 0.9995 (0.0032) (0.0047) (0.0037) [0.0057] [0.0060] [0.0047] Σ 0.1500 0.1532 0.1543 0.1535 Z0 (0.0112) (0.0117) (0.0110) [0.0116] [0.0125] [0.0115] Σ 0.0200 0.0204 0.0210 0.0201 η (0.0016) (0.0020) (0.0015) [0.0016] [0.0023] [0.0015] Σ 0.0500 0.0494 0.0481 0.0499 ν (0.0015) (0.0030) (0.0015) [0.0016] [0.0035] [0.0015] *Foreachparameterandeachestimator,thetopnumberisthemeanoftheestimatesacrosssamples,thenumberin parentheses is the standard deviation, and the number in square brackets is the root mean square error. Table 2: RMSEs of the Bayesian and GMM Estimators for Alternative Calibrations N=500, T=5 N=500, T=20 Parameter Bayesian GMM Bayesian GMM ρ 0.0121 0.0122 0.0028 0.0029 Σ 0.0138 0.0133 0.0141 0.0152 Z0 Σ 0.0032 0.0050 0.0010 0.0017 η Σ 0.0028 0.0048 0.0011 0.0042 ν N=100, T=10 N=2000, T=10 Parameter Bayesian GMM Bayesian GMM ρ 0.0116 0.0116 0.0025 0.0024 Σ 0.0305 0.0322 0.0063 0.0065 Z0 Σ 0.0035 0.0059 0.0007 0.0012 η Σ 0.0037 0.0092 0.0008 0.0019 ν ρ = 0.8 (N=500, T=10) Parameter Bayesian GMM ρ 0.0140 0.0171 Σ 0.0176 0.0165 Z0 Σ 0.0021 0.0025 η Σ 0.0019 0.0025 ν 20

Table 3: Finite-Sample Properties of the Bayesian and GMM Estimators: ρ-misspecified Model (N=500, T=10) Parameter True Value Bayesian GMM Σ 0.1500 0.1203 0.1183 Z0 (0.0088) (0.0092) [0.0310] [0.0326] Σ 0.0200 0.0155 0.0055 η (0.0012) (0.0014) [0.0047] [0.0144] Σ 0.0500 0.0537 0.0729 ν (0.0017) (0.0029) [0.0041] [0.0230] *The model defined by equations (1) and (2) is estimated assuming ρ = 1, but the true data generating process is givenbyequations(1)and(2)withρ=0.95. Foreachparameterandeachestimator,thetopnumberisthemeanof theestimatesacrosssamples,thenumberinparenthesesisthestandarddeviation,andthenumberinsquarebrackets is the root mean square error. Table 4: Finite-Sample Properties of the Bayesian and GMM Estimators: Mixture-misspecified Model (N=500, T=10) Parameter True Value Bayesian GMM ρ 1.0000 0.996 0.9998 (0.002) (0.0039) [0.005] [0.0048] Σ 0.1500 0.149 0.1506 Z0 (0.013) (0.0135) [0.013] [0.0135] Σ 0.0200 0.021 0.0213 η (0.002) (0.0023) [0.002] [0.0026] Σ 0.0500 0.050 0.0476 ν (0.003) (0.0035) [0.003] [0.0042] *The model defined by equations (1) and (2) is estimated assuming that the error terms are distributed according to Normal distributions, but the true data generating process is given by equations (1) and (2) with the error terms distributedasmixturesofNormaldistributions,asoutlineinSection4.4. Thus,the“True”variancesaretheimplied true variances of η and ν given the Mixture DGP. For each parameter and each estimator, the top number is the meanoftheestimatesacrosssamples,thenumberinparenthesesisthestandarddeviation,andthenumberinsquare brackets is the root mean square error. 21

Table 5: Finite-Sample Properties of the Bayesian and GMM Estimators: Unbalanced Panel Dataset with Missing Observations (N=500, T=10) Parameter True Value Bayesian GMM ρ 1.0000 0.9952 0.9955 (0.0025) (0.0056) [0.0055] [0.0072] Σ 0.1500 0.1551 0.1551 Z0 (0.0139) (0.0148) [0.0147] [0.0157] Σ 0.0200 0.0205 0.0213 η (0.0016) (0.0033) [0.0017] [0.0036] Σ 0.0500 0.0493 0.0478 ν (0.0017) (0.0036) [0.0018] [0.0043] *Foreachparameterandeachestimator,thetopnumberisthemeanoftheestimatesacrosssamples,thenumberin parentheses is the standard deviation, and the number in square brackets is the root mean square error. 22

Table 6: Finite-Sample Properties of the Bayesian and GMM Estimators: Time-Varying Variances (N=500, T=10) Mean RMSE Parameter True Value Bayesian GMM Bayesian GMM ρ 1.0000 0.9954 0.9970 0.0055 0.0050 Σ 0.1500 0.1501 0.1497 0.0120 0.0121 Z0 Σ 0.0207 0.0207 0.0213 0.0070 0.0068 η2 Σ 0.0301 0.0317 0.0325 0.0052 0.0081 η3 Σ 0.0274 0.0281 0.0282 0.0047 0.0063 η4 Σ 0.0293 0.0292 0.0284 0.0050 0.0077 η5 Σ 0.0222 0.0225 0.0240 0.0042 0.0074 η6 Σ 0.0289 0.0290 0.0288 0.0046 0.0067 η7 Σ 0.0157 0.0159 0.0181 0.0052 0.0080 η8 Σ 0.0185 0.0186 0.0202 0.0052 0.0081 η9 Σ 0.0415 0.0412 0.0399 0.0066 0.0070 ν1 Σ 0.0318 0.0312 0.0300 0.0038 0.0067 ν2 Σ 0.0372 0.0360 0.0351 0.0046 0.0070 ν3 Σ 0.0286 0.0281 0.0275 0.0036 0.0062 ν4 Σ 0.0286 0.0283 0.0265 0.0037 0.0067 ν5 Σ 0.0351 0.0348 0.0338 0.0040 0.0070 ν6 Σ 0.0380 0.0375 0.0354 0.0039 0.0072 ν7 Σ 0.0544 0.0535 0.0520 0.0054 0.0087 ν8 Σ 0.0369 0.0363 0.0354 0.0045 0.0087 ν9 Σ 0.0506 0.0501 0.0509 0.0051 0.0096 ν10 23

Table 7: Finite-Sample Properties of the Bayesian and GMM Estimators: Labor Income Process with HIP (N=500, T=10) Parameter True Value Bayesian GMM ρ 1.0000 0.9936 0.9960 (0.0037) (0.0040) [0.0074] [0.0062] Σ 0.1500 0.1530 0.1538 Z0 (0.0121) (0.0144) [0.0124] [0.0149] Σ 0.0200 0.0197 0.0195 η (0.0021) (0.0045) [0.0021] [0.0045] Σ 0.0500 0.0498 0.0493 ν (0.0019) (0.0041) [0.0019] [0.0041] Σ 0.0004 0.0006 0.0006 β (0.0003) (0.0005) [0.0003] [0.0005] *Foreachparameterandeachestimator,thetopnumberisthemeanoftheestimatesacrosssamples,thenumberin parentheses is the standard deviation, and the number in square brackets is the root mean square error. 24

Table 8: Finite-Sample Properties of the Bayesian and GMM Estimators: Non-Normal Error Distributions (N=500, T=10) Parameter True Value Bayesian GMM ρ 1.0000 0.9960 0.9987 (0.0023) (0.0036) [0.0047] [0.0045] Σ 0.1500 0.1490 0.1492 Z0 (0.0115) (0.0135) [0.0115] [0.0135] Σ 0.0200 0.0208 0.0214 η (0.0014) (0.0024) [0.0016] [0.0028] Kurtosis of η 8.9743 8.2942 na (1.1551) (na) [1.3354] [na] h 0.0059 0.0054 na η,1 h 0.0766 0.0721 na η,2 π 0.8000 0.7560 na η,1 π 0.2000 0.2440 na η,2 Σ 0.0500 0.0492 0.0479 ν (0.0023) (0.0035) [0.0025] [0.0042] Kurtosis of ν 9.0112 9.1119 na (0.5790) (na) [0.5848] [na] h 0.0146 0.0141 na ν,1 h 0.1914 0.1902 na ν,2 π 0.8000 0.7989 na ν,1 π 0.2000 0.2011 na ν,2 *Foreachparameterandeachestimator,thetopnumberisthemeanoftheestimatesacrosssamples,thenumberin parentheses is the standard deviation, and the number in square brackets is the root mean square error. 25

A Online Technical Appendix A.1 Log Likelihood of LGSS Define the canonical linear Gaussian state space (LGSS) form. S = A S +B v t t t−1 t t Y = C z +D S +w t t t t t t Then the log likelihood of parameter vector θ =(A ,B ,C ,D ,Q ,R ,S ,P ) is given by: t t t t t t 0|0 0|0 lnp(YT|θ) T (cid:89) =ln p(Y |Yt−1,θ) t t=1 T (cid:88) = lnp(Y |Yt−1,θ) t t=1 T (cid:88) = ln[(2π)−0.5kdet[P−0.5]exp[−0.5(Y −C z −D S )(cid:48)P−1 (Y −C z −D S )]] t|t−1 t t t t t|t−1 t|t−1 t t t t t|t−1 t=1 T T (cid:88) (cid:88) = ln[(2π)−0.5kdetP−0.5]+ ln[exp[−0.5(Y −C z −D S )(cid:48)P−1 (Y −C z −D S )]] t|t−1 t t t t t|t−1 t|t−1 t t t t t|t−1 t=1 t=1 T T (cid:88) (cid:88) =−0.5Tkln(2π)−0.5 ln[detP ]+ [−0.5(Y −C z −D S )(cid:48)P−1 (Y −C z −D S )] t|t−1 t t t t t|t−1 t|t−1 t t t t t|t−1 t=1 t=1 A.2 Bayesian Estimation Procedure The labor income process we want to estimate is: y = X β+(cid:15) +Σ0.5ν i,t i,t i,t ν i,t (cid:15) = ρ(cid:15) +Σ0.5η i,t i,t−1 η i,t for i = 1,...,N and t = 1,...,T. η and ν are independent across i and t, and are distributed as i,t i,t standard normal: η ∼N(0,1) i,t and ν ∼N(0,1) i,t (cid:15) is also distributed as normal: i,0 (cid:15) ∼N(0,Σ ) i,0 Z0 X consists of a polynomial of age, race, education, urban residence controls, etc. i,t (cid:16) (cid:17) Our goal is to draw from p β,ρ,Σ ,Σ ,Σ ,{(cid:15) }t=0,T|{y }t=1,T . η ν Z0 i,t i=1,N i,t i=1,N 26

A.2.1 Prior Distribution We specify the prior distributions for β and ρ to be normal with arbitrarily large variances, truncating ρ∈[0,1] to ensure stationarity. β ∼ N(β ,Γ ) 0 0,β ρ ∼ N(ρ ,Γ )I[−1≤ρ≤1] 0 0,ρ We specify the prior distributions for Σ , Σ , and Σ to be inverse-gamma. η ν Z0 Σ ∼ IG (cid:0) T ,S−1(cid:1) η 0,η 0,η Σ ∼ IG (cid:0) T ,S−1(cid:1) ν 0,ν 0,ν (cid:16) (cid:17) Σ ∼ IG T ,S−1 Z0 0,Z0 0,Z0 A.2.2 Gibbs Sampling We use Gibbs sampling with the following 6 blocks. (cid:16) (cid:17) Block 1 : p β|ρ,Σ ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T η ν Z0 i,t i=1,N i,t i=1,N (cid:16) (cid:17) Block 2 : p ρ|β,Σ ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T η ν Z0 i,t i=1,N i,t i=1,N (cid:16) (cid:17) Block 3 : p Σ |β,ρ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T η ν Z0 i,t i=1,N i,t i=1,N (cid:16) (cid:17) Block 4 : p Σ |β,ρ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T ν η Z0 i,t i=1,N i,t i=1,N (cid:16) (cid:17) Block 5 : p Σ |β,ρ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T Z0 η ν i,t i=1,N i,t i=1,N (cid:16) (cid:17) Block 6 : p {(cid:15) }t=0,T|β,ρ,Σ ,Σ ,Σ ,{y }t=1,T i,t i=1,N η ν Z0 i,t i=1,N Throughout Block 1 to Block 4, the formulae are the same as in the standard case with one person, except that we have T*N observations, instead of T observations. Block 1: β It is identical to the problem of drawing from the posterior of coefficients in a linear regression model. The posterior distribution is normal, and its mean and variance are analytically computed (see Chapter 2 of Geweke (2005)). (cid:16) (cid:17) p β|ρ,Σ ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T ∼N(µ ,Γ ) η ν Z0 i,t i=1,N i,t i=1,N 1,β 1,β where27 27In the case of an informative prior, the formula is given by 27

(cid:34) N T (cid:35)−1 (cid:88)(cid:88) Γ = Σ X X(cid:48) 1,β ν i,t i,t i=1 t=1 (cid:34) N T (cid:35) µ = Γ 1 (cid:88)(cid:88) X (y −(cid:15) )(cid:48) 1,β 1,β Σ i,t i,t i,t ν i=1 t=1 Block 2: ρ The same as in Block 1, except for truncation (see p.169 of Geweke (2005)). (cid:16) (cid:17) p ρ|β,Σ ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T ∼N(µ ,Γ )[−1≤ρ≤1] η ν Z0 i,t i=1,N i,t i=1,N 1,ρ 1,ρ where (cid:34) N T (cid:35)−1 (cid:88)(cid:88) Γ = Σ (cid:15)2 1,ρ η i,t−1 i=1 t=1 (cid:34) N T (cid:35) 1 (cid:88)(cid:88) µ = Γ (cid:15) (cid:15) 1,ρ 1,ρ Σ i,t−1 i,t η i=1 t=1 Block 3: Σ ν p (cid:16) Σ |β,ρ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T (cid:17) ∼IG (cid:0) T ,S−1(cid:1) ν η Z0 i,t i=1,N i,t i=1,N 1,ν 1,ν where T = T +TN 1,ν 0,ν N T S = S + (cid:88)(cid:88) (y −X β−(cid:15) )2 1,ν 0,ν i,t i,t i,t i=1 t=1 (cid:20) 1 (cid:21)−1 Γ = Γ−1+ X(cid:48)X 1 0 σ2 (cid:20) (cid:21) 1 µ = Γ Γ−1µ + X(cid:48)y 1 1 0 0 σ2 As Γ goes to infinity, 0 Γ → σ2(cid:2) X(cid:48)X (cid:3)−1 1 µ → (cid:2) X(cid:48)X (cid:3)−1(cid:2) X(cid:48)y (cid:3) 1 which coincides with β and var(β ). ols ols 28

Block 4: Σ η p (cid:16) Σ |β,ρ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T (cid:17) ∼IG (cid:0) T ,S−1(cid:1) η ν Z0 i,t i=1,N i,t i=1,N 1,η 1,η where T = T +TN 1,η 0,η N T S = S + (cid:88)(cid:88) ((cid:15) −ρ(cid:15) )2 1,η 0,η i,t i,t−1 i=1 t=1 Block 5: Σ Z0 (cid:16) (cid:17) (cid:16) (cid:17) p Σ |β,ρ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T ∼IG T ,S−1 Z0 η ν i,t i=1,N i,t i=1,N 1,Z0 1,Z0 where T = T +N 1,Z0 1,Z0 N (cid:88) S = S + (cid:15)2 1,Z0 0,Z0 i,0 i=1 Block 6: {(cid:15) } t For each individual i, we apply the Carter-Kohn algorithm.28 Step 1: Specify S and P 0|0 0|0 Step 2: Kalman-Filter Starting from S and P , compute {P ,P }T and {S }T using the Kalman Filter. 0|0 0|0 t|t−1 t|t t=1 t|t t=1 S = A S t|t−1 t t−1|t−1 P = A P A(cid:48) +BQ B(cid:48) t|t−1 t t−1|t−1 t t G = D P D(cid:48) +R t t t|t−1 t t K = P D(cid:48)G−1 t t|t−1 t t S = S +K (Y −C z −D S ) t|t t|t−1 t t t t t t|t−1 P = (I −K D )P t|t ns t t t|t−1 where S ≡E[S |Yt] and P ≡Var[S |Yt]. s|t s s|t s Step 3: Carter-Kohn’s backward recursion: At each t=T through 1 going backward, compute S = S +P A(cid:48)P−1 (S −A S ) t−1|t t−1|t−1 t−1|t−1 t t|t−1 t t t−1|t−1 P = P −P A(cid:48)P−1 A P t−1|t t−1|t−1 t−1|t−1 t t|t−1 t t−1|t−1 28For details see Carter and Kohn (1994). 29

(cid:0) (cid:1) based on S , and draw S from N S ,P . t t−1 t−1|t t−1|t A.3 Bayesian Estimation Procedure with Missing Data This section presents how to modify Gibbs Block 6 to handle missing observations. In terms of estimation, unbalancedpaneldatasetscanbeseenascontainingsystematicmissingobservationsanddoesnotintroduce extra complication. The LGSS with missing observations can be written as S = A S +Bv t t t−1 t Y = C z +D S +w t t t t t t where v ∼ N(0,Q ) t t w ∼ N(0,R ) t t (cid:0) (cid:1) S ∼ N S ,P 0 0|0 0|0 When working with missing data, it is convenient to let Y denote a vector containing the incomes of t multiple individuals at time t.29 Note that some elements of Y may be missing, i.e., Y =‘NaN(cid:48). Let n t i,t m,t be the number of missing observations at time t. Let W be a (n −n )−by−n matrix that selects the t y m,t y elements of Y that are not missing. For example, suppose Y is 6-by-1, and the second and fourth elements t t are missing.   3.5  ‘NaN(cid:48)     2.1  Y t =  ‘NaN(cid:48)      3.2  5.9 Then, W is a 4-by-6 matrix with the following structure. t   1 0 0 0 0 0  0 0 1 0 0 0  W t =  0 0 0 0 1 0   0 0 0 0 0 1 so that   3.5  1 0 0 0 0 0   ‘NaN(cid:48)   3.5    Y¯ t ≡W t Y t =    0 0 0 0 1 0 0 0 0 1 0 0       ‘N 2 a .1 N(cid:48)    =    2 3 . . 1 2      0 0 0 0 0 1  3.2  5.9 5.9 When no value is missing, W is a n −by−n identity matrix, and Y¯ =Y . t y y t t Using this variable W , we transform the observation equation for each t to obtain t 29In all other sections, Y refers to the income of a single individual. t 30

S = A S +Bv t t t−1 t W Y = W C z +W D S +W w t t t t t t t t t t Finally, create an indicator variable φ which takes the value 0 when all values of Y are missing, and 1 t t otherwise. If φ =0, set W =I . t t ny A.3.1 Estimation Algorithm To modify Block 6: Gibbs Sampling, we must only modify Step 2. Other steps are unmodified. Step 2: Kalman-Filter Starting from S and P , compute {P ,P }T and {S }T using Kalman Filter. 0|0 0|0 t|t−1 t|t t=1 t|t t=1 S = A S t|t−1 t t−1|t−1 P = A P A(cid:48) +BQ B(cid:48) t|t−1 t t−1|t−1 t t G = W D P D(cid:48)W(cid:48)+W R W(cid:48) t t t t|t−1 t t t t t K = P D(cid:48)W(cid:48)G−1 t t|t−1 t t t S = S +φ K (W Y −W C z −W D S ) t|t t|t−1 t t t t t t t t t t|t−1 P = (I −K W D )P t|t ns t t t t|t−1 where S ≡E[S |Yt] and P ≡Var[S |Yt]. In case where all values of Y are missing, simply set s|t s s|t s t S = A S t|t−1 t t−1|t−1 P = A P A(cid:48) +BQ B(cid:48) t|t−1 t t−1|t−1 t t S = S t|t t|t−1 P = P t|t t|t−1 A.4 Bayesian Estimation Procedure with Heterogeneous Income Profiles The labor income process we want to estimate is: y = X β +(cid:15) +Σ0.5ν i,t i,t i i,t ν i,t (cid:15) = ρ(cid:15) +Σ0.5η i,t i,t−1 η i,t for i = 1,...,N and t = 1,...,T. η and ν are independent across i and t, and are distributed as i,t i,t standard normal. (cid:15) is distributed as normal with variance Σ . For each i, β is distributed as mean zero i,0 Z0 i and variance Σ : β β ∼N(0,Σ ) i β X consists of a polynomial of age, race, education, urban residence controls, etc. We use βN to denote i,t (cid:16) (cid:17) {β }N . Our goal is to draw from p βN,ρ,Σ ,Σ ,Σ ,{(cid:15) }t=0,T|{y }t=1,T . i i=1 η ν Z0 i,t i=1,N i,t i=1,N 31

A.4.1 Prior Distribution Select the prior distribution for ρ to be normal and the distributions for Σ , Σ , and Σ to be inverse- η ν Z0 gamma. We specify the prior distributions for Σ to be inverse-gamma. β (cid:16) (cid:17) Σ ∼ IG T ,S−1 β 0,β 0,β A.4.2 Gibbs Sampling We use Gibbs sampling with the following 7 blocks. (cid:16) (cid:17) Block 1 : p βN|ρ,Σ ,Σ ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T β η ν Z0 i,t i=1,N i,t i=1,N (cid:16) (cid:17) Block 2 : p Σ |ρ,βN,Σ ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T β η ν Z0 i,t i=1,N i,t i=1,N (cid:16) (cid:17) Block 3 : p ρ|βN,Σ ,Σ ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T β η ν Z0 i,t i=1,N i,t i=1,N (cid:16) (cid:17) Block 4 : p Σ |βN,ρ,Σ ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T η β ν Z0 i,t i=1,N i,t i=1,N (cid:16) (cid:17) Block 5 : p Σ |βN,ρ,Σ ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T ν β η Z0 i,t i=1,N i,t i=1,N (cid:16) (cid:17) Block 6 : p Σ |βN,ρ,Σ ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T Z0 β η ν i,t i=1,N i,t i=1,N (cid:16) (cid:17) Block 7 : p {(cid:15) }t=0,T|βN,Σ ,ρ,Σ ,Σ ,Σ ,{y }t=1,T i,t i=1,N β η ν Z0 i,t i=1,N Modification of Blocks 3–7 are straightforward. We focus on block 1 and 2. Block 1: βN Asinthepreviouscase,theposteriorisnormallydistributed. However,wenowneedtocomputetheposterior mean and variance for each individual. (cid:16) (cid:17) p β |ρ,Σ ,Σ ,Σ ,Σ ,{(cid:15) }t=0,T,{y }t=1,T ∼N(µ ,Γ ) i η ν Z0 β i,t i=1,N i,t i=1,N i,1,β i,1,β where (cid:20) 1 (cid:21)−1 Γ = Σ−1+ X(cid:48)X i,1,β β Σ i i ν (cid:20) (cid:21) 1 µ = Γ X(cid:48)(y −(cid:15) ) i,1,β i,1,β Σ i i i ν Block 2: Σ β (cid:16) (cid:17) (cid:16) (cid:17) p Σ |βN,ρ,Σ ,Σ ,Σ ,{(cid:15) }t=1,T,{y }t=1,T ∼IG T ,S−1 β η ν Z0 i,t i=0,N i,t i=1,N 1,β 1,β where 32

T = T +N 1,β 0,β N S = S + (cid:88) (β )2 1,β 0,β i i=1 A.5 Bayesian Estimation Procedure with a Mixture of Normal Distributions The labor income process we want to estimate is: y = X β+(cid:15) +ν i,t i,t i,t i,t (cid:15) = ρ(cid:15) +η i,t i,t−1 i,t for i=1,...,N and t=1,...,T. η and ν are independent across i and t. They are distributed according to a mixture of m and m i,t i,t η ν number of normals, respectively. η | ∼N (cid:0) 0,h hj(cid:1) i,t sη,i,t=j η η p(s =j)=πj ∀j ∈1,2,...,m η,i,t η η ν | ∼N (cid:0) 0,h hj(cid:1) i,t sν,i,t=j ν ν p(s =j)=πj ∀j ∈1,2,...,m ν,i,t ν ν For the shock k ∈ {η,ν}, h hj is the variance of distribution j, and πj is the probability weight for k k k distribution j.3031 s indicates from which distribution of the mixture the innovation for individual i in k,i,t time t came. As before, (cid:15) is distributed as normal: i,0 (cid:15) ∼N(0,Σ ) i,0 Z0 Our goal is to draw from (cid:16) (cid:17) p β,ρ,h ,{hj}mν ,{πj}mν ,{s }t=1,T,h ,{hj}mη ,{πj}mη ,{s }t=1,T,Σ ,{(cid:15) }t=0,T|{y }t=1,T ν ν j=1 ν j=1 ν,i,t i=1,N η η j=1 η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N A.5.1 Prior We specify the prior for β and ρ to be normal with arbitrarily large variance. 30Note: handh arenotidentifiedseparately. Sinceweareonlyinterestedinhtimeshj thelackofidentification j is inconsequential. h is added for numerical/computational reasons. See Gelman, Carlin, Stern, and Rubin (2009). 31The likelihood function of the mixture model is invariant to permutation of the components of the mixture. In our application, we simply label the component with a smaller variance as the first component and the component with a larger variance as the second component. For more detailed discussions, see Geweke (2007). 33

β ∼ N(β ,Γ ) 0 0,β ρ ∼ N(ρ ,Γ )I[−1≤ρ≤1] 0 0,ρ We specify the prior for Σ , h , {hj}mν , h , and {hj}mη to be inverse-gamma. Z0 ν ν j=1 η η j=1 (cid:16) (cid:17) Σ ∼ IG T ,S−1 Z0 0,Z0 0,Z0 (cid:16) (cid:17) h ∼ IG T ,S−1 ν hν,0 hν,0 (cid:16) (cid:17) hj ∼ IG T ,S−1 ν hj ν,0 hj ν,0 (cid:16) (cid:17) h ∼ IG T ,S−1 η hη,0 hη,0 (cid:16) (cid:17) hj ∼ IG T ,S−1 η hj η,0 hj η,0 We specify the prior for {πj}mν and {πj}mη to be Dirichlet. ν j=1 η j=1 p (cid:0) {πj}mη (cid:1) ∼D(r,r,...,r) η j=1 p (cid:0) {πj}mν (cid:1) ∼D(r,r,...,r) ν j=1 In the estimation of the model with two normal mixtures in Section 4.4, the true value for h and h η ν are set to one. For prior distributions, we set T = T = T = 2, S = 1, S = S = 0.01, hν,0 h1 ν ,0 h2 ν ,0 hν,0 h1 ν ,0 h2 ν ,0 T =T =T =2, S =1, S =S =0.01. For the parameter of the Dirichlet distribution, hη,0 h1 η ,0 h2 η ,0 hη,0 h1 η ,0 h2 η ,0 we set r =0.5 for both π and π . η ν A.5.2 Estimation Algorithm Gibbs sampling consists of the following 6 main blocks: Block 1: (cid:16) (cid:17) p β|ρ,h ,{hj}mν ,{πj}mν ,{s }t=1,T,h ,{hj}mη ,{πj}mη ,{s }t=1,T,Σ ,{(cid:15) }t=0,T,{y }t=1,T ν ν j=1 ν j=1 ν,i,t i=1,N η η j=1 η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N Block 2: (cid:16) (cid:17) p ρ|β,h ,{hj}mν ,{πj}mν ,{s }t=1,T,h ,{hj}mη ,{πj}mη ,{s }t=1,T,Σ ,{(cid:15) }t=0,T,{y }t=1,T ν ν j=1 ν j=1 ν,i,t i=1,N η η j=1 η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N Block 3.1: (cid:16) (cid:17) p h |β,ρ,{hj}mν ,{πj}mν ,{s }t=1,T,h ,{hj}mη ,{πj}mη ,{s }t=1,T,Σ ,{(cid:15) }t=0,T,{y }t=1,T ν ν j=1 ν j=1 ν,i,t i=1,N η η j=1 η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N Block 3.2: (cid:16) (cid:17) p {hj}mν |β,ρ,h ,{πj}mν ,{s }t=1,T,h ,{hj}mη ,{πj}mη ,{s }t=1,T,Σ ,{(cid:15) }t=0,T,{y }t=1,T ν j=1 ν ν j=1 ν,i,t i=1,N η η j=1 η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N Block 3.3: (cid:16) (cid:17) p {πj}mν |β,ρ,h ,{hj}mν ,{s }t=1,T,h ,{hj}mη ,{πj}mη ,{s }t=1,T,Σ ,{(cid:15) }t=0,T,{y }t=1,T ν j=1 ν ν j=1 ν,i,t i=1,N η η j=1 η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N 34

Block 3.4: (cid:16) (cid:17) p {s }t=1,T|β,ρ,h ,{hj}mν ,{πj}mν ,h ,{hj}mη ,{πj}mη ,{s }t=1,T,Σ ,{(cid:15) }t=0,T,{y }t=1,T ν,i,t i=1,N ν ν j=1 ν j=1 η η j=1 η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N Block 4.1: (cid:16) (cid:17) p h |β,ρ,h ,{hj}mν ,{πj}mν ,{s }t=1,T,{hj}mη ,{πj}mη ,{s }t=1,T,Σ ,{(cid:15) }t=0,T,{y }t=1,T η ν ν j=1 ν j=1 ν,i,t i=1,N η j=1 η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N Block 4.2: (cid:16) (cid:17) p {hj}mη |β,ρ,h ,{hj}mν ,{πj}mν ,{s }t=1,T,{πj}mη ,{s }t=1,T,Σ ,{(cid:15) }t=0,T,{y }t=1,T η j=1 ν ν j=1 ν j=1 ν,i,t i=1,N η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N Block 4.3: (cid:16) (cid:17) p {πj}mη |β,ρ,h ,{hj}mν ,{πj}mν ,{s }t=1,T,h ,{hj}mη ,{s }t=1,T,Σ ,{(cid:15) }t=0,T,{y }t=1,T η j=1 ν ν j=1 ν j=1 ν,i,t i=1,N η η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N Block 4.4: (cid:16) (cid:17) p {s }t=1,T|β,ρ,h ,{hj}mν ,{πj}mν ,{s }t=1,T,h ,{hj}mη ,{πj}mη ,Σ ,{(cid:15) }t=0,T,{y }t=1,T η,i,t i=1,N ν ν j=1 ν j=1 ν,i,t i=1,N η η j=1 η j=1 Z0 i,t i=1,N i,t i=1,N Block 5: (cid:16) (cid:17) p Σ |β,ρ,h ,{hj}mν ,{πj}mν ,{s }t=1,T,h ,{hj}mη ,{πj}mη ,{s }t=1,T,{(cid:15) }t=0,T,{y }t=1,T Z0 ν ν j=1 ν j=1 ν,i,t i=1,N η η j=1 η j=1 η,i,t i=1,N i,t i=1,N i,t i=1,N Block 6: (cid:16) (cid:17) p {(cid:15) }t=0,T|β,ρ,h ,{hj}mν ,{πj}mν ,{s }t=1,T,h ,{hj}mη ,{πj}mη ,{s }t=1,T,Σ ,{y }t=1,T i,t i=1,N ν ν j=1 ν j=1 ν,i,t i=1,N η η j=1 η j=1 η,i,t i=1,N Z0 i,t i=1,N Notation Let (NT)j be the number of i-t pairs that have a ν innovation drawn from the jth distribution in the ν ν mixture. That is, N T (NT)j = (cid:88)(cid:88) δ(s ,j) ν ν,i,t i=1 t=1 where, δ is the Kronecker delta function: δ(a,b)=1 if a=b and δ(a,b)=0 if a(cid:54)=b. Block 1 It is identical to the problem of drawing from the posterior of coefficients in a linear regression model. The posterior distribution is normal, and its mean and variance are analytically computed. (cid:16) (cid:17) p β|ρ,h ,{hj}mν ,{πj}mν ,{s }t=1,T,h ,{hj}mη ,{πj}mη ,{s }t=1,T,Σ ,{(cid:15) }t=0,T,{y }t=1,T ν ν j=1 ν j=1 ν,i,t i=1,N η η j=1 η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N ∼N(µ ,Γ ) 1,β 1,β where (cid:34) N T (cid:35)−1 (cid:88)(cid:88) 1 Γ = X X(cid:48) 1,β h hsν,i,t i,t i,t i=1 t=1 ν ν (cid:34) N T (cid:35) µ = Γ (cid:88)(cid:88) 1 X (y −(cid:15) )(cid:48) 1,β 1,β h hsν,i,t i,t i,t i,t i=1 t=1 ν ν 35

Block 2 (cid:16) (cid:17) p ρ|β,h ,{hj}mν ,{πj}mν ,{s }t=1,T,h ,{hj}mη ,{πj}mη ,{s }t=1,T,Σ ,{(cid:15) }t=0,T,{y }t=1,T ν ν j=1 ν j=1 ν,i,t i=1,N η η j=1 η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N ∼N(µ ,Γ )[−1≤ρ≤1] 1,ρ 1,ρ where (cid:34) N T (cid:35)−1 (cid:88)(cid:88) 1 Γ = (cid:15)2 1,ρ h hsη,i,t i,t−1 i=1 t=1 η ν (cid:34) N T (cid:35) (cid:88)(cid:88) 1 µ = Γ (cid:15) (cid:15) 1,ρ 1,ρ h hsν,i,t i,t−1 i,t i=1 t=1 ν ν Block 3.1 (cid:16) (cid:17) p h |β,ρ,{hj}mν ,{πj}mν ,{s }t=1,T,h ,{hj}mη ,{πj}mν ,{s }t=1,T,Σ ,{(cid:15) }t=0,T,{y }t=1,T ν ν j=1 ν j=1 ν,i,t i=1,N η η j=1 η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N (cid:16) (cid:17) ∼IG S−1 ,T hν,1 hν,1 where T = T +m +NT hν,1 hν,0 ν N T S = S + (cid:88)(cid:88) 1 (y −X β−(cid:15) )2 hν,1 hν,0 hsν,i,t i,t i,t i,t ν i=1 t=1 Block 3.2 (cid:16) (cid:17) p {hj}mν |β,ρ,h ,{πj}mν ,{s }t=1,T,h ,{hj}mη ,{πj}mν ,{s }t=1,T,Σ ,{(cid:15) }t=0,T,{y }t=1,T ν j=1 ν ν j=1 ν,i,t i=1,N η η j=1 η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N (cid:16) (cid:17) ∼IG S−1 ,T hj ν,1 hj ν,1 where T = Tj +(NT)j hj ν,1 ν,0 ν N T S = S + 1 (cid:88)(cid:88) δ(s ,j)(y −X β−(cid:15) )2 hj ν,1 hj ν,0 h ν ν,i,t i,t i,t i,t i=1 t=1 for all 1≤j ≤m . ν 36

Block 3.3 (cid:16) (cid:17) p {πj}mν |β,ρ,h ,{hj}mν ,{s }t=1,T,h ,{hj}mη ,{πj}mν ,{s }t=1,T,Σ ,{(cid:15) }t=0,T,{y }t=1,T ν j=1 ν ν j=1 ν,i,t i=1,N η η j=1 η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N ∼D(b ,b ,...,b ) ν,1 ν,2 ν,mν where b =r+(NT)j ν,j ν for all 1≤j ≤m . ν Block 3.4 (cid:16) (cid:17) p s =j|β,ρ,h ,{hj}mν ,{πj}mν ,h ,{hj}mη ,{πj}mν ,{s }t=1,T,Σ ,{(cid:15) }t=0,T,{y }t=1,T ν,i,t ν ν j=1 ν j=1 η η j=1 η j=1 η,i,t i=1,N Z0 i,t i=1,N i,t i=1,N (cid:34) (cid:35) ∼πj(cid:0) hj(cid:1)−0.5 exp − (y i,t −X i,t β−(cid:15) i,t )2 ν ν 2h hj ν ν for all 1≤t≤T, 1≤i≤N. Block 4 ThisblockisanalogoustoBlock3. Replaceeachν subscriptwithηandy −X β−(cid:15) with(cid:15) −ρ(cid:15) . i,t i,t i,t i,t i,t−1 Block 5 (cid:16) (cid:17) p Σ |β,ρ,h ,{hj}mν ,{πj}mν ,{s }t=1,T,h ,{hj}mη ,{πj}mη ,{s }t=1,T,{(cid:15) }t=0,T,{y }t=1,T Z0 ν ν j=1 ν j=1 ν,i,t i=1,N η η j=1 η j=1 η,i,t i=1,N i,t i=1,N i,t i=1,N (cid:16) (cid:17) ∼IG T ,S−1 1,Z0 1,Z0 where T = T +N 1,Z0 1,Z0 N (cid:88) S = S + (cid:15)2 1,Z0 0,Z0 i,0 i=1 Block 6 For each individual i, we apply the Carter-Kohn algorithm. To cast our labor income processes into our canonical form, A =ρ, B =1, C =β, t t t D =1, Q =h hsη,i,t, R =h hsν,i,t, t t η η t ν ν z =X , S =0, P =Σ t i,t 0|0 0|0 Z0 37

Cite this document

APA

Taisuke Nakata and Christopher Tonetti (2014). Small Sample Properties of Bayesian Estimators of Labor Income Processes (FEDS 2014-25). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2014-25

BibTeX

@techreport{wtfs_feds_2014_25,
  author = {Taisuke Nakata and Christopher Tonetti},
  title = {Small Sample Properties of Bayesian Estimators of Labor Income Processes},
  type = {Finance and Economics Discussion Series},
  number = {2014-25},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2014},
  url = {https://whenthefedspeaks.com/doc/feds_2014-25},
  abstract = {There exists an extensive literature estimating idiosyncratic labor income processes. While a wide variety of models are estimated, GMM estimators are almost always used. We examine the validity of using likelihood based estimation in this context by comparing the small sample properties of a Bayesian estimator to those of GMM. Our baseline studies estimators of a commonly used simple earnings process. We extend our analysis to more complex environments, allowing for real world phenomena such as time varying and heterogeneous parameters, missing data, unbalanced panels, and non-normal errors. The Bayesian estimators are demonstrated to have favorable bias and efficiency properties.},
}