feds · June 30, 2009

What is the Chance that the Equity Premium Varies Over Time? Evidence from Predictive Regressions

Abstract

We examine the evidence on stock return predictability in a Bayesian setting that includes uncertainty about both the existence and strength of predictability. We consider an investor who believes that excess stock returns exhibit predictability with prior probability q < 1. In addition, the investor downweights observed predictability by placing a prior distribution on the R2 of the predictability regression. When we apply our analysis to the dividend-price ratio, we find that even investors who are quite skeptical about the existence and strength of predictability sharply modify their views in favor of predictability when confronted by the evidence. We depart from previous model-selection work by treating the regressor as stochastic rather than known; we find that this has a large impact on inference about time-varying expected returns.

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. What is the Chance that the Equity Premium Varies Over Time? Evidence from Predictive Regressions Jessica A. Wachter and Missaka Warusawitharana 2009-26 NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

What is the Chance that the Equity Premium Varies over Time? Evidence from Predictive Regressions ∗ Jessica A. Wachter University of Pennsylvania and NBER Missaka Warusawitharana Board of Governors of the Federal Reserve System April 2, 2009 ∗ Wachter: Department of Finance, The Wharton School, University of Pennsylvania, 2300 SH-DH, Philadelphia, PA, 19104. jwachter@wharton.upenn.edu, (215)898-7634. Warusawitharana: Division of Research and Statistics, Board of Governors of the Federal Reserve System, Mail Stop 97, 20th and Constitution Ave, Washington D.C, 20551. missaka.n.warusawitharana@frb.gov, (202)452-3461. We are grateful to Sean Campbell, Michael Johannes, Matthew Pritsker, Robert Stambaugh, Stijn van Nieuwerburgh, Jonathan Wright, Moto Yogo, Hao Zhou and seminar participants at the 2008 meetings of the American FinanceAssociation,the2007CIRANOFinancialEconometricsConference,the2007WinterMeetingofthe Econometric Society, the Federal Reserve Board, the University of California at Berkeley and the Wharton School for helpful comments. We are grateful for financial support from the Aronson+Johnson+Ortiz fellowship through the Rodney L. White Center for Financial Research. This manuscript does not reflect the views of the Board of Governors of the Federal Reserve System.

What is the Chance that the Equity Premium Varies over Time? Evidence from Predictive Regressions Abstract Weexaminetheevidence onstockreturnpredictabilityinaBayesiansettingthatincludes uncertainty about both the existence and strength of predictability. We consider an investor who believes that excess stock returns exhibit predictability with prior probability q < 1. In addition, the investor downweights observed predictability by placing a prior distribution on the R2 of the predictability regression. When we apply our analysis to the dividend-price ratio, we find that even investors who are quite skeptical about the existence and strength of predictability sharply modify their views in favor of predictability when confronted by the evidence. We depart from previous model-selection work by treating the regressor as stochastic rather than known; we find that this has a large impact on inference about timevarying expected returns. 2

1 Introduction This paper investigates the evidence in favor of stock return predictability from a modelselection perspective. Much recent empirical work has focused on the predictive regression r = α+βx +u , (1) t+1 t t+1 where r denotes the return on a broad stock index in excess of the riskfree rate, x denotes t+1 t a predictor variable, and u is a noise term. Taking expectations implies that α+βx is t+1 t the conditional equity premium. If β is not equal to zero, then the equity premium varies over time. One approach to investigating whether stock returns are predictable involves running an ordinary least squares regression (OLS) on (1) and asking whether the predictive coefficient β is significantly different from zero. As emphasized in a simulation study by Kandel and Stambaugh (1996), however, this approach has the disadvantage that classical significance may not be indicative of whether the level of predictability is of economic significance. If β is found to be insignificant, or only marginally significant, one cannot conclude that predictability “does not exist” as far as economic agents are concerned. In this study we adopt a Bayesian approach to inference on (1) that takes model uncertainty as well as parameter uncertainty into account. An investor evaluates the evidence in favor of equation (1) as opposed to a null hypothesis r = α+u . (2) t+1 t+1 The investor assigns a prior probability q to a state of the world where (1) describes returns (i.e. the equity premium is time-varying) and thus a prior probability 1 q to the state of − the world where (2) describes returns (i.e. the equity premium is constant). The investor’s beliefs about returns after viewing the data involves assigning a posterior probability to (1), as well as a posterior distribution to the parameters of interest. Our paper builds on several strands of the recent portfolio allocation literature. Once such strand studies properties of Bayesian estimation of predictive regressions (e.g. Bar- 3

beris (2000), Johannes, Polson, and Stroud (2002), Brandt, Goyal, Santa-Clara, and Stroud (2005), Pastor and Stambaugh (2008), Skoulakis (2007), Stambaugh (1999), Wachter and Warusawitharana (2009)), but assumes that the predictive model is known. A second strand focuses on model uncertainty, but assumes that the parameters within the model are known (e.g. Chen, Ju, and Miao (2009), Maenhout (2006), Hansen (2007)). A third strand allows for both model and parameter uncertainty, but assumes returns are independent and identically distributed (e.g. Chen and Epstein (2002), Garlappi, Uppal, and Wang (2007)).1 Our paper builds on this work by assuming that the investor faces both parameter and model uncertainty, and considers the possibility that returns are predictable. Our paper also builds on the literature on return predictability and model selection (Pesaran and Timmermann (1995), Avramov (2002), Cremers (2002)); these papers make the assumption that the future time path of the regressor is known, an assumption that is frequently satisfied in a standard ordinary least squares regression, but rarely satisfied in a predictive regression. Bymakinguseofmethodsdeveloped inWachter andWarusawitharana (2009), we are able to formulate and solve the investor’s problem when the regressor is stochastic. Our paper therefore incorporates the insights of the frequentist literature on predictive return regressions (e.g. Cavanagh, Elliott, and Stock (1995), Nelson and Kim (1993), Stambaugh (1999), Lewellen (2004), Torous, Valkanov, and Yan (2004), Campbell and Yogo (2006)) into a Bayesian portfolio selection setting. When we apply our methods to predicting returns by the dividend-price ratio, we find that an investor who believes that there is a 20% probability of predictability prior to seeing the data updates to a 65% posterior probability after viewing quarterly postwar data. An advantage of modeling the stochastic process for the regressor is that we are able to compute certainty equivalent returns from exploiting predictability that do not depend on a particular value for the regessor. We find certainty equivalent returns of 1.16% per year when the 1Some of this work considers model uncertainty together with ambiguity aversion. In order to better focusontheaffectofparameterandmodeluncertaintyontheinvestor’sdecision-making,wedonotconsider ambiguity aversionhere. 4

dividend-price ratio is used as a predictor variable for an investor whose prior probability in favor of predictability is just 20%. For an investor who believes that there is a 50/50 chance of return predictability, certainty equivalent returns are 1.83%. We also empirically evaluate the effect of using a full Bayes, exact likelihood approach as opposed to the conditional likelihood, and as opposed to empirical Bayes. A common approach to Bayesian inference in a time series setting is to treat the first observation of the predictor variable as a known parameter rather than a draw from the data generating process. However, we find that conditioning on the first observation results in Bayes factors (the ratio of the likelihood of model (1) to (2)) that are substantially smaller as compared to when the initial observation is treated as a draw from the data generating process. The posterior for the unconditional risk premium is highly unstable when we condition on the first observation. However, when this is treated as a draw from the data generating process, the expected return is estimated in a reliable way. In addition, using an empirical Bayes approach, which involves using data on the regressor to determine the prior, implies Bayes factors that are larger than those implied by the fully Bayesian approach. Conditioning on the first observation and using empirical Bayes are often regarded as approximation techniques to the full Bayes exact likelihood approach that we emphasize (e.g. Box and Tiao (1973), Chipman, George, and McCulloch (2001)). Our results suggest that, at least for some purposes, this approximation may be less accurate than previously believed. 2 Model 2.1 Data generating processes Let r denote continuously compounded excess returns on a stock index from time t to t+1 t+1 and x the value of a (scalar) predictor variable. We assume that this predictor variable t follows the process x = θ +ρx +v . (3) t+1 t t+1 5

Stock returns can be predictable, in which case they follow the process (1) or unpredictable, in which case they follow the process (2). In either case, errors are serially uncorrelated, homoskedastic, and jointly normal: u t+1 r ,...,r ,x ,...,x N (0,Σ), (4) t 1 t 0   | ∼ v t+1   and σ2 σ u uv Σ = . (5)   σ σ2 uv v   Asweshowbelow, thecorrelationbetweeninnovationstoreturnsandinnovationstothestate variable implies that (3) affects inference about returns, even when there is no predictability. When the process (3) is stationary, i.e. ρ is between -1 and 1, the state variable has an unconditional mean of θ µ = (6) x 1 ρ − and a variance of σ2 σ2 = v . (7) x 1 ρ2 − These follow from taking unconditional means and variances on either side of (3). Note that these are population values conditional on knowing the parameters. Given these, the population R2 is defined as β2σ2 Population R2 = x . β2σ2 +σ2 x u 2.2 Prior Beliefs An investor’s prior views on predictability can be elicited by the answer to two straightforward questions.2 Consider data generating processes of the form (1) and (2). Given these processes, the investor should answer: 2 ThebasicstructureofthesepriorbeliefsisanalogoustothatusedbyBaks,Metrick,andWachter(2001) in the setting of mutual fund performance evaluation. 6

[Question 1] What is the probability that predictability exists, i.e. that equation (1) • describes returns for some β = 0? (Call this answer q.) 6 [Question 2] Given that predictability exists, what is the probability that the R2 • exceeds 1%? (Call this answer P .) .01 The answer to Question 2 will be conditional on the frequency; for most of our results, quantities will be measured at an annual frequency. Note that Question 2 is not asking about the probability of achieving an R2 in a given sample, which depends on sampling variability. It is asking about the R2 that would result if the time period goes to infinity. The use of 1% is arbitrary; any other value that is greater than 0 could be substituted. We now demonstrate how to specify priors given the answers to these questions. An appeal of this approach is that it is not necessary to specify aspects of the distribution of the predictor variable and of returns other than those given above. The prior beliefs are invariant to changes to these aspects of the distribution. 2.2.1 Full Bayes priors Let H denote the state of the world in which excess returns are unpredictable (the “null”) 0 and H denote the state of the world in which there is some amount of excess return pre- 1 dictability. Then q is the prior probability of H , i.e. q = p(H ). In what follows, we 1 1 construct priors for the parameters conditional on H and on H . It is convenient to group 0 1 the regression parameters in equations (1), (2) and (3) into vectors ⊤ b = [α, θ, ρ] 0 and ⊤ b = [α, β, θ,ρ] . 1 We then specify the prior p(b ,Σ H ), which is the prior on b and Σ conditional on no 0 0 0 | predictability and the prior p(b ,Σ H ), which is the prior on b and Σ conditional on the 1 1 1 | existence of predictability.3 3Formally we could write down p(b1,ΣH0) by assuming p(β b0,Σ,H0) is a point mass at zero. | | 7

Note that p(b ,Σ H ) can also be written as p(β,b ,Σ H ). We set the prior on b and Σ 1 1 0 1 0 | | so that p(b ,Σ H ) = p(b ,Σ H ) = p(b ,Σ). 0 0 0 1 0 | | We assume the investor has uninformative beliefs on these parameters. We follow the approach of Stambaugh (1999) and Zellner (1996), and derive a limiting Jeffreys prior as explained in Appendix A. As Appendix A shows, this limiting prior takes the form −5 p(b 0 ,Σ) σ x σ u Σ 2, (8) ∝ | | for ρ ( 1,1), and zero otherwise. ∈ − The parameter that distinguishes H fromH is β. One approach would be to write down 0 1 a prior distribution for β unconditional on the remaining parameters. However, it is difficult to think about priors on β in isolation from beliefs about other parameters. For example, a high variance of x might lower one’s prior on β, while a large residual variance of r might t t raise it. Rather than placing a prior on β directly, we follow Wachter and Warusawitharana (2009) and place a prior on the population R2. To implement this prior on the R2, we place a prior on “normalized” β, that is β adjusted for the variance of x and the variance of u. Let η = σ −1σ β. u x denote normalized β. We assume that prior beliefs on η are given by η H N(0,σ2) (9) | 1 ∼ η The population R2 is closely related to η: β2σ2 η2 Population R2 = x = . (10) β2σ2 +σ2 η2 +1 x u Equation (10) provides a mapping between a prior distribution on η and a prior distribution on the population R2. Given an η draw, an R2 draw can be computed using (10). A prior on η implies a hierarchical prior on β. Because p(β,b ,Σ H ) = p(β b ,Σ,H )p(b ,Σ H ), 0 1 0 1 0 1 | | | 8

it suffices to choose a prior for β conditional on the other parameters. The prior for η, (9), implies β α,θ,ρ,Σ N(0,σ2), (11) | ∼ β where σ = σ σ −1σ . β η x u Because σ is a function of ρ and σ , the prior on β is also implicitly a function of these x v parameters. The parameter σ indexes the degree to which the prior is informative. As η σ , the prior over β becomes uninformative; all values of β are viewed as equally η → ∞ likely. As σ 0, the prior converges to p(b ,Σ) multiplied by a point mass at 0, implying η 0 → a dogmatic view in no predictability. Combining (11) with (8) implies the joint prior under H : 1 p(b ,Σ H ) = p(β b ,Σ,H )p(b H ) 1 1 0 1 0 1 | | | 1 σ2 Σ − 2 5 exp 1 β2 σ2σ −2σ2 −1 . (12) ∝ 2πσ2 x| | −2 η x u η (cid:26) (cid:27) (cid:0) (cid:1) Jeffreys invariance theory propvides an independent justification for modeling priors on β as (11). Stambaugh (1999) shows that the limiting Jeffreys prior for b and Σ equals 1 p(b ,Σ H ) σ2 Σ − 2 5 . (13) 1 | 1 ∝ x| | This prior corresponds to the limit of (12) as σ approaches infinity. Modeling the prior for η β as depending on σ not only has a convenient interpretation in terms of the distribution x of the R2, but also implies that an infinite prior variance represents ignorance as defined by Jeffreys (1961). Note that a prior on β that is independent of σ would not have this x property. Figure 1 shows the resulting distribution for the population R2 for various values of σ . η Panel A shows the distribution conditional on H while Panel B shows the unconditional 1 distribution. More precisely, for any value k, Panel A shows the prior probability that the R2 exceeds k, conditional on the existence of predictability. For large values of σ , e.g. 100, η the prior probability that the R2 exceeds k across the relevant range of values for the R2 is 9

close to one. The lower the value of σ , the less variability in β around its mean of zero, η and the lower the probability that the R2 exceeds k for any value of k. Panel B shows the unconditional probability that the R2 exceeds k for any value of k, assuming that the prior probability of predictability, q, is equal to 0.5. By the definition of conditional probability: p(R2 > k) = p(R2 > k H )q. 1 | Therefore Panel B takes the values in Panel A and scales them down by 0.5. To distinguish (8) and (12) from an alternative set of priors that we describe in the following section, we refer to these as full Bayes priors. 2.2.2 Empirical Bayes priors A second approach to formulating priors involves conditioning on moments of the data. Let T denote the length of the sample and σˆ the sample variance of x: x 2 T T 1 1 σˆ = x x . x t s T − T ! t=1 s=1 X X One specification for the prior, introduced by Fernandez, Ley, and Steel (2001), is as follows: p(β σ2,H ) = N(0,κσ2σˆ −1), (14) | u 1 u x where κ is a constant that determines the informativeness of the prior, and p(σ ) σ −1. (15) u ∝ u The specification is completed by setting p(α) 1. (16) ∝ These assumptions on the prior are combined with the likelihood T−1 p(D α,β,σ ,H ) = 2πσ2 −T 2 exp 1 (r α βx )2σ −2 (17) | u 1 u −2 t+1 − − t u ( ) t=0 (cid:0) (cid:1) X 10

and T−1 p(D α,β,σ ,H ) = 2πσ2 −T 2 exp 1 (r α)2σ −2 . (18) | u 0 u −2 t+1 − u ( ) t=0 (cid:0) (cid:1) X Very similar specifications are employed by Chipman, George, and McCulloch (2001), Cremers (2002), Wright (2003) and Stock and Watson (2005). Note that these equations display the marginal likelihood over the return equations (1) and (2) rather than the full likelihood that includes thedata generating process forx . An appealofthis formulationforthe prioris t that it leads to analytical expressions for the posterior distribution and for the Bayes factor (in fact, it is closely related to the “g-prior” of Zellner (1996)). The above assumptions are most reasonable in the case where x ,...,x are observed at 1 T time0. WhilethisholdsinmanyapplicationsofOLSregression, itholds rarely, ifever, inthe case of predictive regressions in financial time series. Moreover, were x ,...,x observed, 1 T the contemporaneous correlation between x and r would invalidate the likelihoods (17) t t and (18) because the value of x would convey information about r not reflected in these t t likelihoods. One way to interpret the above in the setting where x is stochastic is to assume that, while the data on x themselves are unobserved, certain functions of the data, namely t sample moments of x such as σˆ , are observed. Allowing data to influence the prior is t x generally referred to as the “empirical Bayes” method.4 For this reason, the formulation of priors that use moments from the sample could be thought of as an example of empirical Bayes, at least if one accepts a broad definition of the term.5 Regardless of its theoretical attractiveness, it is of interest to ask whether the use of empiricalBayesinthissettingmakeadifferenceinpractice. Thereareanumberofdifferences 4 However, in traditional applications of empirical Bayes, the term has generally implied either the use of data that is known prior to the decision problem at hand or data from the population from which the parameter of interest can be drawn (Robbins (1964), Berger (1985)). For example, if one is forming a prior on a expected return for a particular security, one might use the average expected return of firms in that industry (Pastor and Stambaugh (1999)). 5Avramov (2002) uses marginal likelihoods analogous to (17) and (18), but formulates the prior by assuming that the agent observes a prior sample with moments similar to the existing sample, but without predictability. This is also an example of the empirical Bayes approach. 11

between the specification described in (14)–(18) and ours. Most importantly, by assuming the investor knows the sample moments of x, the above approach avoids the need to make explicit assumptions on the prior for the parameters of the x process and for the likelihood of the x process. However, as we show, these assumptions, whether hidden or explicit, have important consequences for the posterior distribution. Leaving these issues aside for the moment, our immediate goal is to write down a version of the above specification that is close enough to our model so that differences in results stemming from the link (or lack thereof) between the distribution of σ and that of β can x be interpreted. To this end, we consider the specification p(β b ,Σ,H ) N(0,σˆ2), | 0 1 ∼ β where σˆ = σ σˆ −1σˆ . β η x u Wecomputeσˆ asthestandarddeviationoftheresidualfromOLSregressionofthepredictive u regression.6 Note that these priors do not imply a proper prior distribution for the R2. Therefore they cannot be used to answer Question 2 posed above. In order to compare the empirical Bayes and the full Bayes priors, we use the same values of σ to form σˆ as we use η β to form σ . β We assume a standard uninformative prior for the remaining parameters (see Zellner (1996) and Gelman, Carlin, Stern, and Rubin (2004)): with a normal distribution for β, where the prior covariance reflects the agent’s beliefs about predictability. We also ensure that x is stationary. That is: t −3 p(b 0 ,Σ H 1 ) = p(b 0 ,Σ H 0 ) Σ 2, (19) | | ∝ | | for ρ ( 1,1), and zero otherwise. It follows that ∈ − 1 1 p(b 1 ,Σ | H 1 ) ∝ | Σ | − 2 3 exp −2 β2σˆ β −2 (20) 2πσˆ2 (cid:26) (cid:27) β q 6 For simplicity, we do not incorporate a link between σˆu and β as in (14). Because σu is estimated very precisely (unlike σx), this is unlikely to make a large difference in the results. 12

These priors may be thought of as the simplest set of priors which contain information about the distribution of β, the coefficient on return predictability. In what follows, we refer to these as empirical Bayes priors. We combine these priors with the same likelihood as used for the full Bayes prior, described below. 2.3 Likelihood 2.3.1 Likelihood under H 1 Under H , returns and the state variable follow the joint process given in (1) and (3). It is 1 convenient to group observations on returns and contemporaneous observations on the state variable into a matrix Y and lagged observations on the state variable and the constant into a matrix X. Let r x 1 x 1 1 0 . . . . Y =  . . . .  X =  . . . . ,      r T x T   1 x T−1          and let z = vec(Y) Z = I X. 1 2 ⊗ In the above, the vec operator stacks the elements of the matrix columnwise. It follows that the likelihood conditional on H and on the first observation x takes the form of 1 0 p(D b ,Σ,x ,H ) = 2πΣ −T 2 exp 1 (z Z b ) ⊤ Σ −1 I (z Z b ) (21) 1 0 1 1 1 T 1 1 | | | −2 − ⊗ − (cid:26) (cid:27) (cid:0) (cid:1) (see Zellner (1996)). The likelihood function (21) conditions on the first observation of the predictor variable, x . Stambaugh (1999) argues for treating x and x ,...,x symmetrically: as random 0 0 1 T draws from the data generating process. If the process for x is stationary and has run for a t substantial period of time, then results in Hamilton (1994, p. 265) imply that x is a draw 0 fromamultivariatenormaldistribution withmeanµ andstandarddeviation σ . Combining x x 13

the likelihood of the first observation with the likelihood of the remaining T observations produces p(D | b 1 ,Σ,H 1 ) = | 2πσ x 2 | −1 2 | 2πΣ | −T 2 exp − 1 2 (x 0 − µ x )2σ x −2 (cid:26) 1 (z Z b ) ⊤ Σ −1 I (z Z b ) . (22) 1 1 T 1 1 − 2 − ⊗ − (cid:27) (cid:0) (cid:1) Following Box and Tiao (1973), we refer to (21) as the conditional likelihood and (22) as the exact likelihood. 2.3.2 Likelihood under H 0 Under H , returns and the state variable follow the processes given in (2) and (3). Let 0 ι T 0 T×2 Z = , 0   0 T×1 X   where ι is the T 1 vector of ones. Then the conditional likelihood can be written as T × p(D b ,Σ,x ,H ) = 2πΣ −T 2 exp 1 (z Z b ) ⊤ Σ −1 I (z Z b ) . (23) 0 0 0 0 0 T 0 0 | | | −2 − ⊗ − (cid:26) (cid:27) (cid:0) (cid:1) Using similar reasoning as in the H case, the exact likelihood is given by 1 p(D | b 0 ,Σ,H 0 ) = | 2πσ x 2 | −1 2 | 2πΣ | −T 2 exp − 1 2 (x 0 − µ x )2σ x −2 (cid:26) 1 (z Z b ) ⊤ Σ −1 I (z Z b ) . (24) 0 0 T 0 0 − 2 − ⊗ − (cid:27) (cid:0) (cid:1) As above, we refer to (23) as the conditional likelihood and (24) as the exact likelihood. 2.4 Posterior distribution The investor updates his prior beliefs to form the posterior distribution upon seeing the data. As we discuss below, this posterior requires the computation of two quantities: the posterior of the parameters conditional on the absence or existence of return predictability, and the posterior probability that returns are predictable. Given these two quantities, we can simulate from the posterior distribution. 14

To computetheposteriorsconditionalontheabsence orexistence ofreturnpredictability, we apply Bayes’ rule conditioning on H and conditioning on H . It follows from Bayes’ rule 0 1 that p(b ,Σ H ,D) p(D b ,Σ,H )p(b ,Σ H ) (25) 0 0 0 0 0 0 | ∝ | | is the posterior conditional on H and that 0 p(b ,Σ H ,D) p(D b ,Σ,H )p(b ,Σ H ) (26) 1 1 1 1 1 1 | ∝ | | is the posterior conditional on H . Because σ is a nonlinear function of the underlying 1 x parameters, the posterior distributions conditional on H and H are nonstandard and must 0 1 by computed numerically. We can sample from these distributions quickly and accurately using the Metropolis-Hastings algorithm (see Chib and Greenberg (1995), Johannes and Polson (2006)). See Appendix B for details. Let q¯denote the posterior probability that excess returns are predictable. By definition, q¯= p(H D). 1 | It follows from Bayes’ rule, that p(D H )q 1 q¯ = | p(D H )q +p(D H )(1 q) 1 0 | | − q 10 = B , (27) q +(1 q) 10 B − where p(D H ) 1 = | (28) 10 B p(D H ) 0 | is the Bayes factor for the alternative hypothesis of predictability against the null of no predictability. The Bayes factor is a likelihood ratio in that it is the likelihood of return predictability divided by the likelihood of no predictability. However, it differs from the standard likelihood ratio in that the likelihoods p(D H ) are not conditional on the values i | of the parameters. In fact, these likelihoods can be formally written as p(D H ) = p(D b ,Σ,H )p(b ,Σ H )db dΣ (29) 0 0 0 0 0 0 | | | Z 15

and p(D H ) = p(D b ,Σ,H )p(b ,Σ H )db dΣ. (30) 1 1 1 1 1 1 | | | Z To form p(D H ) and p(D H ), the likelihood conditional on parameters (the likelihood 0 0 | | function generally used in classical statistics) is integrated over the prior distribution of the parameters. Under our distributions, these integrals cannot be computed analytically. However, theBayesfactor(28)canbecomputeddirectly usingthegeneralizedSavage-Dickey ratio(Dickey(1971),VerdinelliandWasserman(1995)). DetailscanbefoundinAppendixC. Putting these two pieces together, we draw from the posterior parameter distribution by drawing from p(b ,Σ D,H ) with probability q¯and from p(b ,Σ D,H ) with probability 1 1 0 0 | | 1 q¯. − 3 Results We now apply the above framework to understanding the predictive power of the dividendprice ratio and payout yield for the excess return on a broad equity index. 3.1 Data We use data from the Center for Research on Security Prices (CRSP). We compute excess stock returns by subtracting thecontinuously compounded 3-monthtreasury billreturn from the return on the value-weighted CRSP index at annual and quarterly frequencies. Following a large portfolio selection literature (see, e.g., Brennan, Schwartz, and Lagnado (1997), Campbell and Viceira (1999)), we focus on the dividend-price ratio as the predictive factor. The dividend-price ratio is computed by dividing the dividend payout over the previous 12 months with the current price of the stock index. The use of 12 months of data accounts for seasonalities in dividend payments. We use the logarithm of the dividend-price ratio as the predictive factor. We also use the repurchases-adjusted payout yield of Boudoukh, Michaely, Richardson, and Roberts (2007) as a predictive factor. Data are annual data from 1927 to 16

the beginning of 2005; we also report results with the dividend-price ratio at a quarterly frequency from 1952 onwards. 3.2 Bayes factors and posterior means Table1reportsBayesfactorsandposteriormeanswhenthepayoutyieldisusedasapredictor variable. Table 2 and 3 report analogous results for the dividend-price ratio in annual data and in quarterly postwar data respectively. Each table reports results for full Bayes priors combined with the exact likelihood, for full Bayes priors combined with the conditional likelihood and for empirical Bayes priors combined with the exact likelihood. For each prior and likelihood combination, four values of σ are considered: 0.05, 0.09, 0.15 and 100. For η the full Bayes priors, these translate into values of P (the prior probability that the R2 .01 exceeds 0.01) equal to 0.05, 0.25, 0.50 and 0.99 respectively. For the empirical Bayes priors, the prior distribution over the R2 is not well defined. We construct these priors using the same values of σ as the full Bayes counterparts. Because the results are qualitatively similar η across the three data sets, we focus on results for the payout yield in Table 1. Table 1 shows that the Bayes factor is hump-shaped in P for each prior-likelihood .01 combination. For small values of P , the Bayes factor is close to one. For large values, the .01 Bayes factor is close to zero. Both results can be understood using the formula for the Bayes factor in (28) and for the likelihoods p(D H ) and p(D H ) in (29) and (30). For low values 1 0 | | of P , the investor imposes a very tight prior on the R2. Therefore the hypotheses that .01 returns are predictable and that returns are unpredictable are nearly the same. It follows from (29) and (30) that the likelihoods of the data under these two scenarios are nearly the same and that the Bayes factor is nearly one. This is intuitive: when two hypotheses are close, a great deal of data are required to distinguish one from the other. The fact that the Bayes factor approaches zero as P increases is less intuitive. The .01 reduction in Bayes factors implies that, as the investor allows a greater range of values for the R2, the posterior probability that returns are predictable approaches zero. This effect 17

is known as Bartlett’s paradox, and was first noted by Bartlett (1957) in the context of distinguishing between uniform distributions. As Kass and Raftery (1995) discuss, Bartlett’s paradox makes it crucial to formulate an informative prior on the parameters that differ between H and H . The mathematics leading to Bartlett’s paradox are most easily seen 0 1 in a case where Bayes factors can be computed in closed form. However, we can obtain an understanding of the paradox based on the form of the likelihoods p(D H ) and P(D H ). 1 0 | | These likelihoods involve integrating out the parameters using the prior distribution. If the prior distribution on β is highly uninformative, the prior places a large amount of mass in extreme regions of the parameter space. In these regions, the likelihood of the data conditional on the parameters will be quite small. At the same time, the prior places a relatively small amount of mass in the regions of the parameter space where the likelihood of the data is large. Therefore P(D H ) (the integral of the likelihood under H ) is small 1 1 | relative to P(D H ) (the integral of the likelihood under H ). 0 0 | Table 1 also shows that there are substantial differences between the Bayes factors resulting from the exact versus the conditional likelihood and from empirical versus full Bayes. The Bayes factors resulting from the exact likelihood are larger than those resulting from the conditional likelihood, thus implying a greater posterior probability of return predictability. The Bayes factors resulting from full Bayes are smaller than those resulting from empirical Bayes, implying a lower posterior probability of return predictability. ¯ In what follows, we seek to explain these patterns in the Bayes factors. Let β be the posterior mean of β conditional on predictability and ρ¯the posterior mean of ρ conditional onpredictability. AsTable 1 shows, differences inBayes factorsbetween specifications reflect ¯ ¯ differences in β. That is, for any given value of P , β is higher for the exact likelihood than .01 for the conditional likelihood, and lower for full Bayes than for empirical Bayes. Moreover, the opposite pattern is evident for ρ¯. The negative correlation between ρ and β is also noted by Stambaugh (1999)). The source of this negative relation is the negative correlation between shocks to returns and shocks to the predictor variable. Suppose that a draw of β is below its value predicted by ordinary least squares (OLS). This implies that the OLS value 18

for β is “too high”, i.e. in the sample shocks to the predictor variable are followed by shocks to returns of the same sign. Therefore shocks to the predictor variable tend to be followed by shocks to the predictor variable that are of different signs. Thus the OLS value for ρ is “too low”. This explains why values of ρ¯ are higher for low values of P (and hence low .01 ¯ values of β) than for high values, and higher than the ordinary least squares estimate. ¯ We can use the connection between ρ¯, β and the Bayes factor to account for differences between the Bayes factors between the prior and likelihood specifications. As Table 1 shows, using the exact likelihood leads to lower posterior values of ρ. This is because the exact likelihood leads to more precise estimates of µ . By the argument in the previous paragraph, x this implies greater posterior values for β and higher Bayes factors. On the other hand, the use of full rather than empirical Bayes implies higher posterior values of ρ. This occurs because the full Bayes prior, on account of the σ2 term, puts more x weight on high values of σ and therefore high values of ρ. When β is not far from zero, the x posterior distribution is higher for lower values of σ , and hence higher values of σ . This β x leads to lower posterior means of β and lower Bayes factors. Tables 1–3 also report the posterior means of excess returns (the equity premium) and of the predictor variable conditional on predictability. In each case, the OLS row reports the sample mean of excess returns and the sample mean of the predictor variable.7 Posterior means conditional on no predictability are very close to their counterparts for P = .05. .01 Surprisingly, the various choices for the predictor variable and for the prior and likelihood imply different values for the equity premium. For example, the sample average for excess returns over the 1927 to 2004 period is 5.85% per annum. In contrast, the full Bayes exact likelihood approach generates average returns that range from 5.05% to 5.24% per annum 7Posterior means for r and x integrate out over uncertainty in the predictor variables. In the case of returns, for example, we compute θ E[rD,H1]=E α+β H1 , | 1 ρ| (cid:20) − (cid:21) where the expectation on the right hand side is taken over the posterior distribution for the parameters. 19

depending on the informativeness of the prior (the more informative the prior, the higher the excess return). The differences in the estimates of the equity premium arise from differences in estimates of the mean of the predictor variable. The conditional maximum likelihood estimate of the mean of x (not reported) is -3.54. The posterior mean implied by the exact likelihood is between -3.16 and -3.17 (depending on the prior). Thus according to the model, shocks to the predictor variable over the sample period must be negative for -3.54 to be the estimated value when the conditional likelihood is used. It follows that the shocks to excess returns must be positive (because of the negative correlation). Therefore the posterior mean is below the sample mean. This effect also operates in the case of the dividend-price ratio and is in fact more dramatic. In annual data from 1927 to 2004, the implied means for excess returns range from 4.02 to 4.71% per annum versus the sample mean of 5.85%. While the use of empirical Bayes implies values for the posterior mean of r that are similar to those for full Bayes, the use of the conditional likelihood implies estimates that are highly variable and can even be negative. This is because of the lack of precision in estimating µ . x Tables 1–3 demonstrate differences in the posterior distribution depending on whether one uses full Bayes or empirical Bayes, and whether one uses the exact likelihood or the conditional likelihood. In what follows, we will examine the full Bayes, exact likelihood case more closely, and show its implications for inference on return predictability. The following two sections examine statistical measures: the posterior likelihood of predictability and the posterior distribution of the R2. The final section examines economic significance of the predictability evidence through certainty equivalent returns. 3.3 Posterior likelihood of predictability We now examine the posterior probability that excess returns are predictable. Given a Bayes factor and a prior belief on the existence of predictability q, the posterior probability 20

of predictability q¯ can be computed using equation (27). The greater the investor’s prior belief about predictability, the greater is his posterior belief. The greater is the Bayes factor, the greater is the posterior belief. As described in the previous section, the Bayes factor itself depends on the other aspect of the investor’s prior: the prior probability that the R2 exceeds 1% should predictability exist. Table 4 presents the posterior probabilities of predictability as a function of the investor’s prior about the existence of predictability, q, and the prior belief on the strength of predictability, P . We consider the posterior resulting from full Bayes priors and the .01 exact likelihood. The posterior probability is increasing in q and hump-shaped in P , re- .01 flecting the fact that the Bayes factors are hump-shaped in P . The results demonstrate .01 that investors with moderate beliefs on both the existence and strength of predictability revise their beliefs on the existence on predictability sharply upward. For example, an investor with q = 0.5 and P = 0.50 conclude that the posterior likelihood of predictability .01 equals 0.88 using the payout yield to predict annual returns. This result is robust to a wide range of choices for P . As the table shows, P = 0.25 implies a posterior probability of .01 .01 0.74. The posterior probability falls off dramatically as P approaches one; for these very .01 diffuse priors (which imply what might be considered an economically unreasonable amount of predictability), the Bayes factors are close to zero. While theevidence isslightly weaker whenthedividend-price ratioisused inannualdata, the dividend-price ratio combined with quarterly post-war data implies stronger evidence in favor of predictability. In particular, q = .50 implies posterior probabilities of predictability above 0.80 for all but the most diffuse prior. This section has examined an important aspect of the posterior distribution: the probability that returns are predictable. In what follows, we examine the full posterior for the R2 of the predictability relation. 21

3.4 Posterior R2 values We measure the investor’s prior beliefs about the strength of predictability using the metric P(R2 > 1% H ) = P . It is therefore of interest to examine the posterior beliefs over the 1 .01 | R2. We consider posteriors derived from the full Bayes prior and the exact likelihood. Figure 2 shows two plots on the prior and posterior distribution of the R2 with priors P(R2 > 1% H ) = 0.50 and q = 0.5 using the payout yield to predict annual returns. 1 | Panel A plots P(R2 > k) as a function of k for both the prior and the posterior; this corresponds to 1 minus the cumulative density function of the R2.8 The plot for the P(R2 > k) demonstrates a clear rightward shift for the posterior for values of k up to 0.15 (both the prior and the posterior place similarly low probabilities that the R2 exceeds 0.15). The strength of the predictability can be seen in that while the prior implies P(R2 > 1%) = 0.25, the posterior implies P(R2 > 1%) close to 0.85. Thus, after observing the data, an investor revises his beliefs on the strength of predictability substantially upward. Panel B plots the probability density function of the R2. The full Bayes prior places the highest density on low values of the R2. The posterior however places high density in the region around 5% and has lower density than the prior for R2 values less than 2%. The evidence in favor of predictability, with a moderate R2, is sufficient to overcome the investor’s initial skepticism. Figure 3 shows the comparable plots using the dividend-price ratio to predict annual returns. Results aresimilar tothose discussed for thepayout yield. The posterior probability of P(R2 > k) is again higher that the prior probability for k ranging from 0 to 15%. The probability that the R2 exceeds 1% goes from 15% to about 75%. The probability density function also shows lower density than the prior for very low values of the R2 and again places high density in the region of 5%. Figure 4 repeats this analysis using the dividend-price ratio to predict quarterly returns. The results show that the posterior clearly favors the existence of a moderate amount of predictability (notethat wewould expect theR2 measured ataquarterly horizontobebelow 8 This figures shows the unconditional posterior probability that the R2 exceeds k; that is, it does not condition on the existence of predictability. 22

that for an annual horizon). Panel A shows that the probability that the R2 exceeds 1% is 25% for the prior but above 80% for the posterior. More generally, the posterior probability that the R2 exceeds k is greater for the posterior than for the prior for all k < 3%. Panel B shows that the posterior density exhibits a clear spike around 2%. The above analysis evaluates the statistical evidence on predictability. The Bayesian approach also enables us to study the economic gains from market timing. In particular, we can evaluate the certainty equivalent loss from failing to time the market under different priors on the existence and strength of predictability. 3.5 Certainty equivalent returns We now measure the economic significance of the predictability evidence using certainty equivalent returns. We assume an investor who maximizes W1−γ E T+1 D 1 γ " (cid:12) # − (cid:12) (cid:12) (cid:12) for γ = 5, where W = W (wexp r +r +(1 w)exp r ), and w is the weight T+1 T T+1 f,T (cid:12) f,T { } − { } on the risky asset. The expectation is taken with respect to the predictive distribution p(r D) = q¯p(r D,H )+(1 q¯)p(r D,H ), T+1 T+1 1 T+1 0 | | − | where p(r D,H ) = p(r x ,b ,Σ,H )p(b ,Σ D,H )db dΣ T+1 i T+1 T i i i i i | | | Z for i = 0,1. A draw r from the distribution p(r x ,b ,Σ) is given by (1) with probability q¯ T+1 T+1 T 1 | and (2) with probability 1 q¯. The posterior distribution of the parameters is described in − Section 2.4. For any portfolio weight w, we can compute the certainty equivalent return as solving exp (1 γ)CER (wexp r +r +(1 w)exp r )1−γ T+1 f,T f,T { − } = E { } − { } D . (31) 1 γ 1 γ − (cid:20) − (cid:12) (cid:21) (cid:12) (cid:12) 23 (cid:12)

Following Kandel and Stambaugh (1996), we measure utility loss as the difference between certainty equivalent returns from following the optimal strategy and from following a suboptimal strategy. We define the sub-optimal strategy as the strategy that the investor would follow if he believes that there is no predictability. Note, however, that the expectation in (31) is computed with respect to the same distribution for both the optimal and sub-optimal strategy. Table 5 presents the average certainty equivalent loss: we compute the difference in certainty equivalent returns asdescribed above, andthen average over theposterior distribution for x. The data indicate economically meaningful economic losses from failing to time the market. Panel A shows that, for example, an investor with a prior on β such that P = 0.50 .01 and a 50% prior belief in the existence of return predictability would suffer a certainty equivalent loss of 0.84% from failing to time the market using the payout yield.9 Higher values of q imply greater certainty equivalent losses. Panel B shows somewhat lower certainty equivalent losses for the dividend-price ratio using annual data. However, the certainty equivalent loss is much greater for distributions computed using quarterly postwar data: 1.83% per annum for the investor with P = 0.50, and q = 0.50, and higher for higher levels of q. .01 4 Conclusion This study has taken a Bayesian model selection approach to the question of whether the equity premium varies over time. We considered investors who face uncertainty both over whether predictability exists, and over the strength of predictability if it does exist. We foundsubstantialevidence infavorofpredictability whenthedividend-price ratioandpayout yield were used to predict returns. Moreover, we found large certainty equivalent losses from failing to time the market, even for investors who have strong prior beliefs in a constant equity premium. 9The low values of the certainty equivalentlossesfor P.01 =0.99 area reflectionof Bartlett’s paradox,as described above. 24

Finally, we found that taking a fully Bayesian approach that incorporates the exact likelihoodfunctionleadstosubstantiallydifferentinferenceascomparedwithempiricalBayes or the conditional likelihood function. Empirical Bayes tends to overstate the evidence in favor of predictability while using the conditional likelihood understates the evidence. These results point to the importance of taking into account the stochastic nature of the regressor when studying return predictability from a Bayesian perspective. 25

Appendix A Jeffreys prior under H 0 Jeffreys argues that a reasonable property of a “no-information” prior is that inference be invariant to one-to-one transformations of the parameter space. Given a set of parameters µ, dataD, anda log-likelihoodl(µ;D),Jeffreys shows thatinvariance isequivalent tospecifying a prior as ∂2l 1/2 p(µ) E . (A.1) ∝ − ∂µ∂µ⊤ (cid:12) (cid:18) (cid:19)(cid:12) (cid:12) (cid:12) Besides invariance, this formulation of(cid:12)the prior has o(cid:12)ther advantages such as minimizing (cid:12) (cid:12) asymptotic biasand generating confidence sets that aresimilar to their classical counterparts (see Phillips (1991)). Our derivation for the limiting Jeffreys prior on b ,Σ follows Stambaugh (1999). Zellner 0 (1996, pp. 216-220) derives a limiting Jeffreys prior by applying (A.1) to the likelihood (24) and retaining terms of the highest order in T. Stambaugh shows that Zellner’s approach is equivalent to applying (A.1) to the conditional likelihood (23), and taking the expectation in (A.1) assuming that x is multivariate normal with mean (6) and variance (7). We adopt 0 this approach. We derive the prior density for p(b ,Σ −1) and then transform this into the density for 0 p(b ,Σ) using the Jacobian. Let 0 l (b ,Σ;D) = logp(D b ,Σ,H ,x ). (A.2) 0 0 0 0 0 | denote the natural log of the conditional likelihood. Let ζ = [σ(11) σ(12) σ(22)] ⊤ , where σ(ij) denotes element (i,j) of Σ −1. Applying (A.1) implies 1/2 ∂2l0 ∂2l0 p(b ,Σ −1 H ) E ∂b0∂b⊤ 0 ∂b0∂ζ⊤ . (A.3) 0 0 | ∝ (cid:12) (cid:12) −  ∂2l0 ∂2l0 (cid:12) (cid:12) (cid:12) ∂ζ∂b⊤ 0 ∂ζ∂ζ⊤ (cid:12) (cid:12)  (cid:12) The the form of the conditional likel(cid:12)ihood implies that (cid:12) (cid:12) (cid:12) T 1 l (b ,Σ;D) = log 2πΣ (z Z b ) ⊤ Σ −1 I (z Z b ). (A.4) 0 0 0 0 T 0 0 −2 | |− 2 − ⊗ − (cid:0) (cid:1) 26

It follows from (A.4) that ∂l 1 0 = Z ⊤ Σ −1 I (z Z b ), ∂b 2 0 ⊗ T − 0 0 0 (cid:0) (cid:1) and ∂2l 1 0 = Z ⊤ Σ −1 I Z ∂b ∂b ⊤ −2 0 ⊗ T 0 0 0 (cid:0) (cid:1) ⊤ = 1 ι T 0 Σ −1 I ι T 0 T −2  ⊤  ⊗   0 X 0 X (cid:0) (cid:1)     1 σ(11)T σ(12)ι ⊤ X = . (A.5) −2  σ(12)X ⊤ ι σ(22)X ⊤ X    Taking the expectation conditional on b and Σ implies 0 σ(11) σ(12)[1 µ ] x ∂2l T E 0 =  1 1 µ  (A.6) (cid:20) ∂b 0 ∂b ⊤ 0 (cid:21) −2 σ(12) σ(22) x        µ µ σ2 +µ2   x x x x        Using arguments in Stambaugh (1999), it can be shown that ∂2l 0 E = 0. ∂b ∂ζ⊤ (cid:20) 0 (cid:21) Moreover, ∂2l ∂2log Σ E 0 = | | = Σ 3 − ∂ζ∂ζ⊤ ∂ζ∂ζ⊤ | | (cid:12) (cid:18) (cid:19)(cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (see Box and Tiao (1973, pp.(cid:12)474-475)). T(cid:12)here(cid:12)fore (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) p(b 0 ,Σ −1 H 0 ) Φ 2 1 Σ 2 3 (A.7) | ∝ | | | | where σ(12) Σ −1 µ x Φ =   σ(22)  .      µ σ(12) σ(22) (σ2 +µ2)σ(22)   x x x    This matrix Φ has the same determ (cid:2) inant as E (cid:3) ∂2l0 because 2 columns and 2 rows have − ∂b0∂b⊤ 0 h i been reversed. 27

From the formula for the determinant of a partitioned matrix, it follows that σ(12) Φ = Σ −1 σ2 +µ2 σ(22) µ2 σ(12) σ(22) Σ . | | (cid:12) x x − x  (cid:12) (cid:12) σ(22) (cid:12) (cid:12) (cid:12)(cid:12)(cid:0) (cid:1) (cid:2) (cid:3) (cid:12) (cid:12) (cid:12)(cid:12)  (cid:12) (cid:12) (cid:12) Because (cid:12) (cid:12) σ(12) 0 Σ = ,     σ(22) 1     it follows that Φ = Σ −1 σ2 +µ2 σ(22) µ2σ(22) | | x x − x = (cid:12) Σ −1(cid:12) σ (cid:12)2(cid:0) σ(22). (cid:1) (cid:12) (cid:12) | | (cid:12)(cid:12)x (cid:12) The determinant of Σ equals Σ = σ2 σ2 σ2 σ −2 , | | u v − uv u while σ(22) = (σ2 σ2 σ −2) −1 . Therefore, (cid:0) (cid:1) v − uv u Φ = Σ −2σ2σ2. | | | | u x Substituting into (A.7), p(b 0 ,Σ −1 H 0 ) Σ 2 1 σ u σ x . | ∝ | | The Jacobian of the transformation from Σ −1 to Σ is Σ −3. Therefore, | | −5 p(b 0 ,Σ H 0 ) = Σ 2σ u σ x . | | | B Sampling from Posterior Distributions This section describes how to sample from the posterior distributions. In all cases, the sampling procedure for the posteriors under H and H involve the Metropolis-Hastings 1 0 algorithm. Below we describe the case of the full Bayes exact likelihood in detail. The procedures for the other cases are similar. 28

B.1 Posterior distribution under H 0 Substituting (8) and (24) into (25) implies that p(b 0 ,Σ | H 0 ,D) ∝ σ u | Σ | −T+ 2 5 exp − 1 2 σ x −2(x 0 − µ x )2 − 1 2 (z − Z 0 b 0 ) ⊤ Σ −1 ⊗ I T (z − Z 0 b 0 ) . (cid:26) (cid:27) (cid:0) (cid:1) This posterior does not take the form of a standard density function because of the term in the likelihood involving x (note that σ2 is a nonlinear function of ρ and σ ). However, we 0 x v can sample from the posterior using the Metropolis-Hastings algorithm. TheMetropolis-Hastingsalgorithmisimplemented “block-at-a-time”,byrepeatedlysampling from p(Σ b ,H ,D) and from p(b Σ,H D) and repeating. To calculate a proposal 0 0 0 0 | | density for Σ, note that (z Z b ) ⊤ Σ −1 I (z Z b ) = tr (Y XB ) ⊤ (Y XB )Σ −1 , 0 0 T 0 0 0 0 − ⊗ − − − (cid:0) (cid:1) (cid:2) (cid:3) where α θ B = . 0   0 ρ   The proposal density for the conditional probability of Σ is the inverted Wishart with T +2 ⊤ degrees of freedom and scale factor of (Y XB ) (Y XB ). The target is therefore 0 0 − − p(Σ b ,H ,D) σ exp 1 β2 σ2σ −2σ2 −2 1 σ −2(x µ )2 proposal. | 0 0 ∝ u −2 η x u − 2 x 0 − x × (cid:26) (cid:27) (cid:0) (cid:1) Let V = Z ⊤ Σ −1 I Z −1 0 0 ⊗ T 0 (cid:0) (cid:0) (cid:1) (cid:1) Let ˆ b = V Z ⊤ Σ −1 I z 0 0 0 ⊗ T (cid:0) (cid:1) It follows from completing the square that (z Z b ) ⊤ Σ −1 I (z Z b ) = (b ˆ b ) ⊤ V −1(b ˆ b )+ terms independent of b . − 0 0 ⊗ T − 0 0 0 − 0 0 0 − 0 0 (cid:0) (cid:1) 29

ˆ The proposal density for b is therefore multivariate normal with mean b and variance- 0 0 covariance matrix V . The accept-reject algorithm of Chib and Greenberg (1995, Section 5) 0 is used to sample from the target density, which is equal to 1 p(b Σ,H ,D) exp (x µ )2σ −2 proposal. 0 | 0 ∝ −2 0 − x x × (cid:26) (cid:27) Note that σ and Σ are in the constant of proportionality. Drawing successively from the u conditional posteriors for Σ and b produces a density that converges to the full posterior 0 conditional on H . 0 B.2 Posterior distribution under H 1 Substituting (12) and (22) into (26) implies that p(b 1 ,Σ | H 1 ,D) ∝ σ x | Σ | −T+ 2 5 exp − 1 2 β2 σ η 2σ x −2σ u 2 −2 − 2 1 σ x −2(x 0 − µ x )2 (cid:26) (cid:27) (cid:0) (cid:1)1 exp (z Z b ) ⊤ Σ −1 I (z Z b ) . 1 1 T 1 1 −2 − ⊗ − (cid:26) (cid:27) (cid:0) (cid:1) The sampling procedure is similar to that described in Appendix B.1. Details can be found in Wachter and Warusawitharana (2009). To summarize, we first draw from the posterior p(Σ b ,H ,D). The proposal density is an inverted Wishart with T +2 degrees of freedom 1 1 | ⊤ and scale factor (Y XB ) (Y XB ), where 1 1 − − α θ B = . 1   β ρ   We then draw from p(θ,ρ α,β,Σ,H ,D). The proposal density is multivariate normal 1 | with mean and variance determined by the conditional normal distribution, as described in Wachter and Warusawitharana. Finally, we draw from p(α,β θ,ρ,Σ,H ,D). In this case, 1 | the target and the proposal are the same, and are also multivariate normal. 30

C Computing the Bayes factor Verdinelli and Wasserman (1995) provide an implementable formula for the inverse of the Bayes factor. In our notation, this formula can be written as p(b ,Σ H ) −1 = p(β = 0 H ,D)E 0 | 0 β = 0,H ,D . (C.1) B10 | 1 p(β = 0,b ,Σ H ) 1 (cid:20) 0 | 1 (cid:12) (cid:21) (cid:12) (cid:12) To compute p(β = 0 H ,D), note that 1 (cid:12) | p(β = 0 H ,D) = p(β = 0 b ,Σ,H ,D)p(b ,Σ H ,D)db dΣ. (C.2) 1 0 1 0 1 0 | | | Z As discussed in Appendix B.2, the posterior distribution of α and β conditional on the remaining parameters is normal. We can therefore compute p(β = 0 b ,Σ,H ,D) (including 0 1 | integration constants) in closed form, by using the properties of the conditional normal distribution. Consider N draws from the full posterior: ((b (1) ,Σ(1)),...,(b (N) ,Σ(N))), where 1 1 we can write (b (i) ,Σ(i)) as (β(i),b (i) ,Σ(i)). We use these draws to integrate out over b and 1 0 0 Σ. It follows from (C.2) that N 1 p(β = 0 H ,D) p(β = 0 b (i) ,Σ(i),H ,D). | 1 ≈ N | 0 1 i=1 X where the approximation is accurate for large N. To compute the second term in (C.1), we observe that p(b ,Σ H ) p(b ,Σ H ) 0 | 0 = 0 | 0 = √2πσ , β p(β = 0,b ,Σ H ) p(β = 0 b ,Σ,H )p(b ,Σ H ) 0 1 0 1 0 1 | | | because p(b ,Σ H ) = p(b ,Σ H ). For the empirical Bayes approach, σ is a constant 0 0 0 1 β | | and no further simulation is needed. For the full Bayes approach, σ = σ σ −1σ . We re- β η x u quire the expectation taken with respect to the posterior distribution conditional on the existence of predictability and the realization β = 0. To calculate this expectation, we draw ((b (1) ,Σ(1)),...,(b (N) ,Σ(N))) from p(b ,Σ β = 0,H ,D). This involves modifying the 0 0 0 | 1 procedure for drawing from the posterior for b ,Σ given H (see Appendix B.2). We sam- 1 1 ple from p(Σ α,β = 0,θ,ρ,H ,D), then from p(ρ,θ α,β = 0,Σ,H ,D) and finally from 1 1 | | p(α β = 0,Σ,θ,ρ,H ,D), and repeat until the desired number of draws are obtained. All 1 | 31

steps except the last are identical to those described in Appendix B.2 (the value of β is identically zero rather than the value from the previous draw). For the last step we derive p(α β = 0,Σ,θ,ρ,H ,D) from the joint distribution p(α,β Σ,θ,ρ,H ,D), making use of 1 1 | | the properties of the conditional normal distribution. Given these draws from the posterior distribution, the second term equals N p(b ,Σ H ) 1 E 0 | 0 β = 0,H ,D √2πσ (σ(i)) −1σ(i), (C.3) p(β = 0,b ,Σ H ) 1 ≈ N η x u (cid:20) 0 | 1 (cid:12) (cid:21) i=1 (cid:12) X (cid:12) where this approximation is accurat(cid:12)e for N large. 32

References Avramov, Doron, 2002, Stock return predictability and model uncertainty, Journal of Financial Economics 64, 423–458. Baks, Klaas P., Andrew Metrick, and Jessica Wachter, 2001, Should investors avoid all actively managed mutual funds? A study in Bayesian performance evaluation, Journal of Finance 56, 45–86. Barberis, Nicholas, 2000, Investing for the long run when returns are predictable, Journal of Finance 55, 225–264. Bartlett, M.S., 1957, Comment on ’A Statistical Paradox’ by D. V. Lindley, Biometrika 44, 533–534. Berger, James O., 1985, Statistical decision theory and Bayesian analysis. (Springer New York). Boudoukh, Jacob, Roni Michaely, Matthew Richardson, and Michael R. Roberts, 2007, On theimportanceofmeasuringpayoutyield: Implicationsforempiricalassetpricing,Journal of Finance 62, 877–915. Box, George E.P., and George C. Tiao, 1973, Bayesian Inference in Statistical Analysis. (Addison-Wesley Pub. Co. Reading, MA). Brandt, Michael W., Amit Goyal, Pedro Santa-Clara, and Jonathan R. Stroud, 2005, A simulation approach to dynamics portfolio choice with an application to learning about return predictability, Review of Financial Studies 18, 831–873. Brennan, Michael J., Eduardo S. Schwartz, and Ronald Lagnado, 1997, Strategic asset allocation, Journal of Economic Dynamics and Control 21, 1377–1403. Campbell, John Y., and Luis M. Viceira, 1999, Consumption and portfolio decisions when expected returns are time-varying, Quarterly Journal of Economics 114, 433–495. 33

Campbell, John Y., and Motohiro Yogo, 2006, Efficient tests of stock return predictability, Journal of Financial Economics 81, 27–60. Cavanagh, Christopher L., Graham Elliott, and James H. Stock, 1995, Inference in models with nearly integrated regressors, Econometric Theory 11, 1131–1147. Chen, Hui, Nengjiu Ju, and Jianjun Miao, 2009, Dynamic asset allocation with ambiguous return predictability, Working paper, MIT. Chen, Zengjing, and Larry Epstein, 2002, Ambiguity, risk and asset returns in continuous time, Econometrica 70, 1403–1443. Chib, Siddhartha, and Edward Greenberg, 1995, Understanding the Metropolis-Hastings algorithm, American Statistician 49, 327–335. Chipman, Hugh, Edward I. George, and Robert E. McCulloch, 2001, The practical implementation of Bayesian model selection, in P. Lahiri, eds.: Model Selection (IMS Lecture Notes, Bethesda, MA ). Cremers, K.J. Martjin, 2002, Stock return predictability: A Bayesian model selection perspective, Review of Financial Studies 15, 1223–1249. Dickey, James M., 1971, The weighted likelihood ratio, linear hypotheses on normal location paramaters, The Annals of Mathematical Statistics 42, 204–223. Fernandez, Carmen, EduardoLey,andMarkF.J.Steel, 2001,BenchmarkpriorsforBayesian model averaging, Journal of Econometrics 100, 381–427. Garlappi, Lorenzo, Raman Uppal, and Tan Wang, 2007, Portfolio selection with parameter and model uncertainty: A multi-prior approach, Review of Financial Studies 20, 41–81. Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin, 2004, Bayesian Data Analysis. (Chapman & Hall/CRC Boca Raton, FL). 34

Hamilton, J. D., 1994, Time Series Analysis. (Oxford University Press Princeton, NJ). Hansen, Lars Peter, 2007, Beliefs, doubts and learning: Valuing economic risk, NBER working paper #12948. Jeffreys, Harold, 1961, Theory of Probability. (Oxford University Press Clarenden). Johannes, Michael, andNicholasPolson, 2006,MCMCmethodsforfinancialeconometrics, in Yacine Ait-Sahalia, and LarsHansen, eds.: Handbook of Financial Econometrics (Elsevier, North-Holland ). Johannes, Michael, Nicholas Polson, and Jonathan R. Stroud, 2002, Sequential optimal portfolioperformance: Market andvolatility timing, Working paper, Columbia University, University of Chicago, and University of Pennsylvania. Kandel, Shmuel, and Robert F. Stambaugh, 1996, On the predictability of stock returns: An asset allocation perspective, Journal of Finance 51, 385–424. Kass, R., and A. E. Raftery, 1995, Bayes factors, Journal of the American Statistical Association 90, 773–795. Lewellen, Jonathan, 2004, Predicting returns with financial ratios, Journal of Financial Economics 74, 209–235. Maenhout, Pascal, 2006, Robust portfolio rules and detection-error probabilities for a meanreverting risk premium, Journal of Economic Theory 128, 136–163. Nelson, C. R., and M. J. Kim, 1993, Predictable stock returns: The role of small sample bias, Journal of Finance 48, 641–661. Pastor,Lubos,andRobertF.Stambaugh,1999,Costsofequitycapitalandmodelmispricing, Journal of Finance 54, 67–121. Pastor, Lubos, and Robert F. Stambaugh, 2008, Predictive systems: Living with imperfect predictors, forthcoming, Journal of Finance. 35

Pesaran, M Hashem, and Allan Timmermann, 1995, Predictability of stock returns: Robustness and economic significance, Journal of Finance 50, 1201–1228. Phillips, PeterC.B.,1991,Tocriticizethecritics: AnobjectiveBayesiananalysisofstochastic trends, Journal of Applied Econometrics 6, 333–364. Robbins, Herbert, 1964, The empirical Bayes approach to statistical decision problems, The Annals of Mathematical Statistics 35, 1–20. Skoulakis, Georgios, 2007, Dynamic portfolio choice with Bayesian learning, Working paper, University of Maryland. Stambaugh, Robert F., 1999, Predictive regressions, Journal of Financial Economics 54, 375–421. Stock, James H., and Mark W. Watson, 2005, An empirical comparison of methods for forecasting using many predictors, Working paper, Harvard University and Princeton University. Torous, Walter, Rossen Valkanov, and Shu Yan, 2004, On predicting stock returns with nearly integrated explanatory variables, Journal of Business 77, 937–966. Verdinelli, Isabella, and Larry Wasserman, 1995, Computing Bayes factors using a generalization of the Savage-Dickey density ratio, Journal of the American Statistical Association 90, 614–618. Wachter, Jessica A., and Missaka Warusawitharana, 2009, Predictable returns and asset allocation: Should a skeptical investor time the market?, forthcoming, Journal of Econometrics. Wright, Jonathan H., 2003, Bayesian model averaging of exchange rate forecasts, Internation Finance Discussion Papers 779, Board of Governors of the Federal Reserve. 36

Zellner, Arnold, 1996, An introduction to Bayesian inference in econometrics. (John Wiley and Sons, Inc. New York, NY). 37

Table 1: Bayes factors and posterior means: Payout yield and annual returns ¯ Model P β ρ¯ r¯ x¯ .01 10 B Full Bayes 0.05 1.68 2.23 0.936 5.24 -3.17 Exact Lkl. 0.50 11.99 12.94 0.889 5.14 -3.16 0.99 18.20 19.54 0.878 5.05 -3.16 Full Bayes 0.05 1.36 1.39 0.959 5.64 -5.32 Conditional Lkl. 0.50 5.51 10.71 0.910 4.87 -3.76 0.99 6.54 16.42 0.914 -22.66 -6.24 Empirical Bayes 0.05 2.58 3.99 0.926 5.22 -3.17 Exact Lkl. 0.50 19.43 14.17 0.887 5.13 -3.16 0.99 27.13 21.90 0.851 5.09 -3.16 OLS 20.89 0.863 5.85 -3.15 Notes: P denotes the prior probability that the R2 from the predictive regression ex- .01 ceeds .01 conditional on the existence of predictability (this is applicable for full Bayes priors; empirical Bayes priors are constructed to be comparable to full Bayes counterparts). = p(D H )/p(D/H ) denotes the Bayes factor in favor of predictability (H ) versus no 10 1 0 1 B | predictability (H ). The table also reportsposterior means ofthe predictive coefficient β, the 0 autoregressive coefficient ρ, the excess return r and the predictor variable x conditional on H . The predictor variable is the payout yield (the dividend-price ratio adjusted for repur- 1 chases) constructed from the value-weighted CRSP index. Continuously compounded stock returns on the value weighted CRSP index are in excess of the continuously-compounded return on the three-month Treasury Bill. Data are annual from 1/1/1927 to 1/1/2004. OLS denotes results obtained from ordinary least squares regression. 38

Table 2: Bayes factors and posterior means: Dividend-price ratio and annual returns ¯ Model P β ρ¯ r¯ x¯ .01 10 B Full Bayes 0.05 1.51 1.48 0.966 4.71 -3.37 Exact Lkl. 0.50 5.73 7.64 0.946 4.37 -3.35 0.99 6.90 11.30 0.948 4.02 -3.35 Full Bayes 0.05 1.21 0.83 0.980 5.31 -10.24 Conditional Lkl. 0.50 2.78 5.56 0.963 3.15 -6.75 0.99 3.53 8.90 0.976 -83.53 -16.17 Empirical Bayes 0.05 2.23 2.65 0.960 4.64 -3.36 Exact Lkl. 0.50 9.17 8.85 0.942 4.31 -3.34 0.99 9.00 13.28 0.925 4.17 -3.33 OLS 11.64 0.944 5.85 -3.27 Notes: P denotes the prior probability that the R2 from the predictive regression ex- .01 ceeds .01 conditional on the existence of predictability (this is applicable for full Bayes priors; empirical Bayes priors are constructed to be comparable to full Bayes counterparts). = p(D H )/p(D/H ) denotes the Bayes factor in favor of predictability (H ) versus no 10 1 0 1 B | predictability (H ). The table also reportsposterior means ofthe predictive coefficient β, the 0 autoregressive coefficient ρ, the excess return r and the predictor variable x conditional on H . The predictor variable is the dividend-price ratio constructed from the value-weighted 1 CRSP index. Continuously compounded stock returns on the value weighted CRSP index are inexcess of the continuously-compounded return onthe three-month Treasury Bill. Data are annual from 1/1/1927 to 1/1/2004. OLS denotes results obtained from ordinary least squares regression. 39

Table 3: Bayes factors and posterior means: Dividend-price ratio and quarterly post-war returns ¯ Model P β ρ¯ r¯ x¯ .01 10 B Full Bayes 0.05 4.68 1.05 0.990 3.20 -3.49 Exact Lkl. 0.50 7.06 1.87 0.984 3.21 -3.50 0.99 6.48 2.01 0.983 3.21 -3.50 Full Bayes 0.05 2.14 0.69 0.994 2.68 -8.13 Conditional Lkl. 0.50 2.90 1.51 0.988 0.53 -6.87 0.99 2.59 1.59 0.988 -4.74 -8.66 Empirical Bayes 0.05 10.57 1.44 0.988 3.20 -3.50 Exact Lkl. 0.50 11.72 2.43 0.979 3.20 -3.50 0.99 9.34 2.77 0.976 3.20 -3.50 OLS 2.74 0.976 5.22 -3.51 Notes: P denotes the prior probability that the R2 from the predictive regression ex- .01 ceeds .01 conditional on the existence of predictability (this is applicable for full Bayes priors; empirical Bayes priors are constructed to be comparable to full Bayes counterparts). = p(D H )/p(D/H ) denotes the Bayes factor in favor of predictability (H ) versus 10 1 0 1 B | no predictability (H ). The table also reports posterior means of the predictive coefficient 0 β, the autoregressive coefficient ρ, the excess return r and the predictor variable x conditional on H . The posterior mean of r is annualized by multiplying by 4. The predictor 1 variable is the dividend-price ratio constructed from the value-weighted CRSP index. Continuously compounded stock returns on the value weighted CRSP index are in excess of the continuously-compounded return on the three-month Treasury Bill. Data are quarterly from 4/1/1952 to 1/1/2005. OLS denotes results obtained from ordinary least squares regression. 40

Table 4: Posterior probability of predictable excess stock returns for the full Bayes exact likelihood. Predictor P(R2 > 0.01 H ) Prior prob. of return predictability q 1 | 0.01 0.20 0.50 0.80 Payout Yield 0.05 0.02 0.30 0.63 0.87 Annual 0.50 0.11 0.75 0.92 0.98 0.99 0.16 0.82 0.95 0.99 Dividend-Price Ratio 0.05 0.02 0.27 0.60 0.86 Annual 0.50 0.05 0.59 0.85 0.96 0.99 0.07 0.63 0.87 0.97 Dividend-Price Ratio 0.05 0.05 0.54 0.82 0.95 Quarterly 0.50 0.07 0.64 0.88 0.97 0.99 0.06 0.62 0.87 0.96 Notes: The table reports q¯, the probability the investor assigns to predictable excess stock returns after seeing the data. Rows vary P(R2 > .01 H ), the prior probability that the R2 1 | from the predictability regression exceeds 0.01, conditional on the existence of predictability. Columns vary q, the prior probability of predictable excess stock returns. The predictor variables include the payout yield and the dividend-price ratio, both constructed from the value-weighted CRSP index. Continuously compounded stock returns on the value-weighted CRSP index are in excess of the continuously-compounded return on the three-month Treasury Bill. The first two panels report results using annual data from 1/1/1927 to 1/1/2004. The last panel reports results using quarterly data from 4/1/1952 to 1/1/2005. 41

Table 5: Average certainty equivalent returns from timing the market. Predictor P(R2 > 0.01 H ) Prior prob. of return predictability q 1 | 0.20 0.50 0.80 0.99 Payout Yield 0.05 0.01 0.03 0.05 0.07 Annual 0.50 0.57 0.82 0.92 0.95 0.99 1.15 1.50 1.61 1.65 Dividend-Price Ratio 0.05 0.01 0.03 0.06 0.08 Annual 0.50 0.37 0.69 0.84 0.90 0.99 0.97 1.60 1.87 1.98 Dividend-Price Ratio 0.05 0.42 0.86 1.07 1.16 Quarterly 0.50 1.14 1.83 2.11 2.21 0.99 1.19 1.97 2.30 2.42 Notes: The table reports the certainty equivalent return to timing the market. Rows vary P(R2 > .01 H ), the prior probability that the R2 from the predictability regression exceeds 1 | 0.01, conditional on the existence of predictability. Columns vary q, the prior probability of predictable excess stock returns. The predictor variables include the payout yield and the dividend-price ratio, both constructed from the value-weighted CRSP index. The posterior is constructed using full Bayes priors with the exact likelihood. Continuously compounded stock returns on the value-weighted CRSP index are in excess of the continuouslycompounded return on the three-month Treasury Bill. The first two panels report results using annual data from 1/1/1927 to 1/1/2004. The last panel reports results using quarterly data from 4/1/1952 to 1/1/2005. In this panel, returns are annualized by multiplying by 4. The certainty equivalent returns are constructed by averaging over the CER values for 1000 draws of the predictor variable from its unconditional posterior distribution. 42

Figure 1: Prior Distribution of the R2 Panel A: Probability of predictability q = 1. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.02 0.04 0.06 0.08 0.1 k )k>2R(P σ =100.00 η σ =0.15 η σ =0.05 η Panel B: Probability of predictability q = 0.5. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.02 0.04 0.06 0.08 0.1 k )k>2R(P σ =100.00 η σ =0.15 η σ =0.05 η Notes: The figures plot the prior probability that the R2 will be greater than some value k for different values of k. This equals 1 minus the cumulative density function for the distribution on the R2. Panel A reports the values conditional on predictability (q = 1) and panel B plots the values for a prior value of q = 0.5. σ parameterizes the prior variance of η β with σ = σ σ −1σ . β η x u 43

Figure 2: Posterior Distribution of the R2: Payout Yield and Annual Returns Panel A Panel B 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.05 0.1 0.15 0.2 k 2 )k> R(P 25 20 15 10 5 0 0 0.05 0.1 0.15 0.2 2 R ytisned ytilibaborp prior posterior Notes: Panel A plots the probability that the R2 from a predictive regression of excess stock returns on the payout yield will be greater than some value k for different values of k. This equals 1 minus the cumulative density function for the distribution on the R2. Panel B plots the probability density function of the R2 for the same regression. The dashed line signifies the prior and the solid line signifies the posterior distribution for the R2. The likelihood function for these plots is the full Bayes exact likelihood with P(R2 > 0.01 H ) = 0.50 and 1 | q = 0.5. Data are annual from 1/1/1927 to 1/1/2004. 44

Figure 3: Posterior Distribution of the R2: Dividend-Price Ratio and Annual Returns Panel A Panel B 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.05 0.1 0.15 0.2 k 2 )k> R(P 25 20 15 10 5 0 0 0.05 0.1 0.15 0.2 2 R ytisned ytilibaborp prior posterior Notes: Panel A plots the probability that the R2 from a predictive regression of excess stock returns on the dividend-price ratio will be greater than some value k for different values of k. This equals 1 minus the cumulative density function for the distribution on the R2. Panel B plots the probability density function of the R2 for the same regression. The dashed line signifies the prior and the solid line signifies the posterior distribution for the R2. The likelihood function for these plots is the full Bayes exact likelihood with P(R2 > 0.01 H ) = 0.50 and q = 0.5. Data are annual from 1/1/1927 to 1/1/2004. 1 | 45

Figure 4: Posterior Distribution of the R2: Dividend-Price Ratio and Quarterly Returns Panel A Panel B 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.05 0.1 k 2 )k> R(P 55 50 45 40 35 30 25 20 15 10 5 0 0 0.05 0.1 2 R ytisned ytilibaborp prior posterior Notes: Panel A plots the probability that the R2 from a predictive regression of excess stock returns on the dividend-price ratio will be greater than some value k for different values of k. This equals 1 minus the cumulative density function for the distribution on the R2. Panel B plots the probability density function of the R2 for the same regression. The dashed line signifies the prior and the solid line signifies the posterior distribution for the R2. The likelihood function for these plots is the full Bayes exact likelihood with P(R2 > 0.01 H ) = 0.50 and q = 0.5. Data are quarterly from 4/1/1952 to 1/1/2005. 1 | 46

Cite this document

APA

Jessica A. Wachter and Missaka Warusawitharana (2009). What is the Chance that the Equity Premium Varies Over Time? Evidence from Predictive Regressions (FEDS 2009-26). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2009-26

BibTeX

@techreport{wtfs_feds_2009_26,
  author = {Jessica A. Wachter and Missaka Warusawitharana},
  title = {What is the Chance that the Equity Premium Varies Over Time? Evidence from Predictive Regressions},
  type = {Finance and Economics Discussion Series},
  number = {2009-26},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2009},
  url = {https://whenthefedspeaks.com/doc/feds_2009-26},
  abstract = {We examine the evidence on stock return predictability in a Bayesian setting that includes uncertainty about both the existence and strength of predictability. We consider an investor who believes that excess stock returns exhibit predictability with prior probability q < 1. In addition, the investor downweights observed predictability by placing a prior distribution on the R2 of the predictability regression. When we apply our analysis to the dividend-price ratio, we find that even investors who are quite skeptical about the existence and strength of predictability sharply modify their views in favor of predictability when confronted by the evidence. We depart from previous model-selection work by treating the regressor as stochastic rather than known; we find that this has a large impact on inference about time-varying expected returns.},
}