feds · August 8, 2023

The Pricing Kernel in Options

Abstract

The empirical option valuation literature specifies the pricing kernel through the price of risk, or defines it implicitly as the ratio of risk-neutral and physical probabilities. Instead, we extend the economically appealing Rubinstein-Brennan kernels to a dynamic framework that allows pathand volatility-dependence. Because of low statistical power, kernels with different economic properties can produce similar overall option fit, even when they imply cross-sectional pricing anomalies and implausible risk premiums. Imposing parsimonious economic restrictions such as monotonicity and path-independence (recovery theory) achieves good option fit and reasonable estimates of equity and variance risk premiums, while resolving pricing kernel anomalies.

Finance and Economics Discussion Series Federal Reserve Board, Washington, D.C. ISSN 1936-2854 (Print) ISSN 2767-3898 (Online) The Pricing Kernel in Options Steven Heston, Kris Jacobs, Hyung Joo Kim 2023-053 Please cite this paper as: Heston, Steven, KrisJacobs, andHyungJooKim(2023). “ThePricingKernelinOptions,” FinanceandEconomicsDiscussionSeries2023-053. Washington: BoardofGovernorsofthe Federal Reserve System, https://doi.org/10.17016/FEDS.2023.053. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

* The Pricing Kernel in Options Steven Heston Kris Jacobs Hyung Joo Kim University of Maryland University of Houston Federal Reserve Board April 7, 2023 Abstract Theempiricaloptionvaluationliteraturespecifiesthepricingkernelthroughthepriceofrisk,or definesitimplicitlyastheratioofrisk-neutralandphysicalprobabilities. Instead,weextendthe economically appealing Rubinstein-Brennan kernels to a dynamic framework that allows pathand volatility-dependence. Because of low statistical power, kernels with different economic properties can produce similar overall option fit, even when they imply cross-sectional pricing anomaliesandimplausibleriskpremiums. Imposingparsimoniouseconomicrestrictionssuchas monotonicity and path-independence (recovery theory) achieves good option fit and reasonable estimates of equity and variance risk premiums, while resolving pricing kernel anomalies. *Heston: sheston@umd.edu;Jacobs: kjacobs@bauer.uh.edu;Kim: hyungjoo.kim@frb.gov. Wewouldliketothank Caio Almeida, David Bates, Hitesh Doshi, Bjørn Eraker, Xiaohui Gao, Stefano Giglio, Massimo Guidolin, Alex Kostakis, Paola Pederzoli, Jean-Paul Renne, seminar participants at the 2023 AFA Conference, the 2022 Finance Down Under Conference, the 2022 SoFiE Conference, the 2022 FMA Conference on Derivatives and Volatility, K.U. Leuven,SyracuseUniversity,theUniversitiesofHoustonandLiverpool,andespeciallyourdiscussantsGurdipBakshi, Mikhail Chernov, and Jeroen Dalderop for helpful comments. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by the Federal Reserve Board or other members of its staff. Co-author Hyung Joo Kim worked on this project prior to employment at the Federal Reserve Board, while a Ph.D. candidate at the University of Houston. Data sources were obtained under purview of University of Houston licenses.

1 Introduction The pricing kernel is a critical concept in asset pricing, because it determines risk premia on all securities. One approach to study the properties of the pricing kernel specifies its relation to aggregate consumption and estimates the resulting model using consumption data and returns on various assets.1 Alternatively, building on the insights of Breeden and Litzenberger (1978), an extensiveliteratureestimatesthepricingkernelusingindexreturnsandindexoptionprices.2 Index options are interesting from an empirical perspective because they identify the pricing kernel under the assumption that the equity index level is equal to aggregate wealth. However, this literature has given rise to puzzling non-monotonic (U-shaped) estimates of the pricing kernel.3 Thispapercomplementsthesetwoapproaches. Wespecifyeconomicallyintuitivepricingkernels inparametricoptionpricingmodelsandestimatethemusingindexreturnsandindexoptionprices. Our approach differs from the existing parametric literature, which typically specifies prices of risk, and thereby defines the pricing kernel implicitly as the ratio of risk-neutral and physical probabilities.4 Instead, we propose a class of economically motivated pricing kernels that extend the power kernels in Rubinstein (1976) and Brennan (1979) to be functions of the paths of latent variance v(t) and the index level S(t). These kernels also nest path-independent kernels (Ross, 2015) as special cases, and are consistent with the conventional assumption of affine dynamics under the physical and risk-neutral measure in the square root stochastic volatility model (Heston, 1993).5 By including separate components for volatility and stock risk, these kernels allow us to 1See, among others, Hansen and Singleton (1982), Mehra and Prescott (1985), Hansen and Jagannathan (1991), Campbell and Cochrane (1999), Bansal and Yaron (2004), Gabaix (2012) and Wachter (2013) for important contributions to this literature. 2See for instance A¨ıt-Sahalia and Lo (1998), A¨ıt-Sahalia and Lo (2000), Jackwerth and Rubinstein (1996), Jackwerth (2000) and Rosenberg and Engle (2002). 3ForevidenceonU-shapedpricingkernels,seeforinstanceJackwerth(2000),A¨ıt-SahaliaandLo(2000),Rosenberg andEngle(2002),Bakshi,Madan,andPanayotov(2010),Chabi-Yo(2012),Christoffersen,Heston,andJacobs(2013), Song and Xiu (2016), and Cuesdeanu and Jackwerth (2018). Linn, Shive, and Shumway (2018) and Barone-Adesi, Fusari, Mira, and Sala (2020) on the other hand argue that the pricing kernel is well-behaved. 4For examples of this approach, see the seminal papers in this literature by Chernov and Ghysels (2000), Pan (2002), and Eraker (2004). 5For simplicity, we use the simplest possible option pricing model with a stochastic volatility factor, but our approach can be easily generalized to more complex models. 1

examine distinct origins of the equity and variance risk premiums. Westartouranalysiswiththeoften-usedspecificationofanequity(market)riskpremiumµv(t) and a variance risk premium λv(t). We refer to this specification as “completely affine”, adopting the terminology in Singleton (2006, p. 392) and the term structure literature. We characterize a pricing kernel that is consistent with these risk premia, and derive the parameter restrictions consistent with the martingale conditions and absence of arbitrage. We then explore a specification with an equity risk premium µ + µ v(t) and a variance risk premium λ + λ v(t). We refer to this 0 1 0 1 specification as “affine”. Singleton (2006) points out that this specification is problematic because it may violate no-arbitrage conditions, and notes that it would be interesting to characterize the parameterrestrictionsthatpreventarbitrageforthisspecification. Becauseourkernelisformulated as a function of the state variables, it is straightforward to specify such restrictions. Our empirical analysis uses a joint likelihood based on index returns and a rich option data set, using data for the January 1996 to June 2019 period.6 We estimate this joint likelihood for the pricing kernels corresponding to the completely affine and affine prices of risk, and we also estimate it subject to various restrictions on parameter values and risk premia. Because the kernels are formulated as a function of the state variables, it is relatively straightforward to derive the implications of each kernel for the “marginal” kernel which specifies state price as a function of S(t). The marginal kernel plays a central role in empirical applications such as the pricing kernel puzzle. A first empirical result addresses the fit and empirical content of the kernels that support completely affine and affine prices of risk. Unsurprisingly, we find that the affine price of risk specification provides significantly better fit than the nested completely affine specification. However, the improved fit due to the intercepts in the affine specification comes at the cost of implausible Sharpe ratios and/or signs of the risk premiums. Moreover, while the marginal pricing kernel for the completely affine specification is well behaved and economically plausible, the kernel in the 6Much of the modern option pricing literature jointly considers the time-series of observable returns and option prices. See, for instance, Pan (2002), Eraker (2004), Bates (2006), A¨ıt-Sahalia and Kimmel (2007), Hurn, Lindsay, and McClelland (2015), and Andersen, Fusari, and Todorov (2017). 2

affine case implies state prices that are S-shaped as a function of wealth. When we impose additional parameter restrictions that preclude arbitrage, the marginal pricing kernel is well-behaved, but empirical fit worsens. We conclude that the affine specification corresponds to implausible economic assumptions, and that small and seemingly innocuous modifications to the price of risk specifications used in the literature correspond to different pricing kernels with radically different economic implications. Consequently, we advocate the use of the completely affine price of risk specification. Asecondsetofresultscomparestheunrestrictedkernelthatsupportsthecompletelyaffineprice of risk with restricted versions. We reject restrictions that the equity or variance risk premiums are zero, but we cannot reject the independence of the pricing kernel from either variance or market return shocks.7 We are unable to statistically pin down the origins of these risk premiums because innovations to market returns and market variance are highly (negatively) correlated. The equity and variance risk premium each have two components, one due to variance aversion and another due to index level risk aversion. If we restrict one of the risk aversion parameters to be zero, the other parameter absorbs most of that effect. In other words, these risks largely span each other. However, restricting variance or index level risk aversion to zero implies radically different estimates of the equity and variance risk premia, as well as large differences in the state prices embodied in the marginal pricing kernel as a function of wealth. The realized time series path of the kernel without volatility risk is also substantially less variable compared to the unrestricted kernel, especially in crisis periods. A final observation is that while we cannot statistically reject the path-independence restriction used in Ross (2015), it implies an implausible estimate of the variance aversion parameter. Our third finding sheds light on the pricing kernel puzzle – the finding that the marginal pricing kernel is U-shaped – in the existing literature. We show that U-shaped marginal kernels can result from an underlying pricing kernel that is a monotonic function of volatility and the stock price. Therefore, U-shaped pricing kernels are not anomalous nor do they constitute an asset pricing 7Note that the hypothesis that the variance-aversion parameter equals zero amounts to the absence of an independent variance risk premium, which amounts to logarithmic utility in the Merton (1973) ICAPM. 3

puzzle. Ourfourthfindingaddressesestimationofriskpremiums. FollowingBreedenandLitzenberger’s (1978) insight that the risk-neutral density can be inferred from option prices, financial economists have emphasized fitting options and returns jointly to identify risk premia. We find that these data have low power to distinguish different pricing kernels, because identifying pricing kernels is equivalenttotheestimationofconditionalriskpremia, anditisdifficulttoestimateaveragereturns over short periods.8 Different parameter restrictions lead to widely different Sharpe ratios and equity and variance risk premia, but do not translate into large decreases in the likelihood. Merton (1980) convincingly argues that very long time series of returns are required to obtain reliable estimates of the equity premium. Our findings extend Merton’s observation to joint estimation of equity and variance risk premia. We also reinforce Merton’s conclusion that economic restrictions increasepowertoidentifymarketriskpremia. WhileMerton(1980)advocatesimposingapositivity restriction on the path of the conditional equity risk premium, we find that imposing a negativity restriction on the market variance risk premium leads to more plausible and reliable estimates. Our findings also confirm the results in Bakshi, Crosby, and Gao (2022) that some option model parametersarehardtoidentifybecauseof(darkmatter)unspannedrisksthataffectriskpremiums. Our paper is related to several other strands of literature besides those on the estimation of parametric option pricing models and the pricing kernel puzzle. Several studies use consumptionbased models to analyze how preferences and pricing kernels impact index option prices.9 Some of these studies use the recursive preferences of Kreps and Porteus (1978), Epstein and Zin (1989) and Duffie and Epstein (1992), which result in stochastic volatility of index returns. Our proposed pricing kernels are extensions of the power utility of Rubinstein (1976). While consumption is not a state variable in our setup, our approach provides a direct relation with existing empiri- 8Thisstatementisspecifictoplainvanillaoptionprices,whicharesensitivetotheprobabilitiesatexpirationbut notveryinformativeaboutthepath-dependentpropertiesofthepricingkernel. Pricingkernelswithwidelydifferent economic implications can therefore produce similar values for European options. 9See,forinstance,Garcia,Luger,andRenault(2003),ErakerandShaliastovich(2008),Drechsler(2013),Shaliastovich (2015), Eraker and Yang (2019), and Seo and Wachter (2019). Liu, Pan, and Wang (2005) and Eraker and Wu (2017) use related models with the dividend payout rate and cash flow respectively as the state variable. 4

cal implementations of (reduced-form) parametric dynamic option pricing models. It is therefore straightforward to implement using option data, which allows us to explore the impact of stock index volatility on the pricing kernel. From an empirical perspective, a related paper is Chernov (2003), who reverse engineers the pricing kernel based on options on various securities. Chernov (2003) also studies the time path of the realized pricing kernel to learn about state variables and the relation between the pricing kernel and economic conditions. Ghosh, Julliard, and Taylor (2017) also explore the relation between the pricing kernel and business cycle fluctuations, but do not use options to estimate the kernel. Brennan, Liu, and Xia (2006) specify and estimate pricing kernels with multiple state variables. Beason and Schreindorfer (2022) analyze the implications of option data for macro-finance models. Dew-Becker and Giglio (2022) study the implications of synthetic puts for the properties of the marginal pricing kernel. The paper proceeds as follows. Section 2 discusses the data. Section 3 reviews the Heston (1993) stochastic volatility model and discusses our estimation approach based on returns and options data. Section 4 specifies the class of pricing kernels that connect the risk-neutral and physical dynamics. Section 5 presents the estimation results and Section 6 discusses their economic implications. Section 7 concludes. 2 Data Ourempirical analysisusesout-of-the-money (OTM)S&P500 callandput options withmaturities between 14 and 365 days for the January 1996 to June 2019 period. We obtain the option data from OptionMetrics. We apply the following filters: 1. Discard options with implied volatility smaller than 5% or greater than 150%. 2. Discard options with volume or open interest less than ten contracts. 3. Discardoptionswithmidpricelessthan$0.50orbidpricelessthan$0.375toavoidlow-valued options. 5

4. Discard options with data errors – where bid price exceeds offer price, or a negative price is implied through put-call parity. 5. Discard options with moneyness < 0.75 or > 1.25. Then we keep the six most actively traded strike prices for each available maturity. It is important to use as long a time period as possible to identify key aspects of the model, including volatility persistence.10 On the other hand, estimation using large option panels and long time series is very time-intensive. Rather than using a short time series of daily option data, we use an extended time period, but we select option contracts for one day per week only. Following several existing studies (see, e.g., Heston and Nandi, 2000; Christoffersen, Heston, and Jacobs, 2013), we use Wednesday data because it is the day of the week least likely to be a holiday. It is also less likely than other days to be affected by day-of-the-week effects. These steps result in a dataset with 62,483 option contracts. Table 1 presents descriptive statistics. We obtain S&P 500 index returns from CRSP. We use data for the January 1990 to June 2019 period. This sample starts before the option sample to help with the identification of the return parameters under the physical measure, as in Christoffersen, Heston, and Jacobs (2013). We also use data on the VIX from January 1990 to June 2019, which we obtain from the Federal Reserve Bank of St. Louis Economic Database. The time series for the risk-free rate is proxied by the one-month Treasury Bill rate obtained from CRSP. Following existing work, options are valued using a maturity-specific risk-free rate. We apply a cubic spline interpolation to the data obtained from OptionMetrics. 3 Return-Based and Option-Based Parameter Estimates We estimate the stylized affine Heston (1993) stochastic volatility model. We obtain parameter estimates for this model under the physical measure, exclusively based on returns, and under the risk-neutral measure, exclusively based on options. Then we compare the resulting estimates. 10See, for instance, Broadie, Chernov, and Johannes (2007) for a discussion. 6

3.1 The Model Wefocus on the simplestpossible stochastic volatility model witha single diffusive volatility factor. We recognize that the existing literature has clearly established that additional volatility factors, jumps in returns and variance and/or tail factors are required to improve option fit and pricing performance. However, we deliberately focus on the simplest possible model because it suffices to illustrate our main argument and we want to avoid comparisons between models and factors. Our analysis can be repeated using more general models, but at the cost of much greater complexity. We believe that most of the issues we highlight here using a simple model are even more relevant in more complex models, but we leave this analysis for future work. We employ the Heston (1993) continuous-time stochastic square root volatility model to specify stock price dynamics as well as option prices. For option valuation, the risk-neutral stock price dynamicissufficient. Thesquarerootstochasticvolatilitymodelspecifiestherisk-neutraldynamics of the spot index S(t) and its stochastic variance v(t) as follows: (cid:112) dS(t)/S(t) = rdt+ v(t)dz∗(t), (1) 1 (cid:112) dv(t) = κ∗(θ∗−v(t))dt+σ v(t)dz∗(t), 2 where dz∗ and dz∗ are Wiener processes with correlation coefficient ρ. The risk-free rate r can be 1 2 either constant or time-varying; this has negligible implications for our results. It is also straightforward to specify a stochastic model for the risk-free rate, but it is well-known from the existing literature that this does not have a major impact on option valuation (Bakshi, Cao, and Chen, 1997). We therefore deliberately focus on the simplest possible model. Consistent with most of the existing literature, we focus on a physical dynamic that has the same functional form as the risk-neutral dynamic: (cid:112) dS(t)/S(t) = [r+µ(v(t))]dt+ v(t)dz (t), (2) 1 (cid:112) dv(t) = κ(θ−v(t))dt+σ v(t)dz (t), 2 7

where µ(v(t)) denotes the equity premium as a function of v(t), and dz and dz are Wiener 1 2 processes under the physical measure. Note that σ, the variance of variance parameter, and ρ, the correlation between z and z , are assumed to be identical to the corresponding parameters in 1 2 the risk-neutral dynamics. However, the long-run physical variance θ and mean reversion κ differ from the long-run risk-neutral variance θ∗ and mean reversion κ∗. This specification is consistent with the existing literature. It represents the most general combination of physical and risk-neutral dynamics that are consistent with the affine specification and Girsanov’s theorem. We analyze this mapping in more detail below in our discussion of (the) pricing kernel(s). 3.2 The Instantaneous Stochastic Variance and the VIX In the Heston (1993) model, as well as in its many generalizations studied in the literature, the stochastic variance is unknown. This latency is typically addressed in estimation by using filteringor simulation-based techniques (see, e.g., Eraker, Johannes, and Polson, 2003; Eraker, 2004; Bates, 2006; Christoffersen, Jacobs, and Mimouni, 2010). It is well-known that the implementation of such techniques is computationally very demanding, especially when using long time series and large cross-sections of option prices in estimation. To alleviate this computational burden, we follow a different approach.11 We use the fact that the stochastic variance v(t) can be represented as a linear function of VIX2(t). This directly follows from the model specification: When v(t) follows a CIR process, VIX2(t) is a linear function of v(t). Specifically, the model-implied VIX2(t) is given by: 1 (cid:20)(cid:90) t+∆1m (cid:21) VIX2(t) = E∗ v(u)du ∆ t 1m t e−κ∗∆1m −1 = θ∗+ (v(t)−θ∗), (3) −κ∗∆ 1m 11See Bates (2000) and Andersen, Fusari, and Todorov (2015) for alternative approaches. 8

where ∆ ≈ 30/365. Rearranging equation (3) yields 1m VIX2(t)−θ∗(1−w) v(t) = , (4) w where w = (1 − exp(−κ∗∆ ))/(κ∗∆ ). In implementation, we can add a measurement error 1m 1m because equations (3) and (4) use the model-implied VIX2(t). Equation (3) in conjunction with the measurement error yields a measurement equation which can be used to filter the latent state variable. Jones (2003), Cheung (2008), and Chernov, Graveline, and Zviadadze (2018) use this measurement equation and a Bayesian framework with Markov chain Monte Carlo methods to estimate option pricing models. We further simplify the setup: We do not use the measurement equation, but relax the restrictions on the coefficients in equation (4) and omit the measurement error. Specifically, we assume: v(t) = η +η VIX2(t). (5) 0 1 Wethenuseequation(5)inthevaluationformulaforalloptionsinthesample. Asaresult, options are a function not only of the stochastic v(t), but also of the observable VIX. This implementation follows A¨ıt-Sahalia and Kimmel (2007), who use it in a sample which contains a single shortmaturity at-the-money option at each time t. We next discuss the details of this estimation approach when using returns and when using options. Our use of the VIX as a proxy for the stochastic variance has implications for both estimation exercises. 3.3 Return-Based Estimation The main purpose of the assumption that the stochastic variance is an affine function of VIX is to alleviate the computational burden when estimating the model using option data. However, this assumption also has implications for the return-based estimation. Since we observe the total return of the stock index and VIX at each time t, we can formulate the joint likelihood function of the return and VIX2 to estimate the physical parameters. In most existing estimations, the variance is 9

instead filtered from the underlying returns, and the VIX is not used in estimation. To characterize the likelihood function, we first apply Ito’s lemma and the Euler discretization to equation (2), which results in: (cid:20) (cid:21) 1 logR(t+∆) = r+µ(v(t))− v(t) ∆+ϵ (t+∆), (6) R 2 v(t+∆)−v(t) = κ(θ−v(t))∆+ϵ (t+∆), v whereR(t+∆) = S(t+∆)/S(t)representsthegrossreturnand∆ = 1/252.12 Theerrorsϵ(t+∆) = (ϵ (t+∆), ϵ (t+∆))′ follow a joint normal distribution, and their mean and variance-covariance R v matrix are given by     0 v(t) σρv(t) 0 =  , Σ(t) =  ∆. 0 σρv(t) σ2v(t) The joint log-likelihood function is given by: T−1 logLR = (cid:88) logf (cid:0) logR(t+∆),VIX2(t+∆)|VIX2(t) (cid:1) t=1 T−1 (cid:88) = logf(logR(t+∆),v(t+∆)|v(t))×J(t+∆) t=1 T−1 (cid:88) 1 1 = −log(2π)− log|Σ(t)|− ϵ(t+∆)′Σ−1(t)ϵ(t+∆)+logη , 1 2 2 t=1 where f(logR(t+∆),v(t+∆)|v(t)) is the conditional density of the discretized logR(t+∆) and v(t+∆), J(t+∆) is the Jacobian between VIX2(t+∆) and v(t+∆), which is given by η from 1 equation (5), and t represents time measured in days. Let Θ = {µ, κ, θ, σ, ρ, η , η } be the set 0 1 of physical parameters. To estimate Θ, we solve the following optimization problem: maxlogLR. (7) Θ 12NotethatlogR(t+∆)isthedailylogreturnbetweentandt+∆whilev(t)istheannualizedvarianceattimet. 10

3.4 Option-Based Estimation The risk-neutral parameters for the dynamic in equation (1) can be estimated in various ways, but each implementation requires an option valuation technique. We follow the fast Fourier implementation of Carr and Madan (1999). The price of a call option with its strike price K and maturity τ is expressed by a quasi closed form up to a numerical integration, and it is given by e−αk (cid:90) ∞ (cid:104) (cid:105) C(S(t),v(t),t) = Re e−iukψ(u) du, (8) π 0 where k is the natural log of K. The function ψ(u) is the Fourier transform of a modified call price, which is the call price multiplied by eαk for α > 0. We found that α = 4 works well. The function ψ(u) is calculated as follows: e−rτfCH(u−i(α+1)|S(t),v(t)) ψ(u) = τ , (α+iu)(α+1+iu) where i is the imaginary unit, and fCH(ϕ|S(t),v(t)) = E∗(cid:2) eiϕlogS(t+τ)(cid:3) is the risk-neutral con- τ t ditional characteristic function of logS(t + τ). The closed-form expression of fCH(ϕ|S(t),v(t)) τ follows Heston (1993).13 The price of a put option with the same strike price and maturity can be obtained through put-call parity. Note that the option pricing formula in equation (8) does not account for dividends. We follow the existing literature and use a future-dividend-adjusted index price. Specifically, we use S(t)e−qτ, where q is the dividend yield at time t. 13When logS(t) and v(t) are characterized by (cid:112) dlogS(t) = [r+uv(t)]dt+ v(t)dz (t), 1 (cid:112) dv(t) = (a−bv(t))dt+σ v(t)dz (t), 2 the characteristic function solution is given by fCH(ϕ|S(t),v(t))=eC+Dv(t)+iϕlogS(t), (9) τ where (cid:110) (cid:104) (cid:105)(cid:111) (cid:104) (cid:105) C =rϕiτ + a (b−ρσϕi+d)τ −2log 1−gedτ , D= b−ρσϕi+d 1−edτ , (10) σ2 1−g σ2 1−gedτ g= b−ρσϕi+d, and d= (cid:112) (ρσϕi−b)2−σ2(2uϕi−ϕ2). b−ρσϕi−d 11

We use vega-weighted option pricing errors. Let OMkt and OMod denote the market and model i i prices of the ith option, respectively. Both OMkt and OMod represent call option prices if F/K < 1 i i and put option prices if F/K > 1. Define the vega-weighted option pricing errors as OMkt−OMod ϵ = i i , o,i νMkt i where νMkt is the Black-Scholes vega of option i.14 i Maximum likelihood estimation requires a distributional assumption. Following most of the existing literature, we assume that ϵ follows a normal distribution, i.e. ϵ ∼ N(0,s2), where s2 o,i o,i o o is the sample variance of the errors. Option valuation errors are assumed to be independent and identically distributed (i.i.d.). Thesetofrisk-neutralparameterstobeestimatedisdenotedbyΘ∗ = {κ∗, θ∗, σ, ρ, η , η }. Let 0 1 N be the total number of options data. Θ∗ is then estimated by solving the following optimization problem: max logLO, Θ∗ where the option log-likelihood function, logLO, is given by: N N N 1 (cid:88) logLO = − log(2π)− logs2− ϵ2 . 2 2 o 2s2 o,i o i=1 3.5 Parameter Estimates Table 2 presents the estimation results. Panel A presents the (physical) parameters estimated from returns, and Panel B presents the (risk-neutral) parameters estimated from options. The physical parameter estimates are based on the stock index return and VIX data. The riskneutral parameters are estimated exclusively based on options data. Both physical and risk-neutral parameter estimates are economically plausible. Consistent with findings in the existing literature, 14These errorsare often usedbecause option prices are very different but implied volatilitiesare in amore narrow range. See for example Carr and Wu (2007), Trolle and Schwartz (2009), and Christoffersen, Heston, and Jacobs (2013). 12

κ is much larger than κ∗, while θ is much smaller than θ∗. The risk-neutral long-run variance exceeds the physical long-run variance, and risk-neutral persistence exceeds physical persistence. The option-based kurtosis parameter σ is larger than the return-based estimate of σ, and the option-based skewness parameter ρ is more negative than the return-based ρ. The distribution implied by the option data is thus more fat-tailed and skewed than the physical distribution. The finding on σ is mostly consistent with the existing literature. Bakshi, Cao, and Chen (1997), Eraker (2004), and Christoffersen, Jacobs, and Mimouni (2010) also obtain higher estimates of σ whenestimatingonoptions. Existingfindingsonρaremixed. Typicallytheestimatesfromreturns are not very different from the option-based estimates. The estimate of µ in Table 2 implies an average yearly equity premium µv(t) of 8.19% for the January1990toJune2019sampleperiod, closetothesampleaverageof8.24%. Itisnotpossibleto infer the path of the model-implied variance risk premium using the estimated parameters because the physical and risk-neutral estimations do not constrain the parameter estimates of σ, ρ, η , 0 and η to be the same. However, we can use the parameters θ and θ∗ to compare the long-run 1 means of the physical and risk-neutral stochastic variances. Taking the square root, we find that the model-implied long-run expectation of the stock index physical (risk-neutral) yearly volatility is 17.6% (31.4%). Figure 1 shows the time path of the option-based variance, as well as the time path of the difference between the return-based and option-based variance. 4 A Taxonomy of Pricing Kernels We characterize various pricing kernels implied by the physical and risk-neutral dynamics in equations (1) and (2). Specifically, we analyze pricing kernels consistent with completely affine and affine equity and variance risk premiums. We discuss the differences between these kernels and the economic implications of these differences. We highlight several special cases, including the power utility kernel of Rubinstein (1976) and the path-independent kernel of Ross (2015). 13

4.1 The Pricing Kernel by Girsanov’s Theorem The existing literature on the Heston (1993) square root model typically restricts the physical and risk-neutral dynamics to have the same functional form, as in equations (1) and (2), but is not always explicit about the relation between the two dynamics, and by extension about the implied risk premiums and pricing kernels. We discuss two (nested) assumptions on the risk premiums that allow the physical and risk-neutral dynamics to have the same functional form. 4.1.1 Completely Affine Prices of Risk Heston (1993) restricts the variance risk premium to be proportional to the variance, i.e., equal to λv(t). Because Heston (1993) does not perform (joint) estimation of the model parameters, he does not explicitly address the structure of the equity premium. However, several seminal estimation exercises (see, e.g., Bates, 2000; Pan, 2002; A¨ıt-Sahalia and Kimmel, 2007) impose a similar proportional structure on the equity premium, i.e., equal to µv(t). We therefore start our analysis with this specification of the equity and variance risk premium. Henceforth, we refer to this specification, with equity and variance risk premiums that are proportional to the stochastic variance, as completely affine risk premiums. Note that in this case, the relation between the riskneutral and physical parameters in equations (1) and (2) are given by κ = κ∗−λ and θ = κ∗θ∗/κ. Using Girsanov’s Theorem, Online Appendix A shows that the following pricing kernel is consistent with the risk-neutral dynamic (1), the physical dynamic (2) and completely affine equity and variance risk premiums. (cid:18) 1 (cid:90) t (cid:20) dM(s) (cid:21) (cid:90) t (cid:112) (cid:90) t (cid:112) (cid:19) M(t) = M(0)exp −rt− Var − π v(s)dz (s)− π v(s)dz (s) , (11) 1 1 2 2 2 M(s) 0 0 0 (cid:104) (cid:105) where Var dM(s) = (cid:0) π2+π2+2ρπ π (cid:1) v(s)ds, and π and π are price of risk parameters, which M(s) 1 2 1 2 1 2 are related to the risk premium parameters as follows: µ = π +ρπ and λ = ρσπ +σπ , (12) 1 2 1 2 14

4.1.2 Affine Prices of Risk We now consider a generalization of the completely affine equity and variance risk premiums in Section 4.1.1. Specifically, we consider kernels that are consistent with equity and variance risk premiumsthatarelinear(affine)inthestochasticvariancebutalsocontainsomenonzerointercepts; thatis,µ +µ v(t)and/orλ +λ v(t). Inthiscase,therelationbetweentherisk-neutralandphysical 0 1 0 1 parameters is given by κ = κ∗ −λ and θ = (κ∗θ∗ +λ )/κ. See Chernov and Ghysels (2000) and 1 0 Eraker (2004) for examples. WecanonceagainuseGirsanov’sTheoremtocharacterizethepricingkernelwhichisconsistent with these risk premiums, the risk-neutral dynamics (1) and the physical dynamics (2). This gives: (cid:32) (cid:32) (cid:33) 1 (cid:90) t (cid:20) dM(s) (cid:21) (cid:90) t π 1,0 (cid:112) M(t) = M(0)exp −rt− Var − +π v(s) dz (s) (cid:112) 1,1 1 2 M(s) v(s) 0 0 (cid:32) (cid:33) (cid:33) (cid:90) t π 2,0 (cid:112) − +π v(s) dz (s) , (13) (cid:112) 2,1 2 v(s) 0 where the parameters that characterize the affine risk premium parameters can once again be expressed in terms of these price of risk parameters, as follows: µ = π +ρπ , µ = π +ρπ , λ = ρσπ +σπ , λ = ρσπ +σπ , (14) 0 1,0 2,0 1 1,1 2,1 0 1,0 2,0 1 1,1 2,1 and (cid:20) (cid:21) (cid:20) Var dM(s) = (cid:0) π2 +π2 +2ρπ π (cid:1) 1 + (cid:0) π2 +π2 +2ρπ π (cid:1) v(s) M(s) 1,0 2,0 1,0 2,0 v(s) 1,1 2,1 1,1 2,1 (cid:21) + 2(π π +π π +ρπ π +ρπ π ) ds. (15) 1,0 1,1 2,0 2,1 1,0 2,1 2,0 1,1 One critical difference between the pricing kernels in equations (11) and (13) is that (13) may allow for arbitrage when the variance reaches zero (Singleton, 2006). This can be addressed by imposing the Feller (1951) condition(s). We discuss this in more detail below. 15

4.2 A General Pricing Kernel The pricing kernels in equations (11) and (13) contain complicated, opaque stochastic integrals that lack a clear economic interpretation. The purpose of this section is to simplify and interpret these kernels using (generalizations of) economically motivated pricing kernels from the finance literature. These include the seminal power utility kernel of Rubinstein (1976) and Brennan (1979) and the transition-independent kernel of Ross (2015). Weusetheprinciplesofmonotonicityandparsimonytodiscussataxonomyofeconomicpricing kernels, where our exposition progresses from relatively complex kernels with more parameters to simpler monotonic kernels with fewer parameters. Monotonicity is based on the straightforward economic rationale that we expect marginal preferences to be decreasing in equity returns and increasing in variance. Parsimony appeals to Occam’s razor as a pragmatic tool. Complicated models suffer from a “curse of dimensionality”, and are often statistically indistinguishable from simpler models. Therefore, we prefer to condense complicated models to simpler ones with fewer parameters. All kernels we consider are formulated in terms of the state variables S(t) and v(t). The most general pricing kernel we consider is given by: (cid:18) S(t) (cid:19)−γ(cid:18) v(t) (cid:19)α (cid:18) (cid:90) t (cid:19) M(t) = M(0) exp βt+ η(v(s))ds+ξ(v(t)−v(0)) . (16) S(0) v(0) 0 Note that this is not the most general possible kernel with state variables S(t) and v(t). We start our analysis with this kernel because it provides insight into the affine equity and variance risk premium specification, or equivalently into the pricing kernel in equation (13). Specifically, Online Appendix B shows that the mapping between the preference parameters in equation (16) and the reduced-form parameters characterizing the affine risk premiums µ +µ v(t) and λ +λ v(t) is as 0 1 0 1 follows: µ = −ασρ, µ = γ −ρσξ, λ = −ασ2, λ = γρσ−ξσ2. (17) 0 1 0 1 A brief discussion of the parameter count in these kernels is in order. The affine risk premium 16

specification contains four parameters (µ ,µ ,λ ,λ ). The kernel (16) characterizes these four 0 1 0 1 parametersintermsoftheparametersinthephysicaldynamicandthethreepreferenceparameters γ, α and ξ. We therefore refer to (16) as a three-parameter kernel. One may argue that the kernel (16) contains two additional preference parameters, β and η. But using the martingale restriction on the kernel itself, Online Appendix B shows that the timepreference parameter β and the path-dependence function η(v(t)) are given by: β = −(1−γ)r+γµ −ξκθ+ακ+γασρ−ξασ2, (18) 0 (cid:20)(cid:18) (cid:19) (cid:21) (cid:20) (cid:21) η(v(t)) = µ − 1 γ +ξκ− 1 (cid:0) γ2−2γξσρ+ξ2σ2(cid:1) v(t)− κθ− 1 σ2+ 1 ασ2 α . (19) 1 2 2 2 2 v(t) Restrictions (18)-(19) are required to match the risk-free rate, which is an affine function of the variance. Restriction (18) constrains the intercept in this affine function by constraining the β parameter, and restriction (19) constrains the slope. Equation (19) stipulates that given the physical parameters and the three preference parameters γ, α and ξ, there is a (time-varying) pathdependence parameter η(v(t)) that imposes no-arbitrage. The implications of (not) imposing the restrictions (18) and (19) are outside the scope of our study.15 We merely want to point out that for the purpose of the mapping between the kernel (16) and the affine risk premiums, the kernel effectively uses three preference parameters to characterize the four parameters µ ,µ ,λ , and λ . 0 1 0 1 Fromaneconomicperspective, wenotetwoimportantfeaturesofthekernel(16). First, whileit contains an integral that depends on the history of the variance, it does not depend on the history of the stock price. This represents a substantial simplification from the general stochastic integrals in (11) and (13). Second, the kernel can be seen as extending the power utility of Rubinstein (1976)andBrennan(1979)toincludethepricingofvariancerisk. However, thekernelcontainstwo parameters (α and ξ) that govern the pricing of variance, and this is a potential source of problems. Any economically plausible pricing kernel should produce a monotonic variance premium. Because 15The restrictions (18) and (19) should in principle be imposed in empirical work in addition to the affine risk premiums µ +µ v(t) and λ +λ v(t), but usually these restrictions are ignored. This amounts to stating that one 0 1 0 1 hasnoprioronthemagnitudeofthepath-dependenceparameter,andthatoneiswillingtoaccepttherisk-freerate implied by the values of µ ,µ v(t),λ ,λ , even if the resulting risk-free rates are economically implausible. 0 1 0 1 17

α affects the intercept of the variance premium and ξ affects the slope (see equation (17)), the pricing kernel can be a nonmonotonic function of variance if these parameters have opposite signs. Inthiscase,investorssometimeslikehighervariance,whileatothertimestheypreferlowervariance. This will likely have strange implications for the cross-section of option returns. 4.3 A Monotonic Kernel To prevent these implausible nonmonotonicities, we can restrict α in equation (16) to be zero, which gives the following simpler two-parameter specification: (cid:18) S(t) (cid:19)−γ (cid:18) (cid:90) t (cid:19) M(t) = M(0) exp βt+η v(s)ds+ξ(v(t)−v(0)) . (20) S(0) 0 This specification is easier to interpret economically, because one parameter γ determines equity risk-aversion, and another parameter ξ governs variance-aversion. Moreover, Online Appendix C shows that given the risk-neutral dynamics (1), the physical dynamics (2), no-arbitrage, and completely affine equity and variance risk premiums µv(t) and λv(t), the two pricing kernel parameters γ and ξ determine the equity premium µ and variance premium λ parameters, as follows: Equity risk premium: µv(t) = (γ −ξσρ)v(t), (21) Variance risk premium: λv(t) = (ρσγ −σ2ξ)v(t) = (ρσµ−(1−ρ2)σ2ξ)v(t). (22) Equations (21) and (22) indicate that due to the correlation ρ between stock returns and variance, both the equity premium µ and variance premium λ depend on both of the pricing parameters γ and ξ. The variance premium λ may be negative because the pricing kernel parameter ξ is positive, or because variance is negatively correlated with equity returns (ρ < 0) and the equity premium is positive (i.e., variance has a negative beta). We can test these hypotheses separately, by restricting ξ or λ to be zero. Similarly, we can test the hypothesis that the equity risk-aversion γ is zero separately from the hypothesis that the equity premium µ itself is zero. Finally, note that for this pricing kernel the no-arbitrage restrictions (18) and (19) reduce to: 18

β = −(1−γ)r−ξκθ, (23) (cid:18) (cid:19) η = µ− 1 γ +ξκ− 1 (cid:0) γ2−2γξσρ+ξ2σ2(cid:1) . (24) 2 2 Henceforth we refer to the kernel in equation (20) as the exponential-affine pricing kernel. Next we discuss two important special cases of this kernel. 4.4 The Power Utility Pricing Kernel A first economically important special case emanates from setting ξ equal to zero in equation (20). This imposes absence of variance preference, and gives: (cid:18) S(t) (cid:19)−γ (cid:18) (cid:90) t (cid:19) M(t) = M(0) exp βt+ ηv(s)ds . (25) S(0) 0 Thelocalfluctuationsinthepricingkernel(25)nowresultexclusivelyfromchangesinthespotprice S(t), and there is no additional premium due to variance aversion. The martingale condition once again implies that the expected growth of the pricing kernel must equal the opposite of the interest rate. This restricts β and η, effectively leaving only a single index level preference parameter γ to price risk. For convenience we will refer to the kernel in (25) as the power utility pricing kernel. However, a critical point is that the seminal work by Rubinstein (1976) and Brennan (1979) specifies economically motivated pricing kernels for economies with constant variance. While the kernel (25) is related to the power utility pricing kernels of Rubinstein (1976) and Brennan (1979), it highlights that we cannot simply adopt those static kernels in the presence of stochastic volatility. Specifi- (cid:82)t cally, the kernel contains a path dependent component η v(s)ds even in the absence of variance 0 preference. 19

4.5 A Path Independent Pricing Kernel Finally, following Ross (2015), we consider eliminating the dependence of the pricing kernel on the historical path of v(t). This type of pricing kernel follows from time-separable expected utility or from more general Epstein-Zin preferences. Specifically, if we restrict ξ to solve the quadratic equation γ2−(1+2ρσξ)γ +σ2ξ2+2κ∗ξ = 0, then the pricing kernel M(t) is a separable function of the contemporaneous stock price S(t) and contemporaneous variance v(t): (cid:18) S(t) (cid:19)−γ M(t) = M(0) exp(−((1−γ)r+ξκθ)t+ξ(v(t)−v(0))). S(0) This special case effectively sets η = 0 in equation (24), which is a joint restriction on the equity risk aversion parameter γ and variance risk aversion ξ, and therefore also reduces the pricing kernel to a single parameter. We test this restriction in our empirical application. 5 Pricing Kernels: Parameter Estimates We estimate the stochastic volatility model subject to the restrictions imposed by the pricing kernels in the previous section. Our results are based on the joint likelihood composed of returns and options. We use the empirical fit for different sets of restrictions to test the more general pricing kernels against more parsimonious versions. We also compare and analyze the risk premia and other economic implications for different restricted and unrestricted pricing kernels. 5.1 Joint Maximum Likelihood Estimation with No-Arbitrage Restrictions Rather than separately estimating the physical and risk-neutral dynamics as in Table 2, we now jointly estimate both dynamics subject to various specifications of the pricing kernels. In our implementation, we either use a no-arbitrage condition based on a specific structure of the pricing 20

kernel, as in equation (20) for example, or we directly impose a structure on the parameters that characterize the risk premia such as µ and λ in the completely affine case. For kernels nested in the exponential-affine specification in equation (20), given a set of parameters ΘPK1 = {γ, ξ, κ, θ, σ, ρ, η , η }, we can obtain the risk premium parameters 0 1 µ and λ via equations (21) and (22). Likewise, for the more general kernel in equation (16), given ΘPK2 = {γ, ξ, α, κ, θ, σ, ρ, η , η }, the risk premium parameters µ , µ , λ , and 0 1 0 1 0 λ can be obtained from equation (17). We also investigate specifications that restrict the risk 1 premium parameters instead. For the affine price of risk specification, the parameter vector is ΘPR = {µ , µ , λ , λ , κ, θ, σ, ρ, η , η }. Regardless of the parameterization, estimation 0 1 0 1 0 1 is based on the sum of the return and option log-likelihoods. That is, we solve the following optimization problem: maxlogLR+logLO, Θ where Θ can refer to ΘPK1, ΘPK2, or ΘPR. Table 3 presents estimation results for the exponential-affine kernel (20) and six restricted cases. Column (1) reports on the unrestricted exponential-affine pricing kernel in equation (20). In columns (2) and (3) we have two restricted estimation exercises based on risk premium restrictions: µ = 0 and λ = 0 respectively. Next, we have several cases based on preference parameter restrictions. In column (4) we impose γ = 0 and in column (5) we impose ξ = 0. The restriction ξ = 0 corresponds to the power utility pricing kernel in equation (25). We also test the joint restrictions γ = ξ = 0 in column (6).16 Finally, we impose η = 0 in column (7). This case provides insight into the literature on recovery theory (see, e.g., Ross, 2015; Boroviˇcka, Hansen, and Scheinkman, 2016; Qin and Linetsky, 2016). Table 4 reports on the affine specification of the price of risk in column (8), the more general kernel in equation (16) in column (9), and the special case of this kernel with ξ = 0 in column (10). Online Appendix D summarizes the implementation of the restrictions used in these tables. 16From equations (21) and (22), the joint restriction µ=λ=0 is equivalent to the joint restriction γ =ξ=0. 21

5.2 Parameter Estimates: The Exponential-Affine Pricing Kernel Table 3 reports the results of the joint MLE estimation. We report robust standard errors for all parameters, except for parameters that are determined by the other estimates, see Online Appendix D. Column (1) presents the estimates for the MLE estimation of the exponential-affine pricingkernelinequation(20).17 Wecancomparetherisk-neutralestimatesincolumn(1)withthe option-basedestimatesinTable2. Therisk-neutralmean-reversionκ∗ = κ+λfromjointestimation in Table 3is 1.114, somewhat higher than the 0.0986 estimate in Table 2. The risk-neutral long-run variance θ∗ = κθ/κ∗ in Table 3 is 0.0877, somewhat lower than the 0.0986 estimate in Table 2. This finding is not surprising, because the option-implied variance typically exceeds the average variance implied by returns. We can also compare the physical parameters in column (1) of Table 3 with the return-based estimates in Table 2. The physical mean reversion κ in Table 3 is 2.926, substantially smaller than the 4.146 estimate in Table 2. The physical long-run variance θ in Table 3 is 0.033, similar to the 0.031 estimate in Table 2. The most important difference is that the return-based estimates of ρ and σ in Table 2 are smaller (in absolute value) compared to Table 3. Recall that in joint estimation the ρ and σ parameters are the same under both measures. The estimate of ρ from joint estimation is a bit larger in absolute value compared to the one in Panel B of Table 2, and the estimate of σ is a bit smaller, but the differences are relatively minor. The estimate of the index level preference parameter (“risk aversion”) γ in column (1) is 1.393 and the estimate of the variance preference parameter ξ is 1.947. The signs of these preference parametersareconsistentwitheconomicintuition. Thepointestimateofµincolumn(1)ofTable3 issimilarbutslightlylowerthantheestimateinTable2. Thevarianceriskpremiumparameterλis estimated to be negative in column (1) of Table 3, which is also intuitively plausible. The negative λ implies κ > κ∗ and θ < θ∗. The negative λ also means that the ordinal relations between the physical and risk-neutral parameters are the same as in Table 2, which reports estimates from 17Recallthatthiskernelisisomorphictothecompletelyaffinepriceofriskspecificationwithequitypremiumµv(t) and variance risk premium λv(t). 22

returns only and options only. 5.3 Restrictions on the Exponential-Affine Pricing Kernel Columns (2)-(7) in Table 3 report on estimation subject to various restrictions on the exponentialaffine pricing kernel. When we impose restrictions on the risk premium parameters µ and λ in columns (2) and (3), the resulting negative estimates of the risk preference parameters are inconsistent with economic intuition. The implied risk premium parameters µ and λ and the physical parameters also substantially differ from the estimates in column (1). Columns (4) and (5) directly restrict risk preferences instead. The remaining unrestricted risk preference parameter estimates havethesamesignasincolumn(1),butarelargerinabsolutevalue. Theotherparametersarevery similar to column (1). Column (6) imposes the joint restriction γ = ξ = 0, which is equivalent to setting the risk premium parameters equal to zero (µ = λ = 0). These restrictions seem to impact the estimates of the drift parameters κ and θ, but not the parameters ρ and σ, which determine skewness and kurtosis. Column (7) tests the path-independence restriction, characterized by η = 0. In columns (1)- (6) and the other restricted estimations, the martingale property implies the value of η, and the resulting estimates are different from zero. The η = 0 restriction results in a negative estimate of the variance preference parameter ξ, which is not economically plausible. The most important finding is that while the pricing kernel parameters differ across columns (1)-(7), the risk-neutral parameter estimates κ∗, θ∗, σ, and ρ remain remarkably similar across columns. As a result, the vega-weighted option RMSEs are nearly identical. However, the physical parameters κ and θ differ, which explains the large differences in the pricing kernel parameters γ, ξ, and η, or equivalently the risk premium parameters µ and λ. Panel A of Table 5 reports p-values for tests of the restricted pricing kernels against the unrestricted exponential-affine kernel in column (1). While the restrictions that one or both of the risk premium parameters (µ and or λ) are equal to zero in columns (2), (3) and (6) are rejected at the 5% level, this is not the case when we set one of the preference parameters (γ or ξ) equal to zero 23

in columns (4) or (5). The reason is the robustly high (in absolute value) correlation between the innovations to aggregate variance and equity risk. When we set the representative agent’s equity risk aversion or variance aversion equal to zero, the other parameter simply takes over the task of the restricted parameter. This is confirmed by the parameter estimates in columns (4) and (5) of Table 3, where respectively ξ and γ increase when the other parameter is set to zero, and the overall likelihood is little affected. This finding is consistent with the recent findings of Dew-Becker and Giglio (2022) that the entire index variance risk premium is due to a negative stock beta. 5.4 More General Pricing Kernels and Affine Prices of Risk Table 4 presents results based on the more general kernel in equation (16) and the affine price of risk specification. Columns (9)-(10) present results for the pricing kernel in equation (16). The risk-neutral mean reversion is similar to that for the exponential-affine kernel in columns (1)-(7) in Table 3, while the risk-neutral mean-reversion is higher. The other parameters are very similar. Option RMSE is also similar to Table 3. Column (8) in Table 4 reports on the affine specification of the price of risk with an equity premiumµ +µ v(t)andavarianceriskpremiumλ +λ v(t). Notethattherisk-neutralparameter 0 1 0 1 estimates θ∗, σ, and ρ are again very similar to the estimates in Table 3, but the risk-neutral mean reversion κ∗ is slightly smaller. The physical mean reversion is also larger, indicating that this specification allows for a larger wedge between the physical and risk-neutral mean reversion. The joint log-likelihood in column (8) exceeds the log-likelihood in column (1) of Table 3, as well as the one in column (9). The likelihood ratio tests in Table 5 show that these differences are statistically significant at the 1% level. It seems that including an (unrestricted) intercept in the risk premium specification provides a better fit to the data. This improvement in fit mainly derives from a better fit of the return data. Note that in column (9), the more general kernel in equation (16) also contains an intercept in the risk premiums, but this intercept is subject to the restrictions in equation (17). We conclude that the affine risk premium specification has important implications for model 24

fit. To the best of our knowledge, while this specification is frequently used in the option valuation literature, the implications of relaxing these restrictions are not documented or discussed in detail. Our results are consistent with findings in the bond pricing literature, where the completely affine specification is outperformed by more general specifications. Below we investigate the economic implications of these improvements in fit. 5.5 Imposing the Feller Condition(s) Finally, while the estimates in column (9) impose the no-arbitrage restrictions emanating from the martingale condition in Section 4.2, they ignore that the more general kernel in equation (16) may also lead to zero state prices if the variance process reaches zero. Another necessary condition for no-arbitrage in this case is therefore that the Feller (1951) no-arbitrage condition holds under both measures, so 2κθ > σ2 and 2κ∗θ∗ > σ2. Singleton (2006, p. 326) and Heston, Loewenstein, and Willard (2006) discuss how this same issue arises due to the presence of the intercept in the price of risk, which we show is equivalent to the specification of the kernel in equation (16). Table A.1 in the Online Appendix repeats the estimation in columns (8) and (9) of Table 4 but now imposes the Feller conditions. These additional no-arbitrage restrictions lower the likelihood and worsen the option fit. The estimate of equity risk aversion γ now has the wrong sign in column (2). Moreover, the intercept λ is driven to zero in column (1) and both intercepts are zero in 0 column (2). In other words, when we impose the Feller conditions in column (2), the estimation converges to the completely affine case, which reaffirms the economic sensibility of that kernel. Next we investigate the economic implications of the specifications in Table 3 and Table 4. We show that pricing kernels that are difficult to distinguish statistically may have very different economic implications. 6 Pricing Kernels: Economic Implications We first show that while model fit for most columns in Tables 3 and 4 is similar, risk premia and Sharperatiosgreatlyvarywiththespecificationofthepricingkernel. Wethenshowhowtoestimate 25

themarginalpricingkernelthatplotsthepricingkernelasafunctionoflogindexreturns. Weshow that this marginal pricing kernel is not necessarily downward sloping, which means that existing findings of non-monotonic kernels may not constitute puzzles. We also show how to construct the time series path of the pricing kernel and find that it also widely varies, dependent on the restrictions we impose on the pricing kernel. The path associated with the affine price of risk specification is not plausible. Lastly, we offer some observations on the relative importance of the pricing kernel components associated with the index return and variance state variables. 6.1 Risk Premia A striking finding in Tables 3 and 4 is that the risk-neutral parameters are very similar across columns,andconsequentlytheoptionRMSEsareidentical. Itiswellknownthatinjointestimation, the option sample dominates the joint log likelihood because there are many options each day and only one return. Tables 3 and 4 show that this is not a problem in our sample because the component of the likelihood due to returns is large enough to create meaningful differences in the joint log likelihood. Recall that our return likelihood differs from most existing approaches because we exploit the model implication that the latent variance is a function of the VIX.18 Wenowturntotheeconomicimplicationsofthedifferentpricingkernelrestrictionsratherthan the statistical fit. We find diametrically opposite results, namely that the different pricing kernels result in widely different economic implications. Panel B of Table 5 presents descriptive statistics for the equity and variance risk premia. Recall from equations (21), (22), and (17) that these risk premia are linear functions of the variance. This implies that the higher moments are the same acrosscolumns. However, thefirsttwomomentsareverydifferent. Theannualizedequitypremium for the exponential-affine kernel in column (1) is 8.32%, which is very reasonable given the sample estimate of 8.24%. The average equity risk premium for the more general kernel in column (9) is very similar. 18WealsorepeatedtheestimationinTables3and4withanoptionlikelihoodthatisscaledbackbythenumberof options so that the sample on each day effectively consists of one option and one return. Results for this estimation exercise are very similar. 26

Incolumns(2)and(6), theequitypremiumiszerobyconstruction. Someoftheothereconomic restrictions such as λ = 0 in column (3), γ = 0 in column (4), and η = 0 in column (7) lead to substantially lower equity premia. The variance risk premium for the exponential-affine pricing kernel in column (1) of Panel B is equal to −0.0605, which amounts to -24.60% in annual standard deviation terms. For the 1990 to 2019 period, the sample mean of the estimated volatility is 18.28%, and for the VIX it is 20.73% per year. The implied variance risk premium is therefore slightly larger than the index variance itself. This finding is similar to Bollerslev, Tauchen, and Zhou (2009). The results on the more general kernel in column (9) of Table 4 and the affine price of risk specification in column (8) yield similar estimates, but note that the signs of the (implied) risk premium parameters in Table 4 mean that the equity and variance risk premium do not always have the economically plausible signs. Imposing the restrictions in columns (2), (5) and (7) results in smaller (in absolute value) variance risk premia. We conclude that while different pricing kernels lead to nearly identical option fit and riskneutralparameters,aswellassmalldifferencesinthejointloglikelihood,theyresultinverydifferent economicimplicationsandriskpremia. Bates(2003)arguedthatitisdifficulttodistinguishbetween option models based on option fit because misspecified models can fit options relatively well at the cost of overfitting and unreasonable out-of-sample implications. He therefore advocates joint estimation, which makes it easier to differentiate between models. While our estimation exercise in Tables 3 and 4 does not compare models with different dynamics and/or state variables, it does compare the specification of the pricing kernel, which is part of the model specification. Our results show that even when using a joint likelihood based on returns and options, it is difficult to statistically distinguish models. They are consistent with Bates’ observation in the sense that misspecification instead shows up in implausible economic implications. Another important conclusion is that the unrestricted exponential-affine pricing kernel in column (1) of Table 3 is economically plausible. Restricted versions of this kernel often result in implausible economic implications, as measured by risk premia and preference parameters. While these restrictions do not lead to a deterioration in option fit and are sometimes not statistically 27

rejected, these tests may lack power. An equivalent way to characterize our findings is that option pricing models can imply very different economic implications and risk premia, which cannot be distinguished statistically even when using returns and options in estimation. This finding is, to the best of our knowledge, novel and surprising, but it is consistent with the findings of Merton (1980) on the equity premium. 6.2 Sharpe Ratios The parameter estimates allow us to retrieve the conditional mean and standard deviation of the daily market return. For exponential-affine pricing kernels, they are given by E (R(t+∆)) = 1+[r+µv(t)]∆, (26) t (cid:112) σ (R(t+∆)) = v(t)∆. (27) t For the more general pricing kernels in equation (16) and the affine price of risk specification, the conditional mean of the daily market return is given by E (R(t+∆)) = 1+[r+µ +µ v(t)]∆. (28) t 0 1 We calculate the time series of the daily Sharpe ratios for the various kernel specifications based on thesemodel-impliedconditionalmeansandstandarddeviations. ThelastrowinPanelBofTable5 reports the time-series averages of these daily Sharpe ratios. The unrestricted exponential-affine pricing kernel in column (1) implies an average daily Sharpe ratio of 0.0259, which is equivalent to an annualized Sharpe ratio of 0.411. This is close to the sample average of 0.439. The more general kernel in column (9) yields a Sharpe ratio close to the sample average, and the Sharpe ratio remains plausible when setting ξ = 0 in columns (5) and (10). However, the average Sharpe ratios are not plausible when imposing some of the other restrictions. Importantly, the affine price of risk specification in column (8), which is successful in matching the average equity premium (see the discussion in Section 6.1), yields a Sharpe ratio which is too high. Moreover, the time series of 28

the Sharpe ratio in this case (not reported) is very different from the time series associated with columns (1) and (9). The time variation in the Sharpe ratio is much smaller in the case of the affine specification. Moreover, due to the intercept in the risk premium, the lowest observed Sharpe ratio in our sample period is 0.449, which is unrealistic and much higher than in columns (1) and (9). 6.3 A Multivariate Representation of the Estimated Pricing Kernel Figure 2 provides a scatterplot of the estimated exponential-affine kernel in equation (20) as a function of the log stock return logR(t) and the daily change in variance v(t)−v(t−1), based on the MLE estimates in column (1) of Table 3. The two bottom pictures illustrate the univariate relations. Figure 2 illustrates that the log return and the change in variance are negatively and positively correlated respectively with the log pricing kernel. This is, of course, due to the positive estimates of the index return preference γ and the variance preference ξ. The implied correlation coefficientρbetweenthelogreturnandthechangeinvarianceisclearlyhighlynegative. Stockprice increases or log return increases, therefore, reduce the pricing kernel directly through the channel of the positive γ and indirectly through the channel of the positive ξ combined with the negative ρ. An increase of the (change in) variance increases the pricing kernel directly and indirectly through the same mechanism. Notethattheunivariatescatterplotontheleftdoesnotillustratethemarginalortheconditional distribution of the pricing kernel as a function of (log) returns. At each point in time, the relation between the pricing kernel and returns is captured for a different level of volatility. We therefore proceed to a more formal analysis of the relation between the pricing kernel and the state variables. 6.4 The Marginal Distribution of the Pricing Kernel Following Jackwerth and Rubinstein (1996) and Jackwerth (2000), an extensive literature has investigated the shape of the pricing kernel. This literature is mainly motivated by the power utility pricing kernel in equation (25). Based on the kernel in equation (25), we expect a downward sloping log pricing kernel as a function of the log index return. However, starting with Jackwerth 29

(2000), several studies document that both the conditional and the unconditional pricing kernels are not downward sloping as a function of index returns (aggregate wealth), but instead are Ushaped or characterized by an even more nonlinear function (see, for example, the literature review in Cuesdeanu and Jackwerth, 2018). This finding is often referred to as a puzzle or an anomaly. The literature has proposed several potential explanations for this anomaly. For instance, Bakshi,Madan,andPanayotov(2010)arguethataU-shapedkernelnaturallyarisesfromheterogeneous expectations. We instead propose that the pricing kernel will not generally be downward sloping when viewed as a function of aggregate wealth. Instead, given our knowledge of option prices, a sensible pricing kernel should contain several state variables. To the extent that these state variables are correlated with the index return, graphing state prices as a function of the log return may not result in a downward sloping function even if the kernel in equation (20) conforms to economic theory with respect to aggregate wealth, that is, if γ > 0 in equation (20). A U-shaped pricing kernel, therefore, arises naturally and should not be thought of as an anomaly. We investigate this conjecture empirically by characterizing the shape of the pricing kernel in the aggregate wealth dimension under different restrictions on the pricing kernel, i.e., for the different columns in Tables 3 and 4. We refer to the univariate pricing kernel in the aggregate wealth dimension as the marginal distribution of the pricing kernel. It is well-known that it is not straightforward to reliably estimate the pricing kernel. We need to characterize both the physical and risk-neutral distribution. Characterizing the risk-neutral distribution is relatively straightforward because of the abundance of option data on a given day, but characterizing the physical distribution is more challenging. The literature contains several non-parametric and parametric approaches. We proceed parametrically based on the parameter estimates in Tables 2, 3, and 4. Given the parametric stock return dynamics in equations (1) and (2), we can generate the probability density of returns under both the physical and risk-neutral measures. For simplicity, we let the initial stock price S(0) = 1 or equivalently logS(0) = 0. Following Heston (1993), the 30

cumulative distribution function of log returns can be calculated as follows: 1 1 (cid:90) ∞ (cid:20) e−iϕxfCH(ϕ | S(0) = 1,v(0) = v) (cid:21) Pr(logR(τ) ≤ x) = − Re τ dϕ, 2 π iϕ 0 where the characteristic function fCH follows equation (9). We can calculate this characteristic τ function under both the physical and risk-neutral dynamics. By taking the first derivative of the cumulative distribution function with respect to x, we get the probability distribution function: dPr(logR(τ) ≤ x) Pr(logR(τ) = x) = dx 1 (cid:90) ∞ (cid:104) (cid:105) = − Re −e−iϕx+C+Dv dϕ, (29) π 0 where C and D are given in equation (10). Wecalculatethenumericalintegralinequation(29)withthephysicalandrisk-neutraldynamics of log returns to find the probability densities P (logR(τ)|v(0) = θ) and Q(logR(τ)|v(0) = θ), respectively. We set v(0) at its unconditional mean level θ and fix the risk-free rate at r¯= 0.0261 (2.61%), the sample mean for our 1990-2019 sample. The pricing kernel is, by definition, the ratio of risk-neutral density to physical density, multiplied by the risk-free discount factor. We thus express the τ−maturity log pricing kernel as follows: logM(τ)−logM(0) = −r¯τ +logQ(logR(τ)|v(0) = θ)−logP (logR(τ)|v(0) = θ). (30) By changing the value of logR(τ), we can therefore generate the log pricing kernel as a function of the log index return. We repeat this process for various physical and risk-neutral parameter vectors, corresponding to the estimates for various restrictions on the pricing kernel in Tables 3 and4. TherightpanelsinFigures3and4reporttheresults. Forconvenience,weexpressthex-axis in standardized log returns (the log return in deviation from its mean and divided by its standard 31

deviation).19 Results are qualitatively similar across maturities, but not surprisingly the results are more pronounced for longer maturities. Figures 3 and 4 present the results for the implied six-month pricing kernels. For comparison, Figure A.1 in the Online Appendix presents additional results for one-month pricing kernels. 6.5 Recovering Pricing Kernel Dynamics Most of the literature studies (differences in) pricing kernels using the equivalent of the right-side panels in Figures 3 and 4, that is, as a univariate function of the log return (aggregate wealth). However, our proposed pricing kernels (20) and (16) contain additional state variables. We now provide additional perspective on (differences between) pricing kernels by illustrating the impact of these state variables on the path of the pricing kernels. The left panels in Figures 3 and 4 are obtained by inserting the observed realized returns and variances at each time t into the expression for the pricing kernel. These figures can be thought of as the (daily) time series path of the realized pricing kernel conditional on the realized values of the state variables. It is straightforward to generate this time path for all unrestricted and restricted specifications of the pricing kernel.20 See Chernov (2003) for a related exercise. 6.6 Marginal Pricing Kernels and Pricing Kernel Dynamics: Empirical Results Figure 3 presents results for five pricing kernels. The time path of the pricing kernel is on the left, and the marginal kernel is on the right. Panel A is based on the parameter estimates in Table 2, which do not impose any economic restrictions. Panel B of Figure 3 is based on the estimates 19Since we already have the physical probability density, we can calculate the expectation and variance of the log return as: (cid:90) ∞ E(logR(τ)|v(0)=θ) = logR×P(logR(τ)|v(0)=θ)dlogR −∞ (cid:90) ∞ Var(logR(τ)|v(0)=θ) = (logR)2×P(logR(τ)|v(0)=θ)dlogR−E2(logR(τ)|v(0)=θ). −∞ 20However, the computation is more complex in the case where the physical and risk-neutral parameters are independently estimated and for the affine price of risk specification. Online Appendix E discusses these cases. 32

in column (1) of Table 3, i.e., the exponential-affine pricing kernel. Recall that the completely affine price of risk specification is isomorphic to the unrestricted exponentially-affine pricing kernel. Panels C and D report on two special cases of column (1). Panel C imposes γ = 0 and Panel D imposes ξ = 0, i.e., the power utility pricing kernel in equation (25). Finally, Panel E imposes path independence (η = 0) We start with the results in Panels B, C and D. The differences between these three kernels are striking. Consider the marginal pricing kernels in the right column. By definition, the kernel in Panel D is linear in the log return space, and because the estimate of γ in column (5) of Table 3 is positive, it is downward sloping, consistent with the intuition of decreasing marginal utility of wealth. For the exponential-affine kernel in Panel B, state prices are no longer linearly downward sloping as a function of log aggregate wealth. Because the estimate of ξ is positive (and ρ is negative), we now have a convex function. However, the estimate of ξ is not high enough to generate a U-shaped pricing kernel. In Panel C on the other hand, column (4) of Table 3 indicates that we get a higher estimate of ξ, and we obtain a U-shaped pricing kernel.21 The dynamics of the time paths of the kernel (left column) for Panels C and D display some similarities with the exponential-affine kernel in Panel B. The realized kernels fluctuate a lot over time, especially in the 2008-2009 financial crisis period. There is also substantial variation around the 1998 LTCM crisis and the early 2000 recession. The most important difference between Panels B and D is that the time variation in the pricing kernel in Panel B is more pronounced in the second half of the sample. This is due to the fact that this time period contains more volatility outliers, as can be seen from Figure 1. Comparing Panels B and C, the outliers in Panel C are more pronounced due to the larger estimate of ξ. Panel A of Figure 3 reports on the pricing kernel implied by the estimates in Table 2, i.e., using physical estimates obtained from returns and risk-neutral estimates obtained (separately) from options. These estimates do not impose restrictions on the pricing kernel, but they do of course impose a parametric structure on the probability distributions. The resulting pricing kernel 21Song and Xiu (2016) also emphasize how volatility risk can explain U-shaped pricing kernels. They use data on S&P 500 and VIX options and nonparametric estimation techniques. 33

has a W-shape. This is consistent with some existing findings (Cuesdeanu and Jackwerth, 2018), although a U-shape is more common.22 The dramatic differences between the marginal pricing kernel in Panel A and the ones in Panels B, C and D are confirmed by the time path of the pricing kernels in the left panels.23 The only thing the path in Panel A has in common with the two others is that it varies most during the financial crisis. However, the differences between the financial crisis and the rest of the sample are much less pronounced in Panel A. More importantly, the outliers in the pricing kernel are of an entirely different order of magnitude compared to the exponential-affine pricing kernel in Panel B. We conclude that the time-series patterns in Panel B are implausible. This is due to the fact that the underlying P and Q estimates are not linked by economic assumptions. Finally, the kernel that incorporates path-dependence (i.e. nonzero η) generate very different paths compared to the path independent case (η = 0) in Panel E of Figure 3 and strongly differ during periods when the stock variance abruptly increases or decreases. Under path-independence, the pricing kernel is not highly sensitive to changes in variance.24 Panel A in Figure 4 reports on the affine price of risk specification in column (8) of Table 4. Panel B reports on the more general pricing kernel in equation (16), corresponding to the estimates in column (9) of Table 4. Panel C imposes ξ = 0, as in column (10) of Table 4. Recall that the estimates in Panel A are obtained under the affine restriction that the equity premium is equal to µ + µ v(t) and the variance risk premium is equal to λ + λ v(t). Existing results often impose 0 1 0 1 theserestrictionsinsteadoftheno-arbitragerestrictions, presumablybecausedifferentno-arbitrage restrictions all result in the risk premia being linear in the variance. Our results suggest that this assumption is not innocuous. The marginal pricing kernel in Panel A is very different from the ones in Panels B and C. Note also the difference with the exponentially-affine kernel in Panel B of Figure 3; the shape is actually more similar to the unrestricted pricing kernel in Panel A of 22It is important to note that the estimation of the pricing kernel is much more precise for low returns. For high returns, confidence intervals are typically much larger due to more limited option information. We do not report confidence intervals in Figure 3, and moreover this issue is also impacted by imposing a parametric structure. 23Note that the y−axis in Panel B is scaled differently. 24This implication results from the zero η in conjunction with the negative estimate of ξ. 34

Figure 3. This finding is surprising given that Panel A of Figure 3 does not impose any restrictions across measures, while Panel A of Figure 4 does. While the implied pricing kernels as a function of the index returns in Panel A of Figure 3 and Panel A of Figure 4 are similar, this is not the case for the corresponding time paths in the left column. Panel A of Figure 3 displays many more extreme positive and negative outliers throughout the sample. However, the time-series pattern in Panel A of Figure 4 also does not seem plausible. For instance, it is volatile during 1993-1996 and 2004-2007, whereas the fluctuations for the exponential-affine pricing kernel in Panel B of Figure 3 occur mainly in the financial crisis. Panel D of Figure 4 is based on the parameter estimates in Table A.1 in the Online Appendix. It imposes the Feller condition on the kernel in equation (16). A comparison with Panel B of Figure 4 indicates that this restriction strongly affects the economic implications. Most notably, we now obtain a strongly U-shaped pricing kernel. However, the upward sloping part of this kernel is due to the counterintuitive negative sign of γ. This provides additional evidence that the exponential-affine kernel is preferable to the more general kernel (16). Figures 3 and 4 yield six important conclusions: 1) We confirm that the unrestricted pricing kernel (based on the estimates in Table 2) is (highly) nonlinear as a function of aggregate wealth; 2) It is straightforward to write down economically meaningful pricing kernels (models) that can generate log pricing kernels that are U-shaped or non-monotonic in log aggregate wealth; 3) The exponential-affine pricing kernel has very plausible economic implications and is preferable to the more general kernel (16); 4) The affine price of risk specification provides a better fit to the option data, but its time path and other economic implications are implausible; 5) Imposing no-arbitrage restrictions is critically important, as evidenced for instance by the differences between Panels A and B of Figure 4; 6) Even when the log likelihoods in Tables 3 and 4 are similar, the time paths of the pricing kernels and the marginal kernels can be very different. While it seems to be difficult to statistically distinguish between models and/or pricing kernels, it may be easier to do so on economic grounds. This finding extends the observations of Merton (1980) on the estimation of the marketequitypremiumtojointestimationofequityandvarianceriskpremiausingthecross-section 35

of options and the underlying returns. 6.7 Time-Variation in Pricing Kernel Components In our final empirical exercise, we decompose the pricing kernel into its different components: the one related to the stock return (γlogR(t)) and to the change in variance (ξ[v(t) − v(t − 1)]). Figure 5 presents results for the exponential-affine specification in column (1) of Table 3. We omit thepath-dependentcomponentforconvenience. Itissmallandverypersistent. Figure5showshow the different components of the pricing kernel account for the overall variation of the unrestricted exponential-affinepricingkernel. Weplotthecumulativelogpricingkernelanditscomponentsover the sample period to emphasize the difference in the means of the components. Both the return and variance components are large at some points in time, due to their large standard deviations. However, Figure 5 shows that the cumulative index return component is much larger than the cumulative variance component, due to the fact that the mean of the return component is much larger than the mean of the variance component. 7 Conclusion Thepricing kernelis acritical conceptinassetpricing. Itgoverns therelationship betweenphysical and risk-neutral probabilities at all times and for all return horizons, and ensures absence of arbitrage. Options can be used to estimate risk-neutral probabilities that are required to identify the pricing kernel, highlighting the importance of derivatives for asset pricing and financial economics. ThispaperproposespricingkernelsthatgeneralizetheeconomicallyappealingkernelsinRubinstein(1976)andBrennan(1979). Economicintuitionspecifieshowpricingkernelsrelatetorelevant state variables and suggests that kernels should be monotonic, well-behaved and smooth functions of these state variables. The proposed kernels can be path-dependent and nest path-independent kernels (Ross, 2015) as special cases. They are volatility-dependent by construction and therefore characterize the impact of stock market volatility risk on state prices and risk premiums. We use these kernels to study the economic implications of the affine and completely affine price of risk 36

specifications. Weshowthataffineriskpremiacanproducekernelswithcounter-intuitiveeconomic propertiesandpricingkernelsthatarenon-monotonicinindexreturnsarenotanomalous. Akernel consistent with the completely affine price of risk specification produces very plausible results. We also estimate the resulting models subject to various restrictions on the pricing kernel. We reject the hypotheses of a zero equity or variance premium, but due to the negative correlation between index returns and variance we cannot statistically pin down the source of these risk premiums. More generally, we find that the data do not provide sufficient power: It is difficult to statistically distinguish between pricing kernels, even when they embody very different economic assumptionsandgeneratewidelydifferentequityandvarianceriskpremiaandSharperatios. These findings extend Merton’s (1980) observations on the estimation of the market equity premium. The analysis in this paper is based on a single-factor diffusion model. An obvious question is whether pricing kernels in more complex models are meaningfully different from pricing kernels in simpler models. Finally, our implementation uses plain-vanilla index options. It is possible that otheroptioncontractsmayfacilitatetheestimationandidentificationofpricingkernels, equityand variance risk premia. 37

References A¨ıt-Sahalia, Yacine, and Robert Kimmel, 2007, Maximum Likelihood Estimation of Stochastic Volatility Models, Journal of Financial Economics 83, 413–452. A¨ıt-Sahalia, Yacine, and Andrew W. Lo, 1998, Nonparametric Estimation of State-Price Densities Implicit in Financial Asset Prices, Journal of Finance 53, 499–547. A¨ıt-Sahalia, Yacine, and Andrew W. Lo, 2000, Nonparametric Risk Management and Implied Risk Aversion, Journal of Econometrics 94, 9–51. Andersen, Torben G., Nicola Fusari, and Viktor Todorov, 2015, Parametric Inference and Dynamic State Recovery from Option Panels, Econometrica 83, 1081–1145. Andersen, Torben G., Nicola Fusari, and Viktor Todorov, 2017, Short-Term Market Risks Implied by Weekly Options, Journal of Finance 72, 1335–1386. Bakshi, Gurdip, CharlesCao, andZhiwuChen, 1997, EmpiricalPerformanceofAlternativeOption Pricing Models, Journal of Finance 52, 2003–2049. Bakshi, Gurdip, John Crosby, and Xiaohui Gao, 2022, Dark Matter in (Volatility and) Equity Option Risk Premiums, Operations Research 70, 3108–3124. Bakshi, Gurdip, Dilip Madan, and George Panayotov, 2010, Returns of Claims on the Upside and the Viability of U-Shaped Pricing Kernels, Journal of Financial Economics 97, 130–154. Bansal, Ravi, and Amir Yaron, 2004, Risks for the Long-Run: A Potential Resolution of Asset Pricing Puzzles, Journal of Finance 59, 1481–1509. Barone-Adesi, Giovanni, Nicola Fusari, Antonietta Mira, and Carlo Sala, 2020, Option Market Trading Activity and the Estimation of the Pricing Kernel: A Bayesian Approach, Journal of Econometrics 216, 430–449. 38

Bates, David S., 2000, Post-’87 Crash Fears in the S&P 500 Futures Option Market, Journal of Econometrics 94, 181–238. Bates, David S., 2003, Empirical Option Pricing: A Retrospection, Journal of Econometrics 116, 387–404. Bates, David S., 2006, Maximum Likelihood Estimation of Latent Affine Processes, Review of Financial Studies 19, 909–965. Beason, Tyler, and David Schreindorfer, 2022, Dissecting the Equity Premium, Journal of Political Economy 130, 2203–2222. Bollerslev, Tim, George Tauchen, and Hao Zhou, 2009, Expected Stock Returns and Variance Risk Premia, Review of Financial Studies 22, 4463–4492. Boroviˇcka, Jaroslav, Lars Peter Hansen, and Jos´e A. Scheinkman, 2016, Misspecified Recovery, Journal of Finance 71, 2493–2544. Breeden, DouglasT., andRobertH.Litzenberger, 1978, PricesofState-ContingentClaimsImplicit in Option Prices, Journal of Business 51, 621–651. Brennan, Michael J., 1979, The Pricing of Contingent Claims in Discrete Time Models, Journal of Finance 34, 53–68. Brennan,MichaelJ.,XiaoquanLiu,andYihongXia,2006,OptionPricingKernelsandtheICAPM, Working Paper, University of California, Los Angeles. Broadie, Mark, Mikhail Chernov, and Michael Johannes, 2007, Model Specification and Risk Premia: Evidence from Futures Options, Journal of Finance 62, 1453–1490. Campbell, John Y., and John H. Cochrane, 1999, By Force of Habit: A Consumption-Based Explanation of Aggregate Stock Market Behavior, Journal of Political Economy 107, 205–251. 39

Carr, Peter, and Dilip Madan, 1999, Option Valuation Using the Fast Fourier Transform, Journal of Computational Finance 2, 61–73. Carr, Peter, and Liuren Wu, 2007, Stochastic Skew in Currency Options, Journal of Financial Economics 86, 213–247. Chabi-Yo, Fousseni, 2012, Pricing Kernels with Stochastic Skewness and Volatility Risk, Management Science 58, 624–640. Chernov, Mikhail, 2003, Empirical Reverse Engineering of the Pricing Kernel, Journal of Econometrics 116, 329–364. Chernov, Mikhail, and Eric Ghysels, 2000, A Study Towards a Unified Approach to the Joint EstimationofObjectiveandRiskNeutralMeasuresforthePurposeofOptionsValuation,Journal of Financial Economics 56, 407–458. Chernov, Mikhail, Jeremy Graveline, and Irina Zviadadze, 2018, Crash Risk in Currency Returns, Journal of Financial and Quantitative Analysis 53, 137–170. Cheung, Sam, 2008, AnEmpiricalAnalysisofJointTime-SeriesofReturnsandtheTerm-Structure of Option Prices, Working Paper, Columbia University. Christoffersen, Peter, Steven Heston, and Kris Jacobs, 2013, Capturing Option Anomalies with a Variance-Dependent Pricing Kernel, Review of Financial Studies 26, 1963–2006. Christoffersen, Peter, Kris Jacobs, and Karim Mimouni, 2010, Volatility Dynamics forthe S&P500: EvidencefromRealizedVolatility,DailyReturns,andOptionPrices,Review of Financial Studies 23, 3141–3189. Cuesdeanu, Horatio, and Jens Carsten Jackwerth, 2018, The Pricing Kernel Puzzle: Survey and Outlook, Annals of Finance 14, 289–329. Dew-Becker,Ian,andStefanoGiglio,2022,RiskPreferencesImpliedbySyntheticOptions,Working Paper, Northwestern University and Yale University. 40

Drechsler, Itamar, 2013, Uncertainty, Time-Varying Fear, and Asset Prices, Journal of Finance 68, 1843–1889. Duffie,Darrell,andLarryGEpstein,1992,AssetPricingwithStochasticDifferentialUtility,Review of Financial Studies 5, 411–436. Epstein, Larry, and Stan Zin, 1989, Substitution, Risk Aversion and the Temporal Behavior of Consumption and Asset Returns: A Theoretical Framework, Econometrica 57, 937–969. Eraker, Bjørn, 2004, Do Stock Prices and Volatility Jump? Reconciling Evidence from Spot and Option Prices, Journal of Finance 59, 1367–1404. Eraker, Bjørn, Michael Johannes, and Nicholas Polson, 2003, The Impact of Jumps in Volatility and Returns, Journal of Finance 58, 1269–1300. Eraker, Bjørn, and Ivan Shaliastovich, 2008, An Equilibrium Guide to Designing Affine Pricing Models, Mathematical Finance 18, 519–543. Eraker, Bjørn, and Yue Wu, 2017, Explaining the Negative Returns to VIX Futures and ETNs: An Equilibrium Approach, Journal of Financial Economics 125, 72–98. Eraker, Bjørn, and Aoxiang Yang, 2019, The Price of Higher Order Catastrophe Insurance: The Case of VIX Options, Working Paper, University of Wisconsin-Madison. Feller, William, 1951, Two Singular Diffusion Problems, The Annals of Mathematics 54, 173–182. Gabaix, Xavier, 2012, An Exactly Solved Framework for Ten Puzzles in Macro-Finance, Quarterly Journal of Economics 127, 645–700. Garcia, Rene, Richard Luger, and Eric Renault, 2003, Empirical Assessment of an Intertemporal Option Pricing Model with Latent Variables, Journal of Econometrics 116, 49–83. Ghosh, Anisha, Christian Julliard, and Alex P. Taylor, 2017, What is the Consumption-CAPM Missing? An Information-Theoretic Framework for the Analysis of Asset Pricing Models, Review of Financial Studies 30, 442–504. 41

Hansen, Lars Peter, and Ravi Jagannathan, 1991, Implications of Security Market Data for Models of Dynamic Economies, Journal of Political Economy 99, 225–262. Hansen, Lars Peter, and Ken Singleton, 1982, Generalized instrumental variables estimation of nonlinear rational expectations models, Econometrica 50, 1269–1286. Heston, Steven L., 1993, A Closed-Form Solution for Options with Stochastic Volatility with Applications to Bond and Currency Options, Review of Financial Studies 6, 327–343. Heston, Steven L., Mark Loewenstein, and Gregory A. Willard, 2006, Options and Bubbles, Review of Financial Studies 20, 359–390. Heston, Steven L., and Saikat Nandi, 2000, A Closed-Form GARCH Option Valuation Model, Review of Financial Studies 13, 585–625. Hurn,A.Stan,KennethA.Lindsay,andAndrewJ.McClelland,2015,EstimatingtheParametersof StochasticVolatilityModelsUsingOptionPriceData,Journal of Business & Economic Statistics 33, 579–594. Jackwerth,JensCarsten,2000,RecoveringRiskAversionfromOptionPricesandRealizedReturns, Review of Financial Studies 13, 433–451. Jackwerth, Jens Carsten, and Mark Rubinstein, 1996, Recovering Probability Distributions from Option Prices, Journal of Finance 51, 1611–1631. Jones, Christopher S., 2003, The Dynamics of Stochastic Volatility: Evidence from Underlying and Options Markets, Journal of Econometrics 116, 181–224. Kreps, D., and E. Porteus, 1978, Temporal Resolution of Uncertainty and Dynamic Choice Theory, Econometrica 46, 185–200. Linn, Matthew, Sophie Shive, and Tyler Shumway, 2018, Pricing Kernel Monotonicity and Conditional Information, Review of Financial Studies 31, 493–531. 42

Liu, Jun, Jun Pan, and Tan Wang, 2005, An Equilibrium Model of Rare-Event Premia and Its Implication for Option Smirks, Review of Financial Studies 18, 131–164. Mehra, Rajnish, and Edward Prescott, 1985, The Equity Premium Puzzle, Journal of Monetary Economics 15, 145–161. Merton, Robert C., 1973, An Intertemporal Capital Asset Pricing Model, Econometrica 41, 867– 887. Merton, Robert C., 1980, On Estimating the Expected Return on the Market: An Exploratory Investigation, Journal of Financial Economics 8, 323–361. Ooura, Takuya, 2001, A Continuous Euler Transformation and Its Application to the Fourier Transform of a Slowly Decaying Function, Journal of Computational and Applied Mathematics 130, 259–270. Pan, Jun, 2002, The Jump-Risk Premia Implicit in Options: Evidence from an Integrated Time- Series Study, Journal of Financial Economics 63, 3–50. Qin, Likuan, and Vadim Linetsky, 2016, Positive Eigenfunctions of Markovian Pricing Operators: Hansen-Scheinkman Factorization, Ross Recovery, and Long-Term Pricing, Operations Research 64, 99–117. Rosenberg, Joshua V., and Robert F. Engle, 2002, Empirical Pricing Kernels, Journal of Financial Economics 64, 341–372. Ross, Steve, 2015, The Recovery Theorem, Journal of Finance 70, 615–648. Rubinstein, Mark, 1976, The Valuation of Uncertain Income Streams and the Pricing of Options, Bell Journal of Economics 7, 407–425. Seo, Sang Byung, and Jessica A. Wachter, 2019, Option Prices in a Model with Stochastic Disaster Risk, Management Science 65, 3449–3469. 43

Shaliastovich, Ivan, 2015, Learning, Confidence, and Option Prices, Journal of Econometrics 187, 18–42. Singleton, Kenneth, 2006, Empirical Dynamic Asset Pricing: Model Specification and Econometric Assessment. (Princeton University Press Princeton, NJ). Song, Zhaogang, and Dacheng Xiu, 2016, A Tale of Two Option Markets: Pricing Kernels and Volatility Risk, Journal of Econometrics 190, 176–196. Trolle, AndersB., andEduardoS.Schwartz, 2009, UnspannedStochasticVolatilityandthePricing of Commodity Derivatives, Review of Financial Studies 22, 4423–4461. Wachter, Jessica A., 2013, Can Time-Varying Risk of Rare Disasters Explain Aggregate Stock Market Volatility?, Journal of Finance 68, 987–1035. 44

scitsitatS evitpircseD :1 elbaT ssenyenoMybataDnoitpO:AlenaP llA 01.1>K/F 01.1≤K/F<60.1 60.1≤K/F<20.1 20.1≤K/F<89.0 89.0≤K/F<49.0 49.0≤K/F 384,26 205,11 829,7 579,11 331,51 722,9 817,6 stcartnocforebmuN 41.81 90.52 89.02 79.71 01.51 56.31 12.61 )%(VIegarevA 67.42 63.71 37.12 07.62 76.73 82.02 46.41 ecirpegarevA ytirutaMybataDnoitpO:BlenaP llA 081>MTD 081≤MTD<021 021≤MTD<09 09≤MTD<06 06≤MTD<03 03≤MTD 384,26 230,31 341,7 473,5 284,8 384,41 969,31 stcartnocforebmuN 41.81 99.81 77.81 01.91 50.91 80.81 61.61 )%(VIegarevA 67.42 63.05 96.23 61.72 13.12 41.41 10.9 ecirpegarevA )dezilaunna(snruteRkcotS:ClenaP 9102-6991 9102-0991 65.8 13.9 )%(naeM 57.81 15.71 )%(noitaiveddradnatS 62.0- 62.0ssenwekS 90.11 18.11 sisotruK 6991 ,01 yraunaJ eht rof stcartnoc noitpo MTO gnisolc yadsendeW rof scitsitats evitpircsed tneserp B dna A slenaP :setoN .K ecirp ekirts eht yb dedivid τ)q−r(eS = F ecirp serutuf deilpmi eht sa denfied si ssenyenoM .doirep 9102 ,62 enuJ ot ,03 enuJ ot 6991 ,1 yraunaJ dna 9102 ,03 enuJ ot 0991 ,1 yraunaJ eht rof snruter xedni yliad fo gol eht no stroper C lenaP .snruter yliad morf detupmoc era sisotruk dna ssenweks ,dezilaunna era noitaived dradnats dna naeM .sdoirep 9102 .PSRC dna scirteMnoitpO no desab snoitaluclac ’srohtuA :ecruoS 45

Table 2: Return-Based and Option-Based Parameter Estimates Panel A: Return-Based Physical Parameters µ κ θ σ ρ η η logLR 0 1 2.6367 4.1457 0.0310 0.5309 -0.6877 -0.0068 0.8815 55,409 (1.0624) (1.2397) (0.0070) (0.0242) (0.0114) (0.0003) (0.0219) Panel B: Option-Based Risk-Neutral Parameters µ∗ κ∗ θ∗ σ ρ η η logLO 0 1 0 0.9860 0.0986 0.7916 -0.7452 -0.0042 0.8740 156,831 (0.0712) (0.0050) (0.0059) (0.0021) (0.0002) (0.0072) Notes: Panel A presents the physical parameters estimated using index returns and the VIX. Panel Bpresentstherisk-neutralparametersestimatedusingoptionpricesandtheVIX.Robuststandard errors are in parentheses. 46

Table 3: Joint MLE Estimation 1990-2019: Exponential-Affine Specifications Exponential-Affine Restrictions None µ=0 λ=0 γ=0 ξ=0 γ=ξ=0 η=0 (1) (2) (3) (4) (5) (6) (7) RiskPreferenceParameters γ 1.3929 -1.0921 1.3940 0 2.4850 0 2.2311 (0.2266) (0.2182) ξ 1.9474 1.9472 -1.4776 3.4244 0 0 -0.6242 (0.4037) (0.2034) (0.8098) (0.5430) (0.0511) ImpliedRiskPremiaParameters µ 2.4852 0 0.5652 1.9206 2.4850 0 1.8810 λ -1.8115 -0.4177 0 -1.8116 -1.3937 0 -0.9211 P-Parameters κ 2.9252 1.5314 1.1141 2.9252 2.5076 1.1139 2.0352 (0.2082) (0.0727) (0.0558) (0.2933) (0.1255) (0.0558) (0.0855) θ 0.0334 0.0638 0.0877 0.0334 0.0390 0.0877 0.0480 (0.0024) (0.0023) (0.0030) (0.0033) (0.0019) (0.0030) (0.0012) σ 0.7274 0.7274 0.7274 0.7273 0.7274 0.7274 0.7274 (0.0082) (0.0083) (0.0083) (0.0083) (0.0083) (0.0083) (0.0083) ρ -0.7711 -0.7711 -0.7711 -0.7711 -0.7711 -0.7711 -0.7711 (0.0039) (0.0039) (0.0039) (0.0039) (0.0039) (0.0039) (0.0039) η0 -0.0047 -0.0047 -0.0047 -0.0047 -0.0047 -0.0047 -0.0047 (0.0002) (0.0002) (0.0002) (0.0002) (0.0002) (0.0002) (0.0002) η1 0.8881 0.8881 0.8881 0.8880 0.8881 0.8881 0.8881 (0.0059) (0.0060) (0.0060) (0.0059) (0.0060) (0.0060) (0.0060) Q-Parameters κ∗ 1.1137 1.1137 1.1141 1.1135 1.1139 1.1139 1.1141 θ∗ 0.0877 0.0877 0.0877 0.0877 0.0877 0.0877 0.0877 ModelFit logLR 54,443.42 54,440.38 54,440.30 54,443.11 54,443.02 54,439.98 54,442.60 logLO 156,578.03 156,578.03 156,578.10 156,577.96 156,578.04 156,578.03 156,577.97 logLR+logLO 211,021.45 211,018.41 211,018.40 211,021.07 211,021.05 211,018.01 211,020.58 Vega-WeightedOptionPriceRMSE 0.0197 0.0197 0.0197 0.0197 0.0197 0.0197 0.0197 Notes: This table reports the results of joint maximum likelihood estimation. The risk preference parameters γ and ξ as well as the P-parameters are estimated; the other parameters are implied by the respective restrictions in columns (1) to (7). Robust standard errors are in parentheses. We estimate the model for seven different specifications. We report the log likelihood and the vega-weighted option price RMSE for each specification. 47

Table 4: Joint MLE Estimation 1990-2019: Other Specifications AffinePOR PKinEq.(16) Restrictions None None ξ=0 (8) (9) (10) RiskPreferenceParameters γ 1.4059 2.5693 (0.2282) (0.1980) ξ 2.9002 0 (0.2599) α -0.0324 -0.0050 (0.0048) (0.0004) ImpliedRiskPremiaParameters µ 0.0485 -0.0182 -0.0028 0 (0.0202) µ 1.0393 3.0324 2.5693 1 (1.2862) λ 0.0172 0.0172 0.0027 0 (0.0054) λ -2.3297 -2.3227 -1.4410 1 (0.5833) P-Parameters κ 3.4423 3.4361 2.5549 (0.5790) (0.1714) (0.1165) θ 0.0334 0.0334 0.0393 (0.0072) (0.0015) (0.0018) σ 0.7272 0.7273 0.7274 (0.0083) (0.0083) (0.0083) ρ -0.7711 -0.7711 -0.7711 (0.0039) (0.0039) (0.0039) η -0.0047 -0.0047 -0.0047 0 (0.0002) (0.0002) (0.0002) η 0.8880 0.8880 0.8881 1 (0.0059) (0.0060) (0.0059) Q-Parameters κ∗ 1.1126 1.1134 1.1139 θ∗ 0.0878 0.0877 0.0877 ModelFit logLR 54,449.61 54,443.71 54,443.03 logLO 156,577.08 156,578.00 156,578.03 logLR+logLO 211,026.69 211,021.71 211,021.06 Vega-WeightedOptionPriceRMSE 0.0197 0.0197 0.0197 Notes: This table reports the results of joint maximum likelihood estimation. The risk preference parameters γ, ξ, and α as well as the P-parameters are estimated; the other parameters are implied by the respective restrictions. Robust standard errors are in parentheses. We report on the affine price of risk (POR) specification in column (8), the more general pricing kernel (PK) from equation (16) in column (9), and the more general PK subject to ξ = 0 in column (10). 48

nosirapmoC ledoM :5 elbaT seulaV-P :tseT oitaR doohilekiL .A lenaP )61( .qE ni KP ROP enffiA enffiA-laitnenopxE 0=ξ enoN enoN 0=η 0=ξ= γ 0=ξ 0= γ 0=λ 0=µ enoN snoitcirtseR )01( )9( )8( )7( )6( )5( )4( )3( )2( )1( 2681.0 1230.0 9373.0 3383.0 5310.0 7310.0 )1( htiw nosirapmoC 6300.0 6100.0 6600.0 6100.0 3010.0 5010.0 9000.0 9000.0 3500.0 )8( htiw nosirapmoC 2452.0 8123.0 2060.0 2915.0 3725.0 5630.0 9630.0 8074.0 )9( htiw nosirapmoC snoitacilpmI cimonocE .B lenaP 2380.0 3380.0 3380.0 9260.0 0 2380.0 3460.0 9810.0 0 2380.0 muimerP ytiuqE suoenatnatsnI 6540.0- 6060.0- 7060.0- 8030.0- 0 6640.0- 6060.0- 0 0410.0- 6060.0muimerP ecnairaV suoenatnatsnI 8404.0 6473.0 3225.0 1113.0 0 1114.0 5713.0 7390.0 0 1114.0 oitaR eprahS dezilaunnA .)9( dna ,)8( ,)1( snmuloc ni noitacfiiceps eht tsniaga stset oitaR doohilekiL eht rof seulav-P eht stroper A lenaP :setoN rebmun eht ot lauqe modeerf fo seerged htiw erauqs-ihc detubirtsid si doohilekil-gol eht ni ecnereffid eht semit owt suniM eht dna aimerp ksir ecnairav dna ytiuqe suoenatnatsni deilpmi eht rof snaem elpmas eht stneserp B lenaP .snoitcirtser fo .oitar eprahS dezilaunna 49

Figure 1: Instantaneous Variance, 1990 - 2019 Notes: We plot the time series of the stochastic variance from the option-based estimation and the difference between the return-based and the option-based variance. The return-based variance is computedasv(t) = η +η VIX2(t),theoption-basedvarianceiscomputedasv∗(t) = η∗+η∗VIX2(t), 0 1 0 1 and the difference is computed as v(t)−v∗(t), where η , η , η∗, and η∗ are from Table 2. 0 1 0 1 50

Figure 2: The Log Pricing Kernel as a Function of Log Return and Variance Notes: We plot the multivariate and univariate relation between the log pricing kernel, the log stock return (logR(t)), and the daily change in variance (v(t)−v(t−1)). The results are based on the exponential-affine pricing kernel, using the parameters in column (1) of Table 3. 51

Figure 3: Exponentially-Affine Log Pricing Kernels Daily (Realized) Log Pricing Kernels 1990-2019 Implied Six-Month Log Marginal Pricing Kernels Panel A: Independent P & Q Estimates Panel B: Unrestricted Exponential-Affine Panel C: Exponential-Affine with γ =0 Panel D: Exponential-Affine with ξ =0 (no independent variance premium) Panel E: Exponential-Affine with η =0 Notes: We plot the time series of the daily log pricing kernels and the implied 6-month log marginal pricing kernels for the following five specifications: independent estimation of the Pand Q-parameters (Panel A), the unrestricted exponential-affine (Panel B), the exponential-affine specifications with a restriction γ = 0 (Panel C), ξ = 0 (Panel D), and η = 0 (Panel E). Parameter values for Panel A are from Table 2, and those for Panels B, C, D, and E are from Table 3. For the implied 6-month kernels, the x-axis represents the log return in deviation from its mean and divided by its standard deviation. 52

Figure 4: Other Log Pricing Kernels Daily (Realized) Log Pricing Kernels 1990-2019 Implied Six-Month Log Marginal Pricing Kernels Panel A: Affine Price of Risk Panel B: Unrestricted PK from Eq. (16) Panel C: PK from Eq. (16) with ξ =0 Panel D: PK from Eq. (16) with the Feller No-Arbitrage Conditions Imposed Notes: Weplotthetimeseriesofthedailylogpricingkernelsandtheimplied6-monthlogmarginal pricing kernels for the following four specifications: the affine price of risk specification (Panel A), the unrestricted pricing kernel from equation (16) (Panel B), the kernel from equation (16) with restriction ξ = 0 imposed (Panel C), and the pricing kernel from equation (16) with the Feller condition imposed (Panel D). Parameter values for Panels A, B, and C are from Table 4, and those for Panel D are from Table A.1. For the implied 6-month kernels, the x-axis represents the log return in deviation from its mean and divided b5y3its standard deviation.

Figure 5: Cumulative Log Pricing Kernel Components 1990 - 2019 Notes: We plot the cumulative (over time) logarithm of the exponential-affine pricing kernel based on the estimates in column (1) of Table 3. This figure contains the entire pricing kernel and the components of the pricing kernel associated with the index price and its instantaneous volatility. 54

Online Appendix A The Pricing Kernel and Risk Premia Given the pricing kernel in equation (11) or (13), we use the fact that U(t)M(t) is a martingale for any asset U to calculate risk premia. Applying Ito’s lemma to the martingale condition E[d(U(t)M(t))] = 0 shows that expected returns are determined by their covariance with the pricing kernel: (cid:20) (cid:21) (cid:20) (cid:21) (cid:20) (cid:21) E[d(U(t)M(t))] dU(t) dM(t) dU(t) dM(t) = E +E +Cov , = 0. (A.1) U(t)M(t) U(t) M(t) U(t) M(t) Inserting a bond price U(t) = exp(−r(T −t)) into equation (A.1) provides an expression for the interest rate (cid:20) (cid:21) dM(t) E = −rdt. M(t) Ito’s lemma then identifies the equity and variance risk premia in terms of their covariances with the pricing kernel: (cid:20) (cid:21) (cid:20) (cid:21) dS(t) dS(t) dM(t) E −rdt = −Cov , , (A.2) S(t) S(t) M(t) (cid:20) (cid:21) dM(t) E[dv(t)]−E∗[dv(t)] = −Cov dv(t), . (A.3) M(t) Using equations (A.2) and (A.3), it can be shown that the pricing kernel with the completely affine prices of risk in equation (11) is consistent with a completely affine equity risk premium of µv(t) and variance risk premium of λv(t), where µ = π +ρπ and λ = ρσπ +σπ . (A.4) 1 2 1 2 Likewise, the pricing kernel with the affine prices of risk in equation (13) is consistent with affine 55

risk premiums µ +µ v(t) and λ +λ v(t), where 0 1 0 1 µ = π +ρπ , µ = π +ρπ , λ = ρσπ +σπ , λ = ρσπ +σπ . (A.5) 0 1,0 2,0 1 1,1 2,1 0 1,0 2,0 1 1,1 2,1 B No-Arbitrage Restrictions for the More General Pricing Kernel We use the risk-neutral dynamics in equation (1) and physical dynamics in equation (B.1). (cid:20) (cid:21) 1 (cid:112) dlogS(t) = r− v(t)+µ +µ v(t) dt+ v(t)dz (t) (B.1) 0 1 1 2 (cid:112) dv(t) = κ(θ−v(t))dt+σ v(t)dz (t) 2 Applying Ito’s lemma to equation (16) gives dlogM = −γdlogS +αdlogv+βdt+η(v)dt+ξdv (cid:20) (cid:18) (cid:19) (cid:18) (cid:19)(cid:21) 1 α 1 = −γ r+µ +µ v− v +β+η(v)+ξκ(θ−v)+ κ(θ−v)− σ2 dt 0 1 2 v 2 √ (cid:18) √ ασ (cid:19) − γ vdz + ξσ v+ √ dz , 1 2 v where we use the fact that dlogv = v 1 (cid:0) κ(θ−v)− 1 2 σ2(cid:1) dt+ √σ v dz 2 . Again by Ito’s lemma, we have (cid:20) (cid:18) (cid:19) (cid:18) (cid:19) dM 1 α 1 = −γ r+µ +µ v− v +β +η(v)+ξκ(θ−v)+ κ(θ−v)− σ2 0 1 M 2 v 2 1 (cid:18) α2σ2(cid:19) (cid:21) + γ2v+ξ2σ2v+ −γξσρv−γασρ+ξασ2 dt 2 v √ (cid:18) √ ασ (cid:19) − γ vdz + ξσ v+ √ dz . (B.2) 1 2 v The drift of M should be equal to −rMdt. By rearranging the drift term in equation (B.2), [r−γ(r+µ )+ξκθ−ακ−γασρ+ξασ2]+β 0 (cid:20) (cid:18) (cid:19) (cid:21) (cid:20) (cid:21) + −γ µ − 1 −ξκ+ 1 (cid:0) γ2+ξ2σ2−2γξσρ (cid:1) v+ ακθ− 1 ασ2+ 1 α2σ2 1 +η(v) = 0. 1 2 2 2 2 v 56

Thus, the time-preference and the stochastic path-dependence terms are expressed as β = −(1−γ)r+γµ −ξκθ+ακ+γασρ−ξασ2 0 (cid:20)(cid:18) (cid:19) (cid:21) (cid:20) (cid:21) η(v) = µ − 1 γ +ξκ− 1 (cid:0) γ2−2γξσρ+ξ2σ2(cid:1) v− κθ− 1 σ2+ 1 ασ2 α . 1 2 2 2 2 v Moreover, by equations (A.5) and (B.2), we obtain the risk premia expressed as µ +µ v(t) and 0 1 λ +λ v(t), where 0 1 µ = −ασρ, µ = γ −ρσξ, λ = −ασ2, λ = γρσ−ξσ2. 0 1 0 1 The physical dynamics in equation (B.1) are restricted according to κ = κ∗−λ and θ = (κ∗θ∗+λ )/κ. 1 0 C No-Arbitrage Restrictions for the Exponential-Affine Pricing Kernel Applying Ito’s lemma to equation (20) gives dlogM = −γdlogS +βdt+ηvdt+ξdv (cid:20) (cid:18) 1 (cid:19) (cid:21) √ √ = −γ r+µv− v +β +ηv+ξκ(θ−v) dt−γ vdz +ξσ vdz , 1 2 2 where we drop the time-t dependence (t) of M, S, v, z , and z for notational convenience. Again 1 2 by Ito’s lemma, we get (cid:20) (cid:18) (cid:19) (cid:21) dM 1 1 1 = −γ r+µv− v +β+ηv+ξκ(θ−v)+ γ2v−γξσρv+ ξ2σ2v dt M 2 2 2 √ √ − γ vdz +ξσ vdz . (C.1) 1 2 In order to find the restrictions on β and η, we use the fact that ertM(t) is a martingale – the drift 57

term of d (cid:0) ertM(t) (cid:1) must be zero. That is, from equation (C.1), (cid:18) (cid:19) 1 1 1 r−γ r+µv− v +β+ηv+ξκ(θ−v)+ γ2v−γξσρv+ ξ2σ2v = 0. 2 2 2 By rearranging this equation, (cid:20) (cid:21) [β +(1−γ)r+ξκθ]+ η−γµ+ 1 γ −ξκ+ 1 (cid:0) γ2−2γξσρ+ξ2σ2(cid:1) v = 0. (C.2) 2 2 (cid:124) (cid:123)(cid:122) (cid:125) (A) (cid:124) (cid:123)(cid:122) (cid:125) (B) Since equation (C.2) must hold for any values of v, it implies that (A) and (B) must be zero. Therefore, we have the following restrictions on β and η: β = −(1−γ)r−ξκθ η = γµ− 1 γ +ξκ− 1 (cid:0) γ2−2γξσρ+ξ2σ2(cid:1) . 2 2 Furthermore, by equations (A.4) and (C.1), we find the expression for the equity and variance risk premium parameters µ and λ as functions of the preference parameters: µ = γ −ρσξ and λ = γρσ−ξσ2. We can now deduce the relations between the physical and risk-neutral parameters of the variance dynamics. With the variance risk premium λv, the risk-neutral dynamics of variance should be (cid:112) dv(t) = [κ(θ−v(t))−λv(t)]dt+σ v(t)dz∗(t). (C.3) 2 Comparing equation (1) with (C.3) implies the following restrictions: κ = κ∗−λ and θ = κ∗θ∗/κ. 58

D Parameter Restrictions in the Joint Option and Returns Likelihood The following table summarizes the restrictions used when estimating the joint likelihoods in Tables 3 and 4. As mentioned above, for the case in column (8) there are no restrictions. We implement these cases by letting either γ or ξ be a free parameter, while the other one is implied by the restriction(s). The resulting mapping between γ and ξ is reported in the last column. Pricing Kernel Restriction Free parameter Fixed parameter (2) µ = 0 ξ γ = ξσρ (3) λ = 0 ξ γ = σξ/ρ (4) γ = 0 ξ γ = 0 (5) ξ = 0 γ ξ = 0 (6) γ = ξ = 0 None γ = ξ = 0 √ (1+2ρσξ)± (1+2ρσξ)2−4((σξ)2+2κ∗ξ) (7) η = 0 ξ γ = 2 (10) ξ = 0 γ, α ξ = 0 E A Closed-Form Expression for the Joint Probability Distribution Unfortunately it is not straightforward to compute the marginal pricing kernel for the case where the physical and risk-neutral parameters are independently estimated or when the affine price of risk specification is employed in the joint estimation. The pricing kernel can obviously not be computed directly for these two cases. Instead we calculate the daily log pricing kernel every day as follows: logM(t+∆)−logM(t) = −r∆+logQ(logR(t+∆),v(t+∆)|v(t))−logP(logR(t+∆),v(t+∆)|v(t)), (E.1) where ∆ = 1/252. Note that Q(logR(t+∆),v(t+∆)|v(t)) and P(logR(t+∆),v(t+∆)|v(t)) are the risk-neutral and physical joint probability distributions of logR(t+∆) and v(t+∆) conditional on v(t). In this implementation, we have to use the joint probability rather than the marginal 59

probability of the log return described in equation (29), because we need to use the realization of bothlogR(t+∆)andv(t+∆).25 Wecanalsofindaquasiclosedformexpression(uptonumerically computing double integrals for the joint probability). To find the expression for the joint probability distribution of the log stock return and variance, we first find the joint characteristic function and then apply the inverse Fourier transform. Let gCH(ϕ ,ϕ |x(t),v(t)) denote the joint characteristic function of the log stock price (here denoted τ x v by x) and the variance (v). When x and v evolve according to (cid:112) dx(t) = [r+uv(t)]dt+ v(t)dz (t), 1 (cid:112) dv(t) = (a−bv)dt+σ v(t)dz (t), 2 with corr(z ,z ) = ρ, the characteristic function gCH(ϕ ,ϕ |x,v) must satisfy the following partial 1 2 τ x v differential equation: 1 ∂2g ∂2g 1 ∂2g ∂g ∂g ∂g v +ρσv + σ2v +(r+uv) +(a−bv) + = 0. (E.2) 2 ∂g2 ∂x∂v 2 ∂v2 ∂x ∂v ∂t See Heston (1993) for more details. Since g is the joint characteristic function, the terminal condition of the PDE is gCH(ϕ ,ϕ |x,v) = eiϕxx+iϕvv. (E.3) 0 x v Suppose g has the following functional form: gCH(ϕ ,ϕ |x,v) = eG(τ)+H(τ)v+iϕxx. (E.4) τ x v By substituting equation (E.4) into equation (E.2), we get the following ordinary differential equa- 25Themarginalprobabilityofthelogreturnistheexpectationofthejointprobabilitywithrespecttothevariance v(t+∆). 60

tions (ODEs) for G(τ) and H(τ): G′(τ) = rϕ i+aH(τ), x 1 1 H′(τ) = − ϕ2 +ρσϕ iH(τ)+ σ2H(τ)2+uϕ i−bH(τ), (E.5) 2 x x 2 x andtheterminalconditionsoftheODEsareG(0) = 0andH(0) = iϕ inferredfromequation(E.3). v This system of ODEs expressed in equation (E.5) has the following closed-form solution: A (D X −iA Y)ϕ −2iA (−D −2A ϕ +(Y −1)(D −2A ϕ )) 2 m 2 v 1 m 3 v m 3 v H(τ) = , iA D X −A2Y +4A A Y −2A D Xϕ 2 m 2 1 3 3 m v G(τ) = rϕ iτ x 1 (cid:20) 2iaτ(A2−4A A ) (cid:18) A A X2ϕ (cid:19) + −2aA τ − 2 1 2 +2iaarctan 2 3 v 4A 2 D A2(Y −1)−A A Y2+A2X2+ϕ2 3 m 2 1 3 3 v 4aD (cid:18) iA2+2A A Xϕ −2iA (A Y −A Xϕ2) (cid:19) + p arctan 2 2 3 v 3 1 3 v D D (A +2iA ϕ ) m p 2 3 v (cid:18) (cid:19) 4iaD D p p − arctanh +alog(D ) p D A +2iA ϕ m 2 3 v (cid:21) − alog (cid:0) A4(Z −1)+A2A2Y4+A2A2X2Zϕ2 +A4X4ϕ4 −2A A Y2(A2(Y −1)+A2X2ϕ2) (cid:1) , 2 1 3 2 3 v 3 v 1 3 2 3 v where A = −1ϕ2 +uϕ i, A = ρσϕ i−b, A = 1σ2 1 2 x x 2 x 3 2 (cid:112) (cid:112) D = −A2+4A A , D = A2−4A A m 2 1 3 p 2 1 3 X = −1+eiDmτ, Y = 1+eiDmτ, Z = 1+e2iDmτ. To find the joint probability distribution function of x(t+τ) and v(t+τ), we apply the inverse Fourier transform to the characteristic function. That is, 1 (cid:90) ∞ (cid:90) ∞ Pr(x,v|x(t),v(t)) = e−iϕxx−iϕvvgCH(ϕ ,ϕ |x(t),v(t))dϕ dϕ (E.6) 4π2 τ x v x v −∞ −∞ To simplify the notation, we normalize the stock price. By letting the log stock price at time t be 61

x(t) = 0, x(t+τ) represents the τ-horizon log return. Then, equation (E.6) can be written as 1 (cid:90) ∞ (cid:90) ∞ Pr(x,v|v(t)) = e−iϕxx−iϕvv+G(τ|ϕx,ϕv)+H(τ|ϕx,ϕv)v(t)dϕ dϕ . (E.7) 4π2 x v −∞ −∞ Numerical integration of equation (E.7) is not tractable because the exponential function to be integrated decays very slowly, especially over the ϕ dimension. To expedite its calculation, we use v the following weight function for the j-th grid point of ϕ , w : v j  (cid:18) (cid:19) w =    1 2 erfc −√ N j v/2 − (cid:112) N v /2 if j < 0 j (cid:18) (cid:19)    1 2 erfc √ N j v/2 − (cid:112) N v /2 if j ≥ 0, where2N isthenumberofgridpointsofϕ . Thisweightfunctionforcestheexponentialfunctionto v v decaymuchfaster(see,forexample,Ooura,2001).26 Hence,wenumericallycomputeequation(E.7) as 1 N (cid:88)v−1 N (cid:88)x−1 Pr(x,v|v(t)) = e−ixϕ x,k −ivϕv,j+G(τ|ϕ x,k ,ϕv,j)+H(τ|ϕ x,k ,ϕv,j)v(t)w ζ(ϕ ,ϕ ), 4π2 j x,k v,j j=−Nvk=−Nx where ζ(ϕ ,ϕ ) is an integration rule such as the trapezoidal or Gaussian quadrature method. x v 26Notethat this approximation may causean errorif thetrue probability isnear zero. This canhappenwhen the size of x(t+τ) or v(t+τ)−v(t) is very large. We find that the approximation works well for our application. 62

Figure A.1: Implied One-Month Log Pricing Kernels Panel A Panel B Panel C Panel D Notes: We plot the implied one-month log pricing kernels for the following four specifications: The unrestricted exponential-affine (Panel A), the exponential-affine specification with the restriction ξ = 0 (Panel B), the kernel implied by independent estimation of the P- and Q-parameters (Panel C), and the affine price-of-risk specification (Panel D). Parameter values for Panels A and B are from Table 3, parameter values for Panel C are from Table 2, and parameter values for Panel D are from Table 4. The x-axis represents the log return in deviation from its mean and divided by its standard deviation. 63

Table A.1: Joint MLE Estimation 1990-2019. Feller Conditions Imposed PanelA.ParameterEstimates FellerConditionImposed AffinePOR PKinEq.(16) (1) (2) RiskPreferenceParameters γ -1.1661 (0.1140) ξ 8.4911 (0.2829) α 0.0000 (0.0000) ImpliedRiskPremiaParameters µ0 0.0294 0.0000 (0.0265) µ1 1.6372 2.4652 (1.4750) λ0 0.0000 0.0000 (0.0000) λ1 -1.6270 -1.6555 (0.0406) P-Parameters κ 3.9022 3.9316 (0.0294) (0.0303) θ 0.0325 0.0323 (0.0003) (0.0003) σ 0.5036 0.5037 (0.0073) (0.0074) ρ -0.8491 -0.8491 (0.0050) (0.0053) η0 -0.0064 -0.0064 (0.0002) (0.0002) η1 0.9060 0.9060 (0.0059) (0.0060) Q-Parameters κ∗ 2.2752 2.2761 θ∗ 0.0557 0.0557 ModelFit logLR 53,970.97 53,970.58 logLO 149,289.28 149,287.78 logLR+logLO 203,260.25 203,258.36 Vega-WeightedOptionPriceRMSE 0.0222 0.0222 PanelB.EconomicImplications InstantaneousEquityPremium 0.0828 0.0803 InstantaneousVariancePremium -0.0530 -0.0539 AnnualizedSharpeRatio 0.4897 0.3976 Notes: This table reports the results of joint maximum likelihood estimation. The risk preference parameters γ, ξ, and α as well as the P-parameters are estimated; the other parameters are implied by the respective restrictions. Robust standard errors are in parentheses. Column (1) reports on the affine price of risk (POR) specification and column (2) reports on the general pricing kernel (PK) from equation (16). The Feller condition is imposed under both the P- and Q-measures. 64

Cite this document
APA
Steven Heston, Kris Jacobs, & Hyung Joo Kim (2023). The Pricing Kernel in Options (FEDS 2023-053). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2023-053
BibTeX
@techreport{wtfs_feds_2023_053,
  author = {Steven Heston and Kris Jacobs and Hyung Joo Kim},
  title = {The Pricing Kernel in Options},
  type = {Finance and Economics Discussion Series},
  number = {2023-053},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2023},
  url = {https://whenthefedspeaks.com/doc/feds_2023-053},
  abstract = {The empirical option valuation literature specifies the pricing kernel through the price of risk, or defines it implicitly as the ratio of risk-neutral and physical probabilities. Instead, we extend the economically appealing Rubinstein-Brennan kernels to a dynamic framework that allows pathand volatility-dependence. Because of low statistical power, kernels with different economic properties can produce similar overall option fit, even when they imply cross-sectional pricing anomalies and implausible risk premiums. Imposing parsimonious economic restrictions such as monotonicity and path-independence (recovery theory) achieves good option fit and reasonable estimates of equity and variance risk premiums, while resolving pricing kernel anomalies.},
}