feds · August 4, 2025

Discussion of “Dynamic Causal Effects in a Nonlinear World: the Good, the Bad, and the Ugly”

Abstract

This comment discusses Kolesár and Plagborg-Møller's (2025) finding that the standard linear local projection (LP) estimator recovers the average marginal effect (AME) even in nonlinear settings. We apply and discuss a subset their results using a simple nonlinear time series model, emphasizing the role of the weighting function and the impact of nonlinearities on small-sample properties.

Finance and Economics Discussion Series Federal Reserve Board, Washington, D.C. ISSN 1936-2854 (Print) ISSN 2767-3898 (Online) Discussion of “Dynamic Causal Effects in a Nonlinear World: the Good, the Bad, and the Ugly” Edward P. Herbst, Benjamin K. Johannsen 2025-058 Please cite this paper as: Herbst, Edward P., and Benjamin K. Johannsen (2025). “Discussion of “Dynamic Causal Effects in a Nonlinear World: the Good, the Bad, and the Ugly”,” Finance and Economics DiscussionSeries2025-058. Washington: BoardofGovernorsoftheFederalReserveSystem, https://doi.org/10.17016/FEDS.2025.058. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Discussion of “Dynamic Causal Effects in a Nonlinear World: the Good, the Bad, and the Ugly” Edward P. Herbst and Benjamin K. Johannsen∗ March 2025 Abstract This comment discusses Kolesár and Plagbord-Møller (2025) finding that the standard linear local projection (LP) estimator recovers the average marginal effect (AME) even in nonlinear settings. We apply and discuss a subset their results using a simple nonlinear time series model, emphasizing the role of the weighting function and the impact of nonlinearities on small-sample properties. 1 Introduction Kolesár and Plagbord-Møller (2025) (hereafter, KP) is an exciting, important advance in the literature on the estimation of dynamic causal effects in the context of local projections (LPs) (see Jordà (2005)). The paper establishes that the “standard” linear LP of an outcome y onto a shock x (and possibly a vector of controls) estimates an average marginal effect t+h t (AME) of the shock on the outcome. This result holds under suitable assumptions even— and perhaps especially—in the case of a nonlinear data generating process for y . Deriving t the result requires connecting and extending a large literature in microeconometrics. This comment aims to provide an accessible discussion of some of the results reported in KP that is tailored to macroeconomists. We begin by considering some of the theoretical results in KP under common assumptions in the macroeconomics literature. We devote particular attention to the weighting function, ω, that is used to compute the average in the AME. We then analyze the AME and its LP estimation in the context of the quadratic autoregressive model (QAR(1,1)) model of Aruoba et al. (2017). This is a stationary, nonlinear ∗We thank Cristina Scofield for excellent research assistance. The views expressed here are those of the authors and do not indicate concurrence by the Board of Governors or anyone else associated with the Federal Reserve System. Herbst: Federal Reserve Board, edward.p.herbst@frb.gov, Johannsen: Federal Reserve Board. 1

timeseriesmodeldesignedtomimicthestatisticalstructureofasecond-orderapproximation to the solution of a dynamic stochastic general equilibrium (DSGE) model. In the context of the QAR(1,1) model, we relate the population AME to a population nonlinear impulse response function (NIRF) defined in Koop et al. (1996). We also discuss small-sample properties of the LP estimator of the AME with a focus on how nonlinearities in the QAR(1,1) model affect those properties. 2 What does the standard LP estimate? In this section, we discuss the LP estimator of the AME. To establish notation, let y be t+h the observed outcome of interest at time t+h, and let x be the observed shock of interest t at time t. Collect all other variables that determine y into a vector U , which may t+h h,t+h include past values of y , past (and future) values of x , and other controls. We require that t t the vector U is independent of x . A representation of y based on x and U is h,t+h t t+h t h,t+h called the structural function and is given by y = ψ (x ,U ). (1) t+h h t h,t+h The representation that is used to define the notion of dynamic causal effect used in KP is the average structural function, which is given by Ψ (x ) = E[ψ (x ,U )] = E[y |x ] = g (x ). (2) h t h t h,t+h t+h t h t This function describes the expected outcome y given a specific value of the shock x , t+h t integrating out all other sources of randomness. Note that because we have assumed that x is independent of all other factors affecting y , the average structural function is equal t t+h to—and hence can be recovered from—the conditional expectation of y given x . This t+h t quantity can in principle be estimated from the data. In macroeconomics, it can be difficult to estimate the average structural function due to small sample sizes. One approach is to impose strong assumptions about the data-generating process for y . For example, a researcher could assume that y follows an AR(1) process. t t Another option in nonlinear time series analysis is to estimate the AME, defined as (cid:90) θ (ω) = ω(x )Ψ′ (x )dx . (3) h t h t t Here Ψ′ (x ) represents the derivative of the average structural function. This derivative h t captures the effect of an infinitesimal change in x on y . The weighting function, ω(x ), t t+h t 2

determines how different values of x contribute to the AME, defining the sense in which the t AME is an average. KP study local projections that are indexed by h and given by y = β x +γ′w +e . (4) t+h h t h t h,t+h Here, β is a parameter, w is a vector of controls, γ is a vector of parameters, and e h t h h,t+h is an error term. If x and w have zero covariance, then under standard assumptions β is t t h given by Cov[g (x ),x ] h t t β = , (5) h Var[x ] t where g is as defined in equation (2). KP show that h (cid:90) β = ω(x )g′(x )dx , (6) h t h t t where the weighting function, ω(x ), is given by t Cov[1{X > x },X] t ω(x ) = . (7) t Var[X] Here, x is a realization of the random variable X and 1{·} is the indicator function. t 2.1 The weighting function As KP explain, the function ω(x ) has several desirable properties that make it suitable for t (cid:82)1 computing an average. In particular, ω(x ) ≥ 0 for all x and ω(x )dx = 1. From the t t 0 t t properties of the indicator and covariance functions in equation (7), lim ω(x ) = 0 xt→−∞ t and ω(x ) is weakly increasing for x < E[X]. Additionally, lim ω(x ) = 0 and ω(x ) t t xt→∞ t t is weakly decreasing for x ≥ E[X]. Taken together, these facts imply that ω(x) is humpt shaped and that it gives most of the weight to values of x near the mean of X. As discussed t in the paper, if X follows a Normal distribution, then ω(x ) = ϕ(x ) where ϕ(·) denotes the t t (appropriately parameterized) probability density function of the Normal distribution. It is worth emphasizing that ω(x ) depends only on the properties of the random variable t X. It does not depend on the outcome y , on the distribution of other random variables t+h that may go into the construction of y , or on nonlinear dependence of y on past t+h t+h realizations of y . t We can also deduce some additional properties of ω if we assume that X has a continuous 3

density function with full support on the real line, even if X is not Normally distributed. These are common assumptions about identified shocks in the macroeconomics literature. For simplicity, set E[X] = 0 and Var[X] = 1 so that ω(x) = E[1{X > x}X]. (8) Wewillfocusoncontrastingω withf , thedensityfunctionassociatedwithX, andthus, the X AME with the expected marginal effect (EME). The EME is given by θ(f ), which uses X the true probability density function f (x ) as the weight function. Here are two additional X t properties of ω not shown in the paper. Claim. If the distribution of X has heavy tails in the sense that as |x | → ∞, for some t α > 2 and C > 0 f (x ) f (x ) X t X t lim = 1 and lim = 1 C C xt→∞ x1+α xt→−∞ x1+α t t then ω(x ) has heavier tails than f (x ). t X t To see this, for large x , substitute the tail approximation into the definition of ω t (cid:90) ∞ C (cid:90) ∞ x−(α−1) ω(x) ≈ t dt = C t−αdt = C . t1+α α−1 x x We deduce that ω(x) x2 ≈ . f (x) α−1 X For sufficiently large x this object is greater than 1. Claim. Assume that X has finite moments of order j+2 and lim ω(x )xj+1 = 0. Then xt→∞ t t (cid:90) ∞ 1 xjω(x)dx = E[Xj+2]. j +1 −∞ The result of this claim follows from integration by parts. In summary, when X is normally distributed, the LP estimator aligns exactly with the EME, making ω(x) = f (x) a convenient theoretical benchmark. For distributions with X skewness or heavy tails, the AME estimated by the local projection will differ from the EME, potentially biasing the interpretation of results. However, this difference also reflects the robustness of the local projection estimator in capturing effects relevant in the tails of the distribution. To illustrate the weighting function, Figure 1 shows f and ω for four distributions. All X of the distributions are normalized to have mean zero, so ω peaks at zero in each panel. The 4

Figure 1: ω(x) and f (x) for four distributions X 0.4 0.3 0.2 0.1 0.0 5.0 2.5 0.0 2.5 5.0 x eulaV Standard Normal 0.6 f X(x) (x) 0.4 0.2 0.0 5.0 2.5 0.0 2.5 5.0 x eulaV Student's t, =3.1 0.6 0.4 0.2 0.0 5.0 2.5 0.0 2.5 5.0 x eulaV Laplace 0.4 0.3 0.2 0.1 0.0 5.0 2.5 0.0 2.5 5.0 x eulaV Skew-Normal, =5 Source: Authors’ calculations. 5

upper-left panel shows that f and ω coincide for the standard Normal distribution. The X upper-right panel and the lower-left panel show that if f has heavy tails, then ω has heavier X tails. The lower-right panel shows that if f is a skew distribution, then f and ω do not X X peak at the same point. 3 AME in Action: Quadratic Autoregressive Model In this section we analyze the AME and its estimation using a simple nonlinear, time series model—the first-order quadratic autoregressive model or QAR(1,1) model—that was developed in Aruoba et al. (2017). This class of time series models was developed to identify nonlinearities in macroeconomic data and to evaluate DSGE models. This model is particularly useful because it nests the familiar AR(1) model, which is a common benchmark. The QAR(1,1) is given by y = ϕ +ϕ (y −ϕ )+ϕ s2 +(1+γs )σx (9) t 0 1 t−1 0 2 t−1 t−1 t (cid:113) s = ϕ s + 1−ϕ2x . (10) t 1 t−1 1 t Here, y is the observed scalar variable of interest, s is an unobserved state variable, and x t t t is an observed shock with mean zero and variance unity. We assume that the third moment of x is finite. The scalar parameters ϕ and γ control the degree of nonlinearity in the t 2 model. Very roughly speaking, γ is associated with conditional heteroskedasticity in y , and t ϕ is associated with asymmetry and more general state dependence. When ϕ = γ = 0, the 2 2 model collapses to the AR(1) model. 3.1 The NIRF and the AME in the QAR(1,1) As KP note, in a nonlinear time series setting there are a variety of notions of impulse response. Here, we briefly touch on a nonlinear impulse response function (NIRF)—see Koop et al. (1996)—as a definition familiar to most macroeconomists, and later we describe its connection to the objects studied in KP. In the context of the QAR(1,1) model, the NIRF is given by NIRF(h,x ,y ,s ) = E[y |x ,y ,s ]−E[y |y ,s ]. t t−1 t−1 t+h t t−1 t−1 t+h t−1 t−1 6

This object can be expressed as 1−ϕh NIRF(h,x ,y ,s ) = ϕh(1+γs )σx +ϕ ϕh−1 1δ(s ,x ), (11) t t−1 t−1 1 t−1 t 2 1 1−ϕ t−1 t 1 (cid:113) where δ(s ,x ) = 2ϕ s 1−ϕ2x +(1−ϕ2)(x2 −1). t−1 t 1 t−1 1 t 1 t Notice that the NIRF depends on s but not on y . That is, the NIRF is state t−1 t−1 dependent and the relevant state is s . The analytical expression for the NIRF makes t−1 clear that the model is asymmetric in the sense that NIRF(h,x ,y ,s )+NIRF(h,−x ,y ,s ) ̸= 0. t t−1 t−1 t t−1 t−1 That is, adding the NIRF from a positive shock to the NIRF from a negative shock of the same size does not equal zero. Allowing for asymmetry is important for analyzing the transmission of macroeconomic shocks—see, for example, Kilian and Vigfusson (2011). Additionally, the analytical expression for the NIRF makes clear that the model displays heteroskedasticity in the sense that if x ̸= 0 and κ ̸= 1 then t NIRF(h,κx ,y ,s )−κNIRF(h,x ,y ,s ) ̸= 0. t t−1 t−1 t t−1 t−1 That is, the NIRF is not homogeneous of degree one in x . Interestingly, these nonlinearities t do not depend on the level of s even though the NIRF is affected (linearly) by s . t−1 t−1 Recall that KP focus on the representation of y given by equation (1), where y = t+h t+h ψ(x ,U ), and that the variables in the vector U are independent of x . Taking t h,t+h h,t+h t expectations over U , we are left with the average structural function h,t+h Ψ (x ) = E[ψ (x ,U )] = E[y |x ]. h t h t t t+h t This concept is different from the NIRF, in that Ψ averages both future and past shocks, h while the NIRF defined above explicitly conditions on past information. That is, unlike the NIRF, Ψ does not feature any state dependence on s . A motivation for focusing on Ψ h t−1 h instead of the NIRF is that in most applications, such as the setup for the QAR(1,1) model that we consider here, s is unobserved. t−1 Although distinct, Ψ and the NIRF are related in that h Ψ (x ) = E[NIRF(h,x ,y ,s )|x ]+E[y ]. h t t t−1 t−1 t t 7

From the parametric expression for the NIRF in the QAR(1,1) model, we then have that ϕ Ψ (x ) = ϕ + 2 +ϕhσx +ϕ ϕh−1(1−ϕh)(1+ϕ )(x2 −1). (12) h t 0 1−ϕ 1 t 2 1 1 1 t 1 Notice that the asymmetry and heteroskedasticity of the model are apparent from Ψ . Howh ever, an indication of the information lost in Ψ relative to the NIRF because of the averh aging is that γ does not affect Ψ even though it contributes importantly to the nonlinearity h in the model and appears in the NIRF. KP focus on the estimation of the AME given in equation (3). From the parametric expression for Ψ in the QAR(1,1) model h Ψ′ (x ) = ϕhσ +2ϕ ϕh−1(1−ϕh)(1+ϕ )x . (13) h t 1 2 1 1 1 t Applying the results related to ω(x ) when x is continuously distributed gives t t θ (ω) = ϕhσ +ϕ ϕh−1(1−ϕh)(1+ϕ )E[x3]. (14) h 1 2 1 1 1 t Here, we have used our second claim, discussed above. Notice that if x is symmetric then t θ(ω) = ϕhσ and the AME in the QAR(1,1) model is the same as in the AR(1) model. From 1 equation (14), it is also clear that when ϕ = 0 the AME is the same in these models even if 2 x is not symmetric. More generally, in a linear model—that is, a model in which the NIRF t is linear in x —Ψ is linear in x . In this case, if x follows a standard Normal distribution, t h t t then the NIRF for a one standard deviation shock and the AME are equivalent. 3.2 Estimation with small samples Here we analyze the estimation of the AME in the QAR(1,1) model. With an observed set of outcomes and shocks {y ,x }T , the estimation of the set of regressions given by t t t=1 equation (4) is straightforward. However, as emphasized in Herbst and Johannsen (2024), even in this idealized setting, finite sample issues can be important, particularly for sample sizes commonly seen in the macroeconomics literature. Here, we consider how nonlinearities interact with finite sample issues. We simulate the QAR(1,1) model 1,000,000 times with ϕ = 0, ϕ = 0.95, and σ = 1.1 We think of the 0 1 model as a quarterly model and use a sample size of 100, which Herbst and Johannsen (2024) argue is typical in the related macroeconomics literature. We vary ϕ and γ to see how different values—and their associated nonlinearities—affect 2 1We initialize y and s to zero and simulate forward. We begin our sample at y and s . −1000 −1000 0 0 8

the finite sample properties of the estimator of the AME. We specify the local projection as y = β x +γ′[1,y ]′ +e . t+h h t h t−1 h,t+h We focus on the case where h = 6, which is a relatively short horizon for an impulse response ˆ in macroeconomics. We denote the estimator of the AME by β . h As a baseline case, we use a standard Normal distribution for x . To understand the t effects of heavy tails or asymmetries in the distribution of x , we also consider the cases t where x has a (standardized) t distribution with ν = 3.1 degrees of freedom and when x t t has a (standardized) skew-Normal with skew parameter α = 5. Note that when x follows a t Normal distribution or a t distribution the AME in the QAR(1,1) model coincides with that in the AR(1) model with ϕ as the autoregressive parameter. When x has a skew-Normal 1 t distribution, the AME differs because of the non-zero third moment of x , unless ϕ = 0. t 2 ˆ We compare the small-sample average value of β to β (the bias). We know β in closed h h h form from the derivations above. The results are shown in Figure 2. When ϕ = 0 and γ = 0, 2 the QAR(1,1) model reduces to the AR(1) model. At that point on the graphs, the bias is the same for each of the three distributions. For different values of ϕ and γ, the bias of the 2 AME estimator depends on the distribution. Interestingly, introducing nonlinearities does not necessarily increase or decrease bias. That is, nonlinearities have unpredictable effects ˆ on the small-sample average of β h ˆ To further explore the small-sample properties of β , Figure 3 shows the root-meanh ˆ squared-error (RMSE) of β , and Figure 4 shows the coverage probability of nominal 95% h ˆ confidence intervals constructed using β and associated Huber-White standard errors. Noh tably, as the nonlinearities of the QAR(1,1) model increase (ϕ and γ increase in magnitude) 2 ˆ the RMSE grows. This increased volatility of β is not fully captured by the standard errors. h As a result, the coverage probabilities fall as nonlinearities of the QAR(1,1) model increase. WeconcludethatalthoughtheAME isrobusttoanarrayofnonlinearitiesinpopulation, those nonlinearities may have important implications for the small sample properties of estimators and associated test statistics. 9

Figure 2: Bias Estimates for the AME in the QAR(1,1) model with h = 6 and T = 100 0.05 0.00 0.05 0.10 0.15 0.20 1.0 0.5 0.0 0.5 1.0 2 saib =0 0.05 Standard Normal Student's t Skew Normal 0.00 0.05 0.10 0.15 0.20 1.0 0.5 0.0 0.5 1.0 saib 2=0 Source: Authors’ calculations. Figure 3: RMSE for the AME in the QAR(1,1) model with h = 6 and T = 100 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 1.0 0.5 0.0 0.5 1.0 2 ESMR =0 3.5 Normal Student's t 3.0 Skew Normal 2.5 2.0 1.5 1.0 0.5 0.0 1.0 0.5 0.0 0.5 1.0 ESMR 2=0 Source: Authors’ calculations. 10

Figure 4: Coverage of 95% CIs for the AME in the QAR(1,1) model with h = 6 and T = 100 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55 0.50 1.0 0.5 0.0 0.5 1.0 2 .borp egarevoc =0 0.95 Normal Student's t 0.90 Skew Normal 0.85 0.80 0.75 0.70 0.65 0.60 0.55 0.50 1.0 0.5 0.0 0.5 1.0 .borp egarevoc 2=0 Source: Authors’ calculations. References Aruoba, S. B., L. Bocola, and F. Schorfheide (2017): “Assessing DSGE model nonlinearities,” Journal of Economic Dynamics and Control, 83, 34–54. Herbst, E. P. and B. K. Johannsen (2024): “Bias in local projections,” Journal of Econometrics, 240. Jordà, O. (2005): “Estimation and Inference of Impulse Responses by Local Projections,” American Economic Review, 95, 161–182. Kilian, L. and R. J. Vigfusson (2011): “Are the responses of the U.S. economy asymmetric in energy price increases and decreases?” Quantitative Economics, 2, 419–453. Kolesár, M. and M. Plagbord-Møller(2025): “DynamicCausalEffectsinaNonlinear World: the Good, the Bad, and the Ugly,” Journal of Business Economics and Statistics. Koop, G., M. H. Pesaran, and S. M. Potter (1996): “Impulse response analysis in nonlinear multivariate models,” Journal of econometrics, 74, 119–147. 11

Cite this document
APA
Edward P. Herbst and Benjamin K. Johannsen (2025). Discussion of “Dynamic Causal Effects in a Nonlinear World: the Good, the Bad, and the Ugly” (FEDS 2025-058). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2025-058
BibTeX
@techreport{wtfs_feds_2025_058,
  author = {Edward P. Herbst and Benjamin K. Johannsen},
  title = {Discussion of “Dynamic Causal Effects in a Nonlinear World: the Good, the Bad, and the Ugly”},
  type = {Finance and Economics Discussion Series},
  number = {2025-058},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2025},
  url = {https://whenthefedspeaks.com/doc/feds_2025-058},
  abstract = {This comment discusses Kolesár and Plagborg-Møller's (2025) finding that the standard linear local projection (LP) estimator recovers the average marginal effect (AME) even in nonlinear settings. We apply and discuss a subset their results using a simple nonlinear time series model, emphasizing the role of the weighting function and the impact of nonlinearities on small-sample properties.},
}