feds · April 3, 2022

ivcrc: An Instrumental Variables Estimator for the Correlated Random Coefficients Model

Abstract

We discuss the ivcrc module, which implements an instrumental variables (IV) estimator for the linear correlated random coefficients (CRC) model. The CRC model is a natural generalization of the standard linear IV model that allows for endogenous, multivalued treatments and unobserved heterogeneity in treatment effects. The estimator implemented by ivcrc uses recent semiparametric identification results that allow for flexible functional forms and permit instruments that may be binary, discrete, or continuous. The ivcrc module also allows for the estimation of varying coefficients regressions, which are closely related in structure to the proposed IV estimator. We illustrate use of ivcrc by estimating the returns to education in the National Longitudinal Survey of Young Men. Accessible materials (.zip)

Finance and Economics Discussion Series Federal Reserve Board, Washington, D.C. ISSN 1936-2854 (Print) ISSN 2767-3898 (Online) ivcrc: An Instrumental Variables Estimator for the Correlated Random Coefficients Model David Benson, Matthew A. Masten, Alexander Torgovitsky 2020-046 Please cite this paper as: Benson, David, Matthew A. Masten, and Alexander Torgovitsky (2022). “ivcrc: An Instrumental Variables Estimator for the Correlated Random Coefficients Model,” Finance and Economics Discussion Series 2020-046r1. Washington: Board of Governors of the Federal Reserve System, https://doi.org/10.17016/FEDS.2020.046r1. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

ivcrc: An Instrumental Variables Estimator for the Correlated Random Coefficients Model David Benson∗ Matthew A. Masten† Alexander Torgovitsky‡ March 3 2022 Abstract We discuss the ivcrc module, which implements an instrumental variables (IV) estimator for the linear correlated random coefficients (CRC) model. The CRC model is a natural generalization of the standard linear IV model that allows for endogenous, multivalued treatments and unobserved heterogeneity in treatment effects. The estimator implemented by ivcrc uses recent semiparametric identification results that allow for flexible functional forms and permit instruments that may be binary, discrete, or continuous. The ivcrc module also allows for the estimation of varying coefficients regressions, which are closely related in structure to the proposed IV estimator. We illustrate use of ivcrc by estimating the returns to education in the National Longitudinal Survey of Young Men. Keywords: ivregress, Instrumental Variables, Correlated Random Coefficients, Heterogeneous Treatment Effects, Varying Coefficient Models, Returns to Schooling ∗Division of Research and Statistics, Federal Reserve Board of Governors. The analysis and conclusions set forth arethoseoftheauthorsanddonotindicateconcurrencebyothermembersoftheresearchstaffoftheFederalReserve System or of the Board of Governors. †Department of Economics, Duke University. ‡Kenneth C. Griffin Department of Economics, University of Chicago. Research supported in part by National Science Foundation grant SES-1846832. 1

1 Introduction In this paper we describe the ivcrc module for Stata, which implements a linear instrumental variables (IV) estimator for the correlated random coefficients (CRC) model. The CRC model relaxes the constant effects assumption of the standard linear IV model by allowing for random coefficients,whichcaptureunobservedheterogeneityinthecausaleffectoftheendogenousvariables, X, on the outcome variable, Y. Heckman and Vytlacil (1998) and Wooldridge (1997, 2003, 2008) showed that if there is no unobserved heterogeneity in the way the instrument, Z, affects the endogenous variables, X, then the usual linear IV estimator (e.g., ivregress) will estimate the meanoftherandomcoefficients. However,thisassumptionisuncomfortablyasymmetric: treatment effects can be unobservably heterogeneous, but instrument effects cannot. MastenandTorgovitsky(2016)addressthisdrawbackbyshowinghowtoallowforheterogeneity in the relationship between Z and X. The identification arguments developed there allow for the instrumenttobebinary,discrete,orcontinuous,anddependonlyonfamiliar,low-levelassumptions aboutexclusion,exogeneity,andinstrumentrelevance. Theargumentsalsoleadtoasemiparametric estimator that does not suffer from the curse of dimensionality. In Section 2, we briefly review the identification results and estimation approach developed in Masten and Torgovitsky (2014, 2016). The ivcrc module implements their proposed estimator. The structure of the estimator turns out to be quite similar to a common estimator for the varying coefficient models (e.g., Fan and Zhang 2008; Park, Mammen, Lee, and Lee 2015). We have written ivcrc to include a standard estimator for these models as a special case. We briefly describe varying coefficient models in Section 3. In Section 4 we discuss syntax and options for the ivcrc module. In Section 5 we illustrate the module by estimating the return to schooling with a widely-used extract from the National Longitudinal Survey of Young Men. For further examples, see Gollin and Udry (2020), who used the ivcrc module to estimate agricultural production functions, Morales-Mosquera (2019), who used it to estimate the effects of police infrastructure in Colombia, and Masten and Torgovitsky (2014), who used the estimator to revisit Chay and Greenstone’s (2005) analysis of the effect of air pollution on housing prices. 2 The Correlated Random Coefficients Model 2.1 Model and Motivation The simplest form of the model estimated by ivcrc has the outcome equation Y = B +B X, (1) 0 1 2

where Y is an observed outcome, X is an observed explanatory variable, and both B and B 0 1 are unobserved random variables. The model is described as a random coefficients model due to the treatment of B as an unobserved random variable. Economists have long been interested in 1 such models (Wald 1947; Hurwicz 1950; Rubin 1950; Becker and Chiswick 1966). To allow for endogeneity, X is permitted to be arbitrarily dependent with both B and B . This feature makes 0 1 the model one of correlated random coefficients.1 It is helpful to compare (1) with the outcome equation for the textbook linear model: Y = α+βX +U, (2) where α and β are fixed (deterministic) parameters, and U is an unobservable random variable with mean zero. This model also allows for endogeneity by permitting X to be dependent with U. The distinction between U in (2) and B in (1) is not important, since one can view B as being 0 0 equal to α+U. Rather, the important difference between (2) and (1) is that the coefficient on X in (2), i.e. β, is deterministic, whereas the coefficient on X in (1), i.e. B , is a random variable. 1 The interpretation is that in (2) the causal effect of X on Y is the same for all agents, whereas in (1) it is a random variable that can be dependent with X. This important difference allows for heterogeneous treatment effects and selection on the gain of the sort described in Heckman, Urzua, and Vytlacil (2006b). One can view (2) as a special case of (1) with a degenerate B . 1 Textbook discussions of (2) show that β is identified if there exists an instrument Z such that Cov(U,Z) = 0 and Cov(X,Z) (cid:54)= 0. The corresponding IV estimator can be implemented in Stata with the ivregress command. However, if the data is in fact generated by (1), then this estimator converges to Cov(Y,Z) (cid:20) X(Z −E(Z)) (cid:21) = E B × . (3) Cov(X,Z) 1 E[X(Z −E(Z))] This quantity is difficult to interpret in general (Garen 1984; Wooldridge 1997; Heckman and Vytlacil 1998). It is a weighted average of the causal effect of X on Y; that is, a weighted average of B . The weights, however, can be both positive and negative. It generally does not equal the 1 unweighted average of B unless B is independent of (X,Z), which would rule out the type of 1 1 selection on the gain scenario described in Heckman et al. (2006b). A natural question is whether there are additional assumptions under which the IV estimator provided by ivregress would consistently estimate a parameter that is easier to interpret. For example,arethereadditionalconditionsunderwhichthisestimatorconvergestotheaveragepartial effect, E[B ]? Heckman and Vytlacil (1998) and Wooldridge (1997, 2003, 2008) show that there are 1 1 ThisterminologyseemstohavebeenfirstusedbyHeckmanandVytlacil(1998). Inearlierwork,someauthors, forexample,ConwayandKniesner(1991),hadusedtheadjective“correlated”todescribeanunrestrictedcorrelation structure between the random coefficients on different explanatory variables. Our model also allows for this. 3

indeed such conditions, namely, the assumption that the causal effect of Z on X is homogenous. While convenient, this type of homogeneity assumption is uncomfortably asymmetric. It enables the additional heterogeneity in equation (1) relative to equation (2) only by assuming away the same type of heterogeneity in the analogous relationship between Z and X.2 2.2 Identification and Estimation by Conditional Linear Regression Given these negative results, it is worthwhile to consider estimators other than ivregress. The ivcrcmoduleprovidessuchanestimator. Thisestimatorisbasedonthefollowingcontrolfunction argument, which is developed more formally in Masten and Torgovitsky (2016).3 Suppose that there exists an observable variable R such that X⊥⊥(B ,B ) | R. The variable R 0 1 is a “control function” (or sometimes, and more loosely, a “control variable”) because it controls for the endogeneity in X. That is, while X is endogenous in the sense of being unconditionally dependent with (B ,B ), it is exogenous after conditioning on the control function R. In practice, 0 1 R is constructed from the instrument; we explain the derivation and construction of R in more detail in Section 2.3. Given the availability of a variable R with this property, it is straightforward to see that one could consistently estimate the vector β(r) ≡ E[B | R = r] where B ≡ [B ,B ](cid:48) by a linear 0 1 regression of Y on X conditional on R = r. Letting W ≡ [1,X](cid:48) so that Y = W(cid:48)B, one has E[WW(cid:48) | R = r]−1E[WY | R = r] = E[WW(cid:48) | R = r]−1E[WW(cid:48)B | R = r] = β(r), (4) where the second equality uses the assumption that B is independent of X (and hence W), conditional on R. In order for this argument to work, it must be the case that E[WW(cid:48) | R = r] is invertible, which is the usual condition of no perfect multicollinearity, but now conditional on R = r. Intuitively, there must still be some variation left in X after conditioning on R = r. Assuming that this is the case for all r in the support of R, one can average up the linear regression estimands on the right-hand side of (4) to obtain E[B] ≡ E[β(R)], and hence the average partial effect of X on Y, i.e. E[B ]. 1 This identification argument suggests an estimator given by an average of conditional ordinary least squares (OLS) estimators. The conditioning is incorporated by applying kernel weights to each observation, where the weights reflect the distance of R from r. More concretely, given a 2 An influential literature started by Imbens and Angrist (1994), Angrist and Imbens (1995), Angrist, Imbens, and Rubin (1996), and Angrist, Graddy, and Imbens (2000) has provided conditions under which the IV estimator providedbyivregresscanbeinterpretedasalocalaveragetreatmenteffect(LATE)oraweightedaverageofvarious LATEs. While related, these arguments are nonparametric, and in particular do not use the linearity in X of the CRC model. 3 This paper builds on a large literature on control functions, including Heckman (1979), Heckman and Robb (1985),SmithandBlundell(1986),BlundellandPowell(2004),Florens,Heckman,Meghir,andVytlacil(2008),and Imbens and Newey (2009), among many others. 4

sample {Y ,X ,R }n , a conditional regression estimator of Y on W near R = r is given by i i i i=1 (cid:32) n (cid:33)−1(cid:32) n (cid:33) (cid:88) (cid:88) β(cid:98)(r) ≡ k i h(r)W i W i (cid:48) k i h(r)W i Y i , (5) i=1 i=1 where kh(r) ≡ h−1K((R −r)/h) with K a second-order kernel function and h > 0 a bandwidth i i parameter. The conditional OLS estimator (5) displays the same type of bias-variance tradeoff that is familiar from nonparametric kernel regression. As h → ∞, kh(r) → K(0) for all i, so that β(cid:98)(r) is i just the estimator from a usual linear regression of Y on W. We expect this estimator to be biased for E[B] if X is endogenous. Given the control function assumption, this bias disappears as h → 0, butatthecostofhighervarianceinusingfewereffectiveobservationsincomputingβ(cid:98)(r). Balancing these two concerns entails using fewer than n effective observations, and as a consequence β(cid:98)(r) will have a slower-than-parametric rate of convergence for β(r). As a parameter of interest, β(r) has a clear interpretation as the average partial effect of X on Y, conditional on R = r. Variation in this parameter as a function of r indicates treatment effect heterogeneity. We can average β(R) for R in some known set R to obtain the average partial effect for the subpopulation with R ∈ R. A natural estimator of this average is given by β(cid:98)R = (cid:80)n i (cid:80) =1 n β(cid:98)(R 1[ i R )1[R ∈ i R ∈ ] R] , (6) i=1 i where1[·]istheindicatorfunctionthatis1if·istrueand0otherwise. Atleastinprinciple, β(cid:98)R can √ be estimated at the parametric n rate (see Masten and Torgovitsky 2014, or, for a more general discussion, Newey 1994). If the local Gram matrix is invertible for (almost) every r in the support of R, then R can be taken to be the entire support of R, so that (6) becomes an estimator of the unconditional average of B. A more general version of (1) is (cid:88) dx (cid:88) d1 Y = B + B X + B Z ≡ W(cid:48)B, (7) 0 j j dx+j 1j j=1 j=1 where X is now a d –dimensional vector of potentially endogenous explanatory variables and Z ∈ x 1 Rd1 is a vector of exogenous explanatory variables. For notation, we combine these variables and their coefficients together with the constant term as W ≡ [1,X(cid:48),Z(cid:48)](cid:48) and B. We rename the 1 excluded exogenous variable as Z , and combine the exogenous variables (included and excluded 2 instruments) together into a vector Z = [Z(cid:48),Z(cid:48)](cid:48). The required condition on the control function 1 2 is now that W ⊥⊥B | R, so that both X and Z are exogenous after conditioning on R. Given this 1 condition, the identification argument (4) and the estimators (5) and (6) follow exactly as before. 5

2.3 Estimation of the Control Function We have shown how a control function, R, can be used to estimate interesting parameters in a CRC model, but we have not yet explained how one can find or construct such a control function. The most common approach is to assume that for each j = 1,...,d , there exists a function h and x j unobservables V ≡ [V 1 ,...,V dx ](cid:48) ∈ Rdx such that X = h (Z,V ) for each j, (8) j j j where h (z,·) is strictly increasing for each z. As shown by Imbens and Newey (2009) and Masten j and Torgovitsky (2016), if (B,V)⊥⊥Z, then R ≡ [R ,...,R ](cid:48) is a valid control function, where 1 dx R ≡ F (X | Z) and F (x | z) ≡ P[X ≤ x | Z = z] is the population conditional j Xj|Z j Xj|Z j j j distributionfunctionofX , givenZ. ThecomponentsR ofthiscontrolfunctioncanbeinterpreted j j as providing the conditional rank (relative position) of X given Z. The ivcrc module is written j primarily with this choice of control function in mind, although the user can provide a different choice if desired. In that case, the estimator can be viewed as estimating the varying coefficient model discussed in the next section. We refer to Masten and Torgovitsky (2016) for more theoretical details on the interpretation and restrictiveness of maintaining (8); see also Chernozhukov and Hansen (2005) and Torgovitsky (2015). Here we focus on the implications for implementing (4) and (5) with R as the resulting conditional ranks. The first implication is that it may be useful to make a distinction between different components of the endogenous variables, X. For example, if X is a deterministic trans- 2 formation of X , say X = X2, then X is also fully determined by R . As a result, there is no 1 2 1 2 1 need to separately estimate and condition on R . In the terminology of Masten and Torgovitsky 2 (2016), X is a basic endogenous variable, and X = X2 is a derived endogenous variable. 1 2 1 Derived endogenous variables require special treatment, since they appear as part of the vector of explanatory variables W, but are not included as part of the conditioning variables Z in the definition of R ≡ F (X | Z). More formally, a component X of X is a derived endogenous j Xj|Z j j variable if it can be written as X = g (X ,Z) for some known function g . Interaction terms and j j −j j other nonlinear functions form the primary examples of derived endogenous variables. The ivcrc module handles derived endogenous variables using the dendog option discussed in Section 4. The empirical illustration in Section 5 provides an example of its use. A second issue raised by this choice of R is that it is not directly observed in the data. Instead, we need to estimate R = F (X | Z ) in a first step for each basic endogenous variable X and ji Xj|Z ji i j each observation i. The ivcrc module approaches this problem by estimating conditional quantile functions and then inverting them using the pre-rearrangement operator studied by Chernozhukov, Fernandez-Val,andGalichon(2010). Thisoperatortranslatesanestimatorofaconditionalquantile function, say Q(cid:98)Xj|Z (· | z), into an estimator of a conditional distribution function through the 6

relationship (cid:90) 1 (cid:104) (cid:105) F(cid:98)Xj|Z (x j | z) = 1 Q(cid:98)Xj|Z (s | z) ≤ x j ds. (9) 0 For estimating Q(cid:98)Xj|Z (s | z), the ivcrc module uses linear quantile regression (see e.g. Koenker, 2005) as implemented by Stata’s built-in qreg command. The generated regressors {R(cid:98)ji }n i=1 are then constructed by substituting (X ,Z ) into (9) for every i. ji i A third point that arises when using this choice of R is that (6) can be simplified when there is only one basic endogenous variable. This is because R ≡ F (X | Z) is uniformly distributed X|Z when X is continuous. As a result, the probability that R lands in any region R is known a priori and does not need to be estimated. The population average of β(R), conditional on R ∈ R in this case reduces to (cid:90) β = λ(R)−1 β(r)dr, (10) R R where λ(R) is the Lebesgue measure of the set R. When equation (10) holds, ivcrc estimates it (cid:82) by substituting the (known) value of λ(R) and numerically approximating the integral β(cid:98)(r)dr R that replaces β(r) with β(cid:98)(r). A fourth point that is worth reemphasizing is that in order for (4) to exist, the Gram matrix E[WW(cid:48) | R = r] must be invertible. That is, there must not be perfect multicollinearity among the regressors after conditioning on R = r. When using the conditional rank for R, conditioning on R = r still leaves variation in the basic endogenous variables as long as the excluded instrument, Z , is appropriately dependent with X near its rth quantile. See Masten and Torgovitsky (2014, 2 2016) for a more detailed discussion of this point. A consequence for implementation is that it is necessary to exclude from R regions over which this instrument relevance condition fails. 2.4 Bandwidth Selection The ivcrc module implements automated bandwidth selection for h based on a rule-of-thumb method proposed by Fan and Gijbels (1996). We first estimate E[Y | R = r,X = x] as a fourth order polynomial in r, linearly interacted with x. We use this regression to produce estimates of the second derivative of E[Y | R = r,X = x] with respect to r, which we denote by δ(cid:98)(r)x. The rule-of-thumb bandwidth is then defined as (cid:32) (cid:33)1/5 σ2 h ≡ 0.58 (cid:98) , ROT (cid:80)n i=1 1 2 δ(cid:98)(R(cid:98)i )X i where σ2 is the homoskedastic error variance from the polynomial regression and the constant 0.58 (cid:98) isderivedfromtheroughness forsingle-peakedkernels. See thediscussionin Hansen(2021, Section 7

19.9) for further details. 2.5 Statistical Inference The asymptotic variance of β(cid:98)R needs to account for the statistical error involved in estimation of the control function, R. Masten and Torgovitsky (2014) report this calculation, but the form of the asymptotic variance is complicated and does not facilitate direct estimation. Fortunately, β(cid:98)R is a relatively well-behaved estimator, so the bootstrap should be valid for approximating standard errorsandconfidenceintervals(see,e.g.,Chen,Linton,andvanKeilegom2003). Theivcrcmodule uses Stata’s built-in bootstrap routine for these purposes. We evaluate the validity of the bootstrap using a small Monte Carlo simulation. The model is (1) with a single endogenous variable, X. The relationship between X, Z, and V in (8) is specified as X = 0.3Z +0.4ZV +V, where Z is binary {0,1} with equal probability, and V is normally distributed with mean 0.1 and variance 0.2. The unobservables are related by B = 0.3V +(cid:15) and B = 0.7V +(cid:15) , 0 0 1 1 where(cid:15) and(cid:15) arenormallydistributed, independentofV, withmeans0.2and0.45, andvariances 0 1 0.2 and 1. The parameter values imply E[B ] = 0.23 and E[B ] = 0.52. 0 1 We implement the bootstrap on each of 1000 Monte Carlo replications with 250 bootstraps per replication while using the rule-of-thumb bandwidth. We first consider a nominal level 5% test that uses bootstrapped standard errors together with a normal approximation. We reject the true null hypotheses E[B ] = .23, and E[B ] = 0.52 in 4.9% and 6.2% of replications, respectively. Then 0 1 we considered the coverage rate for a nominal 95% bootstrapped percentile confidence interval, without exploiting normality. The confidence intervals covered the true values of E[B ] and E[B ] 0 1 in 94.5% and 94% of replications, respectively. We interpret these findings as providing evidence that the bootstrap provides inference that is close to size-correct. 3 Varying Coefficient Models The CRC model can be viewed as a special case of a larger class of models called varying coefficient models. A simple example of this model is Y = β (S)+β (S)X +U, (11) 0 1 8

where Y is an observed outcome, S are observed covariates (sometimes called “effect modifiers”), X is our primary observed covariate of interest, and U is an unobserved variable. Both β (·) and 0 β (·) are unknown, nonparametrically specified functions. Conditional on S, this is a parametric 1 model in X. But conditional on X, it is a nonparametric model in S. While it is unclear who first proposed such models (e.g., see O’Hagan and Kingman 1978 for an early citation), their in-depth study began with Cleveland, Grosse, and Shyu (1991) and Hastie and Tibshirani (1993). Fan and Zhang (2008) and Park et al. (2015) provide recent reviews of this literature. Given a sample {Y ,X ,S }n , the local regression estimator (5) with R = S is precisely the i i i i=1 i i Nadaraya-Watson (local constant) varying coefficient estimator; e.g., equation (2.1) of Park et al. (2015). Clevelandetal.(1991)proposedalocallinearestimator. FanandZhang(1999)studythese andotheralternativeestimatorsindetail. TheasymptotictheoryinMastenandTorgovitsky(2014) extends that of the varying coefficient literature in two directions: (a) by allowing for S to be a generated regressor and (b) by considering the asymptotic distribution of average coefficients, such as E[β (S)]. While the literature on varying coefficient models focuses on the functions β (·) and 1 0 β (·) themselves, the econometric models we consider motivate interest in these average coefficients 1 as well. The ivcrc module can estimate varying coefficient models like (11) via the varcoef option. See section 4 for details. This estimator allows all components of S to enter all coefficients. Park et al. (2015) discuss estimators which allow one to impose the assumption that some components of S enter some coefficients, but not others. We conclude this section by briefly showing how the linear CRC model can be seen as a varying coefficient model. For simplicity, we only consider the simple model (1). Write Y = B +B X 0 1 = E(B | R)+E(B | R)X +[(B −E(B | R))+(B −E(B | R))X] 0 1 0 0 1 1 ≡ β (R)+β (R)X +U. 0 1 By X ⊥⊥(B ,B ) | R and the definition of U, E(U | R,X) = 0. Thus the linear CRC model is a 0 1 varying coefficient model with effect modifier R. 4 The ivcrc Module The ivcrc module is available on the Statistical Software Components (SSC) archive and can be installed directly in Stata with the command ssc install ivcrc. Alternatively, the latest version of the module can be downloaded from the GitHub repository https://github.com/ a-torgovitsky/ivcrc. The code (ivcrc.ado) and the help file (ivcrc.sthlp) can be downloaded from the repository and placed in the personal ado directory, as described in the Stata FAQ: https://www.stata.com/support/faqs/programming/personal-ado-directory/. 9

The syntax for the ivcrc module is ivcrc depvar [varlist ] (varlist = varlist ) [if] [in] [, options] 1 edg 2 In terms of the IV model discussed in Section 2, depvar is Y, varlist consists of the components in 1 Z ,varlist arethebasicendogenousvariablecomponentsofX,andvarlist arethecomponentsin 1 edg 2 Z . The required components of the syntax are depvar, varlist , and varlist , while the remaining 2 2 edg terms in brackets are optional. The module allows for the options shown in Table 1. The dendog option allows the user to specify a list of endogenous variables that should be treated as derived (rather than basic), with the implications for implementation discussed in Section 2. The bootstrap option controls the calculation of standard errors and confidence intervals. Note that ivcrc does not compute these by default, because the bootstrap procedure can be computationally intensive. The kernel and bandwidth options allow the user to change the kernel function K and bandwidth h used to compute the weights in (5). If the input for bandwidth is a list of numbers (separated by commas), then ivcrc will compute different estimates for each bandwidth. The computational efficiency of specifying several bandwidths at once is especially useful when calling bootstrap for standard errorsandconfidenceintervals. Theranksoptioncontrolsthedegreeofaccuracyforapproximating the integral in (9). The average option determines the set R over which the local estimates β(cid:98)(r) are averaged and controls how this averaging is implemented. For example, average(.1(0).3) sets R = [.1,.3] and uses the empirical mean to evaluate the integral in (10). The module interprets a grid step of 0 as a request for computing β(cid:98)R using the sample averaging formula (6) that does not use knowledge of the distribution of R. Alternatively, specifying average(.1(.01).3) sets R = [.1,.3] and uses grid steps of .01 to numerically evaluate the integral. Multiple non-overlapping sets can be specified by adding commas. If the report suboption is given, then estimates on each set will be reported separatelytogetherwiththeoverallestimate. Forexample,average(.1(0).3, .5(0).8, report) would report the estimate of β just discussed, along with another empirical average estimate for R R = [.5,.8]. The grid method supports the report suboption as well. There are two situations in which the module will always use (6) instead of attempting to numerically integrate (10). The first is when there is more than one basic endogenous variable, in which case R is a vector with a joint distribution that is not known a priori and (10) is not valid. If a user specifies a list of subsets average(lb1(g1)ub1,..., lbN(gN)ubN) when there are multiple basic endogenous variables, the module interprets each subset lbn(gn)ubn as belonging to the nth endogenous variable in order of appearance in varlist . Due to the difficulty of specifying edg sets in higher dimensions, more general multidimensional subset estimates may be obtained either by permuting this syntax, or by storing the local estimates β(cid:98)(r) using the savecoef option and subsequently computing any desired subset average. This is not essential to the method, but 10

Table 1: Options for ivcrc Option Description dendog(varlist) Specify derived endogenous variables. bootstrap() Bootstrap confidence intervals and standard errors; default setting is no standard errors. Specify typical bootstrap options in (), e.g. reps(#) or cluster(varlist). Access additional bootstrap statistics via estat bootstrap. kernel(string) Choose alternative kernel functions; default is the Epanechnikov kernel. Other options: uniform, triangle, biweight, triweight, cosine, or gaussian. bandwidth(numlist) Bandwidth of kernel; default is the rule-of-thumb. If multiple (comma separated) values are specified, estimates for each bandwidth are reported. Sub-option: together with varcoef, specify the bandwidth for a varying coefficients model. ranks(integer) Use ( 1 ,...,1− 1 ) evenly spaced quantiles for integer integer computing the conditional rank statistic; default is 50. average(numlist [, report]) Options for numerical integration, with number list syntax: lb(g)ub. Specify average(lb(0)ub) to use the sample average method; default is average(0(0)1). Specify non-zero values of g to use the grid method, e.g. average(.01(.01).99) to numerically integrate over the grid (.01,.02,...,.99). The space of integration may be comprised of non-overlapping ascending subsets by specifying comma separated lists. Sub-option: specifying average(lb1(g1)ub1,..., lbN(gN)ubN, report) returns estimates for each subset as well as estimates over their union. Sub-option: together with varcoef, specify the support for kernel weights in a varying coefficients model. generate(varname [, replace]) Save the conditional rank estimates to varname in the working dataset; this option is ignored when bootstrapping. userank(varname) Use varname as the conditional rank statistic, bypassing rank estimation. savecoef(filename) Creates a comma delimited (csv) dataset of the local rank-specific coefficient estimates, saved to filename. varcoef(varlist) Estimate a varying coefficients model, in which coefficients are conditioned on covariates specified in varlist as an alternative to conditioning on the ranks of the basic endogenous variables varlist . Options edg average and bandwidth are required with varcoef noconstant Suppress the constant term of the model. 11

allowingformoregeneralspecificationswouldcomplicatethesyntaxsignificantlywithoutproviding much in the way of useful flexibility. The second case in which ivcrc only uses the empirical average (6) is when the varcoef option is called. Passing varcoef(varlist) skips the estimation of R(cid:98)i and uses the variables in varlist in its place. Since the density of these variables is generally not known a priori, (10) may not be true, so (6) is used. The average in (6) can still be taken over some specified subset R, and such a set is still specified using the average(lb(0)ub) syntax. Note that using both the (varlist = varlist ) edg 2 syntax and passing varcoef as an option will generate an error. 5 Using ivcrc to Estimate the Returns to Schooling In this section, we use the ivcrc module to estimate the returns to schooling. Our discussion builds off of Card (1994, 2001) and Heckman and Vytlacil (1998), who note that a simple model of optimal schooling decisions (such as Becker 1975) would generate a CRC model like (1) or (7). In the notation of (7), Y would be a labor market outcome (e.g., wages) and X would be a measure of educational attainment (e.g., years of completed schooling). There are several reasons to expect confounding factors that make the direct relationship between X and Y a poor indicator of the causal effect. Some of these determinants, like family background characteristics, can be observed indataandcontrolledfor. However, thefactthatindividualshavesomechoiceovertheireducation also suggests that some important confounding factors are inherently unobservable. Instrumental variable strategies have been widely used to tackle this self-selection problem. Some instruments that have been used are compulsory schooling laws (Angrist and Krueger 1991; Oreopoulos 2006), the distance a teenager lives from a college (Card 1993; Mountjoy 2019), and local labor market conditions (Cameron and Heckman 1998).4 The argument underlying these strategiesisthattheproposedinstrumentaffectsanindividual’seducationalattainmentbyshifting the costs and/or benefits involved, but does not itself directly affect labor market outcomes and is uncorrelated with any other factors that do. The CRC model discussed in Section 2 allows for the effect that an instrument has on an individual’s educational choice to be correlated with their returns to schooling. So, for example, those whose education choice is affected by distance might have systematically different returns to schooling than those whose choice is not. Our analysis uses the same data and regression specification as Card (1993) and Kling (2001), whichisavailableaspartofCameronandTrivedi’s(2009)textbookonStataforMicroeconometrics. ThedataisanextractfromtheNationalLongitudinalSurveyofYoungMen(NLSYM)thatconsists of 3,010 men who were aged 24–34 in 1966. The extract contains variables from both 1966 and a follow-up survey in 1976. The data, as well as the code for the following analysis, is available at https://github.com/a-torgovitsky/ivcrc. 4 See also Carneiro, Heckman, and Vytlacil (2011) for an IV strategy that uses multiple types of instruments. 12

The outcome variable is wage76, the individual’s log hourly wage in 1976. The primary endogenous variable of interest is grade76, the individual’s highest level of schooling measured in years. The regression includes two additional endogenous variables: years of potential work experience in 1976 exp76, and squared experience expsq76. Potential work experience is defined as exp76 = age76−grade76−6,followingthestandardconventionforMincerequations(Mincer1958, 1974). To avoid perfect collinearity, we follow the literature by including experience in the outcome equation and age in the first stage equation. Since potential experience is a derived endogenous variable, this convention also allows us to illustrate the module’s dendog option. We include a set of sociodemographic controls for race (black), parent’s education (daded, momed,famed1-8),familystructureatage14(momdad14,sinmom14),andgeographicregion(smsa66, smsa76, reg1-reg8).5 While not essential to demonstrating the usage of the ivcrc module, the inclusionofthesecontrolsshowsthatthesemiparametricestimatordoesnotsufferfromthecurseof dimensionality. Including these controls also replicates the regression specification in Card (1993) and Kling (2001), which allows us to compare the ivcrc module to popular alternative estimators like 2SLS (e.g., ivregress). We begin by estimating a linear regression of log wages on schooling, potential work experience, anddemographiccontrolvariables. ThistypeofregressionisoftenreferredtoasaMincerequation; see Heckman, Lochner, and Todd (2006a) for an in-depth discussion. The estimates indicate that an additional year of schooling is associated with approximately a 7.25 percent increase in 1976 wages: . reg wage76 grade76 exp76 expsq76 ‘ControlVars’, robust Linear regression Number of obs = 3,010 F(27, 2982) = 52.45 Prob > F = 0.0000 R-squared = 0.3040 Root MSE = .37191 ------------------------------------------------------------------------------ | Robust wage76 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------grade76 | .0725423 .0038685 18.75 0.000 .0649572 .0801275 ... As discussed above, education is a choice variable that is likely correlated with latent factors that affect wages, even after controlling for sociodemographic characteristics. Card (1993) used an 5 For readability, we collect these into a local variable ControlVars in the do file for this exercise. 13

indicator for living (at age 14) in a county with a four-year college as an instrument for education. Proximity to a four-year college is associated with about a third of a grade higher educational attainment: . reg grade76 col4 ‘ControlVars’, robust Linear regression Number of obs = 3,010 F(25, 2984) = 53.50 Prob > F = 0.0000 R-squared = 0.2937 Root MSE = 2.2591 ------------------------------------------------------------------------------ | Robust grade76 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------col4 | .3669905 .1023706 3.58 0.000 .1662663 .5677147 ... In order for college proximity to be a valid instrument, it should have no direct effect on wages in 1976 and also be uncorrelated with other factors that are correlated with wages or schooling decisions after conditioning on observables. There are several reasons to be suspect of this requirement; see for example Kling (2001), and see Mountjoy (2019) for a modern discussion with richer geographic data. Here we simply compare estimators and take the validity of the college proximity instrument for granted. The 2SLS estimator suggests that an additional year of schooling causes about a 13.33 percent increase in 1976 wages: . ivregress 2sls wage76 (grade76 exp76 expsq76 = col4 age76 agesq76) /// > ‘ControlVars’, perfect Instrumental variables (2SLS) regression Number of obs = 3,010 Wald chi2(27) = 1007.25 Prob > chi2 = 0.0000 R-squared = 0.2030 Root MSE = .39614 -----------------------------------------------------------------------------wage76 | Coef. Std. Err. z P>|z| [95% Conf. Interval] 14

-------------+---------------------------------------------------------------grade76 | .1333034 .0493359 2.70 0.007 .0366068 .2300001 ... This interpretation presumes that the causal effect of schooling on wages is constant. It yields the potentially puzzling conclusion that the raw association between education and wages actually substantially understates the causal effect of education on wages. As Card (2001) documents, this conclusion about the returns to schooling is actually fairly common across diverse studies that use a variety of IV strategies and data sources. One explanation proposed by Card (2001) is that this arises from a failure to account for heterogeneity in the causal effect of schooling on wages. We can use the ivcrc module to assess this explanation. The syntax is similar to that for the IV estimator: . ivcrc wage76 (grade76 = col4 age76 agesq76) ‘ControlVars’, /// > dendog(exp76 expsq76) (default settings do not compute standard errors, see bootstrap() option) (estimating the conditional rank of grade76) (estimating rule-of-thumb bandwidth) (estimating beta(r) at each r[i] rank in the sample) IVCRC Number of obs = 3,010 -----------------------------------------------------------------------------wage76 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------grade76 | .0852767 . . . . . ... ------------------------------------------------------------------------------ Note: Average coefficients over R = [0,1] rank subset; Bandwidth = .0370327 We treat potential experience, exp76, as a derived endogenous variable here because it is defined as adeterministicfunctionof grade76andage76. Whereasthecoefficientongrade76reportedbythe standardlinearregressionestimatorimplementedbyivregresswillestimateadifficult-to-interpret quantity like (3), the coefficient on grade76 produces an estimator of the average causal effect of a one year increase in grade76. The causal effect estimated here of 8.53 percent is substantially lower than the linear IV estimate of 13.33 percent. This supports Card’s (2001) reasoning if, as he argues, the usual linear IV estimator places more weight on individuals with higher returns to schooling. The ivcrc estimate is also somewhat larger than the linear regression coefficient 0.0725. We now demonstrate some of the options for ivcrc by evaluating the statistical significance and robustness of the above ATE estimate. First, we compute standard errors, which tends to be time-consuming due to the necessity of using the bootstrap. The syntax and results are: 15

. ivcrc wage76 (grade76 = col4 age76 agesq76) ‘ControlVars’, /// > dendog(exp76 expsq76) bootstrap(reps(100) seed(5282020)) (estimating rule-of-thumb bandwidth) (running _ivcrc_estimator on estimation sample) Bootstrap replications (100) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 IVCRC Number of obs = 3,010 Replications = 100 -----------------------------------------------------------------------------wage76 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------grade76 | .0852767 .0267166 3.19 0.001 .0329131 .1376403 ... ------------------------------------------------------------------------------ Note: Average coefficients over R = [0,1] rank subset; Bandwidth = .0370327 The confidence interval here is a bit wider than for the linear regression estimator, although substantially narrower than for the usual linear IV estimator. The textbook IV estimator and the ivcrc estimates are constructed under non-nested assumptions, so this by itself is not unexpected. However, since thebandwidthcontrols a bias-variancetrade-off intheivcrcestimator, it doessuggest that we may want to explore bandwidths other than the default rule-of-thumb ((cid:98)h ROT = 0.037) in order to guard against potential bias due to oversmoothing. So next we evaluate the point estimates at several specified bandwidths: . ivcrc wage76 (grade76 = col4 age76 agesq76) ‘ControlVars’, /// > dendog(exp76 expsq76) bandwidth(.025, .05, .075) (default settings do not compute standard errors, see bootstrap() option) (estimating the conditional rank of grade76) (estimating beta(r) at each r[i] rank in the sample) IVCRC Number of obs = 3,010 -----------------------------------------------------------------------------wage76 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------grade76 | .0869784 . . . . . 16

... ------------------------------------------------------------------------------ Note: Average coefficients over R = [0,1] rank subset; Bandwidth = .025 -----------------------------------------------------------------------------grade76 | .0807563 . . . . . ... ------------------------------------------------------------------------------ Note: Average coefficients over R = [0,1] rank subset; Bandwidth = .05 -----------------------------------------------------------------------------grade76 | .0779116 . . . . . ... ------------------------------------------------------------------------------ Note: Average coefficients over R = [0,1] rank subset; Bandwidth = .075 The estimate is relatively stable over different bandwidths, but does decline somewhat as the local estimates β(cid:98)(r) are computed using larger neighborhoods of r. Obtaining standard errors and confidence intervals using the smallest bandwidth in this list, . ivcrc wage76 (grade76 = col4 age76 agesq76) ‘ControlVars’, /// > dendog(exp76 expsq76) bootstrap(reps(100) seed(5282020)) bandwidth(.025) (running _ivcrc_estimator on estimation sample) Bootstrap replications (100) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 IVCRC Number of obs = 3,010 Replications = 100 -----------------------------------------------------------------------------wage76 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------grade76 | .0869784 .029612 2.94 0.003 .0289401 .1450168 ... ------------------------------------------------------------------------------ Note: Average coefficients over R = [0,1] rank subset; Bandwidth = .025 we find a slightly larger standard error and a wider confidence interval, as anticipated. Though more comparable to the standard error and confidence interval from the linear IV model, the ivcrc standard error remains roughly 1.5 times smaller even at this smaller bandwidth. 17

The number of quantiles used to approximate the integral in (9) and the functional form of the kernel weights K could in principle also impact the ivcrc estimates. Quadrupling the number of quantiles from its default of 50 while carrying forward the smaller bandwidth from above, . ivcrc wage76 (grade76 = col4 age76 agesq76) ‘ControlVars’, /// > dendog(exp76 expsq76) bootstrap(reps(100) seed(5282020)) bandwidth(.025) /// > ranks(200) (running _ivcrc_estimator on estimation sample) Bootstrap replications (100) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 IVCRC Number of obs = 3,010 Replications = 100 -----------------------------------------------------------------------------wage76 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------grade76 | .078291 .0320328 2.44 0.015 .0155079 .1410741 ... ------------------------------------------------------------------------------ Note: Average coefficients over R = [0,1] rank subset; Bandwidth = .025 we find that the results are not very sensitive to how finely the integral in (8) is approximated. Swapping a uniform kernel for the (default) Epanechnikov kernel, while carrying forward a smaller bandwidth and more accurate rank estimation from above, . ivcrc wage76 (grade76 = col4 age76 agesq76) ‘ControlVars’, /// > dendog(exp76 expsq76) bootstrap(reps(100) seed(5282020)) bandwidth(.025) /// > ranks(200) kernel(uniform) (running _ivcrc_estimator on estimation sample) Bootstrap replications (100) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 IVCRC Number of obs = 3,010 18

Replications = 100 -----------------------------------------------------------------------------wage76 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------grade76 | .0780691 .0302556 2.58 0.010 .0187693 .137369 ... ------------------------------------------------------------------------------ Note: Average coefficients over R = [0,1] rank subset; Bandwidth = .025 we find that the results are also not sensitive to the functional form of the kernel, in concordance with the usual folklore for nonparametric kernel regression. One interesting way to explore both the robustness and potential explanations for our finding is to change the set R over which the average is being taken. By default, ivcrc averages over all estimated conditional ranks (R(cid:98)i ) directly as in (6). Alternatively, if we are concerned about results being driven by outliers in the conditional distribution of education, we can specify R to be [.05,.95]. Trimming the distribution in this way, while maintaining the smaller bandwidth and more accurate rank estimation from above, . ivcrc wage76 (grade76 = col4 age76 agesq76) ‘ControlVars’, /// > dendog(exp76 expsq76) bootstrap(reps(100) seed(5282020)) bandwidth(.025) /// > ranks(200) average(.05(0).95) (running _ivcrc_estimator on estimation sample) Bootstrap replications (100) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 IVCRC Number of obs = 3,010 Replications = 100 -----------------------------------------------------------------------------wage76 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------grade76 | .0735619 .0345893 2.13 0.033 .0057681 .1413558 ... ------------------------------------------------------------------------------ Note: Average coefficients over R = [.05,.95] rank subset; Bandwidth = .025 we obtain slightly lower estimated returns to education and a slightly larger standard error, but overall similar results to the estimates which used the full observed conditional distribution of 19

education. When there is a single basic endogenous variable, as in the present application, another check on the estimates is to use numerical integration based on (10). Specifying an equally spaced grid with steps of .01 over the outlier-trimmed region [.05,.95] from above, . ivcrc wage76 (grade76 = col4 age76 agesq76) ‘ControlVars’, /// > dendog(exp76 expsq76) bootstrap(reps(100) seed(5282020)) bandwidth(.025) /// > ranks(200) average(.05(.01).95) (running _ivcrc_estimator on estimation sample) Bootstrap replications (100) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 IVCRC Number of obs = 3,010 Replications = 100 -----------------------------------------------------------------------------wage76 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------grade76 | .0730978 .0341265 2.14 0.032 .0062111 .1399845 ... ------------------------------------------------------------------------------ Note: Average coefficients over R = [.05,.95] rank subset; Bandwidth = .025 we obtain estimates that are nearly identical to those obtained using the default sample average method, (6). We can also consider smaller sets of R to explore heterogeneity in the return to schooling. For example, an estimate for individuals in the lower half of the education distribution is: . ivcrc wage76 (grade76 = col4 age76 agesq76) ‘ControlVars’, /// > dendog(exp76 expsq76) bootstrap(reps(100) seed(5282020)) bandwidth(.025) /// > ranks(200) average(0(0).5) (running _ivcrc_estimator on estimation sample) Bootstrap replications (100) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 20

IVCRC Number of obs = 3,010 Replications = 100 -----------------------------------------------------------------------------wage76 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------grade76 | .1038839 .0531316 1.96 0.051 -.0002521 .2080199 ... ------------------------------------------------------------------------------ Note: Average coefficients over R = [0,.5] rank subset; Bandwidth = .025 This suggests that individuals with lower schooling have higher returns to schooling. Specifying a set for each quartile of the education distribution reveals a pattern that supports this explanation, while indicating potentially more nuance, . ivcrc wage76 (grade76 = col4 age76 agesq76) ‘ControlVars’, /// > dendog(exp76 expsq76) bootstrap(reps(100) seed(5282020)) bandwidth(.025) /// > ranks(200) average(0(0).25, .2501(0).5, .5001(0).75, .7501(0)1, report) (running _ivcrc_estimator on estimation sample) Bootstrap replications (100) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 IVCRC Number of obs = 3,010 Replications = 100 -----------------------------------------------------------------------------wage76 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------grade76 | .078291 .0320328 2.44 0.015 .0155079 .1410741 ... ------------------------------------------------------------------------------ Note: Average coefficients over R = [0,1] rank subset; Bandwidth = .025 -----------------------------------------------------------------------------grade76 | .0651947 .0616646 1.06 0.290 -.0556657 .1860551 ... ------------------------------------------------------------------------------ Note: Average coefficients over R = [0,.25] rank subset; Bandwidth = .025 ------------------------------------------------------------------------------ 21

grade76 | .1418537 .1006071 1.41 0.159 -.0553327 .33904 ... ------------------------------------------------------------------------------ Note: Average coefficients over R = [.2501,.5] rank subset; Bandwidth = .025 -----------------------------------------------------------------------------grade76 | .0264401 .066345 0.40 0.690 -.1035938 .1564739 ... ------------------------------------------------------------------------------ Note: Average coefficients over R = [.5001,.75] rank subset; Bandwidth = .025 -----------------------------------------------------------------------------grade76 | .0787306 .0512649 1.54 0.125 -.0217468 .1792081 ... ------------------------------------------------------------------------------ Note: Average coefficients over R = [.7501,1] rank subset; Bandwidth = .025 Themodulefirstdisplaystheestimatetakenovertheunionofthegivensets, inthiscasetheoverall sample average. Then ivcrc reports the estimates over each subset that we specified in average. We find that the estimates of the returns to schooling vary across the education distribution, the second quartile exhibiting large returns that are comparable to the linear IV estimate. However, the estimates are less precisely estimated than the average return using the entire sample, which reflects the fact that each subset only uses approximately one fourth of the number of effective observations. We conclude with an illustration of varying coefficient estimation with ivcrc. Cai (2010) suggeststhattherelationshipbetweeneducationandwagesmightbeincreasinginworkexperience. We can explore this hypothesis using the varcoef option, which allows the linear coefficients to be nonparametric functions of experience. To use varcoef we must also specify the average option. To obtain the support points of experience for average, we first quietly calculate summary statistics. Then we call varcoef using the savecoef option to store the experience-conditioned coefficient estimates. Loading the dataset stored with savecoef, we finally plot our estimates of the coefficient on education as a function of experience. . qui sum exp76 . ivcrc wage76 grade76 ‘ControlVars’ , /// > varcoef(exp76) bandwidth(4) /// > average(‘r(min)’(0)‘r(max)’) savecoef(varcoef_exp) (default settings do not compute standard errors, see bootstrap() option) (estimating beta(exp76) at each exp76[i] in the sample) IVCRC Number of obs = 3,010 22

-----------------------------------------------------------------------------wage76 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------grade76 | .0602117 . . . . . ... ------------------------------------------------------------------------------ Note: Average coefficients over = [0,23]; Bandwidth = 4 . . import delimited "varcoef_exp.csv", colrange(2:3) clear (2 vars, 3,010 obs) ... . local plotopt title("") scheme(s2mono) graphregion(color(gs16) ilc(gs16)) /// > plotregion(lc(gs0)) ylabel(, angle(0) labsize(4)) legend(off) /// > xtitle("Years of Experience", size(4)) ytitle("Coefficient on Education", size(4)) . twoway (scatter grade76 exp76, msize(small) mcolor(black)) /// > (lpoly grade76 exp76 , degree(0) bwidth(1) lcolor(black) ‘plotopt’) Figure 1: Varying Coefficient Estimation – Coefficients as a Function of Experience .08 .06 .04 .02 0 noitacudE no tneiciffeoC 0 5 10 15 20 25 Years of Experience The coefficient on education is positive for all levels of experience, and averages about 6 percent per year of education across experience levels. The coefficient decreases as experience increases for both small and large values of experience. However, as suggested by Cai (2010), the coefficient on education is increasing over the 10th to 90th percentiles (4 to 15 years) of experience. 23

6 Conclusion In this paper we have discussed the CRC model, which is a parsimonious IV model that explicitly incorporatesheterogeneoustreatmenteffects. TheivcrcmoduleforStataimplementsanestimator fortheCRCmodel. Theestimatorcanbeusedtocarefullyanalyzeheterogeneoustreatmenteffects in a way that the usual linear estimator implemented by ivregress cannot. Because the estimator is based on averaging conditional linear regressions, it scales well and incorporates covariates easily, making it attractive from a practical perspective. There are a few important limitations to the estimator that are worth mentioning both as caveats, and as directions for future research. First, the conditions described in Section 2.3 to construct the control function, R, from the instrument, Z, require the endogenous variable, X, to be continuously distributed. The extent to which these methods fail for discrete endogenous variables is an interesting theoretical question that may have important implications for practice. Second, theivcrcmoduleimplementsarule-of-thumbbandwidth. Theremayexistother, superior methods for automated bandwidth selection. Third, defining, detecting, and correcting for weak instruments is an unexplored topic for the CRC model. 24

References Angrist, J. D., K. Graddy, and G. W. Imbens (2000): “The Interpretation of Instrumental Variables Estimators in Simultaneous Equations Models with an Application to the Demand for Fish,” The Review of Economic Studies, 67, 499–527. 4 Angrist, J. D. and G. W. Imbens (1995): “Two-Stage Least Squares Estimation of Average CausalEffectsinModelswithVariableTreatmentIntensity,” Journal of the American Statistical Association, 90, 431–442. 4 Angrist, J. D., G. W. Imbens, and D. B. Rubin (1996): “Identification of Causal Effects Using Instrumental Variables,” Journal of the American Statistical Association, 91, 444–455. 4 Angrist, J. D. and A. B. Krueger (1991): “Does Compulsory School Attendance Affect Schooling and Earnings?” The Quarterly Journal of Economics, 106, 979–1014. 12 Becker, G. (1975): “Human Capital and The Personal Distribution of Income: An Analytical Approach,” in Human Capital, New York: Columbia University Press, second ed. 12 Becker, G. S. and B. R. Chiswick (1966): “Education and the Distribution of Earnings,” The American Economic Review, 56, 358–369. 3 Blundell, R. W. and J. L. Powell (2004): “Endogeneity in Semiparametric Binary Response Models,” The Review of Economic Studies, 71, 655–679. 4 Cai, Z. (2010): “Functional Coefficient Models for Economic and Financial Data,” The Oxford Handbook of Functional Data Analysis. 22, 23 Cameron, A. C. and P. K. Trivedi (2009): Microeconometrics Using Stata, Stata Press. 12 Cameron, S. V. and J. J. Heckman (1998): “Life Cycle Schooling and Dynamic Selection Bias: Models and Evidence for Five Cohorts of American Males,” Journal of Political Economy, 106, 262–333. 12 Card, D. (1993): “Using Geographic Variation in College Proximity to Estimate the Return to Schooling,” NBER Working Paper No. 4483. 12, 13 ——— (1994): “Earnings, Schooling, and Ability Revisited,” NBER Working Paper No. 4832. 12 ——— (2001): “Estimating the Return to Schooling: Progress on Some Persistent Econometric Problems,” Econometrica, 69, 1127–1160. 12, 15 Carneiro, P., J. J. Heckman, and E. J. Vytlacil (2011): “Estimating Marginal Returns to Education,” American Economic Review, 101, 2754–81. 12 Chay, K. Y. and M. Greenstone (2005): “Does Air Quality Matter? Evidence from the Housing Market,” Journal of Political Economy, 113, 376–424. 2 Chen, X., O. Linton, and I. van Keilegom (2003): “Estimation of Semiparametric Models When the Criterion Function Is Not Smooth,” Econometrica, 71, 1591–1608. 8 25

Chernozhukov, V., I. Fernandez-Val, and A. Galichon (2010): “Quantile and Probability Curves Without Crossing,” Econometrica, 78, 1093–1125. 6 Chernozhukov, V. and C. Hansen (2005): “An IV Model of Quantile Treatment Effects,” Econometrica, 73, 245–261. 6 Cleveland, W., E. Grosse, and W. Shyu (1991): “Local Regression Models,” in Statistical Models in S, ed. by J. Chambers and T. Hastie, Chapman & Hall, London, chap. 8, 309–376. 9 Conway, K. S. and T. J. Kniesner (1991): “The Important Econometric Features of a Linear Regression Model with Cross-Correlated Random Coefficients,” Economics Letters, 35, 143–147. 3 Fan, J. and I. Gijbels (1996): Local Polynomial Modelling and Its Applications, Chapman-Hall. 7 Fan, J. and W. Zhang (1999): “Statistical Estimation in Varying Coefficient Models,” Annals of Statistics, 1491–1518. 9 ——— (2008): “Statistical Methods with Varying Coefficient Models,” Statistics and its Interface, 1, 179. 2, 9 Florens, J. P., J. J. Heckman, C. Meghir, and E. Vytlacil (2008): “Identification of Treatment Effects Using Control Functions in Models With Continuous, Endogenous Treatment and Heterogeneous Effects,” Econometrica, 76, 1191–1206. 4 Garen, J. (1984): “The Returns to Schooling: A Selectivity Bias Approach with a Continuous Choice Variable,” Econometrica, 52, 1199. 3 Gollin, D. and C. Udry (2020): “Heterogeneity, Measurement Error, and Misallocation: Evidence from African Agriculture,” Journal of Political Economy (forthcoming). 2 Hansen, B. E. (2021): Econometrics, Princeton University Press (forthcoming). 7 Hastie, T. and R. Tibshirani (1993): “Varying-Coefficient Models,” Journal of the Royal Statistical Society. Series B (Methodological), 757–796. 9 Heckman, J. and E. Vytlacil (1998): “Instrumental Variables Methods for the Correlated Random Coefficient Model: Estimating the Average Rate of Return to Schooling When the Return is Correlated with Schooling,” The Journal of Human Resources, 33, 974–987. 2, 3, 12 Heckman, J. J. (1979): “Sample Selection Bias as a Specification Error,” Econometrica, 47, 153–161. 4 Heckman, J. J., L. J. Lochner, and P. E. Todd (2006a): “Earnings Functions, Rates of ReturnandTreatmentEffects: TheMincerEquationandBeyond,”inHandbookoftheEconomics of Education, ed. by E. Hanushek and F. Welch, Elsevier, vol. 1, chap. 7, 307–458. 13 Heckman, J. J. and R. Robb (1985): “Alternative Methods for Evaluating the Impact of Interventions: An Overview,” Journal of Econometrics, 30, 239–267. 4 26

Heckman, J. J., S. Urzua, and E. Vytlacil (2006b): “Understanding Instrumental Variables in Models with Essential Heterogeneity,” The Review of Economics and Statistics, 88, 389–432. 3 Hurwicz, L. (1950): “Systems with Nonadditive Disturbances,” in Statistical Inference in Dynamic Economic Models, ed. by T. Koopmans, no. 10 in Cowles Commission Monographs, 410– 418. 3 Imbens, G. W. and J. D. Angrist (1994): “Identification and Estimation of Local Average Treatment Effects,” Econometrica, 62, 467–475. 4 Imbens, G. W. and W. K. Newey (2009): “Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity,” Econometrica, 77, 1481–1512. 4, 6 Kling, J. R.(2001): “InterpretingInstrumentalVariablesEstimatesoftheReturnstoSchooling,” Journal of Business & Economic Statistics, 19, 358–364. 12, 13, 14 Koenker, R. (2005): Quantile Regression, Cambridge University Press. 7 Masten, M. A. and A. Torgovitsky (2014): “Instrumental Variables Estimation of a Generalized Correlated Random Coefficients Model,” cemmap working paper 02/14. 2, 5, 7, 8, 9 ——— (2016): “Identification of Instrumental Variable Correlated Random Coefficients Models,” The Review of Economics and Statistics, 98, 1001–1005. 2, 4, 6, 7 Mincer, J. (1958): “Investment in Human Capital and Personal Income Distribution,” Journal of Political Economy, 66, 281–302. 13 ——— (1974): Schooling, Experience, and Earnings, NBER Press. 13 Morales-Mosquera, M. (2019): “The Economic Value of Crime Control: Evidence From a Large Investment on Police Infrastructure in Colombia,” Unpublished draft, Harris School of Public Policy at the University of Chicago. 2 Mountjoy, J. (2019): “Community Colleges and Upward Mobility,” Working paper. 12, 14 Newey, W. K. (1994): “The Asymptotic Variance of Semiparametric Estimators,” Econometrica, 62, 1349–1382. 5 O’Hagan, A. and J. Kingman (1978): “Curve Fitting and Optimal Design for Prediction,” Journal of the Royal Statistical Society. Series B (Methodological), 1–42. 9 Oreopoulos, P. (2006): “Estimating Average and Local Average Treatment Effects of Education When Compulsory Schooling Laws Really Matter,” The American Economic Review, 96, 152– 175. 12 Park, B. U., E. Mammen, Y. K. Lee, and E. R. Lee (2015): “Varying Coefficient Regression Models: A Review and New Developments,” International Statistical Review, 83, 36–64. 2, 9 Rubin, H. (1950): “Note on Random Coefficients,” in Statistical Inference in Dynamic Economic Models, ed. by T. Koopmans, no. 10 in Cowles Commission Monographs, 419–421. 3 27

Smith, R. J. and R. W. Blundell (1986): “An Exogeneity Test for a Simultaneous Equation Tobit Model with an Application to Labor Supply,” Econometrica, 54, 679–685. 4 Torgovitsky, A. (2015): “Identification of Nonseparable Models Using Instruments With Small Support,” Econometrica, 83, 1185–1197. 6 Wald, A. (1947): “A Note on Regression Analysis,” The Annals of Mathematical Statistics, 18, 586–589. 3 Wooldridge, J. M. (1997): “On Two Stage Least Squares Estimation of the Average Treatment Effect in a Random Coefficient Model,” Economics Letters, 56, 129–133. 2, 3 ——— (2003): “Further Results on Instrumental Variables Estimation of Average Treatment Effects in the Correlated Random Coefficient Model,” Economics Letters, 79, 185–191. 2, 3 ——— (2008): “Instrumental Variables Estimation of the Average Treatment Effect in Correlated Random Coefficient Models,” in Modeling and Evaluating Treatment Effects in Econometrics, ed. by D. Millimet, J. Smith, and E. Vytlacil, Elsevier. 2, 3 28

Cite this document
APA
David Benson, Matthew A. Masten, & Alexander Torgovitsky (2022). ivcrc: An Instrumental Variables Estimator for the Correlated Random Coefficients Model (FEDS 2020-046). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2020-046
BibTeX
@techreport{wtfs_feds_2020_046,
  author = {David Benson and Matthew A. Masten and Alexander Torgovitsky},
  title = {ivcrc: An Instrumental Variables Estimator for the Correlated Random Coefficients Model},
  type = {Finance and Economics Discussion Series},
  number = {2020-046},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2022},
  url = {https://whenthefedspeaks.com/doc/feds_2020-046},
  abstract = {We discuss the ivcrc module, which implements an instrumental variables (IV) estimator for the linear correlated random coefficients (CRC) model. The CRC model is a natural generalization of the standard linear IV model that allows for endogenous, multivalued treatments and unobserved heterogeneity in treatment effects. The estimator implemented by ivcrc uses recent semiparametric identification results that allow for flexible functional forms and permit instruments that may be binary, discrete, or continuous. The ivcrc module also allows for the estimation of varying coefficients regressions, which are closely related in structure to the proposed IV estimator. We illustrate use of ivcrc by estimating the returns to education in the National Longitudinal Survey of Young Men. Accessible materials (.zip)},
}