Parallel Trends Forest: Data-Driven Control Sample Selection in Difference-in-Differences
Abstract
This paper introduces parallel trends forest, a novel approach to constructing optimal control samples when using difference-in-differences (DiD) in a relatively long panel data with little randomization in treatment assignment. Our method uses machine learning techniques to construct an optimal control sample that best meet the parallel trends assumption. We demonstrate that our approach outperforms existing methods, particularly with noisy, granular data. Applying the parallel trends forest to analyze the impact of post-trade transparency in corporate bond markets, we find that it produces more robust estimates compared to traditional two-way fixed effects models. Our results suggest that the effect of transparency on bond turnover is small and not statistically significant when allowing for constrained deviations from parallel trends. This method offers researchers a powerful tool for conducting more reliable DiD analyses in complex, real-world settings.
Finance and Economics Discussion Series Federal Reserve Board, Washington, D.C. ISSN 1936-2854 (Print) ISSN 2767-3898 (Online) Parallel Trends Forest: Data-Driven Control Sample Selection in Difference-in-Differences Yesol Huh and Matthew Vanderpool Kling 2025-091 Please cite this paper as: Huh, Yesol, and Matthew Vanderpool Kling (2025). “Parallel Trends Forest: Data-Driven Control Sample Selection in Difference-in-Differences,” Finance and Economics Discussion Series 2025-091. Washington: Board of Governors of the Federal Reserve System, https://doi.org/10.17016/FEDS.2025.091. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.
Parallel Trends Forest: Data-Driven Control Sample Selection in ∗ Difference-in-Differences Yesol Huh and Matthew Vanderpool Kling Federal Reserve Board September 23, 2025 Abstract Thispaperintroducesparalleltrendsforest,anovelapproachtoconstructingoptimalcontrolsamples when using difference-in-differences (DiD) in a relatively long panel data with little randomization in treatment assignment. Our method uses machine learning techniques to construct an optimal control sample that best meet the parallel trends assumption. We demonstrate that our approach outperforms existing methods, particularly with noisy, granular data. Applying the parallel trends forest to analyze the impact of post-trade transparency in corporate bond markets, we find that it produces more robust estimates compared to traditional two-way fixed effects models. Our results suggest that the effect of transparency on bond turnover is small and not statistically significant when allowing for constrained deviations from parallel trends. This method offers researchers a powerful tool for conducting more reliable DiD analyses in complex, real-world settings. ∗Theviewsofthispaperaresolelytheresponsibilityoftheauthorsandshouldnotbeinterpretedasreflectingtheviewsofthe BoardofGovernorsoftheFederalReserveSystemorofanyotherpersonassociatedwiththeFederalReserveSystem. Federal ReserveBoard,20thSt. andConstitutionAvenue,NW,Washington,DC,20551. Pleasesendcommentstoyesol.huh@frb.gov 1
1 Introduction Difference-in-differences (DiD) is one of the most frequently used methodologies in economics and social sciences for establishing causality and estimating the impact of policy changes. Its effectiveness relies critically on the parallel trends assumption—the notion that, in the absence of treatment, the difference between the treatment and control groups would have remained constant over time. Because this critical assumption involves unobserved values, that is, what the outcome values would be for the treated sample if there were no treatment, it is nearly impossible to prove that the parallel trends assumption holds within empirical work. Researchershaveuseddifferentapproachestoconvincetheiraudiencethattheparalleltrendsassumption holds. The most robust is using randomness in treatment assignment. On the extreme, if treated units were decided completely randomly, the treated group and control group would be similar otherwise, and with enough units parallel trends would hold. As long as there is some randomness in assignment (for instance, treatment assignment is random conditional on certain covariates), researchers can select treatment and control samples for which the assignment would be random within those samples. Unfortunately, because we rarely get to oversee an experiment in economics, many cases in which researchers would want to use DiD, such as a policy change for a subset of the population, have zero or very little randomness in treatment assignment. Because natural experiments are fairly rare to begin with, we maystillwanttoexploitthesettingasmuchaspossibletogainempiricalknowledge. Inthiscase,researchers often pick treatment and control samples that look relatively similar to each other in terms of important characteristics, plot the time series in the pre-treatment period, and argue that the series roughly follow paralleltrendswithoneanother. Recentliteraturehasusedthepre-treatmentdatatoformallytestwhether the treatment and control data have different trends and proposed allowing explicitly for deviations from parallel trends (Rambachan and Roth, 2023). In this paper, we address the aforementioned challenge by introducing a novel approach to estimating treatment effects called parallel trends forest. The goal of the parallel trends forest is to find the optimal control sample for which the outcome variable moves in parallel with the treated sample in the absence of the treatment. We estimate the optimal control sample using the pre-treatment data and assume that the parallel movements would continue in absence of treatment. Specifically, for each treatment unit, we find a set of optimal weights on the control units in which the weighted average outcome of the control units wouldmoveinparallelwiththeoutcomeofthetreatmentunit. Ourapproachbuildsontherecentliterature on synthetic control (Abadie and Gardeazabal, 2003; Abadie et al., 2010) and synthetic DiD (Arkhangelsky 2
et al., 2021) in that the optimal control is assumed to be a linear combination of the control sample and the weights are estimated using the pre-treatment period. Methodologically, we use random forests, allowing us to work with a large number of covariates and a large number of treatment and control units. We also introduce a different measure of deviation from “perfect” parallel trends that works with noisy granular data. Usingrandom forestsalsohastheaddedadvantageofautomaticallyselectingcovariatesthatturnout to be important in how the outcome variable moves, rather than selecting on covariates that the researchers assume are important. We then demonstrate the performance of the parallel trends forest using the introduction of post-trade transparency in corporate bond markets. Between 2002 and 2005, the Financial Industry Regulatory Authority (FINRA) phased in post-trade transparency in which trade information such as price and quantity became available to market participants shortly after the trade. Because bonds were divided into several groups and each group was phased in at different times, this setup allows for a natural experiment through which we can study the impact of post-trade transparency. However, phase lists were determined by issue size, ratings, and turnover—which are variables that are generally considered to impact liquidity, the outcomevariableofinterest—andhadverylittlerandomnessbuiltin. Nevertheless,giventheimportanceofthe question and the lack of other natural experiments that can study the impact of transparency on liquidity, many academic papers have used this event (Edwards et al., 2007; Bessembinder et al., 2006; Asquith et al., 2019). The data are granular, noisy, and tend to have non-normal distributions, especially for outcome variables such as weekly trading volume or turnover, which have a high share of zeros. We first show, using a placebo test, that parallel trends forest works better than existing methods such assyntheticcontrol,syntheticDiD,andmatrixcompletion(Atheyetal.,2021). Theexistingmethodsfailto produce an optimal control that fits the treated sample closely in the pre-treatment period or fits perfectly in-sample but performs poorly out-of-sample. We further show via Monte Carlo simulations that parallel trends forest does almost as well as two-way fixed effects estimator that already knows the correct control sample. We then use one of the phases in the post-transparency introduction to further illustrate the parallel trendsforestandcompareitwiththetwo-wayfixedeffects(TWFE)estimator. TheTWFEestimator,which is the estimate from a pooled regression with unit fixed effects and time fixed effects, is what most applied researchers use when employing a DiD method. We study the impact of post-trade transparency on bond turnover for Phase 2, the phase in which medium-sized bonds with high ratings became transparent. The TWFE estimator, when using various “reasonable” control samples, gives different estimates and sometimes 3
even flips sign depending on the control sample used. Thus, an algorithm to select the optimal control in a data-driven way is important. Using the parallel trends forest, we show that the magnitude of the average treatment effect is smaller than what is estimated through TWFE. Furthermore, if we explicitly allow for constraineddeviationsfromparalleltrends(RambachanandRoth,2023), weshowthatthetreatmenteffect is not statistically significant. We also illustrate a parallel trends forest with honesty, which builds a tree on one sample and estimates the weight using the constructed tree and data from another sample. This prevents the algorithm from overfitting in the pre-treatment period. Overall,theparalleltrendsforestproposedinthispaperisadata-drivenmethodforselectingtheoptimal control that most closely moves in parallel with the treated sample. While a large-scale randomized trial would be the ideal setting, in cases with zero or little randomization in treatment assignment our method can help researchers pick the best control sample for DiD studies. In the literature, synthetic control (Abadie and Gardeazabal, 2003; Abadie et al., 2010), synthetic DiD (Arkhangelsky et al., 2021), and matrix completion (Athey et al., 2021) are most closely related to this paper in that they use pre-treatment period to find the set of weights on the control sample that would mostcloselyfitthetreatmentsampleintheabsenceofatreatment. SyntheticcontrolandsyntheticDiDare generally meant for a small number of highly-aggregated data and optimize over the weights directly, which is difficult when the universe of control units are large. Also, with granular data, allowing the algorithm to use covariates to find which set of units follow the parallel trends more closely leads to less overfitting than optimizing the weights directly. Lastly, as we will outline in Section 2, we use a different objective function that is more effective than the usual sum of squared errors for granular, non-normally distributed data. Methodologically, our paper is most closely related to the causal forest developed in Wager and Athey (2018) and Athey et al. (2019). However, the objective of the causal forest is completely different from our parallel trends forest. The goal in causal forests is to estimate heterogeneous treatment effects, so trees are split such that units with similar treatment effect would be in the same leaf. However, in our parallel trends forest, the goal is to group units by those that have parallel trends with one another. We adopt many of the techniques from causal forests—namely, “honesty” and calculation of weights from the forest. Our paper is also related to an emerging literature in economics that studies the DiD methodology more carefully.1 Agrowingnumberofpaperscarefullyanalyzewhetherparalleltrendsholdinpre-treatmentdata before running DiD, for instance, see He and Wang (2017). However, Roth (2022) shows that if the parallel trends assumption is violated, then using only the samples that pass the pretests may bias the treatment 1ForamorecomprehensivereviewonDiD,seeRothetal.(2023)andBakeretal.(2025). 4
effect estimates even more. Because we select the optimal control sample such that they move roughly in parallel with the treated sample, this critique on overfitting could apply to our method. There are two aspects of our methodology that help address this concern. First, overfitting is relatively mild in random forests due to averaging across many trees and using a subset of possible covariates at every split, especially whencomparedtosimplychoosingunitsthatpassthepretests. Second,inSection4.5,wedevelopaparallel trendsforestwithhonesty,whichbuildsthetreeononesampleandestimatestheweightsontheother. This further addresses overfitting. DiD methodology is intimately related to the potential outcomes framework. Both use a treatment and a control sample to estimate the causal effect from some treatment. The potential outcomes framework usuallyemploysamuchstricterapproachofaimingtoapproximatearandomizedexperiment(Rubin,2008). Acriticalassumptioninthis“design-based”approachisthattheprobabilityofbeingtreatedisboundedaway fromzeroandone;thatis,thereisrandomnessintreatmentassignment. Iftheprobabilityoftreatmentdiffers between the treatment and control groups, propensity score matching can be used to achieve balance. On theotherhand, DiDhingesonamuchweakerparalleltrendsassumption. Oursettingofallowingtreatment assignmenttobecompletelybasedonobservedcovariatesclearlydoesnotfitintothedesign-basedapproach. However, while our setting may not be the optimal scenario, it is still a useful and important one to study. Given how rare large-scale natural experiments are in observational studies, researchers would take every opportunitytoexploitthem,andemployingourmethodologywouldbepreferabletoblindlyapplyingaDiD. Also, in settings where we observe the outcome variables for a large number of pre-treatment time periods, it is possible to find a reasonable set of optimal control sample that moves in parallel as long as it exists. 2 Parallel Trends Forest Following the convention in the potential outcomes framework, Y denotes the observed outcome variable u,t for unit u at time t, and Y(d) is the (potential) outcome variable if the treatment status is d at time t, with u,t d = 0 being untreated and d = 1 being treated. We observe t ∈ {1,2,...,T} time periods and a total of M+N units with j ∈{1,...,M} not changing treatment status during this time period (“control sample”) and i ∈ {M +1,...,M +N} receiving treatment at t = T +1 where T < T (“treatment sample”). We 1 1 allow for the possibility that treatment assignment can be deterministic, that is, treatment assignment can fully depend on covariates X. Observation units are granular—each time period is fairly short, and each unit is small—and the observed outcome variables can be noisy and have non-normal distributions. 5
Thegoaloftheparalleltrendsforestisforeachtreatmentuniti∈[M+1,M+N],tofindasetofweights w (i),j ∈[1,M] with (cid:80)M w (i)=1 that would predict the unobserved counterfactual of the treated unit j j=1 j i as a weighted average of observed outcome values of the control sample: M Y(0) =u + (cid:88) w (i)Y t∈[T +1,T] (1) i,t i j j,t 1 j=1 In other words, the goal is to find for treatment unit i a synthetic weighted control unit that satisfy the parallel trends assumption, with u allowing for a constant difference between the two. i In its strictest form, a treated unit i and a control unit j satisfy the parallel trends assumption if (2) holds for all t ,t ∈[1,T]. 1 2 E[Y(0)]−E[Y(0)]=E[Y ]−E[Y ] (2) i,t1 i,t2 j,t1 j,t2 Equation(2)impliesthatwithouttreatment, theoutcomevariablesforiandj moveinparallelwithone another. DiDusesthisassumptiontoinferwhattheoutcomevariablewouldhavebeenforthetreatedunits without treatment at t≥T +1. 1 Whilethereisconsensusonthedefinitionofparalleltrends,thereisnounanimousmeasureforthedegree ofdeviationfromparalleltrendsthattwotimeseriesexhibitfromone-another. Supposewewanttomeasure how far from parallel two time series Y ={Y }T1 and Y ={Y }T1 are. i i,t t=1 j j,t t=1 Consider the often-used deviation measure of 1 (cid:88) T1 (Y −Y¯ −Y +Y¯ )2 (3) T i,t i j,t j 1 t=1 where Y¯ = 1 (cid:80)T1 Y . To see why this would measure the deviation from parallel trends, note that u T1 t=1 u,t if parallel trends hold exactly then E[Y ]−Y¯ = E[Y ]−Y¯ for all t. Synthetic control, synthetic DiD, i,t i j,t j and matrix completion method all minimize some variation of this deviation measure to find the “optimal” synthetic control sample. This measure works well when data is highly aggregated and has very little noise. However, in cases where units are small, observations are noisy, and observed outcomes are drawn from a non-normal or unusual distribution, this measure does not work well, as we will demonstrate in Section 3. 6
We instead define our own deviation measure ||Y ,Y || as: i j (cid:32) 1 (cid:88) T1 1 (cid:88) τ (cid:33) (cid:32) 1 (cid:88) T1 1 (cid:88) τ (cid:33) c(Y ,Y ,τ)= Y − Y − Y − Y i j T −τ i,t τ i,t T −τ j,t τ j,t 1 1 τ+1 1 τ+1 1 1 (cid:88) T1 1 (cid:88) τ = (Y −Y )− (Y −Y ) (4) T −τ i,t j,t τ i,t j,t 1 τ+1 1 (cid:118) (cid:117) (cid:117) 1 T (cid:88)1−1 ||Y ,Y ||=(cid:116) c(Y ,Y ,τ)2 (5) i j T −1 i j 1 τ=1 To illustrate the intuition, consider a placebo test with observed data Y ={Y }T1 and Y ={Y }T1 . In i i,t t=1 j j,t t=1 this test, we hypothetically assume that unit i receives the treatment at time τ +1, while unit j does not change treatment status throughout. The estimated treatment effect will be c(Y ,Y ,τ). If the two series i j satisfy the parallel trends assumption and neither are treated during the period, E[c(Y ,Y ,τ)] = 0 for all i j τ ∈ [1,T −1]. Thus, ||Y ,Y || would measure how far the placebo treatment effect is from zero when the 1 i j placebo treatment date is assigned at random. Measuring deviation from parallel trends in this way ties much more closely to the treatment effect, and we find our method behaves better with noisy data and non-normal distributions. We now move on to outlining our method, which we call “parallel trends forest.” The aim of the parallel trends forest is to find the optimal control sample that moves in parallel for each treatment unit. We first build a tree that uses the deviation measure ||Y ,Y || to group units that have parallel trends i j together based on only the pre-treatment period data. This is similar to the classification and regression tree (CART) but with two major differences. First, trees are trained on a panel data instead of the usual cross sectional data, so we only allow splits along units and not along the time dimension. Second, we use a different splitting function from what is normally used in a CART. Consider a set of units J in the parent node that we aim to partition into two subsets, S and S . We 1 2 define the deviation from parallel trends across units within the leaf J as: (cid:88) L(J)= ||Y , mean (Y )||2. (6) i j∈J j i∈J We split the units into two groups, S and S , in a way that minimizes L(S )+L(S ). 1 2 1 2 Using this splitting rule, we construct B trees to get a parallel trends forest. For the most part, we follow the literature on random forests (Breiman, 2001). For each tree, we take a random sample (without 7
Figure 1: Example tree The following picture shows a truncated tree with first few levels. 1 mean_outstanding £ 450000 >450000 2 19 offering_amt coupon_type £ 275000 >275000 F, V Z 23 $obj_val 3 14 20 [1] 25.73547 bond_type offering_date coupon_type $obs_count [1] 4 CDEB, RNT, USBN CMTN, CPAS, OTHER £ 11827 >11827 F V 18 21 22 $obj_val $obj_val $obj_val 4 9 15 [1] 10.3388 [1] 7.198814 [1] 10.69859 avg_rating avg_rating fungible $obs_count $obs_count $obs_count [1] 52 [1] 998 [1] 106 £ 18 >18 £ 20.706 >20.706 N Y 8 13 16 17 $obj_val $obj_val $obj_val $obj_val 5 [1] 8.13291 10 [1] 3.517346 [1] 5.707796 [1] 10.23397 fungible $obs_count mean_outstanding $obs_count $obs_count $obs_count [1] 206 [1] 8 [1] 890 [1] 166 , N Y £ 2e+05>2e+05 6 7 11 12 $obj_val $obj_val $obj_val $obj_val [1] 3.176822 [1] 7.611654 [1] 2.323579 [1] 12.76411 $obs_count $obs_count $obs_count $obs_count [1] 5219 [1] 292 [1] 4489 [1] 88 replacement) of K units from M +N total units, and at every split, we allow the algorithm to split on kˆ randomly-selected covariates out of k covariates.2 We use K = 0.5(M +N) and kˆ = 1k. We split until all 3 leaves have at most 100 units. Figure 1 shows an example tree generated by our algorithm. The outcome variable of interest is weekly turnover(i.e., weeklytradingvolumedividedbyamountoutstanding)forcorporatebonds. Treessometimes splitoncovariates,likeamountoutstanding,thatareintuitivelyhighlycorrelatedwithturnover,butatother timessplitoncovariates,suchascoupontypeorbondtype,whosecorrelationwithturnoverislessintuitive. When trees split on seemingly less important covariates, it may be that they are uncovering potentially complex relationships that are missed otherwise or they may be overfitting. Regardless, we will show in Section 3 that our parallel trends forest, which is an ensemble of trees, performs quite well. 2Randomforests,includingtheimplementationinBreiman(2001),typicallysamplewithreplacementtocreatetrainingdata for building trees, but several papers have shown that subsampling without replacement behave better in certain situations (Bu¨hlmannandYu,2002;Strobl,Boulesteix,Zeileis,andHothorn,2007). WefollowAtheyetal.(2019)andusesubsampling withoutreplacement,butresultsaresimilarwhenwesamplewithreplacement. 8
Lastly, we find the optimal control sample weights for each treatment unit in a similar manner to the construction done in Athey et al. (2019). For each treated unit i, denote G(i) as the set of trees, out of B trees, in which i appears in and have at least one control unit in the same leaf. Then, for every pair of treated unit i and control unit j, the weight w (i) is calculated as: j 1(j is in the same leaf as i in tree b) w (i)= , b∈G(i) (7) b,j Number of control bonds in same leaf as i in tree b 1 (cid:88) w (i)= w (i) (8) j N(G(i)) b,j b∈G(i) where N(G(i)) is the number of trees in G(i). The weight w (i) captures the frequency with which control j (cid:80) unit j falls into the same leaf as i. w (i)Y is the optimal control sample for treated unit i. We can j j j approximate Y(0) for treated bond i on time t as: i,t M Yˆ(0) =u + (cid:88) w (i)Y (9) i,t i j j,t j=1 1 (cid:88) T1 1 (cid:88) T1 (cid:88) M u = Y − w (i)Y (10) i T i,t T j j,t 1 1 t=1 t=1j=1 Because we do not force individual leaves to contain both treated and control units, there could exist leaves that have only treated units or only control units. Thus, in principle, if a treated unit behaves very differently from all control units, it may not have any control units in the same leaf for all trees in the forest and thus may be not possible to calculate the weights for the particular treated unit. While this may seem problematic at first, excluding those treatment units that do not have good optimal controls is preferable sincetheATTforthemwouldbebiased. OnemayalsochoosetoomittreatmentunitswithasmallN(G(i)) for similar reasons. In our particular use case, all of our treated units have optimal controls. As is standard in the literature, we are interested in the average treatment effect for the treated (ATT). ATT is estimated as: M+N T ATT = 1 (cid:88) (cid:88) (Y −Yˆ(0)). (11) N(T −T ) i,t i,t 1 i=M+1 t=T1+1 Itisworthnotingthatweincludeunitsthataretreatedbeforethesampleperiodstarts(“earlier-treated” units) in the control sample as long as they do not change treatment status during the sample period. Goodman-Bacon(2021)arguesthatifthetreatmenteffectistime-varyingratherthanashiftinlevels,using the earlier-treated units as controls can be problematic. In our case, the control sample is just the set of candidates that can be included in the optimal control. If the effect of the treatment for the earlier-treated 9
unitshasnotbeenfullyphasedinbythestartofthesampleperiod, thoseunitswouldlikelyhaveadifferent trend from the treated units and would end up having zero or very small weights in the optimal control. 3 Comparison with Existing Methods Three methodologies from the literature—synthetic control, synthetic DiD, and matrix completion—share our goal of finding the weights to construct an optimal control. In this section, we demonstrate that in a setting with noisy, granular, and non-normally distributed data, these existing methods may fall short while our parallel trends forest approach performs well. 3.1 Description of existing methods Wefirstquicklydescribesyntheticcontrol,syntheticDiD,andmatrixcompletionatahighlevel. Foramore completeexplanation,readersarereferredtothecitationswithin. SyntheticcontrolwasproposedbyAbadie and Gardeazabal (2003) and Abadie et al. (2010) to estimate the treatment effect in situations where the number of treated units, N, is small, and each unit is at an aggregated level such as a state or a country. Using the setting from Doudchenko and Imbens (2016) but with N possibly larger than 1, synthetic control minimizes:3 2 1 M (cid:88) +N (cid:88) T1 (cid:88) m L sc = NT Y i,t − w j (i)Y j,t +regularization term. (12) 1 i=M+1t=1 j=1 Synthetic DiD also minimizes a very similar function: 2 1 (cid:88) T1 1 M (cid:88) +N (cid:88) m L sdid = T N Y i,t −w 0 − w j Y j,t +regularization term. (13) 1 t=1 i=M+1 j=1 The regularization is done slightly differently for each approach and synthetic DiD includes an intercept term, but otherwise the objective function is very similar and uses the deviation measure (3). Estimation of the treatment effect is somewhat different because synthetic DiD allows for unit fixed effects and time fixed effects as well as including time weights. Matrix completion (Athey et al., 2021) takes a somewhat different approach. This approach sets matrix Yˆ, a (M +N)×T matrix consisting of Y values if there is no change in treatment status for all (M +N) 3TheoriginalsettingfromAbadieetal.(2010)isslightlydifferentinthattheyincludecovariatesintheobjectivefunction. 10
units, as: Y Y ··· Y Y ··· Y 1,1 1,2 1,T1 1,T1+1 1,T Y 2,1 Y 2,2 ··· Y 2,T1 Y 2,T1+1 ··· Y 2,T . . . . . . ... . . . . . . ... . . . Yˆ = , (14) Y M+1,1 Y M+1,2 ··· Y M+1,T1 ? ··· ? . . . . . . ... . . . . . . ... . . . Y Y ··· Y ? ··· ? M+N,1 M+N,2 M+N,T1 where the lower right N ×(T −T ) quadrant is missing because we do not observe the outcome of the 1 treated units in the absence of treatment after time T . The missing values of Yˆ are then imputed by using 1 a low-rank approximation of what is observed in Yˆ.4 3.2 Comparison with synthetic control, synthetic DiD, and matrix completion We compare the performance of the three existing methods and our parallel trends forest via a placebo test usingthefirst36weeksoftradingdatafrom2006.5 This36weekperiodiswellremovedfromthelastTRACE phase-in date of February 7, 2005, so most likely any phase-in effects would have been fully incorporated by the start of 2006. In this placebo test we designate a “treatment” group and an “control” group such that half of the bonds in the control group have similar characteristics to the bonds in the treatment group, the other half not. If a method is effective, it should assign higher weights to bonds that are more similar to the treatmentgroup,andlowerweightstothosedissimilar. Thetreatmentgroupcontains663randomlyselected investment-gradebondsof$500Morlarger(“largeIGbonds”). Thecontrolgroupiscomprisedoftwosetsof bonds: 663largeIGbonds(thesimilarbondstothetreatmentsample)and663randomly-selectedhigh-yield bonds that are $100M or smaller (“small HY bonds”).6 We first confirm in Figure 2 that turnover for large IG bonds in the control sample indeed behaves similarly to the treatment sample by moving in parallel, whereas the small HY part of the control sample moves differently. Figure3plotstheoutcomeofthefourmethods. Wepretendthattreatmentbeginsatweek19,soonlythe datafromthefirst18weeksareusedtofittheweightsinsyntheticcontrol,syntheticDiD,andparalleltrends 4Matrixcompletionmodeldoesnotexplicitlyapproximatethecounterfactualoutcomesforthetreatedsampleasaweighted averageofthecontrolsample;ratheritapproximatesthevaluesdirectly. However,giventhatYˆ isapproximatedwithalow-rank matrix,wecanusesingularvaluedecompositiontogetanapproximatesetofweights. 5WewilldescribethesettinganddatainmoredetailinSections4.1and4.2. 6Thereare1,326totallargeIGbondsthatareoutstandingthroughoutthesampleperiod. Werandomlyselecthalfofthem tobeinthetreatmentgroup,andtheotherhalftobeinthecontrolgroup. 11
Figure 2: Average weekly turnover for the placebo test data Thisfigureplotstheaverageweeklyturnoverforthetreatmentsample,largeIGbondsinthecontrolsample, and small HY bonds in the control sample. 2.5 2.0 1.5 1.0 0.5 0 10 20 30 Week )%( revonruT ylkeeW Treatment Large IG control Small HY control forest.7 Inthecaseofmatrixcompletion,thefirst18weeksoftreatmentdataandall36weeksofcontroldata are usedfor the estimation. All methods exceptfor synthetic control allowfor a constant differencebetween the treatment sample and the optimal control sample. Panel (a) plots the average turnover for treated and control units as well as the fitted average values using synthetic control and synthetic DiD. Panels (c) and (d) plot the values using matrix completion and parallel trends forest, respectively. Panel (a) indicates that synthetic control fits the data very closely in-sample (in the pre-treatment period) but very poorly out-of-sample. Synthetic DiD produces a counterfactual that is very similar to the control sample. Panel (b) illustrates the distribution of weights produced by the synthetic DiD. Specifically, it displays a Lorenz curve of w¯ = 1 (cid:80)M+N w (i), where w¯ represents the average weight assigned to j N i=M+1 j j controlbondj acrossalltreatedbonds. Toconstructthecurve, wefirstsortw¯ inascendingorderandthen j plot the cumulative sum against the number of bonds. The resulting straight line implies that the synthetic DiDassignsalmostequalweightstoallcontrolbonds,whichisclearlyfarfromoptimal. Panel(c)showsthe results for matrix completion both with and without time fixed effects. The counterfactual with the fixed effects is clearly similar to the average control turnover up to a constant shift, indicating that weights are similar across all control bonds, which is far from optimal as one half (large IG) of the control sample is 7Strictlyspeaking,syntheticDiDusespre-treatmentperioddatatofittheunitweightsandallcontroldatatofitthetime weights. Forourcomparison,unitweightsaremoreimportant. 12
Figure 3: Performance of the four methods Figuresbelowplottheoutcomeoftheplacebotestusingsyntheticcontrol,syntheticDiD,matrixcompletion, andparalleltrendsforest. Panels(a),(c),and(d)plottheaverageweeklyturnoverforthetreatmentsample, the control sample, and the fitted optimal control sample using synthetic control and synthetic DiD (panel a), matrix completion (panel c), and parallel trends forest (panel d). Panel (b) presents the Lorenz curve for the optimal weights derived by the synthetic DiD method. (a) Synthetic control and synthetic DiD 2.5 2.0 1.5 1.0 0.5 0 10 20 30 Week )%( revonruT ylkeeW (b) Lorenz curve for synthetic DiD weights 1.00 0.75 0.50 Treatment 0.25 Control Synthetic Control Synthetic DiD 0.00 0 400 800 1200 Number of Bonds thgieW evitalumuC (c) Matrix completion 2.5 2.0 1.5 1.0 0 10 20 30 Week )%( revonrut egarevA (d) Parallel trends forest 2.5 Treatment Control MC MC w/o FE 2.0 1.5 1.0 0 10 20 30 Week )%( revonruT ylkeeW Treatment Control Optimal control 13
clearly more similar to the treatment sample than the other half (small HY). The counterfactual without fixed effects is much flatter and far from what the average treatment sample turnover looks like. Results for parallel trends forest in Figure 3(d) indicate that parallel trends forest performs significantly better. The optimal control tracks the treatment sample quite well, and 93.1% of the weight is in large IG bonds. 3.3 Monte Carlo simulations Havingestablishedthatparalleltrendsforestworkssignificantlybetterthansyntheticcontrol,syntheticDiD, and matrix completion in our setting, we now further study the performance of parallel trends forest and compare it to the usual DiD approach (TWFE estimator) using Monte Carlo simulations. For each Monte Carlosimulation,wegenerateaplacebotestsampleasoutlinedinSection3.2andextracttheATTestimate fromtheparalleltrendsforestaswellastheTWFEestimatefromtheusualDiDsetupforcomparison. The TWFE estimate is the estimated β from the following pooled regression: Y =A +B +β1(u∈trmt)1(t>T )+ϵ (15) u,t u t 1 u,t where A is the bond fixed effect, B is the time fixed effect, 1(u∈trmt) indicates whether bond u is in the u t treatment sample, and 1(t>T ) indicates whether time t is in the post-treatment period. 1 Figure 4 presents the results of 100 Monte Carlo simulations. Since this is a placebo test, a good estimator would give estimates close to zero. The TWFE estimates that use all control bonds (Panel (b)) clearly perform poorly, as the mean estimate is -0.146 with a standard deviation of 0.055. In comparison, parallel trends forest, presented in Panel (a), does quite well with a mean estimate of -0.034 and standard deviation of 0.063. It does slightly worse than the case where we know the optimal control sample ex ante (largeIGbonds),aspresentedinPanel(c). Overall,theparalleltrendsforestperformsquitewellinselecting the correct weights. 14
Figure 4: Monte Carlo simulation The figures below plot the distribution of the average treatment effect estimates from 100 Monte Carlo simulations. Panel (a) uses ATT estimates from parallel trends forest. Panels (b) and (c) use TWFE estimates, where Panel (b) uses all control bonds as controls and Panel (c) uses only large IG bonds as controls. Mean and standard deviation of average treatment effect estimates are also presented. (a) Parallel trends forest mean: −0.034 30 std dev: 0.063 20 10 0 −0.2 −0.1 0.0 0.1 ATE from parallel trends forest tnuoC (b) DiD using all control bonds mean: −0.146 30 std dev: 0.055 20 10 0 −0.3 −0.2 −0.1 0.0 ATE from DiD tnuoC (c) DiD using large IG bonds only mean: −0.003 30 std dev: 0.071 20 10 0 −0.2 −0.1 0.0 0.1 0.2 ATE from DiD with IG control only tnuoC 15
4 Effect of Post-Trade Transparency: Analysis using Parallel Trends Forest 4.1 TRACE Introduction Post-trade transparency—that is, the release of transaction data after the execution of a trade—was introduced in the corporate bond market over multiple phases between 2002 and 2005 through a system called Trade Reporting and Compliance Engine (TRACE). Multiple academic papers have exploited the phase-in design of TRACE introduction to study the effect of transparency on trading costs (Bessembinder et al., 2006; Edwards et al., 2007; Goldstein et al., 2007). The majority of academic papers agree with the finding that transparency decreased trading costs for customers in all phases of the introduction. Although the effect transparency has on trading cost is well-studied and consistent, the opposite is true for the effect on trading activity. Dealers opposed increasing transparency because they believe it would makeitmoredifficulttooffloadpositionsacquiredfrommarketmaking,andaccordingtoBessembinderand Maxwell (2008), many market participants have indicated greater difficulty trading post-TRACE. However, additionalstudiesreachdifferentconclusions. Goldsteinetal.(2007)findthattradingvolumedidnotchange around the FINRA 120 event. Asquith et al. (2019) study the effect of transparency on various measures of trading activity separately for all phases and find that trading activity remains unchanged for all phases, except the last in which the less-frequently traded high-yield bonds are phased in. For those illiquid bonds, they find that number of trades decreases with TRACE introduction. Because TRACE was phased in over multiple disjoint events, it is a prime candidate for a natural experiment to establish causality. However, the choice of which bonds phasing in when was not random except for one small phase. Bonds with similar ratings, size, and trade frequencies were phased in together, and these characteristics are generally considered important for bond liquidity. Therefore, one cannot argue that the control bonds are similar to treated bonds, potentially violating the parallel trends assumption that is crucial for a DiD study. Furthermore, any effects of TRACE introduction seems to be gradual, making it necessary to look over a relatively long sample period and rely heavily on the parallel trends assumption. Researchers have tried to circumvent this problem by carving out a subset of control bonds that look relatively similar to the treated bonds in terms of ratings and bond size, but because there is little common support the authors still implicitly make a strong parallel trends assumption. FINRAbegancollectingtradeinformationsuchastradedvolumeandpricefromdealersthroughTRACE starting July 1, 2002. Dealers were required to submit trade information to TRACE for all TRACE- 16
eligiblecorporatebonds,butFINRAdidnotdisseminateallofthisinformation. Theyinitiallydisseminated information to the market for a subset of bonds (Phase 1). Over the next few years, FINRA expanded the dissemination to cover all TRACE-eligible bonds that are not 144a bonds by February 2005. The phases were as follows. • Phase 1, July 1, 2002: Investment grade bonds with issue size $1 billion or greater • FIPS 50, July, 1, 2002: 50 high-yield bonds disseminated under Fixed Income Pricing System (FIPS) • Phase 2, Mar 3, 2003: Investment grade bonds of issue size $100 million or greater, or bonds rated Aor higher • FINRA 120, Apr 14, 2023: 120 chosen BBB-rated bonds • Phase 3A, Oct 1, 2003: Bonds rated BBB, and more-frequently traded bonds within those rated BB+ or lower • Phase 3B, Feb 7, 2005: Less-frequently traded bonds within those rated BB+ or lower In this paper, we illustrate our parallel trends forest by using the methodology to estimate the impact of Phase 2 implementation on bond turnover. We also compare our results with TWFE estimates. 4.2 Data We use the enhanced TRACE data from WRDS to gather information on dissemination status and trading activity. TRACE data includes a list of bonds that are eligible to be reported to TRACE (“TRACEeligible bonds,” which is a superset of bonds eligible for dissemination) and an indicator for whether each bond is eligible for post-trade dissemination for every trading day.8 We create the treatment and control samples for Phase 2 by using this list. The control sample is all bonds that exist throughout the 36-week period surrounding the phase-in date and do not change dissemination status, and thus include bonds that are disseminated throughout (Phase 1 and FIPS 50 bonds) as well as bonds that are not disseminated throughout (bonds with lower ratings and smaller bonds). The treatment sample are the bonds that exist throughout the 36-week period and change their dissemination status to start dissemination exactly on the 8Some bonds may be outstanding and eligible to report to TRACE but may not trade at all over the sample period, and for our goal of studying the impact of transparency on turnover, it is important to include these bonds in the sample. Thus, cleaning the list of TRACE-eligible bonds correctly is crucial. The TRACE master file in WRDS often contains bonds that have already matured or have zero outstanding; we delete those bond-days. We also delete convertible bonds, exchangeable bonds,and144abonds. 17
phase-in date and do not change dissemination status on any other dates during the 36-week period. There are 2,204 bonds in the treatment sample and 10,314 bonds in the control sample. The outcome variable of interest is weekly turnover, calculated as the week’s trading volume divided by the bond’s outstanding amount. We allow the algorithm to split on more than 25 covariates, including variables that are usually considered to impact turnover such as average rating, outstanding amount, age, and past turnover, as well as potentially less-related covariates such as seniority and industry. Most of these bondcharacteristicsvariablesarefromMergentFISD.Figure5plotsthecumulativeprobabilitydistribution oftheweeklyturnoverdata. Abouthalfofthebond-weekobservationsarezero,andtherestroughlyfollows an exponential distribution. Figure 5: Distribution of weekly turnover This figure plots the cumulative probability distribution for weekly turnover. Each observation is at the bond-week level. 1.0 0.9 0.8 0.7 0.6 0.5 0.0 0.5 1.0 1.5 2.0 Turnover (%) ytilibaborp evitalumuC 4.3 Two-way fixed effects estimator We first present the results from the TWFE estimator, the most commonly used estimator in the DiD literature. This specification assumes that the control sample that is used in the regression satisfies the parallel trends assumption. Because the treatment assignment is not random and treatment criteria of ratings and issue size are highly correlated with the variable of interest (turnover), we follow the existing literatureanduseasampleofcontrolbondsthathavesomewhatsimilarcharacteristicsasthetreatedbonds. 18
In particular, we follow Edwards et al. (2007) and construct three different control samples: bonds that are transparent throughout the sample and are rated A or higher (“Transparent & ≥ A”), bonds that are not transparent throughout and are rated A or higher (“Not Transparent & ≥ A”), and bonds that are not transparent throughout that are BBB-rated and have size between 100M and 1B (“BBB & 100M–1B”).9 The first two control samples are similar in ratings but differ in issue size from the treated sample, and the third control sample is similar in issue size but differ in ratings. We also test using all control bonds as an additional control sample. Figure 6 shows the average weekly turnover for the treatment sample, full control sample, and the three control samples. It is clear that even in the pre-treatment period the different control samples behave quite differently. For instance, the Transparent & ≥A sample have a much stronger trend downwards during the pre-treatment period compared to the treatment sample. Figure 6: Average weekly turnover for Phase 2 data This figure plots the average weekly turnover for treated bonds, all control bonds, and various control samples. 4 3 2 1 -10 0 10 Week )%( revonruT ylkeeW Treatment All controls Transparent & ≥A Non-Transparent & ≥A BBB & 100M- 1B Table1presentstheTWFEestimatorusingthevariouscontrolsamples. Theestimatesforthetreatment effect changes with the control sample used, which underscores the need for a method that chooses which 9Edwardsetal.(2007)studytheeffectofpost-tradetransparencyonbid-askspreads, notturnover, soourresultsisnota critiqueofthepaper. 19
controlsampleworksbest. Moreover,themostoptimalcontrolsamplemaynotbeoneofthefourusedhere. Table 1: Two-way fixed effects estimator The following table presents the treatment effects estimated by regressing weekly turnover on bond fixed effects, time fixed effects, and the interaction of post-treatment indicator and treated indicator. For their respective control samples, column (1) uses all control bonds, column (2) uses Transparent & ≥ A bonds, column (3) uses Not Transparent & ≥ A bonds, and column (4) uses BBB & 100M–1B bonds. All Transparent & ≥ A Not Transparent & ≥ A BBB, 100M-1B (1) (2) (3) (4) Treated × Post −0.185∗∗∗ 0.719∗∗∗ −0.184∗∗ 0.060 (0.052) (0.179) (0.068) (0.077) date f.e. Yes Yes Yes Yes cusip f.e. Yes Yes Yes Yes Observations 450,648 87,768 200,520 134,928 R2 0.232 0.175 0.138 0.165 Adjusted R2 0.210 0.151 0.113 0.141 Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01 double clustered standard errors 4.4 Parallel trends forest results We construct a parallel trends forest with 1,000 trees based on pre-treatment data to estimate the weights. We then use those weights to construct the optimal control (Yˆ(0)) for each treated bond i. When building i,t thetrees, wesplituntileachleafcontainslessthan100bonds, butwedonotrestricttheleavestohaveboth treated and control bonds. We also allow the algorithm to split on ratings and bond issue size, which are the variables used to determine treatment assignment, so it is theoretically possible to have treated bonds that do not have any corresponding optimal control sample if none of the control bonds behave similarly; but in this sample that does not seem to be the case. Figure 7 presents the average weekly turnover for the treated bonds, control bonds, and the optimal control sample. The optimal control sample follows the treated sample quite closely with a constant shift albeit not perfectly, especially compared to the average control sample, which behaves very differently from the treatment sample. The fact that the optimal control and treatment samples track each other quite well during the pre-treatment period indicates that the algorithm does quite well at selecting the weights in-sample. The two series track each other quite closely post-treatment, and there are no discontinuities around the treatment date, which implies that treatment effect is likely small. Table 2 shows the composition of the optimal control by presenting the sum of w¯ for the various control j 20
Figure 7: Parallel trends forest This figure plots the average weekly turnover for treated bonds, control bonds, and the optimal control sample derived using parallel trends forest. Parallel trends forest uses 1,000 trees. 1.5 1.0 0.5 −10 0 10 Week )%( revonruT ylkeeW Treatment Control Optimal control samples used in the TWFE estimates. The optimal control sample, compared to the outstanding numbers, overweighs the Transparent & ≥ A bonds and the BBB 100M–1B bonds. Panel (b) shows that the optimal control sample matches the characteristics of the treatment sample more closely than the average control sample does. Table 2: Composition of Optimal Control Sample (a) Weights allocated to each subsample Transparent & ≥A Not Transparent ≥A BBB, 100M-1B Other control Sum(weight) 8.1% 26% 42.5% 23.5% Share by count 2.3% 32.6% 15% 50.1% (b) Average characteristics of treated, optimal control, and control samples Sample Rating Outstanding amt Pre-sample turnover Age Time-to-maturity Treatment 5.413 259 0.345 65.144 91.273 Optimal control 8.333 334 0.344 61.373 95.706 Control 9.222 150 0.242 57.584 80.050 We then look at how many trees we would need for our ATT estimates to be reasonably accurate. Using 21
4,000 trees, we plot the distribution of ATT estimates against the number of trees in each forest in Figure 8. Figure 8 indicates that the variance of ATT estimates are reasonably small with more than 50 trees. For instance, a forest with 100 trees would give an ATT estimate between -0.019 and -0.047 95% of the time. While estimating the standard error for parallel trends forest estimate in an analytical way is outside of the scope of this paper, our estimate constructed from 1,000 trees would have a small standard error. Figure 8: Convergence of parallel trends forest The following figure plots the average and 95% confidence interval of ATT estimate against the number of trees used in parallel trends forest construction. The dots indicate the average ATT estimate, and the lines indicate the 95% confidence interval calculated using the average and standard deviation of ATT estimates for the given number of trees. 0.1 0.0 −0.1 −0.2 0 25 50 75 100 Number of trees in forest TTA Thesourceofuncertaintycalculatedinthestandarderroroftheparalleltrendsforestestimatementioned aboveisabouthowaccuratelywecanestimatetheweightsofthe“most”optimalcontrolthathastheclosest paralleltrendstothetreatedsample. TheresultsinFigure8indicatethatthisuncertaintyisfairlylowwith a reasonable number of trees. The low number of trees needed for a reasonable estimate means that large computational resources are not necessary to employ parallel trends forest. However, there are two other potential sources of uncertainty: how closely parallel trends hold and the uncertainties due to random sampling from a larger population.10 In Figure 7, the treated and optimal controlsampledoesnottrackeachotherexactlyinthepre-treatmentperiod. Someofthedeviationcouldbe 10Mostempiricalworkineconomicsliteratureassumethatresearchersobservesomerandomsampleofalargerpopulation. SeeAbadieetal.(2020)fordiscussiononsampling-basedversusdesign-baseduncertainties. 22
frominaccurateweights,butatleastsomeofthedeviationisbecausetheparalleltrendsdonotholdexactly. We deal with this issue by allowing for violation of parallel trends using the approach from Rambachan and Roth (2023). We assume that the average deviation from parallel trends between the post and pre-treatment period is lessthanLtimestheaveragedeviationfromparalleltrendsbetweenthefirstandthesecondhalfofthepretreatment period. More specifically, denote β as the difference in Y on time t between the average treated t bond and the average counterfactual derived from the optimal control sample, which can be estimated from the regression Y −Yˆ(0) =β +ϵ . (16) i,t i,t t i,t Then we assume that (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) T − 1 T 1 t= (cid:88) T T 1+1 β t − T 1 1 (cid:88) t T = 1 1 β t (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) ≤L (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) T 2 1 t=T (cid:88) T 1/ 1 2+1 β t − T 2 1 T (cid:88) t 1 = / 1 2 β t (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) . (17) For brevity, we write (17) assuming T is an even number. We do not use our definition of the deviation 1 from parallel trends (equation (5)) here because it unfortunately does not satisfy the assumptions we need to use Rambachan and Roth (2023), but it has some similarities in that we use |c(Y ,Y ,T /2)| from (4) as i j 1 measure of deviation. Table 3 presents the results for various values of L. The confidence intervals calculated here incorporate uncertainties arising from both deviations from parallel trends and random sampling, but assumes that the optimal control sample estimation is exact. The estimate for ATT is -0.032, smaller in magnitude compared to all the TWFE estimates in Table 1. This indicates that much of the large ATT estimate is capturing the deviation of parallel trends rather than true effect of treatment. In the case of L=0, that is if parallel trends hold perfectly, the ATT estimate is almost statistically significant at the 10% level but otherwise the ATT estimate is not significant for other values of L. Overall, we conclude that the treatment effect is fairly small and not statistically significantly different from zero for reasonable values of L. One may ask—if we still have to allow for violation from parallel trends, why do we bother with trying to find the optimal control? We can still get much more accurate estimates from having an optimal control in which the deviation from parallel trends is much smaller. 23
Table 3: Allowing for violation of parallel trends The following table presents the 90% confidence interval for various values of L. The ATT estimate remains constant. L Estimate Lower Bound Upper Bound 0 -0.032 -0.0643 0.0002 0.5 -0.032 -0.0794 0.0147 1 -0.032 -0.1069 0.0422 1.5 -0.032 -0.1363 0.0716 2 -0.032 -0.1677 0.103 4.5 Parallel trends forest with honesty So far in our parallel trends forest, we select the optimal control sample such that parallel trends hold as closely as possible in the pre-treatment period. However, this does not necessarily guarantee that parallel trends will hold in the post-treatment period when there is no treatment; in general, it is impossible to construct anoptimal control sample thatweknowfor sure thatmoves inparallel with thetreated sample in thecounterfactualno-treatmentworld. HereweadoptatechniquefromWagerandAthey(2018)andAthey et al. (2019) and construct a parallel trends forest that is “honest” in order to reduce bias from overfitting in the pre-treatment period. Construction of the forest is similar to that outlined in Section 2 but with some important differences. When constructing the tree b, we first divide the randomly-sampled K units further into two groups arbitrarily. We grow the tree on the first subsample, then put the second subsample into the tree to generate the weights w (i). Because weights are calculated using out-of-sample data, if parallel trends do not hold b,j well out-of-sample, the optimal control constructed from parallel trends forest with honesty would behave very differently from the treatment sample even in the pre-treatment period. Figure 9 presents the results for the parallel trends forest with honesty. Panel (a) presents the average turnover values for the treatment sample, control sample, optimal control sample calculated from parallel trendsforestwithouthonesty(fromSection4.4), andoptimalcontrolsamplecalculatedfromparalleltrends forest with honesty. The average turnover for the optimal control with honesty tracks the version without honesty quite closely, and both move with the treatment sample fairly closely. The optimal control with honesty fits the treatment sample slightly less closely compared to the version without honesty, especially in the pre-treatment period, which is to be expected since version without honesty uses out-of-sample data to calculate the weights. Panel (b) of Figure 9 compares w¯ = 1 (cid:80)N w (i), where w¯ represents the average weight assigned j N i=M+1 j j 24
Figure 9: Parallel trends forest with honesty The following figures show the results using parallel trends forest with honesty. Panel (a) shows the average turnoverforthetreatmentsample,controlsample,optimalcontrolsamplederivedfromparalleltrendsforest withouthonesty,andoptimalcontrolsamplederivedfromparalleltrendsforestwithhonesty. Panel(b)plots w¯ , which represents the average weight assigned to control bond j across all treated bonds, obtained from j parallel trends forest without honesty against that obtained from parallel trends forest with honesty. (a) Time series 2.0 1.5 1.0 0.5 −10 0 10 Week )%( revonruT ylkeeW (b)Weightsforparallelforestwithandwithouthonesty 0.002 0.001 Treatment cor: 0.93 Control Optimal, w/o honesty Optimal, w/ honesty 0.000 0.000 0.002 0.004 0.006 0.008 Weight without honesty ytsenoh htiw thgieW to control bond j across all treated bonds, obtained from parallel trends forest with and without honesty. The two weights are highly correlated with a correlation of 0.93. Weights obtained from parallel trends forest without honesty tends to have a fatter right tail, which comes from fitting the data closely in-sample. Overall, given the high correlation in the optimal control between the two methods, we can conclude that in this use case, overfitting by the parallel trends forest is fairly mild. We present ATT estimates that use the optimal control from parallel trends with honesty in Table 4. Similar to Table 3, we follow Rambachan and Roth (2023) and allow for deviations from parallel trends in the form of (17). The ATT estimate is slightly larger in magnitude but statistically significant only when L=0. 25
Table 4: ATT estimates using parallel trends with honesty The following table presents the 90% confidence interval ATT estimated using parallel trends with honesty for various values of L, where L denotes the maximum degree of deviation from parallel trends. L Estimate Lower Bound Upper Bound 0 -0.042 -0.075 -0.01 0.5 -0.042 -0.127 0.04 1 -0.042 -0.194 0.107 1.5 -0.042 -0.263 0.174 2 -0.042 -0.332 0.245 5 Conclusion In this paper, we propose a novel method to select the optimal control sample that satisfies the parallel trendsassumptionascloselyaspossible. Thisparalleltrendsforestmethodisusefulfornaturalexperiments in which treatment assignment contains very little randomness but has a very large candidate of control samples and an observable long pre-treatment period that can be used for selection. We show using the introduction of post-trade transparency in the corporate bond market that in some settings our method works better than other existing data-driven methods as well as the usual TWFE estimator. It seems to be the case that the granular, non-normal distribution of the outcome variables plays a key role in why parallel trends forest outperforms other data-driven methods by a large margin in our use case. It would be interesting to study further in future work. References Abadie, A., S. Athey, G. W. Imbens, and J. M. Wooldridge(2020): “Sampling-basedversusdesignbased uncertainty in regression analysis,” Econometrica, 88, 265–296. Abadie, A., A. Diamond, and J. Hainmueller (2010): “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program,” The Journal of the American Statistical Association, 105, 493–505. Abadie, A. and J. Gardeazabal (2003): “The Economic Costs of Conflict: A Case Study of the Basque Country,” The American Economic Review, 93, 113–132. Arkhangelsky, D., S. Athey, D. A. Hirshberg, G. W. Imbens, and S. Wager (2021): “Synthetic Difference-in-Differences,” The American Economic Review, 111, 4088–4118. 26
Asquith, P., T. Covert, and P. Pathak (2019): “The Effects of Mandatory Transparency in Financial Market Design: Evidence from the Corporate Bond Market,” Working Paper. Athey, S., M. Bayati, N. Doudchenko, G. Imbens, and K. Khosravi (2021): “Matrix Completion MethodsforCausalPanelDataModels,”Journal of the American Statistical Association,116,1716–1730. Athey,S.,J.Tibshirani,andS.Wager(2019): “GeneralizedRandomForests,”TheAnnalsofStatistics, 47, 1148–1178. Baker, A., B. Callaway, S. Cunningham, A. Goodman-Bacon, and P. H. Sant’Anna (2025): “Difference-in-DifferencesDesigns: APractitioner’sGuide,”forthcoming, Journal of Economic Literature. Bessembinder, H. and W. Maxwell(2008): “Markets: TransparencyandtheCorporateBondMarket,” Journal of Economic Perspectives, 22, 217–234. Bessembinder, H., W. Maxwell, and K. Venkataraman (2006): “Market Transparency, Liquidity Externalities, and Institutional Trading Costs in Corporate Bonds,” Journal of Financial Economics, 82, 251–288. Breiman, L. (2001): “Random forests,” Machine learning, 45, 5–32. Bu¨hlmann, P. and B. Yu (2002): “Analyzing Bagging,” The Annals of Statistics, 30, 927–961. Doudchenko, N. and G. W. Imbens (2016): “Balancing, Regression, Difference-in-Differences and Synthetic Control Methods: A Synthesis,” National Bureau of Economic Research Working Paper. Edwards, A. K., L. E. Harris, and M. S. Piwowar (2007): “Corporate Bond Market Transaction Costs and Transparency,” The Journal of Finance, 62, 1421–1451. Goldstein,M.A.,E.S.Hotchkiss,andE.R.Sirri(2007): “TransparencyandLiquidity: AControlled Experiment on Corporate Bonds,” Review of Financial Studies, 20, 235–273. Goodman-Bacon, A. (2021): “Difference-in-Differences with Variation in Treatment Timing,” Journal of Econometrics, 225, 254–277. He, G. and S. Wang (2017): “Do College Graduates Serving as Village Officials Help Rural China?” American Economic Journal: Applied Economics, 9, 186–215. Rambachan, A. and J. Roth (2023): “A More Credible Approach to Parallel Trends,” The Review of Economic Studies, 90, 2555–2591. 27
Roth, J. (2022): “Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends,” American Economic Review: Insights, 4, 305–322. Roth, J., P. H. Sant’Anna, A. Bilinski, and J. Poe (2023): “What’s Trending in Difference-in- Differences? A Synthesis of the Recent Econometrics Literature,” Journal of Econometrics, 235, 2218– 2244. Rubin, D. B. (2008): “For Objective Causal Inference, Design Trumps Analysis,” The Annals of Applied Statistics, 808–840. Strobl, C., A.-L. Boulesteix, A. Zeileis, and T. Hothorn (2007): “Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution,” BMC Bioinformatics, 8, 25. Wager, S. and S. Athey (2018): “Estimation and Inference of Heterogeneous Treatment Effects using Random Forests,” Journal of the American Statistical Association, 113, 1228–1242. 28
Cite this document
Yesol Huh and Matthew Vanderpool Kling (2025). Parallel Trends Forest: Data-Driven Control Sample Selection in Difference-in-Differences (FEDS 2025-091). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2025-091
@techreport{wtfs_feds_2025_091,
author = {Yesol Huh and Matthew Vanderpool Kling},
title = {Parallel Trends Forest: Data-Driven Control Sample Selection in Difference-in-Differences},
type = {Finance and Economics Discussion Series},
number = {2025-091},
institution = {Board of Governors of the Federal Reserve System},
year = {2025},
url = {https://whenthefedspeaks.com/doc/feds_2025-091},
abstract = {This paper introduces parallel trends forest, a novel approach to constructing optimal control samples when using difference-in-differences (DiD) in a relatively long panel data with little randomization in treatment assignment. Our method uses machine learning techniques to construct an optimal control sample that best meet the parallel trends assumption. We demonstrate that our approach outperforms existing methods, particularly with noisy, granular data. Applying the parallel trends forest to analyze the impact of post-trade transparency in corporate bond markets, we find that it produces more robust estimates compared to traditional two-way fixed effects models. Our results suggest that the effect of transparency on bond turnover is small and not statistically significant when allowing for constrained deviations from parallel trends. This method offers researchers a powerful tool for conducting more reliable DiD analyses in complex, real-world settings.},
}