feds · March 21, 2019

The Limits of p-Hacking: a Thought Experiment

Abstract

Suppose that asset pricing factors are just p-hacked noise. How much p-hacking is required to produce the 300 factors documented by academics? I show that, if 10,000 academics generate 1 factor every minute, it takes 15 million years of p-hacking. This absurd conclusion comes from applying the p-hacking theory to published data. To fit the fat right tail of published t-stats, the p-hacking theory requires that the probability of publishing t-stats < 6.0 is infinitesimal. Thus it takes a ridiculous amount of p-hacking to publish a single t-stat. These results show that p-hacking alone cannot explain the factor zoo. Accessible materials (.zip)

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. The Limits of p-Hacking: a Thought Experiment Andrew Y Chen 2019-016 Please cite this paper as: Chen, Andrew Y. (2019). “The Limits of p-Hacking: a Thought Experiment,” Finance and Economics Discussion Series 2019-016. Washington: Board of Governors of the Federal Reserve System, https://doi.org/10.17016/FEDS.2019.016. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

The Limits of p-Hacking: a Thought Experiment AndrewY.Chen FederalReserveBoard andrew.y.chen@frb.gov ∗ January2019 Abstract Suppose that asset pricing factors are just p-hacked noise. How much phackingisrequiredtoproducethe300factorsdocumentedbyacademics? Ishowthat,if10,000academicsgenerate1factoreveryminute,ittakes15 million years of p-hacking. This absurd conclusion comes from applying thep-hackingtheorytopublisheddata. Tofitthefatrighttailofpublished t-stats, the p-hacking theory requires that the probability of publishing tstats<6.0isinfinitesimal. Thusittakesaridiculousamountofp-hacking topublishasinglet-stat. Theseresultsshowthatp-hackingalonecannot explainthefactorzoo. ∗ I thank Preston Harry for excellent research assistance and Steve Sharpe for helpful comments. Theviewsexpressedhereinarethoseoftheauthorsanddonotnecessarilyreflectthe positionoftheBoardofGovernorsoftheFederalReserveortheFederalReserveSystem.

1. Introduction Thereisawell-knownsolutiontoeveryhumanproblem—neat,plausible,andwrong. —H.L.Mencken(1920),Prejudices: SecondSeries. Academics have documented more than 300 factors that explain expected stock returns.1 This enormoussetoffactors begsfor aneconomicexplanation, yetthereislittleconsensusontheirorigin.2 p-hacking(a.k.a.data-snooping,data-mining)offersaneatandplausiblesolution (Harvey, Liu, and Zhu 2016, Chordia, Goyal, and Saretto 2017, Hou, Xue, andZhang2017,LinnainmaaandRoberts2018,amongothers). Thiscynicalexplanationbeginsbynotingthatthecross-sectionalliteratureusesstatisticaltests that are only valid under the assumptions of classical single hypothesis testing. These assumptions are clearly violated in practice, as each published factor is drawnfrommultipleunpublishedtests. Inthiswell-knownexplanation,thefactorzooconsistsoffactorsthatperformedwellbypurechance. In this short paper, I follow the p-hacking explanation to its logical conclusion. Torigorouslypursuethep-hackingtheory,Iwritedownastatisticalmodel in which factors have no explanatory power, but published t-stats are large becausetheprobabilityofpublishingat-statt followsanincreasingfunctionp(t ). i i Iestimatep(t )byfittingthemodeltothedistributionofpublishedt-statsinHari vey,Liu,andZhu(2016)andChenandZimmermann(2018).Thep-hackingstory ispowerful: Themodelfitseitherdatasetverywell. Thoughp-hackingfitsthedata,followingitslogicfurtherleadstoabsurdconclusions. In particular, the pure p-hacking model predicts that the ratio of unpublished factors to published factors is ridiculously large, at about 100 trillion to 1. To put this number in perspective, suppose that 10,000 economists mine thedatafor8hoursperday,365daysperyear. Andsupposethateacheconomist 1Iusetheterm“factor”torefertoanyvariablethathelpsexplainexpectedreturns,following Harvey,Liu,andZhu(2016). 2Cochrane(2017)providesamacro-financeperspectiveonpredictability.Barberis(2018)providesapsychologicalperspective. Recentexplicitfactormodelsbasedonq-theory,thepresent value relation, and mispricing are given by Hou, Xue, and Zhang (2015), Fama and French (2015),andStambaughandYuan(2016),respectively.RigorousstatisticalexplanationsforcrosssectionalpredictabilityareproposedbyKozak,Nagel,andSantosh(2017),Kelly,Pruitt,andSu (2017),andLettauandPelger(2018). 1

finds 1 predictor every minute. Even with this intense p-hacking, it would take 15millionyearstofindthe316factorsintheHarvey,Liu,andZhu(2016)dataset. This absurd conclusion comes from the fact that the right tail in published t-stats is extremely fat compared a t-distribution with many degrees of freedom. 10% of t-stats in Harvey, Liu, and Zhu (2016) are larger than 6.34, while the corresponding p-value of the t-distribution with 200 degrees of freedom is 0.00000007%. Thus, to account for the fat tail in the data, authors and journals musthaveanextremelystrongpreferenceforverylarget-stats: t-statslessthan 4.0haveatmosta10 −10 probabilityofbeingpublished,whilet-statslargerthan 8.0arepublishedwithaprobabilityof0.9997.Whileitishardtoplacereasonable limitsonthepreferenceforlarget-stats,logisticalandphysicalconstraintsimply thatthepowerofp-hackingislimited,fartoolimitedtoaccountfortheliterature onassetpricingfactors. This thought experiment demonstrates that assigning the entire factor zoo top-hackingiswrong. Thoughthep-hackingstoryappearslogical,followingits logicrigorouslyleadstoimplausibleconclusions,disprovingthetheorybycontradiction. Thus,mythoughtexperimentsupportstheideathatpublicationbias inthecross-sectionofstockreturnsisrelativelyminor(Green,Hand,andZhang 2014,McLeanandPontiff2016,JacobsandMüller2017,ChenandZimmermann 2018, Chen 2018). Papers that argue that publication bias is dominant include Harvey, Liu, and Zhu (2016), Chordia, Goyal, and Saretto (2017), Hou, Xue, and Zhang(2017),andLinnainmaaandRoberts(2018). Inthisliterature,mypaperis uniqueinitsrigorousanalysisofthep-hackingstory. 2. Model, Estimation Method, Data Thissectionpresentsarigorousversionofthep-hackingstoryanddescribes howIfitittodata. EstimationresultsandabsurdimplicationsarefoundinSection3. 2.1. Model Thedistributionofallt-stats(publishedandnot)isstandardnormal t ∼N(0,1), i.i.d. (1) i 2

This assumption formalizes the notion that all factors are false: t-stats are just noisearoundtheunobservedpopulationreturnof0. Somereadersmayobjecttotheindependenceassumption,notingthatseveral well-known anomalies are related to value or momentum. Value- and momentum- related anomalies, however, comprise only a small portion of the total universe of published anomalies (Harvey, Liu, and Zhu 2016, McLean and Pontiff2016).Forexample,intheChenandZimmermann(2018)dataset,predictorsrelatedtovaluationsrepresentonly8%oftheir156predictors. Momentumrelatedpredictorsrepresentonly6%. Ultimately,thepropercorrelationshouldbemeasuredfromthedata,andthe dataindicateclose-to-zerocorrelationisappropriate. Theaveragepairwisecorrelationbetweenpredictorreturnsistiny,at0.03(McLeanandPontiff2016,Chen and Zimmermann 2018). This tiny average correlation does not result from averagingacrosslargepositiveandlarge-negativecorrelations. Indeed, Chenand Zimmermann (2018) find that 80% of correlations are between -0.36 and 0.43. Moreover,ChenandZimmermannfindthatprincipalcomponentanalysisindicatesthatalargenumberofprincipalcomponentsarerequiredtospanthedata. Equation(1)alsoassumesnormality. Thisassumptionisjustifiedbythefact that the numerator of the t-stat is the average of hundreds of monthly returns. Thus, by the central limit theorem, the sample mean return is approximately normal and the t-stat is approximately standard normal. Chen and Zimmermann(2018)showthatthisapproximationholdsverywellfora312monthsampleofequal-weightedlong-shortquintileportfoliossortedonB/M.Equation(1) alsoassumesthatperformanceisuncorrelatedacrosspredictors,consistentwith thenear-zeroaveragepairwisecorrelationbetweenmonthlylong-shortreturns of different published predictors (McLean and Pontiff2016, Chen and Zimmermann2018). Thought-statsareonaveragezero,publishedt-statsarelargeduetoauthors’ and journals’ preferences for large t-stats. This preference is embodied in the function p(t ) which determines the probability that a t-stat t is published. I i i 3

assumeastaircase(orstep)functionforp(t ): i  p , e <t ≤e   1 1 i 2       p 2 , e 2 <t i ≤e 3   p(t i )= ... (2)       p K , e K <t i ≤e K+1     0, otherwise where the edges {e 1 ,e 2 ,...,e K+1 } and probabilities {p 1 ,p 2 ,...,p K } are model parameters. In words, p is the probability of publishing a t-stat between e and i i e i+1 . Equation(2)isarigorousversionofthep-hackingstory. t-stats<e arenever 1 publishedorobservedbythepublic. Thestaircasefunctionalformallowsforthe ideathatlargert-statsaremorelikelytobepublished.TheflexibilityoftheK step staircase allows the model to fit the data very closely and provides a tractable, closed-formestimation. 2.2. Estimation Themodelpredictsthatthefractionofpublishedt-statsbetweene i ande i+1 is fmodel= p i [Φ(e i+1 )−Φ(e i )] fori =1,...,K (3) i (cid:80)K j=1 p j (cid:163)Φ(e j+1 )−Φ(e j ) (cid:164) whereΦ(·)isthestandardnormalCDF.Equation(3)embodiesthepowerofthe p-hackingtheory. Itsaysthatanynumberoft-statscanbeobserved. Evenifitis unlikelytoobservesuchalarget-statbychance(Φ(e i+1 )−Φ(e i )issmall),alarge publicationprobabilityp canmakeitpossible. i Equation (3) suggests an intuitive method-of-moments estimation. First choose a set of edges {e 1 ,e 2 ,...,e K+1 } that produces a histogram that describes the data well. Then, measure in the data the fraction of published t-stats betweene i ande i+1 andcallthis f i data. Finally,setting f i model= f i data givesasetof 4

K equationstosolvefortheK probabilitiespˆ ,...,pˆ . Specifically,3 1 K 1 fdata pˆ ≡ i (4) i κ[Φ(e i+1 )−Φ(e i )] where K fdata κ≡ (cid:88) j . (5) j=1 (cid:163)Φ(e j+1 )−Φ(e j ) (cid:164) Themodelisexactlyidentified,andthusEquation(4)doesnotprovideanyformalevaluationofthemodel. Instead,IdisciplinethemodelbyexaminingasimplethoughtexperimentinSection3.2. For histogram edges e I use {1,2,3,...,8,∞}. Other edges lead to similar rei sults. 2.3. DataonPublishedt-stats I estimate the model on 2 datasets. The first is Chen and Zimmermann’s (2018) replications of 156 equal-weighted long-short quintile portfolios. These portfoliosareconstructedfromvariablesthathavebeenshowntopredictstock returns cross-sectionally and are published in finance, accounting and general interesteconomicsjournals. Themajorityareconstructedusingeitheraccounting data or market prices, but about 1/3 use diverse data that include analyst forecasts, trading-related measures, and corporate events. The Chen and Zimmermann dataset allows for easy replication, as this data is publically available athttp://sites.google.com/site/chenandrewy/code-and-data/. I also consider the hand-collected t-stats for 316 factors in Harvey, Liu, and Zhu(2016). Thesefactorsincludevariablesthatpredictcross-sectionalreturns, as well as other variables that broadly explain return patterns. Harvey et al do not make their data publically available, but in Table 5 (page 30) they provide parameter estimates for a model of the t-stats in their data. Using their model estimates, I can simulate their dataset. By design, this simulated data should match the moments in the original data. I use the parameter values from the firstrowofTable5,buttheotherparametersleadtosimilarresults. 3Toseethis,notethat f i data=κpˆ i [Φ(e i+1 )−Φ(e i )]. Then,notingthat (cid:80) i pˆ i =1,wehaveκ= (cid:80) j fdata/ (cid:163)Φ(e j+1 )−Φ(e j ) (cid:164) . 5

Table1summarizesthedatasets. Itshowsthehistogramcountsfort-statsin percent. IusethesecountsastargetmomentsinEquation(4). Forcomparison, thetablealsoshowsthehistogramcountsofthehand-collecteddatafromChen andZimmermann(2018). [Table1abouthere.] All three datasets show afatright tail in t-stats. About 50% oft-stats are between2.0and4.0,andtheremaining50%arespreadfaroutandtotheright. At least15%oft-statsaregreaterthan6.0usinganyofthethreedatasets. 3. Results 3.1. EstimatedPreferenceforLarget-stats Figure1illustratesthemodelfitandestimationresults. Thefigureplotsthe histogram of t-stats data (bars) and model (circle markers), along with the estimatedpreferencefort-stats(trianglemarkers). ThetoppanelusestheChenand Zimmermann(2018)(CZ)replicateddata,andthebottompanelusesmoments fromHarvey,Liu,andZhu(2016)(HLZ). [Figure1abouthere.] Asthemodelisexactlyidentifiedusingthet-stathistogram,themodelfitin Figure 1 is very good by construction. This fit illustrates the powerful logic of p-hacking: onecangenerateanypatternifthedataisselectivelypublished. Theimpliedpreferenceforlarget-stats,however,isveryextreme. Thispreferenceischaracterizedby8parametersp ,p ,...,p correspondingtotheprob- 1 2 8 ability of publishing a t-stat in each bin. The probabilities are so extreme that theyneedtobeplottedonalogscale(triangles,rightaxis),andrangefrom10 −14 fort-statsbetween1and2to0.99977fort-statsinexcessof8.0fortheCZdata. TheHLZdataleadstosimilarresults. Thesehighlyskewedprobabilitiescomefromtheverythintailofastandard normaldistribution. ThiscanbeseeninthebottomrowofTable1,whichshows thehistogramcountsimpliedbyastandardnormaldistributionthatistruncated at 2.0. Roughly 0.00001% of t-stats exceed 6.0 in this distribution, compared to 6

theroughly15%oft-statsthatexceed6.0inthedata. Thus,inorderforthedata to be generated by p-hacking, the publication probability for these large t-stats mustbeveryhighcomparedtothoseforsmallert-stats, leadingtotheextreme skewedprobabilitiesseeninFigure1. 3.2. AThoughtExperiment It’sdifficulttosayiftheestimatedpreferenceforlarget-statsinFigure1isreasonable. Theprobabilitythatagivent-statispublisheddependsonthechoices ofthebothauthorsandjournals. Thesechoicesinteract,makinginterpretation difficult. However, onecaninterpretthet-statpreferenceseasilyinathoughtexperiment.Thisthoughtexperimentteststheplausibilityofthep-hackingstoryinthe same way Mehra and Prescott’s (1985) calibration exercise tests the plausibility ofthepowerutilitymodelofequityprices. SupposethatN economistsminethedata8hoursperday,365daysperyear. Supposefurtherthattheeconomistsproducefactorsatarateofxpereconomisthour. Howlongwouldittaketoproduce100factors? Toanswerthisquestion,Ineedtocalculatetheprobabilitythatarandomtstatispublished. Thisprobabilityisfoundbyintegratingtheprobabilityofpublication (2) over the distribution of t-stats. The staircase form of (2) implies a closedformexpression: K (cid:88) ProbabilityofPublishingaRandomt-stat= pˆ i [Φ(e i+1 )−Φ(e i )] (6) i=1 where,asareminder,Φ(·)isthestandardnormalCDF.Plugginginpˆ fromFigure i 1andthestandardnormalprobabilitiesfromTable1wehave ProbabilityofPublishingaRandomt-stat=6.49e−15 (7) usingtheChenandZimmermann(2018)and ProbabilityofPublishingaRandomt-stat=1.23e−14. (8) usingHarvey,Liu,andZhu(2016). Theseinfinitesimalprobabilitiescomefromthefactthattheestimatedprob- 7

abilitiesofpublicationinFigure1andprobabilitiesimpliedbythestandardnormaldistribution(seeTable1)arelargelydisjoint. Fort-statsbelow6.0, pˆ isexi tremely small, but for t-stats above 6.0, the standard normal density implies a tiny probability. Summing over the product of these probabilities, Equation (6) impliesanextremelytinyprobabilityofpublication. Usingthisprobabilityofpublication,Icalculatethenumberofyearsittakes topublish100factors,assumingvariousnumbersofeconomistsandratesoffactor production. As both probabilities are extremely small, I focus on the larger theprobabilityimpliedbytheHarvey,Liu,andZhu(2016)dataset. [Table2abouthere.] Table 2 shows the result. The table begins by assuming that 10,000 economistsminethedata.Ifthese10,000economistsproducefactorsatarateof 1pereconomist-hour,ittakes528millionyearstopublish100factors.Toputthis numberinperspective,thenumberofeconomicsprofessorsintheUnitedStates was12,770in2017,andthenumberofeconomistswas21,300in2016according totheBureauofLaborStatistics. One might argue that factors can be mined at a much faster rate than 1 per economist-hour given moderncomputingpower. However, factors need to comewithsupplementaryresultsthatsatisfyjournalreviewinordertobepublished.Forexample,portfoliosortsareoftenrequiredtoproducemonotonicpatternsinexpectedreturns,alternativemethodsforfactorconstructionaresometimes required, and the factors themselves are typically asked to be consistent withsomekindoftheoryforjournalstopublishthem. Theseadditionalrestrictionsaredifficulttosatisfyusingcomputingpoweralone. Regardless, I can pursue the idea of highly productive factor mining in this thoughtexperiment.Table2showsthat,evenatafactorproductionrateof10per economist-second, it would take 15,000 years for 10,000 economists to publish 100factors. Table 2 also explores the possibility that more than 10,000 economists engage in p-hacking. Even if 1 million economists mine the data at 10 factors per economist-second, it would still take 145 years to publish 100 factors. To put thesenumbersinperspective,theBureauofLaborStatisticsestimatesthatthere were296,100financialanalystsintheUnitedStatesin2016. Finally,thebottomrowofTable2showsthatif1millioneconomistsproduce 8

40 factors per economist-second, then 100 factors will be published in just 19 years.However,theideathat1millioneconomistscan,ineverysecond,produce 40 factors thathave the supplementary results required for publication, and do soconsistentlyfor19years,isridiculous. 4. Conclusion The idea that all asset pricing factors are due to p-hacking is very tempting. Inonefellswoop,p-hackingcanexplaindecadesofpuzzlingfinancialresearch. A rigorous exploration of this explanation, however, shows that it is implausible. Thoughitmaybedifficulttounderstand,thestockreturndatadoesdisplay cross-sectionalvariationinexpectedreturns. 9

References Barberis, Nicholas C. Psychology-based Models of Asset Prices and Trading Volume.Tech.rep.NationalBureauofEconomicResearch,2018. Chen, Andrew Y. “Do t-stat Hurdles Need to be Raised? Direct Estimates of False Discoveries in the Cross-Section of Stock Returns”. Available at SSRN: https://papers.ssrn.com/abstract=3254995(2018). Chen,AndrewYandTomZimmermann.“PublicationBiasandtheCross-Section of Stock Returns”. Available at SSRN: https://ssrn.com/abstract=2802357 (2018). Chordia,Tarun,AmitGoyal,andAlessioSaretto.“p-hacking:Evidencefromtwo mmilliontradingstrategies”(2017). Cochrane,JohnH.“Macro-finance”.ReviewofFinance21.3(2017),pp.945–985. Fama, Eugene F. and Kenneth R. French. “A five-factor asset pricing model”. Journal of Financial Economics 116.1 (2015), pp. 1–22. ISSN: 0304-405X. URL: http : / / www . sciencedirect . com / science / article / pii / S0304405X14002323. Green, Jeremiah, John RM Hand, and Frank Zhang. “The remarkable multidimensionalityinthecross-sectionofexpectedUSstockreturns”.Availableat SSRN2262374(2014). Harvey, Campbell R, Yan Liu, and Heqing Zhu. “... and the cross-section of expectedreturns”.TheReviewofFinancialStudies29.1(2016),pp.5–68. Harvey,CampbellandYanLiu.“Multipletestingineconomics”(2013). Hou,Kewei,ChenXue,andLuZhang.“Digestinganomalies:Aninvestmentapproach”.TheReviewofFinancialStudies28.3(2015),pp.650–705. — Replicating Anomalies. Tech. rep. National Bureau of Economic Research, 2017. Jacobs, Heiko and Sebastian Müller. “Anomalies across the globe: Once public, nolongerexistent?”(2017). Kelly, Bryan T, Seth Pruitt, and Yinan Su. “Some characteristics are risk exposures,andtherestareirrelevant”(2017). Kozak,Serhiy,StefanNagel,andShrihariSantosh.“ShrinkingtheCrossSection” (2017). Lettau, Martin and Markus Pelger. “Factors that Fit the Time Series and Cross- SectionofStockReturns”(2018). 10

Linnainmaa, Juhani T and Michael R Roberts. “The history of the cross-section ofstockreturns”.TheReviewofFinancialStudies31.7(2018),pp.2606–2649. McLean,RDavidandJeffreyPontiff.“Doesacademicresearchdestroystockreturnpredictability?”TheJournalofFinance71.1(2016),pp.5–32. Mehra,RajnishandEdwardCPrescott.“Theequitypremium:Apuzzle”.Journal ofmonetaryEconomics15.2(1985),pp.145–161. Stambaugh,RobertFandYuYuan.“Mispricingfactors”.TheReviewofFinancial Studies30.4(2016),pp.1270–1315. 11

Exhibits Figure1: ModelFitandt-statPreference. Iestimateamodelofpurep-hacking (Equations(1)-(2),circles)onlargedatasetsofpublishedt-stats(bars)bymethod of moments (Section 2.2). The top panel uses 156 replicated long-short portfoliosfromChenandZimmermann(2018). Thebottompanelusesmomentsfrom Harvey, Liu, and Zhu (2016). The t-stat preference is modeled as publication probabilities (triangles). The estimated preference for large t-stats is extremely strong. t-stats<6.0haveanabsurdlylowprobabilityofpublication. 0.3 0.25 0.2 0.15 0.1 0.05 0 0 2 4 6 8 10 12 14 t-stat ycneuqerF 100 10-4 10-8 10-12 10-16 ytilibaborP noitacilbuP Data: Chen-Zimmermann Replications Data Model t-stat Preference (right) 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 12 14 t-stat ycneuqerF 100 10-4 10-8 10-12 10-16 ytilibaborP noitacilbuP Data: Harvey, Liu, Zhu Data Model t-stat Preference (right) 12

Table1:DistributionofPublishedt-stats Thistablesummarizesthedataandprovidesmomentsusedintheestimation.CZreplicationsisthe156replicationsofequal-weightedlong-shortquintileportfoliosinChen andZimmermann(2018). HLZestimatedmodelsimulatesthemodelfromHarvey,Liu, andZhu(2016)Table5, firstrow. Forcomparison, Ishowthe77handcollectedstatistics from Chen and Zimmermann (2018) (CZ hand collection) and a standard normal truncatedat2.0. percentoft-statsbetween 1,2 2,3 3,4 4,5 5,6 6,7 7,8 >8 UsedinEstimation CZreplications 15.4 24.4 21.8 10.9 10.3 2.6 4.5 10.3 HLZestimatedmodel 1.2 30.9 27.1 16.2 9.7 5.9 3.5 5.4 ForComparison CZhandcollection 6.4 29.5 20.5 9.0 14.1 5.1 1.3 14.1 standardnormal - 94.1 5.8 0.1 1E-03 4E-06 6E-09 3E-12 truncatedat2.0 13

Table2:AThoughtExperiment I calculate the probability that a random t-stat is published (Equation (6)). Using the Harvey,Liu,andZhu(2016)data,thisprobabilityis1.23e-14. Applyingthisprobability totheassumednumberofeconomistsandfactorspereconomist-hourinthetableleads tothenumberpublicationsperyearandyearstopublish100factors. Forcomparison, therewere12,770economicsprofessorsintheUnitedStatesin2017and21,300professionaleconomistsin2016accordingtotheBureauofLaborStatistics. Numberof Factorsper Factors Publicationsper Yearsto Economists Economist-Hour perYear Year Publish100 (Millions) Factors 10,000 1 29 3.60E-07 277,524,922 10,000 60 1,752 2.16E-05 4,625,415 10,000 3,600 105,120 1.30E-03 77,090 10,000 36,000 1,051,200 1.30E-02 7,709 100,000 36,000 10,512,000 0.13 771 500,000 36,000 52,560,000 0.65 154 1,000,000 36,000 105,120,000 1.30 77 1,000,000 144,000 420,480,000 5.19 19 14

Cite this document

APA

Andrew Y. Chen (2019). The Limits of p-Hacking: a Thought Experiment (FEDS 2019-016). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2019-016

BibTeX

@techreport{wtfs_feds_2019_016,
  author = {Andrew Y. Chen},
  title = {The Limits of p-Hacking: a Thought Experiment},
  type = {Finance and Economics Discussion Series},
  number = {2019-016},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2019},
  url = {https://whenthefedspeaks.com/doc/feds_2019-016},
  abstract = {Suppose that asset pricing factors are just p-hacked noise. How much p-hacking is required to produce the 300 factors documented by academics? I show that, if 10,000 academics generate 1 factor every minute, it takes 15 million years of p-hacking. This absurd conclusion comes from applying the p-hacking theory to published data. To fit the fat right tail of published t-stats, the p-hacking theory requires that the probability of publishing t-stats < 6.0 is infinitesimal. Thus it takes a ridiculous amount of p-hacking to publish a single t-stat. These results show that p-hacking alone cannot explain the factor zoo. Accessible materials (.zip)},
}