feds · November 13, 2018

On the U.S. Firm and Establishment Size Distributions

Abstract

This paper revisits the empirical evidence on the nature of firm and establishment size distributions in the United States using the Longitudinal Business Database (LBD), a confidential Census Bureau panel of all non-farm private firms and establishments with at least one employee. We establish five stylized facts that are relevant for the extent of granularity and the nature of growth in the U.S. economy: (1) with an estimated shape parameter significantly below 1, the best-fitting Pareto distribution substantially differs from Zipf's law for both firms and establishments; (2) a lognormal distribution fits both establishment and firm size distributions better than the commonly-used Pareto distribution, even far in the upper tail; (3) a convolution of lognormal and Pareto distributions fits both size distributions better than lognormal alone while also providing a better fit for the employment share distribution; (4) the estimated parameters are different across manufa cturing and services sectors, but the distribution fit ranking remains unchanged in the sectoral subsamples. Finally, using the Census of Manufactures (CM), we find that (5) the distribution of establishment-level total factor productivity---a common theoretical primitive for size---is also better described by lognormal than Pareto. We show that correctly characterizing the firm size distribution has first order implications for the effect of firm-level idiosyncratic shocks on aggregate activity. Accessible materials (.zip)

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. On the U.S. Firm and Establishment Size Distributions Illenin O. Kondo, Logan T. Lewis, and Andrea Stella 2018-075 Please cite this paper as: Kondo, Illenin O., Logan T. Lewis, and Andrea Stella (2018). “On the U.S. Firm and Establishment Size Distributions,” Finance and Economics Discussion Series 2018-075. Washington: Board of Governors of the Federal Reserve System, https://doi.org/10.17016/FEDS.2018.075. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

On the U.S. Firm and Establishment Size Distributions IlleninO.Kondo∗ LoganT.Lewis AndreaStella UniversityofNotreDame FederalReserveBoard FederalReserveBoard November8,2018 Abstract ThispaperrevisitstheempiricalevidenceonthenatureoffirmandestablishmentsizedistributionsintheUnitedStatesusingtheLongitudinalBusinessDatabase(LBD),aconfidential Census Bureau panel of all non-farm private firms and establishments with at least one employee. We establish five stylized facts that are relevant for the extent of granularity and the natureofgrowthintheU.S.economy: (1)withanestimatedshapeparametersignificantlybelow1,thebest-fittingParetodistributionsubstantiallydiffersfromZipf’slawforbothfirmsand establishments; (2)alognormaldistributionfitsbothestablishmentandfirmsizedistributions betterthanthecommonly-usedParetodistribution,evenfarintheuppertail;(3)aconvolutionof lognormalandParetodistributionsfitsbothsizedistributionsbetterthanlognormalalonewhile alsoprovidingabetterfitfortheemploymentsharedistribution; (4)theestimatedparameters aredifferentacrossmanufacturingandservicessectors,butthedistributionfitrankingremains unchangedinthesectoralsubsamples.Finally,usingtheCensusofManufactures(CM),wefind that(5)thedistributionofestablishment-leveltotalfactorproductivity—acommontheoretical primitivefor size—isalso betterdescribedby lognormalthan Pareto. Weshow thatcorrectly characterizingthefirmsizedistributionhasfirstorderimplicationsfortheeffectoffirm-level idiosyncraticshocksonaggregateactivity. JELclassifications: L11,E24 Keywords: Firmsizedistribution,TFPdistribution,Lognormal,Pareto,Zipf’slaw,Granularity. ∗TheviewsexpressedhereshouldnotbeinterpretedasreflectingtheviewsoftheFederalReserveBoardofGovernors oranyotherpersonassociatedwiththeFederalReserveSystem.Anyopinionsandconclusionsexpressedhereinarethose oftheauthorsanddonotnecessarilyrepresenttheviewsoftheU.S.CensusBureau. Allresultshavebeenreviewedto ensure that no confidential information is disclosed. We thank Robert L. Axtell, Andrew Figura, and Colin Hottman forhelpfulcommentsandsuggestions. WealsothankseminarparticipantsattheFederalReserveBankofPhiladelphia, CEF2018,IAAE2018,andNASMES2018fortheircomments. MichaelKisterprovidedexcellentresearchassistance. Finally,wethankFoster,GrimandHaltiwangerforkindlysharingtheTFPdatafromFosteretal.(2016).

1 Introduction Modernmacroeconomicmodelsoftenfeaturecross-sectionalfirmheterogeneityasthefirmsizedistributionaffectsimportantoutcomessuchaseconomicgrowth,internationaltradeelasticities,orthe sourcesofaggregatefluctuations. Forinstance,ifaneconomyisdominatedbylargefirms,idiosyncraticshockstothesefirmsmaybeanimportantsourceofaggregatefluctuationsdependingonhow heavytherighttailofthefirmsizedistributionis(Gabaix2011,diGiovanniandLevchenko2012, Stella 2015). Specifically, Gabaix (2011) shows the idiosyncratic origins of aggregate volatility whenthesizedistributionfollowsZipf’slaw(thatis,Paretodistributedwithashapeparameterof1). Moregenerally,multiplefoundationalpapersinmacroeconomicsandinternationaleconomicsrely ontheParetodistributionbecauseofitsanalyticalconvenienceanditsseemingempiricalregularity (Chaney 2008, Arkolakis et al. 2012, Rossi-Hansberg and Wright 2007, Luttmer 2011, Carvalho and Grassi 2018). In this paper, we revisit the empirical evidence on the nature of the distribution offirmsizeandestablishmentsizeintheUnitedStatesandprovidenovelstylizedfactsalongwith maximumlikelihoodestimatesofthebest-fittingdistributions. Using the Longitudinal Business Database (LBD), a confidential U.S. Census Bureau panel dataset of all non-farm private firms and establishments with at least one employee, we document several important facts about the U.S. establishment and firm employment size distributions. In practice, two classes of distributions are commonly used in the modern macroeconomic literature: the lognormal distribution and the Pareto distribution. We use maximum likelihood estimation (MLE)toestimatetheparametersofthelognormalandParetodistributions,aswellastwocombinationsoflognormalandParetodistributions. ThoughbothlognormalandParetodistributionsare “heavy-tailed”—that is, their upper tails are “heavier” than an exponential distribution—they are verydifferentintheireconomicoriginsandimplications.1 Wefindthebest-fittingParetoshapeparametertoberobustlybelow1forbothfirmsandestablishments: a Pareto shape parameter less than 1 implies an upper tail heavier than Zipf’s law, and thereforeleadstoproblematictheoreticalimplicationsasthefirmsizedistributionmeanwouldnot be well-defined. We also extend the analysis of Gabaix (2011) to illustrate another striking pitfall andlikelyunpalatableconsequenceoftheestimatedParetodistributions. Weshowthat, unlikethe lognormal case, aggregate volatility does not decrease in the number of firms in the case of Pareto distributions with shape parameter below 1: idiosyncratic shocks generate ‘too much’ aggregate volatility. Statistically, we clearly find that a lognormal fits the employment size distribution better than a Pareto. This finding holds even when we consider most cuts of the upper tail of the firm size 1Headetal.(2014)examinetheconsequencesofalognormaldistributionininternationaltrade,andRossi-Hansberg andWright(2007)explorestheoriginsofestablishmentgrowthandhowscaledependencecanvaryacrossindustriesand generatedistributionswiththinnertailsthanZipf’slaw. 1

distribution: the Pareto distribution provides a better fit of the right tail for only a narrow range, andthefarrighttailisstillbetterdescribedbylognormal. Theseresultsoverturnthebestavailable evidencefortheUnitedStatesfromAxtell(2001),themainreferenceintheliterature. Moving beyond the simple lognormal and Pareto distributions, we estimate the parameters of a statistical mixture of lognormal and Pareto as well as a convolution of a Pareto random variable multipliedbyalognormalrandomvariable. Wefindthatthemixtureprovidesthebestoverallfit,but the convolution also beats the fit of the Pareto or lognormal distributions alone. However, we find thatthemixtureoftenalsohasaParetoshapeparameterbelow1,whiletheconvolutionhasaPareto shape parameter well above 1. If a distribution mixes any Pareto with a shape parameter below 1, some incredibly large firms would be generated in reasonably sized samples, leaving too many employeesbelongingtotheverylargestbinoffirms. Therefore,whenweconsiderasecondcriterion of the distribution fit—the fraction of employment accounted for by various bins of establishment or firm size—the convolution provides a markedly better fit. Moreover, because the convolution canariseinaheterogeneousfirmmodelwithtwosourcesoffirm-levelshock,sayademandshock and a productivity shock or two productivity shocks, the convolution may be the more appropriate distributiontouse. Next, we characterize the nature of the firm size distribution across sectors and over time. In thelastseveraldecades,theU.S.economymadeasubstantialtransitionawayfromtraditionalmanufacturing to a more service-based economy. This ongoing structural change in the U.S. economy mayhaveimplicationsforeconomicgrowth,bothatthefirmandaggregatelevel. Weexplorehow such structural transformation affected the employment size distribution across sectors and across time. We find that manufacturing and services are both well-described by a mixture of lognormal andPareto,buttheyhavenotableeconomicdifferencesinparameterestimates. Manufacturingfirms andestablishmentsareonaveragelargerthantheirservicescounterparts, andtheemploymentdistributioninmanufacturinghasahighervarianceandaheavierrighttailthanintheservicessector. Finally, we use estimates of establishment-level total factor productivity (TFP) from Foster et al. (2016) to estimate the best distribution fit for this more-primitive source of establishment size. We find strong evidence that TFP is better described by lognormal than Pareto, and a lognormal distribution often even fits better than a convolution. The mixture still provides the best statistical fit,buttheevidencesuggeststhatTFPcanbereasonablymodeledbyalognormaldistribution. This is consistent with an overall size distribution being a convolution, where TFP is lognormal and anotherfirmsizedeterminant,e.g. firmdemand,isdistributedPareto. Overall, our contribution is twofold: first, we adopt the most rigorous statistical techniques to estimateandtesttheshapeofthefirmandestablishmentsizedistributions,whilethepreviousliterature largely relied on demonstrably weaker econometric approaches, such as regression analysis. Second,weusethemostcomprehensivedatabasefortheUnitedStates: wehavetheentirepopulationoffirmsandestablishmentsintheUnitedStatesoveratimeperiodthatspans30years, which 2

allowsustonotonlystudytheentirepopulation,butalsosubsamplesintheuppertailorsector-year subsamples. This evidence can help develop better and more relevant macroeconomic models of heterogeneousfirmsandestablishments. The existing literature is mixed about the nature of the best-fitting size distribution. In early work,SimonandBonini(1958)laysoutasimplestatisticalmodeloffirmgrowthwhichcangenerate both lognormal and Pareto distributions and provides anecdotal evidence that firm sizes might be near Zipf’s law. Using Small Business Administration data and a bin-based regression, Luttmer (2010) finds, like Axtell (2001), that U.S. firm sizes are Pareto with a shape parameter of 1.06. Combesetal.(2012)findevidencethatfirm-levelTFPinFranceisbetterdescribedbyalognormal distributionthanaPareto. TheyuseamixtureofParetoandlognormalandconcludethatthemixture parameter is quite close to lognormal and proceed with a straight lognormal distribution for their model. A few recent papers identify and explore the implications of a firm size distribution that is not Pareto. Fernandesetal.(2015)proposetomodelthefirmproductivitydistributionwithalognormal distribution to match the empirical evidence on the importance of the intensive margin of trade. Sager and Timoshenko (2017) show that a convolution of lognormal and Pareto fits best using Brazilian export sales data. Nigai (2017) shows that a mixture of lognormal and Pareto fits the firmproductivitydistributionofFrenchfirmsthebestandthatitsadoptionaffectstheestimationof thegainsfromtrade. ArmenterandKoren(2015)findsthattheParetoshapeparameterrequiredto matchtheexportersizepremium(howmuchlargertheaverageexporteris)wouldbeabout1.65,and thusitisdifficulttoreconcilemodelsofselectionintoexportingbyfirmsizewithasizedistribution generatedbyrealisticParetoparameterizations. Therestofthepaperisorganizedasfollows. Section2introducesthedataweuseinthepaper. Section 3 explains the parametric distributions that we fit to the data. Section 4 presents our main results on the employment size distributions and analyzes the TFP establishment distribution in manufacturing. Finally,section5concludes. 2 Data Description TheCenterforEconomicStudiesattheU.S.CensusBureaucreatedandmaintainsalongitudinallylinked establishment-level database: the Longitudinal Business Database (LBD). The LBD covers the non-farm private economy of establishments with at least one employee. It was created using informationfromawidearrayofsurveysconductedbytheU.S.CensusBureau,suchastheStandard Statistical Establishment List, the quinquennial Economic Census, and the annual surveys.2 Table 2ThewidecoverageoftheLBDcomesatacost,asitonlyprovidesnumberofemployees,payroll,location,firmID, andsectoralaffiliationofeachestablishment;wedonotobserverevenues,intermediateinputs,capitalinvestment,prices 3

1showsthenumberofestablishmentsandfirmsweuseineachyear. Table1: LBDobservations Year Est. Firm 1982 4,490,000 3,620,000 1992 5,580,000 4,390,000 1997 6,060,000 4,770,000 2002 6,290,000 4,900,000 2012 6,590,000 4,980,000 Note:Numbersarerounded. The LBD establishment is defined as a single physical location where business is conducted; thisdefinitionisnotequivalenttotheIRSEstablishmentIdentificationNumber(EIN),whichmight be comprised of more than one LBD establishment. The LBD establishment is also not equivalent toalegalentity;theLBDincludesafirmIDvariablethatgroupstogetherestablishmentsownedby the same firm. The LBD firm ID was created using information from the quinquennial Economic Census and the annual Company Organization Survey. The latter is only submitted to large firms andasubsetofsmallfirms,sothefirmIDisnotentirelyreliableoutsideofCensusyears,whichis theyearswhenthequinquennialCensusisconducted;forthisreason,weonlyuseCensusyearsin ouranalysis. In most of the paper, we measure the size of an establishment or firm with its number of employees. Most of the analysis will be conducted on the whole universe, but we also consider two subsamples: manufacturingandservices,wherethelatterexcludesretail,wholesaleandFIRE.3 The LBD covers nearly the entire U.S. business population, but only provides us with limited information. Census surveys on manufacturing establishments include much richer detail on their operation. Foster et al. (2016) estimate Total Factor Productivity (TFP) with data from the AnnualSurveyofManufactures(ASM)andthequinquennialCensusofManufactures(CM).Sincethe distribution of productivity shocks represents an important primitive assumption in many theories, we extended our analysis to the distribution of TFP of manufacturing establishments, discussed in Section 4.5. Table 2 shows the number of establishments and firms in the services and manufacturing sectors as well as the number of establishments in the TFP dataset, which does not include 2012. NotethatservicesmakeupthevastmajorityoftheestablishmentsandfirmsintheLBD,and thuscontributeagreatersharetothedistributioncalculationsfortheuniverseofestablishmentsand firms. AlsonotethattheTFPcalculationisavailableforabitoverhalfofmanufacturingestablishorotherimportantinformation.SeeJarminandMiranda(2002)foradditionaldetailsontheLBD. 3Wedefinethemanufacturingsectorasallestablishmentswithtwo-digitSICcodes∈[20,40)foryears1977,1982, 1987,1992,and1997.For2002,2007,and2012,wedefinemanufacturingasestablishmentswithtwo-digitNAICScodes ∈[31,33].Theservicessectorisdefinedasalltwo-digitSICcodes∈[70,90)andtwo-digitNAICScodes∈[54,81]and 51forthesameyears.Weassignfirmstothesectorwheremostofitsemployeeswork. 4

ments. Table2: Samplebysector Services Manufacturing Year Est. Firm Est. Firm TFPest 1982 1,430,000 1,280,000 330,000 270,000 190,000 1992 2,000,000 1,730,000 350,000 290,000 190,000 1997 2,240,000 1,920,000 360,000 300,000 210,000 2002 3,070,000 2,500,000 330,000 280,000 180,000 2012 3,440,000 2,760,000 280,000 230,000 Note:Numbersarerounded. 3 Parametric Distributions and Estimation Methods Motivated by the existing literature, we fit four parametric distributions to the data.4 The first and most popular distribution is Pareto. Axtell (2001) provides the benchmark evidence that the employmentandsalesfirmsizedistributionsintheU.S.arewellapproximatedbyaParetocloseto Zipf’s law. As a consequence, along with analytical tractability, much of the endogenous growth literature focuses on generating a Pareto distribution and Pareto is widely used in heterogeneous firm models that assume an exogenous distribution. The CDF of a Pareto with scale parameter x m andshapeparameterα is (cid:16)x (cid:17)α m F (x)=1− . (1) P x ForthistypeofParetodistribution,themeanis α x forα >1andthevarianceis α x2 α−1 m (α−1)2(α−2) m for α >2. When α ≤2, the variance is undefined, and when α ≤1, the mean and variance are undefined. These properties are especially important given the range of shape parameter estimates wefindinthedata. ThelognormaldistributionhasfrequentlybeenconsideredasapossiblealternativetothePareto distribution. The log of a lognormal random variable follows a normal distribution. The CDF of a lognormalwithparametersµ andσ is (cid:18) (cid:19) 1 1 lnx−µ F (x)= + erf √ . (2) L 2 2 2σ 4SeeSimonandBonini(1958)forasimpleandseminalstatisticalmodeloffirmgrowthwhichcangenerateboth lognormalandParetodistributionsasspecialcases. 5

Forthelognormaldistribution,themeanisgivenbyeµ+σ2/2 andthevariance(eσ2−1)e2µ+σ2. Besides the two most popular parametric distributions in the literature, we consider two distributions that combine Pareto and lognormal with the hope to provide a better fit. One is a pure statisticalmixtureofthetwodistributionsandtheotheraconvolutionofthetwodistributions. Specifically, theCDFF ofthemixtureofaParetoandalognormalusingamixingparameter M pis F (x)= pF (x)+(1−p)F (x), (3) M L P where F is the CDF of a lognormal with parameters µ and σ and F is the CDF of a Pareto with L P scaleparameterx andshapeparameterα. m Finally,wedefinetheconvolutionastheproductofaParetorandomvariablewithCDFF and P alognormalrandomvariablewithCDFF . Equivalently,thelogofsuchconvolutionisthesumof L anormaldistributionandanexponentialdistribution.5 Thus,theCDFoftheconvolutionis F C (lnx)=Φ(α(x−µ);0,ασ)−e−α(x−µ)+(ασ 2 )2 Φ(α(x−µ);(ασ)2,ασ), (4) whereΦ(x;µ,σ)istheCDFatxofanormaldistributionwithparametersµ andσ. MostofthepreviousliteratureonthefirmsizedistributionevaluatedthefitoftheParetodistributionusingregressionanalysis,seeAxtell(2001)andGabaix(2009). Ithasbeenwidelydocumented inboththestatisticsandeconometricsliteraturethatregressionanalysisisnotwellsuitedtotestthe goodness of fit of a Pareto distribution.6 Since we are estimating parametric models with a known and simple likelihood, we use Maximum Likelihood Estimation (MLE) for its excellent statistical properties.7 Todeterminewhichdistributionbestfitsthedata,werelyonformalstatisticaltesting. For nested models, we use the popular likelihood ratio test. If L is the maximum likelihood 1 of a model, L is the maximum likelihood of a reduced version of the model, and k is difference 0 betweennumberofparameters,thenΛ=−2ln(L 0)isasymptoticallydistributedaccordingtoχ2. L k 1 Fornon-nestedmodels,weuseatestdevelopedbyVuong(1989). Thistestisafunctionofthe likelihoodratiotest: Λ˜ =n1 2 Λ ,where f isthemodelinthenumeratoroftheratioandgthemodel ωn inthedenominator. UnderthenullhypothesisH thatthetwomodelsareequivalent,Λ˜ −→ D N(0,1); 0 under the first alternative H that model f is better, Λ˜ − a → .s ∞; finally, under the second alternative f H that model g is better, Λ˜ − a → .s −∞. We also use the Akaike information criterion (AIC), which g 5SeeReed(2001)forthestochasticgrowthprocessesthatcanyieldsuchdistribution.SagerandTimoshenko(2017) alsorationalizethisdistributionusingamodelwithParetoproductivityshocksandlognormaldemandshocks. 6Clausetetal.(2009)discussthisissueindetailandalsoprovideaMonteCarlosimulationtoshowthatthelognormal distributioncanapproximateaParetoverycloselywhenevaluatedwiththeregressionanalysisapproachusedinAxtell (2001)andGabaix(2009).SeealsoEeckhout(2009). 7WeranMonteCarlosimulationstoinvestigatetheaccuracyoftheMLEinestimatingourmodelsandfoundittobe reliable.Resultsareintheappendix. 6

hastheattractivefeatureofpenalizingmodelswithahighernumberofparameters. Our preferred measure of establishment and firm size, the number of employees, is discrete, whereasallthedistributionswedescribedsofarhaveacontinuoussupport. WefollowBuddanaand Kozubowski(2015)anddiscretizethedistributions. Inparticular,ifF(·)istheCDFofacontinuous distribution,thePMFofthediscretizeddistributionisdefinedasPr(X =n)≡F(n+1)−F(n). In otherwords,thecontinuousdistributionisdiscretizedbycreatingabinforeachintegervalue. Finally, our data includes only establishments and firms with at least one employee, but the lognormal and convolution have support starting at zero; to make the estimation possible we shift rightbyoneunitthelognormal,thelognormalcomponentofthemixture,andtheconvolution. 4 Employment Distribution Results in the United States In this section, we highlight three stylized facts that emerge from the U.S. firm and establishment employmentdistributionsintheyears1982,1992,1997,2002,and2012.8 4.1 LognormalversusPareto StylizedFact1 For both firms and establishments, the best-fitting Pareto distribution has an estimatedshapeparametersignificantlybelow1,anddifferssubstantiallyfromthecommonZipf’slaw benchmark. StylizedFact2 A lognormal fits both establishment and firm size employment distributions better thanthecommonlyusedPareto,evenfarintheuppertail. Table3: Paretoandlognormalestimatesusingtheentiresample Pareto lognormal α µ σ Year Est. Firm Est. Firm Est. Firm 1982 0.57 0.61 1.38 1.21 1.52 1.81 1992 0.56 0.61 1.40 1.21 1.53 1.71 1997 0.56 0.61 1.41 1.17 1.56 1.74 2002 0.55 0.60 1.44 1.15 1.57 1.80 2012 0.56 0.62 1.37 1.14 1.61 1.80 8To capture long-run trends we analyze every 10 years starting with the 1982 census year, and we add 1997 for comparisontoAxtell(2001). 7

Table 3 shows the maximum likelihood estimates of the lognormal and Pareto distributions, using the entire sample of establishments and firms. Both the Vuong test and AIC find that the lognormal distribution is preferred over Pareto for establishments and firms in all years, while the best-fitting Pareto consistently has a shape parameter significantly below 1, around 0.6. The estimatesarealltightlyestimated,standarderrorsandteststatisticsareavailableuponrequest. RightTailEstimates Whilethelognormaldistributionmightbeabetterfitoverall,forsomeeconomicquestions,onlytheuppertailistherelevantportionofthedistribution. Therefore,wepresent in Table 4 the parameter estimates for a Pareto distribution and a truncated lognormal distribution atvariousemploymentthresholdsforthe1997firmsizedistribution. Table4: Paretoandlognormalinthefirmsizerighttail Pareto lognormal Threshold α µ σ N(rounded) 2 0.76 -0.02 1.93 3,750,000 5 0.97 -5.88 3.11 2,150,000 10 1.05 -14.97 4.26 1,160,000 25 1.11 -14.79 4.22 450,000 50 1.12 -16.44 4.46 210,000 100 1.10 -19.10 4.85 95,000 200 1.05 -22.01 5.28 43,000 300 1.02 -23.80 5.56 28,000 400 1.01 -24.73 5.72 20,000 500 1.01 -21.29 5.40 16,000 1000 1.01 -5.32 3.73 8,100 2500 1.05 1.99 2.68 3,300 5000 1.11 5.70 2.00 1,600 10000 1.23 7.21 1.66 800 Notes: The estimates are reported using the 1997 firm size sampleinordertoensureconsistencyandcomparabilitywith Axtell(2001),themainbenchmarkintheliterature. InTable4,theParetoshapeparameterfirstincreasesmonotonicallywiththeuppertailthreshold. At various thresholds the shape parameter is in fact near one, corresponding to Zipf’s law. Note, however, that Zipf’s law draws its power from the thickness of the right tail, and at the highest thresholds of 5,000 and 10,000 the shape parameter is non-trivially above one.9 Indeed, the lack of economic stability of the shape parameter estimates across cutoffs suggests that the underlying 9SeeTable4fortruncateddatasamplesize. 8

distribution is not Pareto, because a true Pareto distribution would have shape parameter estimates invarianttothecutofformorestablefurtherintheuppertail.10 DoestheParetoProvideaBetterFit? InTable5,wealsousetheVuongstatistictoformallytest which distribution has a better fit for each truncation threshold. The truncated lognormal provides the best fit both when the threshold is at or below 10 employees and when the threshold is at or above 400 employees. With 300 employees, neither distribution is preferred. Thus, the lognormal fittypicallydominatestheParetofitevenfarintheuppertail,exceptinarelativelynarrowtruncation window. Table5: Paretovslognormal(Vuongstatistic) Threshold Vuongstatistic P-value Winner 2 -223.6 0.00 lognormal 5 -58.4 0.00 lognormal 10 -11.3 0.00 lognormal 25 18.3 0.00 Pareto 50 22.4 0.00 Pareto 100 18.4 0.00 Pareto 200 6.4 0.00 Pareto 300 1.0 0.16 None 400 -2.0 0.02 lognormal 500 -3.2 0.00 lognormal 1000 -3.3 0.00 lognormal 2500 -3.3 0.00 lognormal 5000 -3.4 0.00 lognormal 10000 -2.7 0.00 lognormal Revisiting the Zipf’s Law Evidence How can these results be reconciled with those of Axtell (2001)? Using a very popular methodology, Axtell explores the fit of a Pareto distribution by running a regression of the logarithm of the frequency distribution of the data on the logarithm of the firm size. Axtell concludes: “the Zipf distribution is an unambiguous target that any empirically accuratetheoryofthefirmmusthit.” HowwellalinefitsthelogfrequencyplotisofteninterpretedasevidenceofthefitofthePareto distribution. AsextensivelyexplainedbyClausetetal.(2009)andEeckhout(2009),thismethodis ill-suited to determine how well the Pareto fits, as it generates significant systematic errors. Moreover, Axtell(2001)computesthefrequencydistributionusingsuccessivebinsofincreasingsizein powers of three. This approach yields thirteen (13) data points and the regression estimation on 10Thewell-knownHillplotforvisuallyidentifyingpower-lawdistributionsisbasedonthisstabilityargument. 9

suchbinnedsampleeffectivelygivesmoreweighttotheobservationsintherighttail. We first replicated Axtell (2001)’s procedure to ensure that differences in our results were not duetotheunderlyingsampleused.11 Axtell(2001)alsousestheLBD,butincludesnon-employee firms. In his regressions, the slope was 2.059 with a standard deviation of 0.054, which implies a Pareto shape parameter at 1.059. Our replication of his linear regression with our data produced a slope of 2.057 with a standard deviation of 0.039, which implies that the differences in results are notduetotheunderlyingdata,buttothemethodologiesused. VisualinspectionofFigure1(p. 1819)fromAxtell(2001)alsoprovidessomeintuitionforthe differenceinresults. Axtell’sfirstandlastbinsarebothbelowtheregressionline. Tofitthefirstbin, which contains a large portion of our sample, would require a shallower slope, or a Pareto shape parameterbelow1. Tofitthelastbin,containingthelargestfirms,wouldrequireasteeperslope,or aParetoshapeparameterabovethe1.06thatAxtellfound. Thisisconsistentwiththelargershape parameterforthefarrighttailinTable4. Discussion Overall, the evidence in this section indicates that the lognormal typically provides a better fit than the Pareto, even in the right tail of the firm size distribution. However, the bestfittinglognormalmaylookquite similartothebest-fittingPareto. Does thismeanthat, despitethe better statistical fit of a lognormal, these different distributions are interchangeable in economic models? Simon (1955) and Simon and Bonini (1958) provide an early and insightful discussion ofdifferentingredientsgeneratingvariousheavy-tailedsteady-statesizedistributions. Moreover,if foragivenmodelthetwodistributionsyielddifferentresults, thenitmustbethatthenatureofthe left tail of the distribution or its right tail is important. In the next section we sharply contrast the implicationsoftheseseeminglyinterchangeabledistributionsinonesuchcontext: Gabaix(2011)’s study of aggregate fluctuations arising from idiosyncratic shocks to very large firms in economies withafinitesetoffirms,or“granular”economies. 4.2 LognormalversusPareto: SomeTheoreticalImplicationsforGranularity Having established that a lognormal distribution is a better fit than a Pareto for the U.S. firm and establishmentemploymentdistributions,wenowhighlightsometheoreticalimplicationsforaggregatevolatilitythatsharplydistinguishthetwodistributions. Gabaix (2011) provides a useful framework, emphasizing the potential of large firms to generate sizable aggregate shocks in a “granular” economy. Gabaix (2011) focuses his discussion on a Pareto distribution as it approaches Zipf’s law. Gabaix (2011) shows that idiosyncratic shocks canbeamoreimportantsourceof aggregatevolatilityunderZipf’slaw, comparedtoalognormal. 11InAppendixA.1,weanalyzetheabilityoftheAxtell(2001)regressionapproachtorecoverthetrueparametersof aParetodistributionandotherdistributions. 10

Here, we show the implications of going past Zipf’s law, with an empirically-plausible shape parameterbelowone,andcontrastittotheimplicationsofalognormaldistribution. Inparticular,we showthatParetoshapeparameterestimatesbelowonehaveunappealingeconomicimplicationsin granular economies, further suggesting that it may not be an appropriate modeling assumption in heterogeneous-firm models. We state below the main results and their intuition. See Appendix C fordetailedderivations. Proposition1 ConsiderN firmsfacingi.i.d. multiplicativegrowthshockswithfinitevarianceς2 to firmemployment. Ifthesizedistributionoffirmshasfinitemeanandvariance,thenthevarianceof aggregateemployment,σ2 ,isdecreasingatarateproportionalto 1. N,t N For a formal proof, see Appendix C. The proof is an application of the law of large numbers. Intuitively, withfinitemeanandvariance, thetailofthedistributionisnottoofatanduncorrelated shockstofirmswillcanceleachotherout,andaggregatevolatilitydecreasesasthenumberoffirms inaneconomyincreases. Thisholdstrueofthelognormaldistributionwithanyparameters,butfor Pareto, it depends on the shape parameter. Building on Gabaix (2011), we fully characterize the ParetocasewithProposition2. Proposition2 Consider N firms facing i.i.d. multiplicative growth shocks with finite variance ς2 to firm employment. If the size distribution of firms is Pareto with shape parameter α, then the varianceofaggregateemployment,σ2 ,isproportionalto: N,t  ς2×1/N whenα >2      (cid:18) N (cid:19)  ς2×1/ whenα =2   lnN    ς2×1/ (cid:16) N1− α 1 (cid:17)2 whenα ∈(1,2) (5)      ς2×1/(lnN)2 whenα =1     ς2 whenα ∈(0,1) SeeAppendixCforaformalproof. Withα >2,wereturntotheworldofProposition1. With alowershapeparameter,aggregatevolatilitydeclinesmoreslowlywithN. Whenα <1,theupper tail is heavier than Zipf’s law (α =1) and aggregate volatility does not decline at all with N: the largest firms so disproportionately dominate that aggregate volatility is the idiosyncratic volatility of these firms. This is a starkly different context than the better-fitting lognormal, where volatility declinesrapidlywithN.12 12Wealsoshowthat,quantitatively,thegranularoriginsofaggregatefluctuationswouldbeunrealisticallylargeinthe caseofthebest-fittingPareto. SeeAppendixCforthecalibratedsimulationresultsinthespiritofGabaix(2011). See alsoStella(2015),Rossbach(2017)and? forotherquantitativeinvestigationsofthecontributiontoaggregatevolatility comingfromidiosyncraticshocksingranulareconomies. 11

Discussion Zipf’s law is considered a good approximation of the size distribution of U.S. firms. Having replicated the main analysis underlying that benchmark, we use detailed Census data to show that the lognormal provides a better fit while the best-fitting Pareto is not stable in the upper tailandyieldsashapeparameterbelowonefortheentiresample. SinceSimonandBonini(1958), thegrowthliteraturehasalsoexploredeconomicandstochasticpropertiesthatrationalizeaPareto distribution versus a lognormal. Beyond the origins of the size distribution, we show that seemingly innocuous statistical differences can also lead to strikingly different economic implications. Thegranularitypropertiescharacterizedaboveillustratethispointandtheimportanceofaccurately characterizingthesizedistribution. TheexistingliteraturemotivatedusingthelognormalandtheParetoasnaturalstartingpointsin ouranalysisofthesizedistributionsofU.S.firmsandestablishments. However,neitherdistribution seemssufficientlyflexibletofitequallywelltherightandlefttailsofthedistribution. Wetherefore extendouranalysistoconsidertwomore-flexiblealternatives: astatisticalmixtureoflognormaland Pareto,andaconvolution,orproductofalognormalrandomvariableandaParetorandomvariable. 4.3 MixtureandConvolutionDistributions StylizedFact3 BoththemixtureandtheconvolutionoflognormalandParetodistributionsfitthe size distributions better than lognormal alone. Statistically, a mixture provides the best fit, but economically,theconvolution’sfitmaybemoresuitable. Mixture Estimates Table 6 provides the parameter estimates for the statistical mixture distribution of a lognormal and a Pareto. The parameters µ,σ, and α have the same meanings as before (see Section 3). Now we also estimate x , the minimum of the Pareto distribution. This approach m effectivelymeansthattheParetodistributionisallowedtoworkonanendogenouslychosencutoff of the right tail of the distribution. In practice, this is approximately 3 employees and very stable acrossbothfirmsandestablishments. The mixture also has the unique p parameter, specifying the degree to which the estimated distribution is lognormal. For both establishments and firms, this lognormal mixing parameter p is about 0.8 to 0.9, without much meaningful difference between establishments and firms across years. If anything, the distribution appears to be getting more lognormal over time, especially for establishments. ComparingtheestimatesbetweenTables3and6, theParetoshapeparameterissystematically higher in the mixture. This is consistent with the estimates of the right tail in Table 4. Since the scale parameter x ≈3, the mixture is mixing in a Pareto only above this threshold, analogous to m thetruncatedestimations. Thisresultalsoshowsthattheestimationdidnotfavorahigherthreshold x yieldingaParetocomponentclosertoZipf’slaw(seeTable4). Overtime,theestimatedPareto m shapeparametersalsoappeartobeslightlydecliningforfirms,butnotforestablishments. 12

Table6: ParameterEstimates-Mixture µ σ p x α m Year Est. Firm Est. Firm Est. Firm Est. Firm Est. Firm 1982 1.18 1.02 1.50 1.43 0.83 0.86 3.39 3.37 0.80 0.75 1992 1.22 1.00 1.53 1.45 0.84 0.85 3.55 3.48 0.85 0.76 1997 1.25 1.00 1.58 1.49 0.86 0.86 3.57 3.47 0.85 0.74 2002 1.32 1.07 1.57 1.48 0.89 0.89 3.55 3.35 0.80 0.69 2012 1.24 0.97 1.60 1.52 0.91 0.89 4.47 3.62 0.83 0.68 Convolution Estimates The parameter estimates for the convolution of a lognormal and Pareto distributionareshowninTable7. Thelognormalµ andσ parametersaresystematicallylowerthan their mixture counterparts, while most notably the Pareto shape parameter α is higher, and always wellabove1,apointtowhichwewillreturnshortly. Table7: ParameterEstimates-Convolution µ σ α Year Est. Firm Est. Firm Est. Firm 1982 0.62 0.44 1.27 1.23 1.29 1.26 1992 0.70 0.45 1.32 1.26 1.39 1.26 1997 0.72 0.43 1.37 1.29 1.43 1.25 2002 0.76 0.44 1.39 1.28 1.46 1.23 2012 0.75 0.34 1.46 1.33 1.58 1.22 A lower shape parameter α for firms, relative to establishments, means that the right tail firm sizewillbethickerthantheestablishmentsizerighttail.13 The parameter estimates remain reasonably stable over time. If anything, the distribution appears to embed a more dispersed lognormal component over time, especially for establishments. TheParetocomponentfortheestablishmentsizedistributionalsoappearstohaveslightlylessheavy tailanditisbecominglessheavyovertime,whilethetailoftheParetocomponentforthefirmsize distributionismorestableovertime. Testing the Four Models With estimates for four distributions in hand, we formally test which distribution fits the data best. Pareto and lognormal are both nested in a mixture and can therefore betestedwiththelikelihoodratiotest. Fornon-nestedmodels,weusethetestdevelopedbyVuong 13Thispatternisalsoapparentforthemixture.Wechosenottoemphasizetherelativethicknessofthefirmsizeright tailbecausetheParetoshapeparametersestimatedinthosecasesarebelow1. Interestingly,theParetoestimatedonthe entiresampledoesnothavethesameproperty,suggestingtheimportanceofusingmoreflexibledistributions. 13

(1989) described earlier. As an alternative, we also computed the AIC for all the distributions and findidenticalrankings. For the 6 paired tests (testing each of 4 distributions against each other), the rankings are consistent across years and between establishments and firms: a mixture is always the most preferred distributionandaconvolutionthesecondmostpreferred.14 Again,theAICproducesidenticalrankings. While the statistical ranking is clear, it does not provide any feel for the nature of the fit. For this, we turn to a tabulation of the 1997 Business Dynamic Statistics (BDS) and data simulated using the parameter estimates in Tables 3 to 7. The BDS are public and are drawn from the same underlyingdatathatweuseforouranalysis.15 The first column of Table 8 shows the employment size categories provided by the BDS; the second column shows the tabulation of the U.S. firms by size in 1997: for instance, 54.8% of firms have between 1 and 4 employees. Starting from the third column, we show the tabulation of simulated data. Table 8 clearly indicates that the Pareto with the shape parameter from Axtell (2001)providesaverypoorfitoftheU.S.firmsizedistribution. EventheParetowithourestimate of the shape parameter, the fourth column, does not fit the data well, putting too much weight on theleftandrighttails. Lognormal,mixture,andconvolutionallprovideabetterfit. Table8: Tabulationof1997BDSdataandsimulateddata BDS Axtell Pareto lognormal Mixture Convolution 1to4 54.8 81.9 62.6 54.9 55.0 55.8 5to9 21.2 9.4 13.0 17.3 20.4 19.4 10to19 12.2 4.5 8.6 12.3 12.5 12.3 20to49 7.5 2.6 6.9 9.5 7.9 8.1 50to99 2.3 0.8 3.2 3.4 2.3 2.6 100to249 1.2 0.5 2.6 1.8 1.2 1.3 250to499 0.4 0.1 1.2 0.4 0.3 0.4 500to999 0.2 0.1 0.8 0.1 0.2 0.2 1,000to2,499 0.1 0.0 0.6 0.0 0.1 0.1 2,500to4,999 0.0 0.0 0.3 0.0 0.0 0.0 5,000to9,999 0.0 0.0 0.2 0.0 0.0 0.0 ≥10,000 0.0 0.0 0.0 0.0 0.0 0.0 Figure1providesagraphicalrepresentationofthefitoftheparametricdistributionswith1997 BDS data. It depicts the Complementary Cumulative Distribution Function in log space: for each logarithm of s number of employees, it shows how many firms are larger than s. The black line 14Teststatisticsavailableuponrequest. 15TheU.S.CensusBureaudoesnotallowresearcherstodisclosehistogramsorothertabulationsofthedata,sowe usedatamadepubliclyavailablebytheU.S.CensusBureau. 14

representsthedataavailableintheBDSdescribedinTable8. Distributionswithfitsabovetheblack solid line represent too many large firms compared to the data, while distributions with fits below theblacklinehavetoofewlargefirmsrelativetothedata. Figure 1 confirms that the mixture and convolution appear to provide the best fit with this binning procedure. By contrast, our estimate of the Pareto shape parameter (the dot-dashed line) fits thelefttailbutimpliesfartoomanylargefirms. Notably,theAxtellParetofit(thedashedline)with shape parameter 1.06 remains below the data, but, overall, it matches the slope of the black line excludingthelefttail,whichisessentiallybydesignoftheregression-basedestimationprocedure. Thelognormalfitseemsdecentthroughaboute5≈150employeesbutproducestoofewverylarge firms. The convolution also has a slightly too-thin tail, while the mixture has a slightly too-thick tail. In Appendix B, we evaluate the stability of the employment size distribution in the BDS and thecorrespondingmixturedistribution’sfitacrosstime. Figure1: CCDF 16 14 ) s12 fo th g ir10 s m r if fo 8 m u Axtell n 6 ( n Pareto l Lognormal Mixture 4 Convolution BDS Data 2 0 2 4 6 8 10 s=ln(employees) ImplicationsforEmploymentShares InTable9,wetakeacompletelydifferentcutofthedata: thefractionofoverallemploymentaccountedforbyfirmsineachbin.16 Sincethesedatamoments are not explicitly targeted in the estimation, we can use them as an “out-of-sample" test of the fit oftheparametricdistributionsunderstudy. Thesecondcolumnshows,forexample,thatfirmswith 16SeeAppendixBforconfidencebandsandadescriptionofhowthistablewasconstructed. 15

20to49employeesaccountfor10.6percentofoverallemploymentintheeconomy. Here,Axtell’s estimate of Pareto with a shape parameter of 1.06 fits the data somewhat better in absolute terms for each bin. Notably, however, too much employment is accounted for by very small firms (11.6 percent versus 5.7 percent in the data for firms with 1 to 4 employees) and very large firms (36.6 percentversus24.7percentforfirmswithmorethan10,000employees). But this table also makes clear the odd results of a Pareto distribution or Mixture containing a Pareto with a shape parameter below one. Simulating this distribution generates some very, very large firms, whose employment completely dominates the economy. The lognormal, as expected fromFigure1,generatestoofewlargefirmsoftoosmallasize. However,formostbins,therelative employment accounted for in each bin is comparable to the data (that is, the ratio of any two rows excludingthelargefirmsizes). Notably,bythismetric,theconvolutionfitsthedataremarkablywell. Thelefttailhasabittoomuchoftotalemploymentandtherighttailhastoolittle, butweshowin Appendix B that in Monte Carlo simulations, the convolution is quite capable of reproducing the valuesweobserveintheBDS. It is striking that even the convolution distribution does not fit the tabulation of the fraction of employment in Table 9 as well as the frequency histogram in Table 8; since the width of the brackets in the two tables is increasing and quite sizable for large firms, discrepancies between the convolution and the empirical distribution within brackets, especially in the right tail, must be responsible for the differences in how well the parametric distributions appear to fit the empirical distributioninthetwotables. Inotherwords,thefractionoftotalemploymentdistributionismuch moresensitivetodiscrepanciesbetweentheparametricdistributionsandtheempiricaldistribution, especiallyintherighttail. Table9: Fractionof1997firmemployment BDS Axtell Pareto lognormal Mixture Convolution 1to4 5.65 11.56 0.00 7.04 0.46 6.57 5to9 6.53 5.44 0.00 7.52 0.53 7.15 10to19 7.73 5.41 0.00 11.06 0.67 9.27 20to49 10.62 6.95 0.00 19.07 0.94 13.57 50to99 7.52 5.06 0.00 15.56 0.64 9.81 100to249 8.72 6.40 0.01 17.99 0.68 11.23 250to499 5.54 4.63 0.01 9.76 0.44 7.06 500to999 5.09 4.45 0.01 6.18 0.44 5.95 1,000to2,499 7.07 5.61 0.02 4.00 0.66 6.44 2,500to4,999 5.4 4.04 0.02 1.18 0.61 3.98 5,000to9,999 5.46 3.88 0.03 0.44 0.72 3.35 ≥10,000 24.68 36.58 99.90 0.19 93.19 15.64 16

4.4 ManufacturingversusServices We have so far discussed results obtained using the entire U.S. population of firms and establishments. Inthissection,weestimatetheparametricdistributionsontwosubsamples: manufacturing andservices,wherethelatterexcludesretail,wholesaleandFIRE. StylizedFact4 Manufacturing and services sectors have notably different distribution estimates, but the distribution fit ranking is unchanged in these sector-year subsamples relative to the the aggregatedata. ParetoandLognormalEstimates Table10presentstheestimatesforthelognormaldistribution by sector. Focusing on the average line, manufacturing establishments have both a larger µ and σ than services establishments, implying both a greater fitted mean and variance. A similar pattern holds for manufacturing firms relative to services firms. Table 11 presents the Pareto shape parameterα bysector. Notably,manufacturingestablishmentsandfirmshaveanα evenlowerthan for all firms, averaging about 0.4 compared to services’ 0.6. These results suggest that in models focusing onmanufacturing firms, such asstandard modelsin internationaltrade, thefit of aPareto distributionontheoverallempiricaldistributionisnotmoresensiblethanalognormal. Over time, both the mean and the variance of the estimated lognormal appear to be increasing forfirmsandestablishmentsintheservicessector,butnotsomuchinthemanufacturingsector. Table10: Lognormalbysector Manufacturing Services Year µ σ µ σ Establishments 1982 2.42 1.75 1.09 1.50 1992 2.32 1.76 1.21 1.53 1997 2.34 1.77 1.23 1.58 2002 2.21 1.75 1.43 1.60 2012 2.13 1.77 1.40 1.64 Average 2.28 1.76 1.27 1.57 Firms 1982 2.13 1.65 1.01 1.48 1992 2.05 1.69 1.11 1.54 1997 2.08 1.71 1.09 1.60 2002 1.98 1.68 1.25 1.60 2012 1.90 1.72 1.20 1.63 Average 2.03 1.69 1.13 1.57 17

Table11: Paretoα bysector Establishments Firms Year Manufacturing Services Manufacturing Services 1982 0.38 0.65 0.42 0.67 1992 0.39 0.61 0.43 0.64 1997 0.39 0.60 0.43 0.64 2002 0.41 0.55 0.44 0.60 2012 0.42 0.55 0.45 0.60 Average 0.40 0.59 0.44 0.63 Mixture Estimates Table 12 reports the estimates for the mixture distribution by sector. The lognormal parameters remain remarkably similar to those from estimating lognormal alone, but therearenotabledifferencesintheParetoshapeparameter. Table12: Mixturebysector Manufacturing Services Year µ σ p x α µ σ p x α m m Establishments 1992 2.32 1.79 0.94 3.66 0.88 0.89 1.43 0.79 3.50 0.78 1997 2.34 1.80 0.96 4.49 1.12 0.96 1.51 0.81 3.51 0.77 2002 2.22 1.78 0.95 3.50 1.01 1.34 1.59 0.92 3.56 0.76 2012 2.13 1.79 0.98 4.48 1.96 1.29 1.63 0.92 4.48 0.81 Average 2.25 1.79 0.96 4.03 1.24 1.12 1.54 0.86 3.76 0.78 Firms 1992 1.90 1.63 0.88 4.48 0.59 0.74 1.35 0.78 3.49 0.78 1997 1.94 1.66 0.89 4.52 0.58 0.77 1.41 0.80 3.50 0.76 2002 1.85 1.62 0.90 4.53 0.57 1.06 1.48 0.87 3.48 0.71 2012 1.76 1.66 0.90 4.41 0.58 1.00 1.52 0.89 4.36 0.71 Average 1.86 1.64 0.89 4.49 0.58 0.89 1.44 0.83 3.71 0.74 Note:1982notreportedasestimateshaveunusuallylargestandarderrors. Startingwithmanufacturingestablishments,theParetoscaleparameter,x ,isestimatedtotake m effectaround4employeesandtheParetoshapeparameter,α,isestimatedtobeareasonable1.24, but the mixing parameter p = 0.96 suggests that manufacturing establishment size distribution is almost entirely lognormal. By contrast, the establishment size distribution in the services sector is somewhatlesslognormalwith p=0.86onaverage,butthecorrespondingParetoshapeparameter remainsrobustlybelow1. Manufacturingfirmshavealowermixoflognormal(p=0.89)but,unlike manufacturingestablishments,featurealowParetoshapeparameterbelowone. Thecontrastseems 18

lesspronouncedintheservicessector: estimatesforthesizedistributionoffirmsandestablishments arequitesimilar. Theestimatedparametersarerelativelystableovertime. Theestimateddistributionsappeartobe gettingmorelognormalovertime,especiallyintheservicessector. Overtime,theestimatedPareto shapeparametersalsoappeartoberelativelystable,butnotformanufacturingestablishments. ConvolutionEstimates Finally,Table13showsthesectoralparameterestimatesfortheconvolution distribution. As with the previous distributions, manufacturing establishments and firms have much higher estimates of mean µ and standard deviation σ than their services counterparts. The Paretoshapeparameteriswellabove1forallsubsamples,whichfurthersuggeststhattheconvolutionyieldsreasonableparameterestimatesandisasolidcandidateforuseincalibratedmodels. Over time, for establishments in both sectors, there is a steady rise in the shape parameter α of the Pareto component. Consequently, there is a growing deviation from Zipf’s law for establishments, especiallyinmanufacturingwheretheshapeparameterisestimatedtobeabove2since 1997. Incontrast, theshapeparameteroftheconvolution’sParetocomponentismuchmorestable and closer to 1 for firms in the manufacturing sector. The firm size distribution in services, however,featuresaslightupwardtrendintheParetoshapeparameterestimate. Theseresultsareoverall consistentwiththefindingsintheaggregatesample. Thesefindingsalsosuggestthatthemanufacturing sector, unlike services, has experienced different dynamics at the firm level, relative to the establishmentsthatcomprisethesefirms. Table13: Convolutionbysector Manufacturing Services µ σ α µ σ α Establishments 1982 1.75 1.61 1.49 0.27 1.16 1.17 1992 1.75 1.66 1.75 0.40 1.23 1.20 1997 1.91 1.72 2.31 0.43 1.30 1.22 2002 1.73 1.68 2.08 0.80 1.44 1.55 2012 1.78 1.73 2.89 0.84 1.52 1.76 Average 1.78 1.68 2.10 0.55 1.33 1.38 Firms 1982 1.26 1.37 1.15 0.19 1.12 1.16 1992 1.18 1.41 1.14 0.26 1.18 1.15 1997 1.22 1.45 1.16 0.26 1.23 1.16 2002 1.15 1.42 1.19 0.46 1.29 1.23 2012 1.04 1.45 1.15 0.43 1.37 1.27 Average 1.17 1.42 1.16 0.32 1.24 1.19 19

Figure2: CCDFforManufacturing 14 12 ) s fo 10 th g ir 8 s m r if fo 6 m u Axtell n ( 4 Pareto n l Lognormal Mixture 2 Convolution BDS Data 0 0 2 4 6 8 10 s=ln(employees) Figure2showsagraphicalrepresentationofthefitoftheparametricdistributionsofmanufacturing with 1997 BDS data. Like Figure 1, it depicts the Complementary Cumulative Distribution Functioninlogspace. Again,distributionsabovetheblacksolidlinerepresenttoomanylargefirms compared to the data. Here we note that the convolution and mixture continue to fit the overall shape of the CCDF well, while our Pareto estimate generates far too many large firms. The lognormal distribution fits the left tail and middle of the distribution well, but generates too few large firms. We also plot Axtell’s Pareto estimate of 1.06 for comparison, and it has a similar fit to the manufacturingdistributionasitdoesfortheoveralldistribution. In Figure 3, we show the same CCDF plot for services with 1997 BDS data. Relative to manufacturing, the mixture provides an even better fit across all bins, while lognormal struggles more with the right tail. Our Pareto estimate remains systematically above the data, while Axtell’s is below. Finally, we formally test which distribution fits the data best by sector. For establishments and firms in both manufacturing and services across 1982, 1992, 1997, 2002, and 2012, the formal statisticaltestsandtheAICprovideaconsistentrankingofdistributionfit: themixturefitsthebest, thenconvolution,thenlognormal,andlastPareto.17 Therankingsarestatisticallysignificantatleast ata5percentlevelandtypicallyatamuchtighterlevel.18 17Detailedresultsavailableuponrequest. 18OurresultsarenotconsistentwiththefindingsbyQuandt(1966),whoestimatedthefirmsizedistributioninseveral 20

Figure3: CCDFforServices 15 ) s fo10 th g ir s m r if fo m u 5 Axtell n ( Pareto n l Lognormal Mixture Convolution BDS Data 0 0 2 4 6 8 10 s=ln(employees) 4.5 ManufacturingTFP StylizedFact5 The distribution of establishment-level total factor productivity is also better describedbylognormalthanPareto. Modern macroeconomic models prominently feature firm heterogeneity and therefore require assumptions on the distribution of productivity shocks. In particular, the analytical tractability of theParetodistributionanditsapparentgoodfittothedatahavemadeitacommonassumption. For instance,intheirinfluentialpaperongainsfromtradeinnewtrademodels,Arkolakisetal.(2012) assumePareto. Instandardmonopolisticcompetitionmodels,thedistributionofproductivityshocks self-reflects into the firm size distribution, but given selection, demand functions, and potentially manysourcesofshocks,thisisnotalwaysthecase.19 Therefore, in this section, we focus directly on estimates of the TFP distribution available for theU.S.manufacturingsector. WeusetheTFPRmeasuresofTFPasestimatedforestablishments intheCensusofManufactures. Ourdistributionestimatesareonlyfor1982,1992,1997,and2002 sectorsseparately. Hisrankingofdistributionfitvariesfromsectortosector. However,hissampleonlyincludedvery largefirms,andhedidnotconsiderthelognormaldistributionormixturesoflognormalandPareto. 19Mrazovaetal.(2017)characterizehowtheanalyticalpropertiesofthedemandfunctioncriticallyshapetheimplied distributionofsalesandoutputusingstandarddistributionsforproductivitysuchaslognormalorPareto. 21

becausetheseTFPRestimatedbyFosteretal.(2016)werenotavailablefor2012atthetimeofour study. Table 14: Manufacturing establishment TFP Year α p µ σ Pareto 1982 0.11 1992 0.14 1997 0.12 2002 0.15 Average 0.13 Lognormal 1982 1.86 0.57 1992 1.90 0.54 1997 1.93 0.54 2002 1.97 0.57 Average 1.92 0.56 Year x α p µ σ m Mixture 1982 3.68 1.59 0.87 1.85 0.56 1992 4.75 1.52 0.88 1.86 0.51 1997 19.65 1.83 0.98 1.90 0.50 2002 19.49 1.71 0.97 1.92 0.49 Average 11.89 1.67 0.93 1.88 0.52 Convolution 1982 4.00 1.61 0.51 1992 3.44 1.61 0.45 1997 3.57 1.65 0.46 2002 2.75 1.61 0.43 Average 3.44 1.62 0.46 Table14showstheresultsofthemaximumlikelihoodestimationofthefourparametricdistributions considered in this paper. Starting with the Pareto shape parameter fit in the top sub-panel, the estimate averages a paltry 0.13, inheriting all of the economic issues inherent with an α below 1. The second sub-panel provides the lognormal fit, which remains remarkably consistent across time. Themixtureestimatesagainshowaveryhighproportionoflognormal,0.93onaverage. The lognormalparametersµ andσ areverysimilartotheestimatesfromthelognormalalone,whilethe shapeparameteriswellabove1inallyears. ThisisincontrasttotheParetoshapeparameterofthe mixture for the employment size distributions, which were below 1. Those lognormal parameters 22

remainfairlysimilarintheconvolution,whichincludesveryhighParetoshapeα averaging3.4. Using the likelihood-based statistical tests, we find that generally, the mixture distribution outperformslognormal,but,in2002,theconvolutionoutperformsbothmixtureandlognormal,bumpingthemixturefittosecondplace. Therankingsarestatisticallysignificanttypicallyatwellbeyond a1percentlevel. OurresultsareconsistentwiththeevidenceprovidedbyCombesetal.(2012)and Nigai (2017) using French data. The data strongly suggest that Pareto provides a poor fit, and that lognormalisareasonabledistributionforTFP. 5 Conclusion In this paper, we use confidential microdata from the U.S. Census and maximum likelihood estimation to precisely characterize the U.S. firm and establishment size distributions, as measured by the number of employees and, for manufacturing, TFP. We establish five stylized facts about thesedistributionsandprovideguidanceforresearchersinparameterizingmodelsthatincludefirm heterogeneity. The commonly used Pareto distribution is a particularly poor fit for U.S. establishments and firms. The lognormal distribution, a commonly used alternative, provides a better fit in most circumstancesandformosttruncationsoftherighttail. Economically,thelognormalfithasmarkedly different properties, as our Pareto shape parameter is robustly below one, implying neither a welldefined mean or variance. Our estimate is past the often discussed Zipf’s Law result studied by Gabaix(2011)andothers. Soforbothstatisticalandeconomicreasons,theuseofParetoisgenerallyunwise. Both a mixture and a convolution of a lognormal and Pareto distribution provide a better fit. While the mixture has the best statistical fit, its Pareto shape parameter below 1 implies that it inheritstheunappealingcharacteristicsoftheParetodistributioninthatrange. Inaddition,evaluating distribution fit by the fraction of employment accounted for by firms of each size, the convolution provides a clearly superior fit to the mixture. As such, the convolution might provide a more suitablechoiceeconomicallyforuseinotherapplications. ItcanalsobegeneratedinaveryreasonablewayastheproductofaParetodistributionrandomvariableandalognormallydistributedone; thus, models incorporating both productivity shocks and taste shocks could easily generate such a distribution. We estimate our four distributions on manufacturing and services sub-samples. Here too we showthatParetoisstrictlydominatedbytheotherthreedistributions,withthesameoverallranking of mixture first, then convolution, then lognormal, and Pareto fitting least well. While the services sectortendstohaveestimatesclosetotheoveralldistribution(unsurprisinggiventhatmuchofthe 23

overall distribution consists of our services sub-sample), the manufacturing estimates are characterized by even lower Pareto shape parameters. In a mixture, the Pareto shape parameter remains wellbelowoneformanufacturingfirms,butisnotasstableformanufacturingestablishments. Both manufacturing µ and σ tend to be higher than their services counterparts. And as with the results for all establishments and firms, the convolution yields sensible parameter estimates that can be includedtractablyinmodels. Finally,wemakeuseofmanufacturingTFPestimatestoconsiderthedistributionofthisdeeper sourceoffirmheterogeneity. WefindthatthebetterfitoflognormalrelativetoParetois,ifanything, even greater for TFP than for employment size. This lends further support to our suggestion that given only one source of heterogeneity, lognormal TFP draws are reasonable. Adding a second sourceofheterogeneitywithParetodistributeddrawswouldfurtherhelpfittheoverallemployment sizedistribution. Asidefromprovidingguidanceaboutcalibratingmodelswithexogenouslydefined firm heterogeneity, our results also highlight that future models of endogenous growth should not seektogeneratestraightParetosizedistributions. 24

References Arkolakis,Costas,ArnaudCostinot,andAndrésRodríguez-Clare,“NewTradeModels,Same OldGains?,”AmericanEconomicReview,February2012,102(1),94–130. Armenter,RocandMiklósKoren,“EconomiesofScaleandtheSizeofExporters,”Journalofthe EuropeanEconomicAssociation,June2015,13(3),482–511. Axtell, Robert L., “Zipf Distribution of U.S. Firm Sizes,” Science, September 2001, 293 (5536), 1818–1820. Buddana,AmruthaandTomaszJ.Kozubowski,“DiscreteParetoDistributions,”EconomicQualityControl,2015,29(2),143–156. Carvalho,VascoMandBasileGrassi,“LargeFirmDynamicsandtheBusinessCycle,”2018. Chaney,Thomas,“DistortedGravity: TheIntensiveandExtensiveMarginsofInternationalTrade,” AmericanEconomicReview,August2008,98(4),1707–1721. Clauset,Aaron,CosmaRohillaShalizi,andM.E.J.Newman,“Power-lawdistributionsinempiricaldata,”SIAMReview,November2009,51(4),661–703. arXiv: 0706.1062. Combes, Pierre-Philippe, Gilles Duranton, Laurent Gobillon, Diego Puga, and Sébastien Roux, “The Productivity Advantages of Large Cities: Distinguishing Agglomeration From FirmSelection,”Econometrica,November2012,80(6),2543–2594. diGiovanni,JulianandAndreiA.Levchenko,“CountrySize,InternationalTrade,andAggregate FluctuationsinGranularEconomies,”JournalofPoliticalEconomy,December2012,120(6), 1083–1132. Durrett,Rick,“Probability: TheoryandExamples5thEdition,”2017. Eeckhout,Jan,“Gibrat’sLawfor(All)Cities: Reply,”AmericanEconomicReview,August2009, 99(4),1676–1683. Fernandes,AnaM.,PeterJ.Klenow,SergiiMeleshchuk,MarthaDenissePierola,andAndrés Rodrıguez-Clare,“TheIntensiveMargininTrade: MovingBeyondPareto,”2015. Foster, Lucia, Cheryl Grim, and John Haltiwanger, “Reallocation in the Great Recession: CleansingorNot?,”JournalofLaborEconomics,2016,34(S1),S293–S331. Gabaix,Xavier,“PowerLawsinEconomicsandFinance,”AnnualReviewofEconomics,September2009,1(1),255–294. 25

,“TheGranularOriginsofAggregateFluctuations,”Econometrica,2011,79(3),733–772. Head,Keith,ThierryMayer,andMathiasThoenig,“WelfareandTradewithoutPareto,”AmericanEconomicReview,May2014,104(5),310–316. Luttmer, Erzo G. J., “On the Mechanics of Firm Growth,” The Review of Economic Studies, July 2011,78(3),1042–1068. Luttmer, Erzo G.J., “Models of Growth and Firm Heterogeneity,” Annual Review of Economics, 2010,2(1),547–576. Mrazova,Monika,JPeterNeary,andMathieuParenti,“SalesandMarkupDispersion: Theory andEmpirics,”September2017. Nigai, Sergey, “A tale of two tails: Productivity distribution and the gains from trade,” Journal of InternationalEconomics,January2017,104,44–62. Quandt,RichardE.,“OntheSizeDistributionofFirms,”TheAmericanEconomicReview,1966, 56(3),416–432. Reed,WilliamJ,“ThePareto,Zipfandotherpowerlaws,”EconomicsLetters,December2001,74 (1),15–19. Rossbach,Jack,“InternationalCompetitionandGranularFluctuations,”December2017,p.51. Rossi-Hansberg, Esteban and Mark LJ Wright, “Establishment size dynamics in the aggregate economy,”AmericanEconomicReview,2007,97(5),1639. Sager,ErickandOlgaA.Timoshenko,“TheEMGDistributionandAggregateTradeElasticities,” 2017. Simon, Herbert A., “On A Class Of Skew Distribution Functions,” Biometrika, December 1955, 42(3-4),425–440. andCharlesP.Bonini,“TheSizeDistributionofBusinessFirms,”TheAmericanEconomic Review,1958,48(4),607–617. Stella, Andrea, “Firm dynamics and the origins of aggregate fluctuations,” Journal of Economic DynamicsandControl,June2015,55,71–88. Vuong, Quang H., “Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses,” Econometrica,March1989,57(2),307. 26

A Monte Carlo experiments A.1 Axtell’sregressionanalysis ToinvestigatetheperformanceofAxtell’smethodologytodeterminethefitofaParetodistribution, we generated 250,000 synthetic datasets for each of the following parametric distributions: Pareto withAxtell’sestimatedparameter(1.06),andPareto,lognormal,MixtureandConvolutionwithour parameterestimatesfor1997Censusdata. WethenimplementedAxtell’smethodologytoestimate theParetoshapeparameteroneachofthefoursetsof250,000syntheticdatasets.20 Table15: Axtell’sregressionanalysisonsyntheticdata Axtell’sPareto Pareto lognormal Mixture Convolution Trueα 1.06 0.61 0.74 1.25 αˆ 1.05 0.61 1.37 0.81 1.12 c.i. (1.00,1.10) (0.58,0.63) (1.24,1.45) (0.77,0.85) (1.08,1.16) N 4,770,000 4,770,000 4,770,000 4,770,000 4,770,000 Num. sim. 250,000 250,000 250,000 250,000 250,000 Note:αisthemeanofthedistributionofOLScoefficientsforeachsyntheticdataset. c.i. isthe95%confidenceinterval. Nisthenumberofobservations.Eachsimulationisrunwith4,770,000observations,whichisthenumberoffirmsin1997 accordingtotheBDS. Table 15 shows that Axtell’s methodology is able to correctly uncover the Pareto shape parameter when the data is drawn from a Pareto. However, Axtell’s methodology produces a shape parameter close to but above one even when data is drawn from a Convolution with our estimated parameters. In other words, if the true firm size distribution were to be drawn from a convolution, Axtell’smethodologywouldincorrectlyfindempiricalsupportforZipf’slaw. A.2 MLEsimulations We now investigate the performance of Maximum Likelihood Estimation in estimating the parameters of the distributions of interest. We drew 1 million observations from the Pareto, lognormal, MixtureandConvolutionusingourestimated1997coefficientsastrueparameters.21 Wediscretized the data and then estimated the parameters using the same MLE procedure used in the paper. Finally,wecomputedthedistancebetweenthetrueparametersandtheestimatedparameters,andthe pair-wiselikelihoodratiotestsamongallthedistributions. Werepeatedthisexercise250timesfor eachdistribution. 20Axtell(2001)explainsin"ReferencesandNotes"30howheimplementstheregressioninFigure1. 21Weusedamillionobservationsbecauseusingthenumberofobservationsinthe1997LBDwasnotcomputationally feasibleforthisexercise. 27

Table16: RMSEs Pareto lognormal Mixture Convolution µ 1.17 1.00 0.49 RMSEs 0.00 0.34 0.00 σ 1.74 1.49 1.29 RMSEs 0.00 0.14 0.00 p 0.86 RMSEs 0.08 α 0.61 3.47 1.25 RMSEs 0.06 0.13 0.01 x 0.74 m RMSEs 0.15 N 1,000,000 1,000,000 1,000,000 1,000,000 Num. sim. 250 250 250 250 Note: Foreachdistribution,weshowthe1997estimatedcoefficientsandtheRMSEsfrom thesimulation.Nisthenumberofobservationsineachsimulation. Table17: Likelihood-basedratiotests True: Pareto lognormal Mixture Convolution Alternative: Pareto 100.0% 97.6% 100.0% lognormal 96.8% 97.6% 100.0% Mixture 33.2% 0.0% 100.0% Convolution 70.0% 1.6% 97.2% N 1,000,000 1,000,000 1,000,000 1,000,000 Num. sim. 250 250 250 250 Note:Foreachdistribution,weshowthepercentageoftimesthatthelikelihood-ratiotestcorrectlypicksthetruedistribution.TheLRTtestisusedbetweenMixtureandParetoandbetween Mixtureandlognormal.TheVuongtestisusedforallotherpairs.Nisthenumberofobservationsineachsimulation. Table16showstheRootMeanSquaredErrors(RMSEs)computedusingthedistancesbetween the true parameters and their MLE estimates in the 250 simulations; the errors made by MLE are for the most part tiny. Table 17 presents the percentage of times that the likelihood-based ratio tests with 95% confidence were able to pick the correct distribution. The tests are able to almost always pick the correct distribution when the true distributions are Mixture and Convolution, and when testing between lognormal and Pareto, but they struggle in deciding between true Pareto or lognormal distributions and more flexible alternative Mixture and Convolution distributions. For instance,asshowninTable17,whenthesampleisdrawnfromaParetodistribution,thelikelihood- 28

based ratio tests correctly pick Pareto over lognormal 96.8% of the times, but only 33.2% when a Mixturedistributionisthealternative,and70%whenaConvolutiondistributionisthealternative. B Additional Results In Table 18, we show the 95% confidence intervals for Table 9, obtained by drawing 4.77 million firms (as in the 1997 LBD) 100,000 times from each each distribution with the parameter values we estimate. Here we see that for the Axtell calibration, the fraction of firms accounted for by the largest and smallest bins has a large variance, though the confidence intervals include the true data values. The lognormal distribution is much more consistently simulated across draws, with its thinner right tail. Our Pareto estimate with a shape parameter below 1 and the mixture, also incorporating a Pareto estimate with a shape parameter below 1, consistently draws massive firms which account for nearly all of employment. Finally, the confidence bands for the convolution are fairly economically narrow but often include the true value, including both the small firm bin and thelargestfirmbin. Table18: Fractionof1997firmemployment: 95%confidenceintervals BDS Axtell Pareto lognormal Mixture Convolution 1to4 5.65 (5.22,13.95) (0.00,0.01) (7.01,7.07) (0.01,1.10) (5.43,6.97) 5to9 6.53 (2.46,6.56) (0.00,0.01) (7.49,7.56) (0.02,1.27) (5.91,7.58) 10to19 7.73 (2.44,6.52) (0.00,0.01) (11.01,11.11) (0.02,1.59) (7.66,9.83) 20to49 10.62 (3.14,8.39) (0.00,0.01) (18.98,19.15) (0.03,2.24) (11.22,14.39) 50to99 7.52 (2.28,6.11) (0.00,0.02) (15.48,15.65) (0.02,1.52) (8.11,10.40) 100to249 8.72 (2.89,7.73) (0.00,0.03) (17.87,18.12) (0.02,1.63) (9.28,11.91) 250to499 5.54 (2.09,5.60) (0.00,0.03) (9.63,9.89) (0.01,1.05) (5.83,7.50) 500to999 5.09 (2.00,5.39) (0.00,0.04) (6.04,6.32) (0.01,1.05) (4.91,6.34) 1,000to2,499 7.07 (2.53,6.82) (0.00,0.07) (3.83,4.17) (0.02,1.58) (5.31,6.90) 2,500to4,999 5.40 (1.82,4.99) (0.00,0.07) (1.03,1.32) (0.02,1.44) (3.27,4.36) 5,000to9,999 5.46 (1.74,4.88) (0.00,0.09) (0.32,0.57) (0.02,1.73) (2.73,3.76) ≥10,000 24.68 (23.48,71.39) (99.62,100.00) (0.07,0.35) (83.80,99.79) (10.61,30.27) C Theoretical Appendix Let (cid:96)i denote the employment at firm i at time t. Aggregate employment is then simply L = t N,t ∑ N (cid:96)i,whereN denotesthenumberoffirms.22 i=1 t Considerasetofmultiplicativeshocksε tothesizeofeachfirmsuchthatε hasmean0and i,t i,t 22Fortherestofthetheoreticalexposition,weusefirmtodenotetheindividualeconomicentity. 29

varianceς: i ∆(cid:96) ≡(cid:96) ε . (6) i,t+1 i,t i,t+1 Thentheaggregateemploymentgrowthrateissimply: ∆L N ∆(cid:96) N,t+1 = ∑ i,t+1 (7) L L N,t i=1 N,t andaggregatevolatility,thevarianceofaggregategrowthis: (cid:34) (cid:35) N ∆(cid:96) N (cid:18) (cid:96) (cid:19)2 σ2 ≡var ∑ i,t+1 = ∑ i,t ς2. (8) N,t t L L i i=1 N,t i=1 N,t In the symmetric case where ς =σ ∀i, the Herfindahl index h2 summarizes the aggregation of i N,t idiosyncraticshocks: N (cid:18) (cid:96) (cid:19)2 σ2 =σ2∑ i,t ≡σ2h2 . (9) N,t L N,t i=1 N,t TheHerfindahl,inturn,canberewrittenas: ∑ N (cid:96)2 (cid:0) N−2∑ N (cid:96)2 (cid:1) (cid:0) N−1∑ N (cid:96)2 (cid:1) h2 = i=1 i,t = i=1 i,t =N−1 i=1 i,t . (10) N,t (cid:0) ∑ N i=1 (cid:96) i,t (cid:1)2 (cid:0) N−1∑ N i=1 (cid:96) i,t (cid:1)2 (cid:0) N−1∑ N i=1 (cid:96) i,t (cid:1)2 Therefore,whenE[(cid:96)]andE(cid:2) (cid:96)2(cid:3) arefinite, E(cid:2) (cid:96)2(cid:3) h2 ×N − a −→ .s. . (11) N,t (E[(cid:96)])2 Definition1 ConsiderarandomvariableY,asequenceofrandomvariablesζ ,andasequenceof N d positivenumbersa . FollowingGabaix(2011),aconvergenceindistributionsuchthatζ /a →− Y N N N asN →∞isalsodenotedζ ∼a Y andζ issaidtoscalelikea . N N N N Using equation 10 and the scaling definition above, we can characterize the scaling properties ofgranularshockswhenthemomentsofthesizedistributionarefinite. Proposition3 If the size distribution has finite mean and variance, then the size Herfindahl index issuchthat: 1 h2 ∼ (12) N N andthusthemacrovarianceσ2 isdecreasingin 1. N N 30

The proof is a straight application of the law of large numbers and is the same as the proof of Proposition 1 in Gabaix (2011). Proposition 3 implies that, in terms of the micro origins of macroeconomicvolatility,thereisnomaterialdifferencebetweenalognormalsizedistributionand aParetosizedistributionwithshapeparameterwhenα >2. However, when the Pareto shape parameter α is equal to or lower than 2, the distribution has undefined variance, and it has both undefined mean and variance when α is equal to or lower than 1. Aswehaveshown,thesearerelevantregionsoftheParetoparameterspace,andsowedescribe the behavior of the Herfindahl index in the following proposition which extends Proposition 2 in Gabaix(2011). Proposition4 If firm size is distributed Pareto with shape parameter α, then the size Herfindahl indexischaracterizedby:  1/N whenα >2      (cid:18) N (cid:19)  1/ whenα =2   lnN   h2 N (α)∝ 1/ (cid:16) N1− α 1 (cid:17)2 whenα ∈(1,2) (13)        1/(lnN)2 whenα =1    1 whenα ∈(0,1) andthus,inthecaseofα <1,themacrovarianceσ2 nolongerdecayswithN forN largeenough. N Proof: WewillproveProposition4usingTheorem3.8.2inDurrett(2017,p. 167-168),whichisas follows: Theorem1 Suppose that X , X , ... are i.i.d. with a distribution that satisfies (i) lim P(X > 1 2 x→∞ 1 x)/P(|X |>x)=θ ∈[0,1]and(ii)P(|X |>x)=x−αL(x)withα∈(0,2)andL(x)slowlyvarying. 1 1 Let s N =∑ N i=1 X i , a N =inf{x:P(|X 1 |>x)≤1/N}, and b N =NE[X 1 1 |X 1|≤aN ]. As N →∞, (s N − b )/a convergesindistributiontoanondegeneraterandomvariableY.Whenα <1,b =0. N N N Usingequations(9)and(10),wecanwrite (∑ N (cid:96)2)1/2 σ =σ i=1 i (14) N (cid:0) ∑ N i=1 (cid:96) i (cid:1) When α >2, a Pareto random variable has finite mean and variance, and so we simply apply Proposition 3. When α =2, we must apply 1 to the numerator, ∑ N i=1 (cid:96)2 i . Here, a N =N and b N = N (cid:82)Ny·y−(1+1)dy=Nln(N),andthus: 1 N N−1(∑(cid:96)2−Nln(N))→− d u, i i=1 31

where u is a random variable following a nondegenerate distribution that does not depend on N. Thus: N ∑(cid:96)2∼Nln(N). i i=1 Itfollowsthat: h = (∑ N i=1 (cid:96)2 i )1/2 →− d (Nln(N))1/2u1/2 ∝ (cid:18) ln(N) (cid:19)1 2 N ∑ N i=1 (cid:96) i NE[(cid:96) i ] N When1<α <2,weagainapplyTheorem1todeterminethenumeratorinequation14. Since l isParetodistributedwithscaleparameterequalto1andshapeparameterequaltoα,l2 isPareto i i distributedwiththesamescaleparameterandshapeparameterequalto α. Sinceα <2, α <1and 2 2 b =0. N P((cid:96)2>x)=P((cid:96)>x1/2)=(x1/2)−α =x−α/2, whichimpliesthata =N2/α. Witha andb weapplyTheorem1: N N N N N−2/α ∑(cid:96)2→− d u, i i=1 Itfollowsthat h N = (∑ ∑ N i= N i 1 = (cid:96) 1 2 i (cid:96) ) i 1/2 →− d N N 1/ E α [ u (cid:96) 1 i / ] 2 = N1− u α 1 1 / E 2 [(cid:96) i ] ∝1/N1− α 1 Whenα =1,wehavetoapplyTheorem1toboththenumeratoranddenominator. For∑ N (cid:96)2, i=1 i a N =N2andb N =0. For∑ N i=1 (cid:96) i ,P((cid:96)>x)=x−1≤1/Nimpliesa N =N;b N =N (cid:82) 1 Nx·x−(1+1)dx= Nln(N). Wethenhave: 1 N (∑(cid:96) −Nln(N))→− d g, i N i=1 wheregisrandomvariablefollowinganondegeneratedistributionthatdoesnotdependonN. This impliesthat N ∑(cid:96) ∼Nln(N). i i=1 (∑ N (cid:96)2)1/2 N h = i=1 i ∼ ∝1/ln(N) N ∑ N i=1 (cid:96) i Nln(N) Finally,when0<α <1,a N =Nα 1 andb N =0for∑ N i=1 (cid:96) i ,anda N =Nα 2 andb N =0for∑ N i=1 (cid:96)2 i . Thisimplies N ∑(cid:96) ∼N1/α, i i=1 32

and N ∑(cid:96)2∼N1/α, i i=1 andthus, (∑ N (cid:96)2)1/2 N1/α h = i=1 i ∼ ∝1. N ∑ N i=1 (cid:96) i N1/α Calibration of Granularity and Aggregate Volatility Simulations Gabaix (2011) provides a calibrationofaggregatefluctuationsbasedonthesimplemodelusedinthissection. Heshowsthat, using equation (9), with a firm volatility of σ = 12% and a Zipf distribution of firm sizes, GDP volatility,σh,is1.4%. We replicated this simulation using our 1997 estimates; we simulated 100,000 samples taking 106 draws from several distributions, including Zipf, Axtell’s Pareto, and our parametric distributions. WecomputedtheGDPvolatilityforeachsimulationandweshowthemeansinTable19. Gabaix compares his calibration with a U.S. aggregate volatility of 1.7% and makes the point that, with firms distributed according to Zipf’s law or Axtell’s Pareto, idiosyncratic shocks can explainasignificantportionofaggregatevolatility. Ourestimatespointatadifferentpicture;aswe provedearlier,withaParetoshapeparameterbelowoneidiosyncraticshocksdonotcanceloutinthe aggregate producing too much aggregate volatility. If firms are lognormally distributed, the law of largenumbersessentiallykicksinandidiosyncraticshockscanceloutintheaggregate. Finally,the convolution, our preferred distribution, allows for idiosyncratic shocks to matter in the aggregate, butwithamuchmorediminishedrolecomparedtoZipf’slaw. ThevaluesinTable19shouldbetakenasupperbounds,asthecalibrationassumesthatallfirms have the same volatility, whereas larger firms have lower volatilities thus potentially significantly diminishingtheaggregateimpactofidiosyncraticshocks. Table19: CalibrationofAggregateVolatilityunderGranularity Zipf’slaw Axtell’sPareto Pareto Lognormal Mixture Convolution 1.43% 0.99% 6.63% 0.05% 4.59% 0.38% Note: Wetook100,000times106 drawsfromeachdistribution,computedtheaggregatevolatilityusingequation(9), andtookthemean.Zipf’slawisaParetowithshapeparameterequalto1.Axtell’sParetoisaParetowithshapeparameterequalto1.057. Pareto,lognormal,mixtureandconvolutionareparametrizedusingour1997estimatesforthefirm sizedistributionfromTables3,6,and7. 33

Cite this document
APA
Illenin O. Kondo, Logan T. Lewis, & and Andrea Stella (2018). On the U.S. Firm and Establishment Size Distributions (FEDS 2018-075). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2018-075
BibTeX
@techreport{wtfs_feds_2018_075,
  author = {Illenin O. Kondo and Logan T. Lewis and and Andrea Stella},
  title = {On the U.S. Firm and Establishment Size Distributions},
  type = {Finance and Economics Discussion Series},
  number = {2018-075},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2018},
  url = {https://whenthefedspeaks.com/doc/feds_2018-075},
  abstract = {This paper revisits the empirical evidence on the nature of firm and establishment size distributions in the United States using the Longitudinal Business Database (LBD), a confidential Census Bureau panel of all non-farm private firms and establishments with at least one employee. We establish five stylized facts that are relevant for the extent of granularity and the nature of growth in the U.S. economy: (1) with an estimated shape parameter significantly below 1, the best-fitting Pareto distribution substantially differs from Zipf's law for both firms and establishments; (2) a lognormal distribution fits both establishment and firm size distributions better than the commonly-used Pareto distribution, even far in the upper tail; (3) a convolution of lognormal and Pareto distributions fits both size distributions better than lognormal alone while also providing a better fit for the employment share distribution; (4) the estimated parameters are different across manufa cturing and services sectors, but the distribution fit ranking remains unchanged in the sectoral subsamples. Finally, using the Census of Manufactures (CM), we find that (5) the distribution of establishment-level total factor productivity---a common theoretical primitive for size---is also better described by lognormal than Pareto. We show that correctly characterizing the firm size distribution has first order implications for the effect of firm-level idiosyncratic shocks on aggregate activity. Accessible materials (.zip)},
}