ifdp · August 24, 2025

Why are Manufacturing Plants Smaller in Developing Countries? Theory and Evidence from India

Abstract

Poorer countries (and poorer states within India) have a larger share of manufacturing employment in small plants. This paper presents empirical evidence and a theoretical model to show that this relationship is driven by greater demand for lower quality goods in poorer regions, which can be produced efficiently in small plants. First, using data for India, we show that richer households buy higher price goods and larger plants produce higher price products. Second, we develop a model that matches these facts. Finally, we find that our model explains about forty percent of the cross-state variation in the size distribution of manufacturing plants in India.

Board of Governors of the Federal Reserve System International Finance Discussion Papers ISSN 1073-2500 (Print) ISSN 2767-4509 (Online) Number 1417 August 2025 Why are Manufacturing Plants Smaller in Developing Countries? Theory and Evidence from India Anil K. Jain; Siddharth Kothari Please cite this paper as: Jain, Anil K., and Siddharth Kothari (2025). “Why are Manufacturing Plants Smaller in Developing Countries? Theory and Evidence from India,” International Finance Discussion Papers 1417. Washington: Board of Governors of the Federal Reserve System, https://doi.org/10.17016/IFDP.2025.1417. NOTE: International Finance Discussion Papers (IFDPs) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the International Finance Discussion Papers Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers. Recent IFDPs are available on the Web at www.federalreserve.gov/pubs/ifdp/. This paper can be downloaded without charge from the Social Science Research Network electronic library at www.ssrn.com.

Why are Manufacturing Plants Smaller in Developing Countries? Theory and Evidence from India* Anil K. Jain † Siddharth Kothari Board of Governors of International Monetary the Federal Reserve System Fund August 25, 2025 Abstract Poorer countries (and poorer states within India) have a larger share of manufacturing employment in small plants. This paper presents empirical evidence and a theoretical model to show that this relationship is driven by greater demand for lower quality goods in poorer regions, which can be produced efficiently in small plants. First, using data for India, we show thatricherhouseholdsbuyhigherpricegoodsandlargerplantsproducehigherpriceproducts. Second,wedevelopamodelthatmatchesthesefacts. Finally,wefindthatourmodelexplains aboutfortypercentofthecross-statevariationinthesizedistributionofmanufacturingplants inIndia. JELcodes: O11,O16,O17; Keywords: Firmsizedistribution;Productquality;India *Apreviousversionofthispaperwascirculatedas“TheSizeDistributionofManufacturingPlantsandDevelopment.” Thispaperbenefitedsignificantlyfromin-depthdiscussionswithManuelAmador,NicholasBloom,PascalineDupas, RobertHall,ChadJones,PabloKurlat,KalinaManova,MonikaPiazzesi,MartinSchneider,andChristopherTonetti. SiddharthgratefullyacknowledgessupportfromtheLeonardW.ElyandShirleyR.ElyGraduateStudentFundFellowship(SIEPR),B.F.HaleyandE.S.ShawFellowshipforEconomics(SIEPR),andtheSEEDFellowshipfromthe StanfordInstituteforInnovationinDevelopingEconomies. Thefindingsandconclusionsinthispaperaresolelythe responsibilityoftheauthorsandshouldnotbeinterpretedasreflectingtheviewsoftheBoardofGovernorsoftheFederalReserveSystem,theviewsofanyotherpersonassociatedwiththeFederalReserveSystem,ortheInternational MonetaryFund. †Correspondingauthor;email: anil.k.jain@frb.gov. 1

1. Introduction The typical size distribution of manufacturing establishments in developing countries has a thick left tail compared to developed countries. For instance, as shown in figure (1) in India, six out of ten manufacturing workers are employed in establishments with fewer than five people—a share thirtytimeslargerthanintheUnitedStates(2005-2006).1 This size-income relationship also holds across Indian states—poorer states have a larger fraction of employment in the smallest of firms. Figure 2 plots the share of employment in establishments of size five or less in 2005-06 for different Indian states against the per-capita net domestic product(NDP)ofthestaterelativetothepooreststate,Bihar.2 TherichestIndianstateshaveabout four times the per-capita NDP of the poorest states. While the poorest states have almost 90 percent of their manufacturing workforce employed in establishments of size five or less, the richer stateshaveonlyabout40percentoftheirworkforceworkinginsmallestablishments.3 What explains this negative correlation between income levels and the share of employment in small establishments? A leading view, starting from De Soto [1989], blames size-dependent regulation—suchaslicensingrulesorsmall-scalereservations—fortheprevalenceofsmallplants. Suchpoliciescandistortresourceallocation,depressincomes,and,ultimately,keepestablishments small. Thispaperexploresanalternative(thoughpotentiallycomplementary)explanationforthissizeincomerelationshipthatisdrivenbyconsumerpreferencesandtechnologyratherthandistortions. 1The US data is retrieved from the US Business County Patterns Database maintained by the US Census Bureau. TheIndiandatacombinestwosurveys, theAnnualSurveyofIndustries(ASI)andtheSurveyofUnorganized Manufacturing(SUM).AppendixSectionsA.1,A.2,andA.5givemoredetailsregardingthesedatasets. 2For ease of exposition Figure 2 plots only the 15 largest states that cover over 96 percent of the manufacturing workforce. Thenegativerelationshipbetweenshareofemploymentinsmallplantsandper-capitastateNDPisrobust to including all the states. In a regression of share of employment in plants of size five or less on log of per-capita stateNDP,thecoefficient(standarderror)onlogstateNDPis-0.320(0.0553)whenrestrictingto15statesand-0.319 (0.0568) when including all the states. A possible concern with the relationship seen in Figure 2 is that it might be driven by differences in industry composition across states. However, a large part of the differences in share of employment in small plants across states is actually driven by within industry differences in size. Controlling for industrycomposition(weightingthesizedistributionineverystatebytheallIndiaindustrycompositioninsteadofthe statespecificindustrycomposition)atthe2-digitlevelcausestheslopecoefficientonlogofstateper-capitaNDPto fallfrom-0.320(0.0553)to-0.274(0.0417). 3The differences in share of employment in small plants also reflects in differences in average plant size across states. Theaverageplantsizeinthericheststatesisaboutdoubletheaverageinthepooreststates. 2

Figure1: ShareofEmploymentbySizeCategory: Indiavs. US 6. 4. 2. 0 <5 5 to 9 10 to 19 20 to 49 50 to 99 100 to 249 250 to 499 500 to 999 >=1000 India US Notes: ThegraphplotstheshareoftotalemploymentinestablishmentsofdifferentsizecategoriesforIndiaandthe US.ThedataforIndiacombinestwosources,theAnnualSurveyofIndustries(ASI)andtheSurveyofUnorganized Manufacturing (SUM) for 2005-06. The data for the US is taken from the County Business Patterns Database for 2006. Our hypothesis is that poor households have high demand for low quality products, which can be produced efficiently in small establishments as they require small fixed investments (no research anddevelopmentexpenditure,ornoneedforlargeinvestmentsinfixedcapital). Ontheotherhand, richer households tend to demand higher quality goods, whose production requires a larger scale due to the need for larger fixed investments. In effect, we argue that higher quality goods require higher sunk and fixed costs, and consequently larger firms, consistent with the seminal industrial organizationpapersbyJohnSutton(suchasShakedandSutton[1987]andSutton[1991]). This relationship between income levels and demand for quality implies that poor countries or states have demand skewed towards goods which require a small scale of production, which in turn causes the size distribution to be dominated by small plants. As a region develops and income levels increase, demand shifts towards high quality products, which in turn leads to a shift on the production side towards higher quality goods. This shift in production causes the share of employment in small plants to decrease, and thus can generate the negative relationship between theshareofemploymentinsmallplantsandincomelevelsseeninthedata. 3

Figure2: SizeDistributionofManufacturingEstablishments: AcrossIndianStates BIH ORS M.P. W.B. U.P. H.P. RAJ A.P. KER KRT T.N. PUN HMRAYH GUJ 5=< ni tnemyolpmE fo erahS 9. 8. 7. 6. 5. 4. 1 2 3 4 State per−capita NDP Relative to Poorest State Notes: The graph plots the share of employment in plants of size five or less in a state against per-capita NDP of thestaterelativetothepooreststate. Thedataforthestatescombinestwosources, theAnnualSurveyofIndustries (ASI)andtheSurveyofUnorganizedManufacturing(SUM).Onlythe15largeststatesareincludedtokeepthegraph readable. We provide empirical evidence in support of this hypothesis using Indian data from consumer andproducersurveys. First,consistentwiththehypothesisthatricherhouseholdsbuyhigherqualityproductsweshowthatinthecross-sectionricherhouseholdspayahigherunitpriceforsimilar goods (using data from consumer expenditure surveys). Second, we show substantial evidence that largerplants producehigher quality goodsthan smallerfirms. We startby showing thatlarger plants charge a higher unit price for a similar good than smaller plants—a relationship that holds across formal and informal plants. Moreover, consistent with higher quality, we show that larger plants use more expensive material inputs, use more capital per unit of output, invest more in capitalperunitofoutput,andhiremoreskilledworkersthansmallerfirmsmakingsimilargoods. We develop a general equilibrium model that matches these cross-sectional facts. Households choose from a finite number of quality levels. The choice over quality levels is modeled as a discrete-choiceproblemwithhouseholdschoosingtoconsumeonequalityleveloutofthoseavailable in the economy. Their preferences exhibit non-homotheticity with respect to quality: richer households are more likely to choose higher quality levels. The non-homotheticity arises because 4

theutilityfunctionfeaturescomplementaritybetweenqualityandquantityconsumed(themarginal increase in utility from a given increase in quantity consumed is larger for higher quality goods) andricherhouseholdscanconsumemorequantityofwhicheverqualityleveltheychoose. On the producer side, production of high quality goods uses skilled labor more intensively. Also, starting a higher quality plant requires higher fixed costs, which combined with a free entry conditionimpliesthatproducersofhighqualitygoodswillbelargeronaverage(inordertorecover theirlargerfixedcosts). The model parameters are chosen to match the micro-facts documented on the consumer and producer side. The quality-size relationship on the producer side is matched to the relationship betweenpricesandplantsizefromtheproducersurveys,whilethedegreeofnon-homotheticityis chosentomatchtheprice-incomerelationshipseenintheconsumersurveys. The empirical findings and our model raise the question: How much of the cross-state variation in the size distribution seen in Figure 2 can be explained by the model? We answer this question through a counterfactual exercise in which we simulate changes in per-capita income levels in the model (by varying productivity and the skill level of the population) and see what is the effect on thesizedistribution. Inourmodel,asincomelevelsrise,demandshiftstohighqualitygoodsduetothenon-homotheticity ofpreferences. Thisshiftindemandtowardshigherqualityleadstoashiftontheproductionside, with a fall in the number of low quality producers and an increase in the number of high quality producers. As high quality producers are larger on average compared to low quality producers, there is also a shift in the size distribution towards larger plants. We find that the share of employment in plants of size five or less goes down by nearly 20 percentage points (which is about 43 percent of the difference seen across Indian states) when income in the model varies by the same extent as it does across Indian states. Further, we document that the share of employment in plantsofsizefiveorlesshasgonedownbyabout20percentagepointsinIndiabetween1989and 2009, and show that the model can explain about 65 percent of this change. That said, the model abstractsfromseveralfactorsthatmayimpactthequantitativeresult. Forexample,weassumeperfectly competitive final goods markets with zero economic profits, and monopolistically competitiveintermediateproducerswithconstantmarkups,thusnotallowingforpossiblemarkupvariation 5

across quality levels. If consumer valuation of quality allows higher quality firms to charge larger markups,thiscouldweakenthesize-qualityrelationforagivenfixedcost. Wealsodonotallowfor standard supply side frictions (for example, state-level entry barriers, size-dependent labor laws, andcredit-marketimperfections),whichcouldpotentiallyinteractwithfirms’decisionsonquality choice. As such, our quantitative exercises should be interpreted as upper-bound estimates of the impactofquality-drivendemand. As robustness, Section D shows evidence that inter-state trade is not driving our model results. Ourmodelandthecounterfactualexercisesassumethateachstatecanbetreatedasaclosed economy in which local demand is met by local production. A potential confounding effect of inter-state trade could come through the location choice of large plants. For example, if the richer states are more suited for operating large plants (for example, due to availability of skilled labor or less stringent labor laws), then larger plants might choose to locate in these states (and ship their goods to the poor states) and this might be driving the negative relationship between income and firm size. If inter-state trade was an important force, then we would expect the more tradable industries within manufacturing to have a stronger negative relationship between size and income levels across states. To test this, we construct two measures of tradability at the 3-digit level of industrial classification. We find that the size-income relationship across states is not stronger for tradablesascomparedtonon-tradables(foroneofthemeasures,thenon-tradablesactuallyhavea strongernegativerelationshipascomparedtotradables)indicatingthatinter-statetradeisunlikely tobeanimportantforcedrivinglargershareofemploymentinsmallplantsinpoorerstates. Ourpapercontributestoseveralliteratures. A large literature has studied the question of why the size distribution differs markedly across countries. The role of distortionary policies and the regulatory environment in determining the size distribution of plants (and the extent of informality) has extensively been studied in Little et al. [1987], De Soto [1989], Loayza [1996], Djankov et al. [2002], Loayza et al. [2005], Loayza etal.[2009],Garicanoetal.[2016],andUlyssea[2018]. Whilesize-dependentpoliciesarepotentiallyanimportantdeterminantofthesizedistribution,thesepoliciesareunlikelytoexplainallthe differences in size distribution seen between developing and developed countries. Tybout [2000] notes that developing countries tend to have a large share of their population in small plants, irre- 6

spective of whether they have policies which discriminate against large plants or not. This finding suggeststhatthesepoliciescannotbetheonlyfactordrivingplantsize. Consistent with regulatory policies not fully explaining the puzzle of why there are so many smallfirmsindevelopingcountries,Gollin[1995]andHsiehandKlenow[2014]conductquantitativeexercisesinwhichtheyfindthatsize-dependentpoliciesleavealargepartofthedifferencesin size across countries unexplained. Moreover, Hsieh and Olken [2014] document that the “missing middle” in the size distribution in developing countries actually does not exist and that regulatory obstacleswhichbecomebindingatparticularthresholdlevelsdonotseemtoleadtodiscontinuities inthesizedistributionindevelopingcountries. 4 Complementary to this argument, recent quantitative and theoretical work has emphasized that deeper structural factors—such as technological design, entrepreneurial skill distribution, and human capital may also account for cross-country differences in firm size. Bento and Restuccia [2017, forthcoming] show that models featuring technology that favors small-scale production, or frictions that prevent firm growth, can rationalize observed variation in both the firm size distribution and aggregate productivity. Poschke [2018] builds on this insight by highlighting that countrieswithlowerskillavailabilityandmorerigidentrepreneurialtechnologiesnaturallyexhibit athinnerrighttailofthefirmsizedistribution,evenintheabsenceofdistortivepolicies. Our paper suggests that a large part of the differences in size distribution that we see across countries and states is a natural consequence of the low levels of income in developing countries andisnotnecessarilycausedbypolicieswhichdiscriminateagainstlargeproductiveplantsinfavor of small unproductive plants. The hypothesis considered in the paper is closer to the dual-sector viewoftheinformalsectorinLaPortaandShleifer[2008]accordingtowhichtheinformalsector doesnotcompetedirectlywiththeformalsector. Forinstance,BloomandVanReenen[2007]and Scur et al. [2024] document that smaller firms in developing countries tend to have significantly weaker management practices, and that this management gap is a key driver of lower productivity andscale. Agrowingbodyofresearchdescribesobstaclestofirmgrowthinlow-incomecountriessuchas 4Thereisalsoaquantitativeliteraturewhichlooksattheroleofdistortionarypoliciesinexplainingcross-country differencesinTotalFactorProductivity. SeeGuneretal.[2008],Alfaroetal.[2009],García-SantanaandPijoan-Mas [2010],BarseghyanandDiCecio[2011],HsiehandKlenow[2014],andRestucciaandRogerson[2013]. 7

managerial,technological,credit,andinformationalfrictions. Akcigitetal.[2021]showthatweak selection pressures and limited managerial delegation in developing economies hinder the growth and upgrading of high-potential firms. Bassi et al. [2023] highlight that many small firms operate informally and depend on self-employed workers with minimal access to capital or training, restricting their capacity to improve product quality. Human capital limitations further constrain innovation: Cox (2025) documents that deficits in tertiary education reduce firms’ ability to adopt new technologies and upgrade product standards. Similarly, Cirera et al. [2022] attribute lagging adoption of quality-enhancing technologies to poor managerial practices, limited digital infrastructure, and information constraints. Evidence from a field experiment by Atkin et al. [2017] shows that targeted support can enable quality upgrading among small Egyptian firms seeking to enterexportmarkets. Takentogether,thesestudiessuggestthatsmallfirmsinlow-incomecontexts face a web of supply-side constraints and limited market incentives that give rise to a comparative advantageintheproductionoflower-qualitygoods. Our paper’s results are similar to the evidence in Lagakos [2016] that provides cross-country evidence for retail trade showing that poor countries exhibit lower TFP because they rationally choice technology with low measured labor productivity due to high costs of transportation and poor household wealth. We focus on the heterogeneity of quality levels being produced by plants ofdifferentsizesandhowthedemandforlowqualityfallswithdevelopment.5 Some of the empirical results documented here have been studied in different contexts. Deaton andDupriez[2011]andDikhanov[2010]documentthatricherIndianhouseholdsbuyhigherprice goods. However, these papers focus on spatial differences in prices within India and not the price income relationship itself and its implication for the size distribution. Bils and Klenow [2001] showthatricherhouseholdsintheUSalsobuyhigherpriceddurableproducts. Our paper is related to Kugler and Verhoogen [2012] finding that larger plants produce higher price goods and use higher price inputs in Colombia. Similar, to our paper, they interpret these pricedifferencesasrepresentingqualitydifferencesanddevelopamodelinwhichmoreproductive firms choose to produce higher quality goods at a higher unit cost. Our paper extends this result 5The idea of quality dualism between the formal and the informal sector has been looked at by Banerji and Jain [2007],whodevelopapartialequilibriummodelinwhichformalsectorestablishmentshaveacomparativeadvantage in producing higher quality goods due to differences in factor prices across the two sectors. However, their partial equilibriummodeldoesnothaveimplicationsforthesizedistributionoffirmsanditsrelationshiptoincomelevels. 8

for India and, by combining data from the formal and informal sector, show that the price size relationship also holds when we include very small plants in the sample (the Colombian data only hasplantsofsizetenormore).6 A number of papers, especially related to international trade, have developed models of nonhomothetic preferences with respect to quality. These include Flam and Helpman [1987], Mitra and Trindade [2005], Dalgin et al. [2008], and Choi et al. [2009]. The model we develop is most closely related to the model in Fajgelbaum et al. [2011]. Their model features non-homothetic preferences with respect to quality where the non-homotheticity arises due to complementarity between the homogenous good and quality. The non-homotheticity with respect to quality in our modelarisesduetocomplementaritybetweenthequantityofthegoodconsumedandquality. The rest of the paper is structured as follows: Section 2 documents that richer households buy higher price goods and that larger plants produce higher price goods and use higher price inputs. Section3presentsthemodelandSection4discussesthecalibration. Section5presentstheresults for the counterfactual exercises and explores the sensitivity of the results to some key parameters. Section D considers the role of inter-state trade in explaining the cross-state relationship seen in Figure2andSection6concludes. 2. Empirical Results In this section, we provide empirical evidence which is consistent with our hypothesis of richer households consuming higher quality products which are produced by larger plants. In particular weshowthefollowingfacts: 1. Richerhouseholdsbuyhigherpricegoods 2. Onaverage,largerplantsproducehigherpricegoods 3. Largerplantsusehigherpricematerialinputsandhiremoreskilledlabor 6Thereisalargeinternationaltradeliteraturewhichdocumentsheterogeneityinpriceseitherattheproductorthe firm level for exports and imports and interprets these price differences as quality differences. Some papers in this literature include Schott [2004], Hummels and Klenow [2005], Hallak [2006], Mandel [2010], Manova and Zhang [2012],IacovoneandJavorcik[2012],andHallakandSivadasan[2013]. 9

The facts are documented using four Indian surveys. We give a brief description of each survey alongwiththemainresultsinthesectionsthatfollow. 2.1. Households: Richer Households Buy Higher Price Goods This sections shows that richer households buy higher price goods, which is consistent with them consuminghigherqualityproducts. WeusedatafromtheConsumerExpenditureSurveyof2004- 05 conducted by the National Sample Survey Office (NSS) of India. About 125,000 households fromallIndianstatesandunion-territorieswereinterviewedforthesurvey. Thesurveyaskshouseholdstoreportthevalueofconsumptionfor339differentgoods. Householdsreportquantitiesand rupeevaluesseparatelyfor209goods,whichcanbeusedtocomputepricesforthesegoods. More detailsaboutthesurveycanbefoundinAppendixA.3. Werunregressionsoftheform (cid:0) (cid:1) ln P =α +βln(c )+ε , h,g g,state,rural h h,g whereP isthepricepaidbyhouseholdhforgoodg,c isper-capitaexpenditureofthehousehold h,g h excludingdurables,andα representsfixedeffectsforeachproduct,state,andurban-rural g,state,rural cell. c isaproxyfortheincomelevelofthehousehold,adjustingforhouseholdsize.7 α h g,state,rural controls for the fact that different goods have different average price levels and that these price levels can vary across rural and urban areas and across states. For example, real estate prices mightdifferacrossruralandurbanareasoracrossstateswithdifferentlevelsofper-capitaincome and this can drive differences in cost of living and all prices. The fixed effects ensure that the price-income relationship is not identified out of differences in average price levels across states of different income levels or across rural-urban area. Intuitively, the coefficient β is the elasticity of price with respect to per-capita consumption level and is identified out of variation in prices paid for the same good by households of different income levels within each state’s urban or rural sector. 7Purchaseofdurablesisexcludedasthesearelumpy,infrequentpurchases. Twohouseholdswiththesamelevel ofpermanentincomemighthaveverydifferentlevelsofdurableexpenditureinanyparticularyearsimplybecauseof differencesintimingofdurablepurchases. 10

Table1: HouseholdRegressions: RicherHouseholdsBuyHigherPriceGoods (1) (2) (3) (4) (5) log(price) log(price) log(price) log(price) log(price) log(per-capitaexpenditure) 0.17∗∗∗ 0.12∗∗∗ 0.11∗∗∗ (0.00071) (0.00059) (0.00056) log(per-capitaexpenditure):winsored 0.11∗∗∗ (0.00056) log(per-capitaexpenditure):excludeown-product 0.11∗∗∗ (0.00056) AdjustedR2 0.967 0.969 0.976 0.976 0.976 PriceRatio(75thto25thpercentile) 1.14 1.1 1.09 1.09 1.09 PriceRatio(95thto5thpercentile) 1.39 1.25 1.23 1.24 1.23 Winsor Yes Yes Yes Yes Yes Observations 5348463 5348463 5348463 5348463 5348463 BlockFE N/A N/A N/A N/A N/A ProductFE Yes Yes N/A N/A N/A StatexRuralFE No Yes N/A N/A N/A StatexRuralxProductFE No No Yes Yes Yes NumberofProducts 188 188 188 188 188 SEclusters: Household Household Household Household Household NumberofClusters 124635 124635 124635 124635 124635 Standarderrorsinparentheses ∗ p<0.10,∗∗ p<0.05,∗∗∗ p<0.01 The data is from the Consumer Expenditure Survey of 2004-05. This table examines whether richer households purchasemoreexpensivegoodscontrollingfordifferentdefinitionsofproductsandinclusionoffixedeffects. Column 1 only includes product fixed effects. Column 2 includes product fixed effects and state interacted with rural fixed effects. Column 2 winsorizes 1 percent tails of per-capita expenditure and goods prices. Column 3 excludes the expenditure on the good itself from the independent variable. The price ratio implied by the coefficient estimates for different percentiles of per-capita expenditure are reported in the rows called "Price Ratio". Standard errors are clusteredatthehouseholdlevel. The results of Table 1 suggest that richer households pay more for the same product than other households. Columns 1-5 of Table 1 reports the estimate of β, the elasticity of price with respect to per-capita consumption, with slightly different specifications based on 188 goods.8 The point estimate for β varies between 11 and 17 percent which implies that the average price paid by the 95th percentile household in terms of per-capita expenditure is 23 to 39 percent more than the 8Althoughpricescanbecomputedfor209goods,only188wereincludedintheregression. Thegoodsexcluded were heavy durables and all goods with the word “other” mentioned in the description. The results do not change substantiallyifthesegoodsareincluded. 11

price paid by the 5th percentile household (in contrast, the 95th percentile household’s per-capita expenditure is about seven times that of the 5th percentile household). Columns 1 through 3 use tighterfixedeffectsandColumn4showsthatwinsorizing1percenttailsforper-capitaexpenditure and prices (for a good within a state and urban-rural cell) does not materially change the results. Column 5 uses the households’ expenditure after removing the consumer’s expenditure on that product. Apossibleconcernwiththeresultsintable(1)isthattheindependentvariableisitselfafunction of the dependent variable as per-capita expenditure sums the expenditure of the household across all goods, i.e., c = ∑g P h,g Q h,g where Q is the quantity consumed by household h of good g. h householdsize h,g This can give rise to a mechanical correlation and also cause a bias if the variables are measured witherror. Therefore,toovercomethispossiblebias,weregressthepricepaidonasetofeducation dummies and other controls. In figure (4), from this regression, we plot the estimated coefficients (the circle) and the 95 percent confidence intervals (the line) for each of the education dummies with “illiterate” being the omitted education category. The key inference is that households with more education—which is likely strongly correlated with income—spend more on the same good thanotherhouseholds. Moreover,thisresultstronglyincreasesforhigherlevelsofeducation. To provide further evidence that richer households pay more for the same product relative to other households, we compare households that differ in the number of dependents in the household. Specifically, we examine whether households with similar levels of expenditure but more dependents in the household spend less than other households. Our logic is that households with similarincomebutwithmoredependentswillhavelessdisposableincome,therefore,ineffectwill bepoorer. Therefore,wetestwhetherhouseholdswithmoredependentswillpayalowerpricerelative to other households. Our results in table (2) show that households with more dependents spend less per unit of product than other households after controlling for household income and othercontrols. One final robustness concern is that richer households pay higher prices for the same good because they have higher opportunity cost of time and consequently search less for lower prices. To provide evidence that rules out this concern, we exploit data on whether there are adults that not working in the household—the underlying assumption is that these households should have a 12

Figure3: Therelationshipbetweeneducationandthepricepaidforeachproduct Thisfigureplotstheestimatedcoefficientsfromtheregressionoflog(price)onasetofeducationdummiesandadditionalcontrols.Thesecontrolsarethesameasthoseusedintable1column4andisthetripleinteractionofstate,rural, andproductfixedeffects. Theomittededucationcategoryis“illiterate.” The95%confidenceintervalsaredenotedby thelines. Standarderrorsareclusteredatthehouseholdlevel. lowopportunitycostoftime. Wefindthatevenaftercontrollingforhouseholdswithnon-working adults,westillobserveastrongrelationship(andasimilarmagnitude)betweenhouseholdincome andpricespaid. MoredetailsontheseregressionsandresultsareinAppendix(C). The results in this section has shown strong evidence that, on average, richer households pay moreforthesameproductthanotherhouseholds,whichisconsistentwiththehypothesisthatthey are consuming higher quality products. For robustness, we also investigate variation in the price elasticity relative to household income by product. Figure (4) plots the frequency histogram of theproduct-specificpriceelasticitytohouseholdexpenditure,whileincludingstateinteractedwith 13

Table 2: Households with more dependents pay a lower price than other households with similar incomeforthesameproduct (1) (2) log(price) log(price) log(householdexpenditure) 0.091∗∗∗ 0.079∗∗∗ (0.00053) (0.00052) Numberofdependents -0.016∗∗∗ (0.00018) Shareofdependentsinhousehold -0.067∗∗∗ (0.0012) AdjustedR2 0.976 0.976 Winsor Yes Yes Observations 5348463 5348463 StatexRuralxProductFE Yes Yes NumberofProducts 188 188 SEclusters: Household Household NumberofClusters 124635 124635 Standarderrorsinparentheses ∗ p<0.10,∗∗ p<0.05,∗∗∗ p<0.01 Thistableexamineswhetherhouseholdswithmoredependentsspendlessthanotherhouseholdswithsimilarincome forthesameproduct. Wedefineadependentasapersoninthehouseholdwhoisyoungerthan16andolderthan70. Incolumn1(2),weincludethenumber(share)ofdependentsinthehouseholdasaregressor. ruralfixedeffects(α ). Specifically,wedofollowingregressions: state,rural log(price) =α +β log(percapitalexpenditure) +ε (1) h,g rural,state p p h,g for each product and collect each β (so 188 regressions since there is 188 individual consumer p products in our dataset). We then plot the frequency histogram for these β . These coefficients p representtheestimatedincreaseinpricepaidforeachproductasthehouseholdexpenditure(which proxies household income) rises. The key interpretation of figure (4) is that for vast majority of products (more than 80 percent), the estimated elasticity is between 0 and 20 percent. This suggests that the result that households with higher income pay more for the same product is a 14

general result and not driven by outlier products. Out of the 188 products in our regression, we only find 2 products where higher income households pay less on average for that product than otherhouseholds. Figure4: FrequencyHistogramofProduct-SpecificPriceElasticitytoExpenditure This figure plots the frequency distribution of the coefficient on “log(per-capital expenditure)” when regressing ‘log(price)”on“‘log(per-capitalexpenditure)”foreachindividualproductwhileincludingruralinteractedwithstate fixedeffects.Specifically,wedothefollowingregressionlog(price) =β log(per-capitaexpenditure) +α +ε p,h p p,h r,s p,h foreachproductandcollecteachβ (so188regressionssincethereis188individualconsumerproductsinourdataset). p Wethenplotthefrequencyhistogramfortheseβ . p 15

2.2. Firms Starting from Shaked and Sutton [1987], there’s a long history of industrial organization papers arguingthatmarketswithverticalproductdifferentiationcanexhibitfirmsofdifferentscalesdueto thepresenceoffixedcosts. Specifically,firmsthatproducehigherqualitygoodswillbelargerand investmoreinfixedcosts. Thissectionstartsbyshowingstrongevidencethatlargerfirmsproduce higherqualitygoods—throughshowingthatlargerfirmschargeahigherpriceforthesameproduct and other more direct measures of quality (such as international product certifications). Second, consistentwiththehigherqualitygoodsrequiringmoreexpensiveinputsandinvestment,weshow three pieces of compelling evidence: larger firms have both a higher capital stock (per unit of output) and a higher capital investment flow (per unit of output); firms with a higher capital to laborratiochargehigherprices;largerfirmshiremoreeducatedworkers. To build intuition for why larger firms produce higher quality products, it is helpful to consider a single product. Specifically consider firms that produce the product “finished cotton cloth.” As described in substantial detail in Appendix (B), this industry exhibits large dispersion in firm size and final prices, with a strong positive correlation between firm size and prices. Consistent with our theory, we find the largest firm with public data in this industry uses expensive and high-end imported machinery, skilled labor, multiple product certifications, and sells at significantly higher pricesthanitscompetitors. Toshowtheseresults,wepredominantlyrelyoncombiningdatafromtheAnnualSurveyofIndustries (ASI) of 2005-06 and the Survey of Unorganized Manufacturing (SUM) of 2005-06. The ASI covers all manufacturing plants registered under the Factories Act, 1948. This includes manufacturing plants employing twenty or more workers and not using electricity or employing ten or more workers and using electricity. The SUM on the other hand covers the smaller manufacturing plantsnotcoveredbytheASI.Thetwosurveystogethershouldprovidearepresentativesampleof themanufacturingsectorasawhole. 9 Both the surveys ask manufacturing establishments detailed questions about the products they 9Anumberofrecentpapershavecombinedthesetwosurveystoconstructadatasetwhichisrepresentativeofthe manufacturingsectorasawhole. TheseincludeHasanandJandoc[2010],Nataraj[2011],HsiehandKlenow[2014], andGhanietal.[2012]. 16

produce and inputs they use. Each establishment reports the quantity of the product it produces (fora5-digitproductclassification,whichhasabout5,500possibleproducts)anditsvalue(before taxes and distribution expenses) which can be used to compute prices. For the ASI, each products quantityissupposedtobereportedforastandardizedunit(kilograms,numbers,etc). IntheSUM, different plants can report the same products price in different units. We concord units across the twosurveysothatthepriceofthesameproductisnotgettingcomparedfordifferentunits.10 10In the ASI all plants reporting a certain product are supposed to report quantities in the same units. However, thereareclearcasesinwhichplantsaremisreportingquantityunits. Forexample, allplantwhichproducemilkare supposedtoreportquantitiesintermsofkiloliterswhichmeansthatthepricecomputedbydividingtherupeevalueby thequantityshouldyieldpricesperkiloliter. However,thereisagroupofplantswhosepricesareapproximately1000 times lower than others. This is clearly a case of some plants reporting quantities in liters instead of kiloliters. We havemanuallygonethroughallproductcategoriesandidentifiedproductswiththisproblemandsplittheseintotwo separatecategoriesbasedonasensiblepricecutoff. Inadditiontothismanualcheck, wehavealsoimplementedan algorithmtoidentifytheseproblemproductsandusedthealgorithmgeneratedcutoff’stosplitproblematicproducts. Theresultsaresimilartotheonesreportedhere. AppendixFgivesmoredetailsregardingthisproblemandhowitis beingtackled. 17

2.2.1. Onaverage,largerplantsproducehigherqualitygoodsgoods Toshowevidencethat,onaverage,largerplantsproducehigherqualitygoodsthanotherfirms,we start by showing that larger plants charge higher prices relative to other firms. Second, consistent with Verhoogen [2008], we show that larger firms produce higher quality goods by using more direct measures of product quality (specifically, showing that larger firms are more likely to have internationalcertificationsandbeanexportthanotherfirmsproducingthesamegood). To show that, on average, larger plants produce higher quality goods than other firms, while controlling for the triple interaction of product, state, and urban fixed effects, we run regressions oftheform: (cid:0) (cid:1) (cid:0) (cid:1) ln P =α +α +γln L +ε , f,g g state,rural f f,g where P is the price charged by plant f for product g, L is the number of workers employed f,g f by plant f, α is a product fixed effect, and α is a state times urban-rural fixed effect. g state,rural Intuitively, the coefficient γ is the elasticity of the price of output produced with respect to plant size and it is identified out of variation in prices charged by plants of different sizes producing the same product (reported in the same units) and allowing for differences in average price levels acrossstatesandurbanandruralareas.11 Column 1 of Table 3 reports results when the sample is restricted to the ASI only. The estimate for the elasticity of price with respect to size, γ, is 0.096 and is statistically significant at the 1 percent level. The point estimate implies that a plant which employs 500 people on average chargesapricewhichis55.6percentmorethanaplantemploying5workers.12 Column 2 report results when the sample is restricted to the SUM only. The point estimate for the coefficient γ (elasticity of price with respect to size) is still positive but smaller. This is not surprising as the variation in employment levels within the SUM is small with 95 percent of the plantsemploying16workersorless. Column 3 reports results when the two surveys are combined. The estimate for the elasticity of 11Notethatthedefinitionofaproductdiffersbetweentheconsumerandfirmdatasets; therefore, eventhoughwe includeproductfixedeffectsinbothsetsofregressions,theyarefixedeffectsforadifferentsetofproducts. 12Note that the formal plants surveyed in the ASI report the value of output before taxes and distribution costs. Therefore, theprice-sizerelationdocumentedhereisnotdrivenmechanicallybythefactthatlargerplantsmightbe payingtaxeswhilethesmallerplantsarenot. 18

Table3: PlantRegressions: LargerPlantsProduceHigherPriceGoods (1) (2) (3) log(outputprice) log(outputprice) log(outputprice) log(labor) 0.10∗∗∗ 0.055∗∗∗ 0.11∗∗∗ (0.010) (0.018) (0.014) AdjustedR2 0.883 0.919 0.883 PriceRatio(Size50to5) 1.26 1.14 1.28 PriceRatio(Size500to5) 1.60 1.29 1.64 Sample ASI SUM Both Winsor Yes Yes Yes Observations 46704 28457 75161 StatexRuralxProductFE Yes Yes Yes NumberofProducts 1218 2740 3182 SEclusters: Product Product Product Standarderrorsinparentheses ∗ p<0.10,∗∗ p<0.05,∗∗∗ p<0.01 The data is from the ASI and SUM for 2005-06. All columns report results for regressions of log price charged by plantsfortheirproductsonlogofnumberofemployeeshiredbytheplant. Column1restrictsthesampletotheASI, Column 2 restricts the sample to the SUM, while column 3 combines the two. One percent tails of prices (within a product) and plant size are winsorized. Regressions include the triple interaction of product, state, and rural fixed effects. Standard errors are clustered at the product level. The price ratio for different sized plants implied by the coefficientestimatesarereportedintherowscalled"PriceRatio". pricewithrespecttosizeimpliesthataplantwhichemploys500peopleonaveragechargesaprice whichis62.9percentmorethanaplantemploying5workers. Figure 5 plots the non-parametric equivalent of the the regression in column 3 of Table 3. In particular, it estimates a kernel-smoothed local linear regression of residualized log prices (after removing product fixed effects and state times urban-rural fixed effects) on residualized log of plant size.13 Again, the non-parametric estimates suggest that the price size relation across plants isclosetolog-linear. The fact that larger plants produce goods which they sell at a higher price is consistent with the 13Logpriceandlogofemploymentofeachplantisregressedonproductandstatetimesurbanruralfixedeffects. The residuals from this procedure are used to run a kernel-smoothed local linear regression with an Epanechnikov kernelandabandwidthof0.502. Thetopandbottom1percentofresidualizedlogofemploymentareexcluded. 19

Figure5: Non-parametricEstimate: LargerPlantsProduceHigherPriceGoods dezilaudiseR )ecirP(gol 3. 1. 1.− 3.− −3 −2 −1 0 1 2 3 log(L) Residualized Notes:ThedataisfromtheASIandtheSUMof2005-06. Thegraphplotsthekernel-smoothedlocallinearregression ofresidualizedlogpriceschargedbyaplantforitsproductsonresidualizedlogemploymentofthatplant(removes productfixedeffectsandtheinteractionofstateandurban-ruralfixedeffects). Productswhichhavetheunitsproblem discussed in footnote 10 and in Appendix F are split into two product categories. 1 percent tails of residualized log employment are excluded. An Epanechnikov kernel with a bandwidth of 0.502 used. The grey regions is the 95 percentconfidenceintervalforthenon-parametricestimate. 20

hypothesisthatlargerplantsproducehigherqualityproducts. To provide more direct evidence that larger firms produce higher quality products, we supplement our analysis by examining two more direct measure of quality: whether a firm has an internationalcertificationandwhetheritisanexporter. Bothmeasureshavebeenusedasmeasures of quality in the literature (Verhoogen [2008, 2023]).14 To show this evidence, we use a more recentIndianmanufacturingdataset(ASI2009-2010)becausethisdatasetcontainsinformationon International Organization for Standardization certification (commonly referred to as ISO certification), a datapoint that is not available in earlier surveys. Specifically, this survey asks whether the firm has ISO 14000 series certification, which focuses on environmental practices. Consistent with higher prices being higher quality, table (4) column 1 (column 2) shows that firms with ISO certification (firms that are exporters) charge, on average, 12 percent (15 percent) more than other firmsevenaftercontrollingforthetripleinteractionofstate,rural,andproductfixedeffects. Table4: FirmswithISOcertificationorfirmsthatexportchargehigherpricesfortheirgoods (1) (2) log(price) log(price) ISOcertified 0.12∗∗∗ (0.038) Exporter 0.15∗∗∗ (0.045) Observations 42507 42549 AdjustedR2 0.764 0.764 StatexRuralxProductFE Yes Yes SEclusters: Product Product Standarderrorsinparentheses ∗ p<0.10,∗∗ p<0.05,∗∗∗ p<0.01 ThistableexamineswhetherfirmswithISOcertificationorfirmsthatexportproducehigherpricegoods. Column1 (2)regressesISOcertificationdummy(exportstatusdummy)onthenaturallogarithmofpricechargedwhileincluding thetripleinteractionofstate,ruralandproductfixedeffects. SinceISOcertificationisnotavailableinthe2005-2006 ASIsurvey,wecreatethistableusingthe2009-2010ASIsurvey. Thissurveyincludesaquestiononwhetherthefirm hastheISO14000seriescertification,whichfocusesonenvironmentalpractices. 14Manyotherpapershavetheoreticallyandempiricallyarguedthatfirmsthatexportproducehigherqualityproducts suchasKuglerandVerhoogen[2012],ManovaandZhang[2012],IacovoneandJavorcik[2012],HallakandSivadasan [2013]. 21

To supplement this evidence, we also show that larger firms are more likely to be ISO certified and be an exporter (table 5). Specifically, this table shows that firms with more employees are more likely to be ISO certified and be an exporter than other firms that produce the same product, evenaftercontrollingforthetripleinteractionofstate,rural,andproductfixedeffects. Table5: LargerplantsaremorelikelytobeISOcertifiedandbeanexporter (1) (2) ISOcertified Exporter log(Labor) 0.065∗∗∗ 0.045∗∗∗ (0.0023) (0.0017) Observations 93895 93984 AdjustedR2 0.214 0.229 StatexRuralxProductFE Yes Yes SEclusters: Product Product Standarderrorsinparentheses ∗ p<0.10,∗∗ p<0.05,∗∗∗ p<0.01 ThistableexamineswhetherlargerplantsaremorelikelytobeISOcertifiedandanexporter. Column1(2)regresses firm size—proxied by the natural logarithm of the number of employees—on an ISO certification dummy (export statusdummy)whileincludingthetripleinteractionofstate,ruralandproductfixedeffects. SinceISOcertificationis notavailableinthe2005-2006ASIsurvey,wecreatethistableusingthe2009-2010ASIsurvey. Thissurveyincludes aquestiononwhetherthefirmhastheISO14000seriescertification,whichfocusesonenvironmentalpractices. 22

2.2.2. Largerfirmsinvestmoreandusehigherpriceinputs Toprovidemoreevidencethatlargerfirmsproducehigherqualityoutputs,weexaminetheinvestmentandinputsuse. Firstweshowthatlargerplantspayahigherpriceforthesamematerialinput as compared to smaller plants. This is consistent with the idea that larger plants produce higher quality products which require higher quality inputs. We then show that larger plants hire more educatedworkersascomparedtosmallplants. Table6: PlantRegressions: LargerPlantsUseHigherPriceInputs (1) (2) (3) Log(inputprice) Log(inputprice) Log(inputprice) Log(labor) 0.065∗∗∗ 0.042∗∗ 0.053∗∗∗ (0.0065) (0.017) (0.010) AdjustedR2 0.893 0.929 0.902 PriceRatio(Size50to5) 1.16 1.1 1.13 PriceRatio(Size500to5) 1.35 1.21 1.28 Sample ASI SUM Both Winsor Yes Yes Yes Observations 107325 105422 212747 StatexRuralxProductFE Yes Yes Yes NumberofProducts 1218 2740 3182 SEclusters: Product Product Product Standarderrorsinparentheses ∗ p<0.10,∗∗ p<0.05,∗∗∗ p<0.01 The data is from the ASI and SUM for 2005-06. All columns report results for regressions of log of price paid by establishmentsformaterialinputsusedonlogofnumberofemployeeshiredbytheestablishment. Column1restricts the sample to the ASI only. Column 2 restricts the sample to the SUM only while column 3 combines the ASI and theSUM.Onepercenttailsofprices(withinaproduct)andplantsizearewinsorized. Regressionsincludethetriple interactionofproduct,state,andruralfixedeffects. Standarderrorsareclusteredattheproductlevel. Thepriceratio fordifferentsizedplantsimpliedbythecoefficientestimatesarereportedintherowscalled"PriceRatio". We use the ASI and SUM are used to show that larger plants use higher price material inputs. Each establishment reports the material inputs it uses (for a 5-digit product classification, which hasabout5,500possibleproducts)andthepriceitpaysfortheinput. Theunitsbetweenthesurveys 23

areagainconcorded.15 Werunaregressionoftheform (cid:0) (cid:1) (cid:0) (cid:1) ln P =α +α +γln L +ε , f,i i state,rural f f,i whereP isthepricepaidbyplant f forinputi,L isthenumberofworkersemployedbyplant f, f,i f α is a product fixed effect, and α is a state times urban-rural fixed effect. Intuitively, the i state,rural coefficientγ istheelasticityofthepricepaidforinputswithrespecttoplantsizeanditisidentified outofvariationinpricespaidbyplantsofdifferentsizesforthesameinputs(reportedinthesame units),controllingfordifferencesinaveragepricesacrossstatesandurban-ruralsectors. Column 1 of Table 6 reports results when the sample is restricted to the ASI only. The estimate fortheelasticityofinputpriceswithrespecttoplantsize,γ,is0.077andisstatisticallysignificant atthe1percentlevel. Thepointestimateimpliesthataplantwhichemploys500peopleonaverage pays prices for inputs which are 42.6 percent more than a plant employing 5 workers. Column 2 reports results when the sample is restricted to the SUM only. The coefficient γ is positive but smaller. Column3reportsresultswhenthetwosurveysarecombined. Whencombiningthetwosurveys, theestimatefortheelasticityofinputpriceswithrespecttosizeimpliesthataplantwhichemploys 500 people on average pays a price for inputs which is 25.9 percent more than a plant employing 5workers. Not only do larger plants use higher price inputs, they also use more capital and invest more. Specifically, they have both a higher capital stock to output ratio and a higher capital investment flowtooutputratio. Figure (6) shows the strong relationship between firm size and firm capital using a binned scatterplot. A binned scatterplot is a convenient way of visualizing relationships when working with large datasets. Specifically, we plot the residuals for the natural logarithm of employees in a firm (x-axis) and for the natural logarithm of amount of capital in the firm relative to the firm’s output (y-axis) after controlling for the triple interaction of product, state, and rural fixed effects. The 15ThesameproblemofunitmisreportingintheASIdiscussedinfootnote10isalsopresentforinputs. Weperform thesamecorrectionforthisproblemaswedidintheprevioussection. Thedataappendixprovidesmoredetails. 24

keyinferencefromthisfigureisthatlargerfirmsuserelativelymorecapital,evenaftercontrolling for the fact that larger firms produce more output. To reinforce this result, in table (7), we show thatthisresultalsoholdsfortheleveloffirminvestment(column3)anddifferentfunctionalforms (columns 1 and 2). In all these regressions we include the triple interaction of state, rural, and product fixed effects. The results in columns 2 and 4 in table (7) suggests that a firm that employs 10 percent more people would have a capital stock to output ratio that is roughly 3 percent higher andcapitalinvestmenttooutputratiothatisroughly4percenthigher. Figure6: Relationshipbetweenfirmsizeandcapital This figure shows the binned scatterplot and line of best fit for the relationship between the size of the firm and the amount of capital used in the firm. Specifically, we plot the residuals for the natural logarithm of employees in a firm(x-axis)andforthenaturallogarithmofamountofcapitalinthefirmrelativetothefirm’soutput(y-axis)after controllingforthetripleinteractionofproduct, state, andruralfixedeffects. Tobeprecise, abinnedscatterplotisa non-parametricmethodofplottingtheconditionalexpectationfunction(whichdescribestheaveragey-valueforeach x-value). Consistentwithlargefirmsmakinghigherqualitygoods,wefindthatfirmswithahighercapital tolaborratiochargehigherpricesfortheirgoodsandtheyusemoreexpensiveinputs. Specifically, in figure (7) we plot the residuals for the natural logarithm of the capital to labor ratio (x-axis) andforthenaturallogarithmoftheproductprice(y-axis)aftercontrollingforthetripleinteraction of product, state, and rural fixed effects. The key inference from this figure is that firms that use relativelymorecapitaltolaborchargehigherpricesfortheirproductthanotherfirms. 25

Table7: Relationshipbetweenfirmsize,capital,andinvestment (1) (2) (3) (4) Capital/Output log(Capital/Output) Investment/Output log(Investment/Output) log(labor) 322.1∗∗∗ 0.28∗∗∗ 83.6∗∗∗ 0.38∗∗∗ (65.5) (0.028) (15.6) (0.032) AdjustedR2 0.576 0.701 0.547 0.655 Winsor Yes No Yes No Observations 23155 23155 23155 21604 StatexRuralxProductFE Yes Yes Yes Yes SEclusters: Product Product Product Product NumberofClusters 945 945 945 915 Sample ASI ASI ASI ASI Standarderrorsinparentheses ∗ p<0.10,∗∗ p<0.05,∗∗∗ p<0.01 This table examines whether larger plants (measured by number of employees) have a larger stock of capital and a higherlevelofinvestmentrelativetofinaloutput. Column1(2)regressesthestockoftotalcapital(naturallogarithm oftotalcapital)relativetofinaloutputonthenaturallogarithmofemployees. Column2(4)regressestotalinvestment (naturallogarithmoftotalinvestment)relativetofinaloutputonthenaturallogarithmofemployees. Topreventour results being distorted by outliers, the ratios in columns 1 and 3 have been winsorized at the 5 percent level. Each regression includes the triple interaction of state, rural, and input product fixed effects. We only include firms that reportcapitallevels. Weclusterourstandarderrorsforeachproduct. To further explore why firms with more capital charge higher prices, we examine different forms of capital. In table (8), we examine the relative stock of machinery capital and total capital (columns 1 and 3), and the relative level of machinery investment and total investment to labor on product prices. Across these regressions we find that firms with higher relative capital stock and capitalinvestmenttolaborchargehigherprices. Theonlyregressioncoefficientthatisnotstatisticallysignificantismachineryinvestmenttolabor,buteventhatcoefficientisclosetosignificantat the10percentlevelandshowsresultsconsistentwiththeotherregressions.16 Largerfirmsalsoemploymoreskilledlabor. ToshowthisweusetheEmployment-Unemployment Survey of 2004-05 conducted by the National Sample Survey Office (NSS) of India.17 The 16Wealsofindthatfirmsthathaveahighercapitaltolaborratio, alsousemoreexpensiveinputs. Specifically, in table(16)inAppendix(H),wedoregressionsanalogoustothoseintable(8)butuseafirm’sinputpricesratherthan output prices. The results in table (16) are consistent with firms that use more capital require higher quality inputs. Therefore,thisresultcontradictsthepotentialconcernthatfirmswithmorecapitalareusingthiscapitaltoreducetheir inputcostsratherthanproducinghigherqualityoutputs. 17WeusetheNSSbecauseplantsintheASIandSUMdonotreporttheeducationleveloftheirworkers. 26

Figure7: Relationshipbetweencapitalandprices Thisfigureshowsthebinnedscatterplotandlineofbestfitfortherelationshipfirmcapitalandthepriceofaproduct. Specifically,weplottheresidualsforthenaturallogarithmoftotalcapitaltoemployeeratio(x-axis)andforthenatural logarithmofpricecharged(y-axis)aftercontrollingforthetripleinteractionofproduct,state,andruralfixedeffects. Tobeprecise,abinnedscatterplotisanon-parametricmethodofplottingtheconditionalexpectationfunction(which describestheaveragey-valueforeachx-value). Employment-Unemployment Survey records demographic information (including education levels) for about 600,000 individuals. It also asks individuals to report the size of the establishment in which they work, with five permissible values: less than six workers; between six and nine workers, between ten and nineteen workers, twenty or greater workers, and unknown size. Table 9 reports the skill composition of workers for the different size categories. Out of the workers in establishments of size less than six workers, 43 percent have never attended school while only 3 percent have graduated from high school. On the other hand, out of workers in establishments of size more than 20 workers, only 23 percent have never attended school while 22 percent percent have graduated high school. As can be seen, a larger share of workers in big establishments have highlevelsofeducation. 27

Table8: Plantswithmorecapitalchargehigherprices (1) (2) (3) (4) log(price) log(price) log(price) log(price) MachineryCapital/Laborratio 0.024∗∗∗ (0.0063) MachineryInvestment/Laborratio 0.031 (0.024) TotalCapital/Laborratio 0.018∗∗∗ (0.0057) TotalCapitalInvestment/Laborratio 0.027∗ (0.014) AdjustedR2 0.878 0.878 0.881 0.881 Winsor Yes Yes Yes Yes Observations 45303 45303 46143 46143 StatexRuralxProductFE Yes Yes Yes Yes NumberofProducts 1218 1218 1218 1218 SEclusters: Product Product Product Product NumberofClusters 1178 1178 1178 1178 Sample ASI ASI ASI ASI Standarderrorsinparentheses ∗ p<0.10,∗∗ p<0.05,∗∗∗ p<0.01 This table examines whether plants with a higher capital to labor ratio use charge higher prices. Column 1 (2) regressesthestockofmachinerycapital(totalcapital)tolaboronaproduct’sprice. Column2(4)regressesmachinery investment(totalinvestment)tolaboroninputprices. Topreventourresultsbeingdistortedbyoutliers,thepricesare winsorizedatthe1percentlevel. Eachregressionincludesthetripleinteractionofstate,rural,andinputproductfixed effects. Weclusterourstandarderrorsforeachinputproduct. Table9: LargerPlantsHireMoreEducatedWorkers NoSchool Grade1to9 Grade10to12 >Grade12 L<=5 0.43 0.41 0.13 0.03 5<L<=10 0.34 0.41 0.17 0.08 10<L<=20 0.33 0.41 0.16 0.10 L>20 0.23 0.32 0.22 0.22 Notes: ThedataisfromtheEmployment-UnemploymentSurveyof2004-05. Therowsofthetablerepresentthesize category of the establishment in which an individual works while the columns represent the education level. Each numberrepresentstheshareofindividualsinthegivensizecategorywhohaveattainedthelevelofeducationgiven bythecolumn. 28

2.2.3. Substantialvariationintheelasticityofproductpricetofirmsize This section shows there is substantial variation in the elasticity of product price to firm size. Specifically even though, on average, larger firms charge higher price–there’s many products with theoppositerelationship. To examine variation in the price elasticity relative to firm size income, we start by plotting the frequency histogram of the product-specific price elasticity to firm size (proxied by number of employees), while including state interacted with rural fixed effects (α ). Specifically, we state,rural dothefollowingregressions: (cid:0) (cid:1) (cid:0) (cid:1) ln P =α +γ ln L +ε , f,g state,rural g f f,g foreachproductandcollecteachγ (so1217regressionssincethereis1217individualproductsin g theASIdataset),wherethef andg,refertofirmandgood,respectively. Wethenplotthefrequency histogram for these γ . These coefficients represent the estimated increase in price charged for g each good as the firm size increases. Figure (8) shows two key results. First, there’s substantial variation across products. Second, even though most products exhibit a positive elasticity (nearly two-thirdsofproducts)andthemediangoodhavingapositiveelasticityofnearly10percent,there remains a sizable minority of products with a negative elasticity. In effect, this figure corroborates the anecdote that you may expect some industries to be characterized by smaller firms producing higher quality outputs (for example, tailored suits compared to mass-market men’s formal wear), but,onaverage,wefindthatlargerfirmsproducehigherqualitygoods. To complement this analysis, we can examine variation in prices charged within product and across similar products. Specifically, we can estimate the differences in prices using a less narrow definition for a "product." Table (10) examines the effect of using less narrow fixed effects for estimating the price elasticity; therefore, examining if larger firms produce slightly different and more expensive goods. Using the ASI dataset, for convenience, column 1 repeats the regressions Table (3 that uses the triple interaction of product, state, and rural fixed effects. In columns 2 and 3, we use increasingly broader product categories. Specifically, column 2 (3) interacts the state x rural fixed effect with a fixed effect for the 4-digit (2-digit) NIC code. Moreover, since 29

Figure8: FrequencyHistogramofProduct-SpecificPriceElasticitytoFirmSize Using the ASI dataset, this figure plots the frequency distribution of the coefficient on “log(number of employees)” when regressing ‘log(price)” on “‘log(number of employees)” for each individual product while including rural interacted with state fixed effects. Specifically, we do the following regression log(price) = f,g γ log(numberofemployees) +α +ε for each product and collect each γ (so 1217 regressions since there is p f r,s f,g g 1217individualproductsinourASIdataset),where f,g,randssubscriptsrefertofirm,good,ruralandstate,respectively. Wethenplotthefrequencyhistogramfortheseγ . Toensurethefigureisnotdistortedbyoutliers,weomitthe g extreme2percentsofthedistribution. the units for different products within the same 4-digit NIC code may differ (for example, tons versus kilogram), we also interact the NIC code with a fixed effect for the unit of measurement. The key result from Table (10) is that using a broader product definition causes a slightly larger estimated effect. That is, on average, larger firms are more likely to produce similar—but slightly different—moreexpensiveproducts. 30

Table10: Largerplantsproducehigherpricegoods: Robustnesstobroaderproductcategories (1) (2) (3) log(outputprice) log(outputprice) log(outputprice) log(labor) 0.10∗∗∗ 0.16∗∗∗ 0.13∗∗∗ (0.010) (0.015) (0.031) AdjustedR2 0.883 0.698 0.634 Observations 46704 46704 46704 Sample ASI ASI ASI StatexRuralxProductFE Yes No No StatexRuralx4-digitNICxUnitFE N/A Yes No StatexRuralx2-digitNICxUnitFE N/A N/A Yes SEclusters: Product Product Product Standarderrorsinparentheses ∗ p<0.10,∗∗ p<0.05,∗∗∗ p<0.01 This table examines whether larger plants produce higher price goods controlling for different degrees of product categories. Column1repeatsthemainregressionfromthepaper. Column2andcolumn3useincreasinglybroader categoriesthancolumn1.Specifically,column2(3)interactstheStatexRuralfixedeffectwithafixedeffectforthe4digit(2-digit)NICcode. Moreover,sincetheunitsfordifferentproductswithinthesame4-digitNICcodemaydiffer (forexample,tonnesversuskilogram),wealsointeracttheNICcodewithafixedeffectfortheunitofmeasurement. 3. Model ThissectiondevelopsageneralequilibriummodelwhichmatchesthefactsdescribedinSection2. In particular, we model consumers choice between different quality levels with richer households morelikelytobuyhighqualitygoods. Ontheproductionside,weassumethatproductionofbetter quality requires larger fixed costs which along with free entry implies that high quality producers arelargeronaverage. 3.1. Households There are a mass L of households in the economy indexed by the subscript j. Share h of the households are skilled and earn wage w (determined endogenously in equilibrium) while share S 1−h are unskilled and earn wage w . Unskilled wage w is assumed to be the numeraire and is U U 31

normalizedto1.18 There are N quality levels. Q={q ,q ,...,q } denotes the the set of qualities available in the 1 2 N economy. The quality indexes, q , are arranged in ascending order of quality. Therefore q is the n 1 lowestqualitylevelandq isthehighestqualitylevel. N Theutilityderivedbyhousehold j fromconsumingqualitylevelq isgivenby n (cid:0) (cid:1) (cid:0) (cid:1) u c ,ε =a +q log c +ε ∀q ∈Q, (2) j,qn j,qn j,qn qn n j,qn j,qn n where a is a constant in the utility function which can vary by quality level, c is the quantity qn j,qn consumed of quality level q by household j, and ε is a random utility component which repn j,qn resents the idiosyncratic valuation of quality level q by household j. The fact that higher quality n levelshavehigherindexes,q ,ensuresthatforagivenlevelofquantityconsumed,householdsget n moreutilityfromconsuminghigherqualitygoods. The random utility component ε is assumed to be independently and identically distributed j,qn withaGumbelType1ExtremeValuedistributionwithdensity f (cid:0) ε j,qn (cid:1) =e−εj,qnee−εj,qn. AsshownbyMcFadden[1974](seealsoChapter3ofTrain[2009]),assumingaGumbeldistributionfortherandomutilitycomponentimpliessimpleclosedformexpressionsfordemands. Weassumethatahouseholdcanchoosetoconsumeonlyonequalitylevelandhencewillspend theirentireincomeonthatqualitylevel. Therefore,ifthehousehold j choosestoconsumequality levelq ,theindirectutilityfunctioncanbewrittenas: n (cid:18) (cid:19) w (cid:0) (cid:1) j v w ,P ,ε =a +q log +ε ∀q ∈Q, (3) j,qn j qn j,qn qn n P j,qn n qn wherewe havesubstituted thehousehold’swage (w )divided bythe priceofquality levelq (P ) j n qn forthehousehold’sconsumptionofgood j inequation(2). 18Havingtwoskilllevelswithdifferentwagesiscrucialforourexerciseasitgeneratescross-sectionaldifferences in income levels in the model. This cross-sectional variation in income levels allows us to calibrate the extent of non-homotheticityinthemodeltomatchtheprice-incomeslopedocumentedinSection2.1. 32

Eachhousehold j receivesdrawsoftherandomutilitycomponentε foreachqualitylevelq j,qn n andgiventhesedraws,choosestoconsumethequalitylevelwhichgivesitthehighestutilitylevel. Therefore,household j choosestoconsumequalitylevelq ifandonlyif n (cid:0) (cid:1) (cid:0) (cid:1) v w ,P ,ε >v w ,P ,ε ∀n̸=m. j,qn j qn j,qn j,qm j qm j,qm Let ρ(q |w) be the share of households with wage w who choose to consume quality level n q . Given the assumption that ε is independently and identically distributed with a Gumbel n j,qn distribution,thissharetakesthesimplelogitform (cid:16) (cid:17) e aqn +qnlog P w qn ρ(q |w)= ∀q ∈Q n (cid:16) (cid:17) n ∑ N e aqi +q ilog P w qi i=1 eaqn (cid:16) w (cid:17)qn = Pqn ∀q ∈Q. (4) (cid:16) (cid:17)q n ∑ N eaqi w i i=1 Pqi Analyzing how ρ(q |w) changes as wage changes can help understand how this preference n structure leads to non-homotheticity with respect to quality choice. Define γ to be the elas- ρ(qn),w ticityofρ(q |w)withrespecttowagesw. Takinglogsanddifferentiatingequation(4)withrespect n tolog(w)yields ∂log[ρ(q |w)] N γ = n =q −∑qρ(q|w). ρ(qn),w ∂log(w) n i i i=1 The elasticity of ρ(q |w) with respect to wages w is simply the quality index q minus a n n weighted average of all the quality indexes, where the weights are the share of households with wage w who buy each quality level. A positive elasticity (cid:0) q n >∑ N i=1 q i ρ(q i |w) (cid:1) implies that as wagesincrease,alargershareofthehouseholdsbuythequalityq . Aslowerqualitygoodshavea n lowerqualityindex(q >q ∀n>m),thelowestqualitylevelwillalwayshaveanegativeelasticn m ity, that is, the share of household who buy the lowest quality level will always go down as wages increase. In our model, the non-homotheticity with respect to quality operates on the extensive margin. As a household becomes richer, it is more likely to choose the higher quality goods. There is a 33

Figure9: QualityEngelCurve 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 0.3 1 1.5 2 2.5 3 3.5 4 4.5 5 )w| q(ρ 2 ∆ = 0 ∆ = 0.5 ∆ = 1 Wage Notes:Thefigureplotstheshareofhouseholdswhopurchasethehighqualityproductfordifferentwagelevels. There are only 2 quality level (N=2) which have prices P =1. Quality index for the low quality is set to one, that is, q1 q =1. The three lines correspond to three different values of ∆ where q =1+∆. a , the constant for the high 1 2 q2 qualityischosensuchthat30percentofhouseholdswithwageequaltoonechoosethehighquality. positively sloped “quality Engel curve” where households with higher levels of wages will, on average, spend a larger share of their expenditure on higher quality goods. This arises because the utility function in equation (2) features complementarity between quantity consumed and quality. As wages increase, the household can consume more quantity of whichever quality level that it chooses. Complementaritybetweenquantityandqualityimpliesthatthemarginalincreaseinutilityfromagivenincreaseinwageislargerforhigherqualitygoodswhichleadstomorehouseholds choosinghigherqualitylevelsaswagesincrease(giventhedrawofε ). j,qn The steepness of the quality Engel curve is determined by the differences in the quality indexes across quality levels. One way of parameterizing the quality indexes is to set the index for the lowest quality level to be one and assume that each higher quality level has an index which is a constant ∆ larger than the previous quality index.19 In this case, the size of the constant ∆ determines the extent of non-homotheticity with a larger ∆ implying that demand shifts to higher qualityfasteraswagesincrease. 19Forexample,q =1andq =q +∆ 1 n n−1 34

Consider the following simple example which illustrates this relation between the size of ∆ and the extent of the non-homotheticity. Assume that there are only two quality level (N =2) which have prices P =1 and P =1.5 and quality indexes q =1 and q =1+∆.20 Figure 9 plots the q q 1 2 1 2 shareofhouseholdswhochoosethehighqualitylevelq asafunctionofwagesfordifferentvalue 2 of ∆.21 For each value of ∆, the constant in the utility function a is chosen such that 30 percent q 2 ofthehouseholdswithwageequaltoonechoosethehighqualityq .22 2 For the case with ∆=0 (blue line in Figure (9)), there is no change in the share of households who buy the high quality as wage increases. This is expected as ∆=0 ensures there is no quality distinction between the goods. For positive values of ∆, there is an increase in the share of householdswhobuythehighqualitygoodaswagesincrease,andthisincreaseislargerforhighervalues of∆(comparethegreenlinewiththeredline). Given prices and the wages of skilled and unskilled workers, the total demand for quality level q isgivenby n w w S U C = Nhρ(q |w ) + N(1−h)ρ(q |w ) ∀q ∈Q. (5) qn n S P n U P n qn qn (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) demandfromskilledhouseholds demandfromunskilledhouseholds Thefirsttermisthedemandforqualityq fromskilledhouseholdswhichistheproductofthenumn berofskilledhouseholds(Nh),theshareofskilledhouseholdswhochoosequalityq (ρ(q |w )), n n S (cid:16) (cid:17) w and the quantity consumed by each skilled household who consumes quality q S . Similarly, n Pqn thesecondtermisthedemandforqualityq fromunskilledhouseholds. n In summary, the consumers choose between different quality levels and complementarity betweenqualityandquantityimpliesthatricherhouseholdsaremorelikelytoconsumehigherquality. This non-homotheticity with respect to quality will help match the patterns seen in Table 1 (thatricherhouseholdsbuyhigherpricegoods). 20In the full calibration done in Section 4, there is a richer quality space with N =12. Here, to illustrate the non-homotheticity,thesimplifyingassumptionofN=2ismade. 21TheresultsinFigure9canbeviewedasthechoicemadebyanindividualif theyfacedacontinuouswageprofile. However,onlytwowageswillexistinequilibrium(theunskilledandtheskilledwages). 22OnlyN−1constantsintheutilityfunctionareidentifiedaswhatmattersforconsumerchoiceisthedifferencein utilityacrossqualitylevels. Therefore,forthecasewithN=2,onlyoneconstantneedstobecalibrated. 35

3.2. Final Goods Producers There are N competitive final goods producers, one for each quality level. In addition to the vertical differentiation across qualitylevels, there ishorizontal differentiationin products withina quality level. The final goods producer of quality q combines intermediate varieties (horizontal n differentiation) of quality q to produce the composite final good of that quality. Each final goods n producerhasaconstantelasticityofsubstitution(CES)productionfunctiongivenby (cid:32) (cid:33) σ 1 Mqn σ−1 σ−1 Ys = ∑x σ , ∀q∈Q qn 1 i,qn Mσ−1 i=1 qn where i indexes varieties, M is the number of varieties (or plants) of quality q present in the qn n economywhichwillbedeterminedbyfreeentry,x isthequantityofvarietyiofqualityq used i,qn n bythefinalqualityproducerofqualityq ,23 andσ istheelasticityofsubstitutionbetweendifferent n varietiesofthesamequality. Themultiplicativefactor 1 intheproductionfunctionscalesouttheloveofvarietyfromthe 1 Mq σ−1 CES production function. This ensures that the price difference between different quality levels does not reflect differences in number of varieties available. We maintain this assumption of no love of variety in the baseline specification for two reasons. Firstly, assuming no love of variety is the conservative choice as changes in the size distribution in the counterfactual exercises are smaller in this case as opposed to the case with love of variety. Secondly, allowing for love of variety makes the changes in size distribution in the counterfactual sensitive to the average level of the quality indexes q which is a difficult parameter to calibrate as it represents the own price n elasticity of each quality level with respect to the unobserved CES price index of that quality.24 Therefore, while the baseline results presented in Section 5.1 maintains the assumption of no love of variety, Section 5.3 provides results when allowing for love of variety and further discuses the 23Notethatthepair(i,q )togetheridentifiesavarietyuniquelyintheeconomy. irepresentsthehorizontaldiffern entiationdimensionwhileq representstheverticaldifferentiationdimension. Forexample,(i=1,q )representsthe n 1 firstvarietyoflowestqualityq while(i=1,q )representsthefirstvarietyofthehighestquality. 1 N 24As mentioned in Section 3.1, we parametrize the quality indexes using the recursion q =q +∆, where the n n−1 sizeof∆determinesthesteepnessofthequalityEngelcurves. Withnoloveofvariety, thechoiceofthelevelofq 1 (whichgivena∆determinestheaveragelevelofthequalityindexes)doesnotimpactthechangesinsizedistribution inthecounterfactual. However,whenallowingforloveofvariety,theresultsbecomesensitivetothechoiceofq . 1 36

sensitivityoftheresultstotheaveragelevelofthequalityindexesq . n The final quality producers take the prices of intermediate varieties, p , as given and solve i,qn theircostminimizationproblem min∑p x x i,qn i,qn i,qn (cid:32) (cid:33) σ 1 Mqn σ−1 σ−1 s.t.Ys = ∑x σ , ∀q ∈Q. qn 1 i,qn n Mσ−1 i=1 qn Thisyieldstheirdemandcurves (cid:32) (cid:33) σ 1 Mqn 1−σ x = p−σ Mσ−1Ys ∑ p1−σ ∀q ∈Q, (6) i,qn i,qn qn qn i,qn n i=1 whicharetakenasgivenbydownstreamintermediateproducers. Thefinalqualityproducersmake zeroprofits. Thepricethattheychargeconsumersisgivenby P = ∑ M i= q 1 n p i,qn x i,qn, ∀q ∈Q. qn Ys n qn Given the assumption of no love of variety, P will be independent of the number of varieties qn M availableintheeconomy. qn 3.3. Intermediate Goods Producers Eachvarietyofeachqualityisproducedbyamonopolisticallycompetitiveintermediateproducer. The intermediate producers combine skilled and unskilled labor and their production function is givenby x(A,q )=A (cid:18) θ (cid:0) lU (cid:1)σs σ u s − u 1 + (cid:0) 1−θ (cid:1) (cid:16) lS (cid:17)σs σ u s − u 1(cid:19) σs σ u s − u 1 , (7) i n i,qn qn i,qn qn i,qn where lU is the quantity of unskilled labor hired by variety i producer of quality q , lS is the i,qn n i,qn quantityofskilledlaborhiredbyvarietyiproducerofqualityq ,σ istheelasticityofsubstitution n su 37

between the two types of labor, A is the idiosyncratic productivity level of variety i producer of i,qn qualityq ,andθ istheshareparameterofunskilledlaborforqualityq producers. n qn n Solving the cost minimization problem of the intermediate goods producer subject to the productionfunctiongiveninequation(7)yieldsthemarginalcostofproductionforvarietyiofquality q whichisgivenby n 1 κ(A,q )= . i n (cid:18) (cid:19) 1 A θ σsu (cid:16) 1 (cid:17)σsu−1 + (cid:0) 1−θ (cid:1)σsu (cid:16) 1 (cid:17)σsu−1 σsu−1 i,qn qn w U qn w S The marginal costs is a function of skilled and unskilled wage, and is inversely proportional to theproductivitylevelA . i,qn Intermediate quality producers will take the demand curve of final quality producers (equation 6) as given and will maximize profits. As the demand curve of final quality producers is of the constant elasticity form, the optimal price charged by intermediate producers will be a constant markupovermarginalcostandisgivenby σ p(A,q )= κ(A,q ). (8) i n i n σ−1 To start an intermediate goods plant of quality q requires f units of labor. Share α of the n qn qn entrylaborneedstobeskilledandthisshareisdifferentfordifferentqualitylevels. Onpayingthe fixedcost f ,entrantreceiveaproductivitydrawfromalognormaldistributiongivenby qn log (cid:0) A (cid:1) ∼g ∼N (cid:0) µ ,ν2(cid:1) . i,qn qn qn Note that the mean of the log of the productivity draw can differ across quality levels but the varianceisthesame. Freeentryrequiresthatthefixedcostpayedmustequaltheex-anteexpectedprofiti.e. ˆ (cid:0) (cid:1) α f w + 1−α f w = π(A,q )g (A)dA ∀q ∈Q (9) qn qn S qn qn U i n qn i i n 38

where π(A,q ) is the flow profit earned by an intermediate quality producer of quality q with i n n productivitydrawA andisgivenby i π(A,q )=[p(A,q )−κ(A,q )]x(A,q ). i n i n i n i n The number of varieties M will adjust to ensure that the free entry condition holds for all qn qualitylevels. If fixed costs for higher quality levels is larger than for lower quality levels, then for the free entry condition to hold, the scale of production x(A,q ) will have to be larger for higher quality i n producers. Furthermore, if θ >θ ∀n>m then higher quality producers will use skilled labor qn qm more intensively and will have a higher cost of production. Finally, differences in µ will also qn translateintodifferencesinpricesbetweendifferentqualitylevelsasmarginalcostsandpricesare proportionaltoproductivity. One simplifying assumption in our model is that output prices are fully determined by production costs and do not serve, in equilibrium, as signals of quality. This outcome arises because we assumeperfectlycompetitivefinalgoodsmarketswithzeroeconomicprofits,andmonopolistically competitive intermediate producers with constant markups. As a result, prices and average firm size reflect the cost of intermediate inputs and fixed costs. Therefore, higher quality firms charge higher prices and are larger on average only due to the technology requirement of higher quality production requiring higher fixed and variable costs. While this abstraction allows us to isolate the supply-side forces driving the quality–firm size relationship, it omits a potential demand-side mechanism. We acknowledge that in many real-world settings, firms producing higher quality goods may also enjoy pricing power. Nevertheless, the model delivers a positive relationship betweenpriceandqualityviafirminputchoices,whichalignswiththeempiricalpatternsweobserve: higher-pricedfirmsinvestmore,usemorecapital,andemploymoreskilledlabor. Weinterpretthis as indirect evidence of quality, while recognizing the model’s limitations in capturing price as a signalofconsumervaluation. 39

3.4. Equilibrium (cid:18) (cid:19) (cid:110) (cid:111) (cid:8) (cid:9) The equilibrium in this economy is a set of prices w , p , P , allocations S i,qn i∈Mqn qn qn∈Q (cid:110) (cid:111) (cid:8) (cid:9) (cid:8) (cid:9) c ,C , x ,Y ,andmassofentrantsM suchthat j,qn j∈L qn i,qn i∈Mqn qn qn∈Q qn (cid:0) (cid:1) • Given prices P , wages, and draws of the random utility component ε , consumers qn j,qn choosetheiroptimalqualitylevel(equations4and5hold) • Given prices, final quality producers demand optimal amounts of intermediate goods (demandfollowsequation6) • Intermediate good producers maximize profits (charge the constant markup price given by equation8) • Freeentryconditionsholdforallqualitylevels(equation9) • Marketsclear Y =C ∀q ∈Q qn qn ˆ n L(1−h)=∑M lU(A,q )g (A)dA +∑M (cid:0) 1−α (cid:1) f (10) qn i n qn i i qn qn qn qn ˆ qn Lh=∑M lS(A,q )g (A)dA +∑M α f (11) qn i n qn i i qn qn qn qn qn Equations (10) and (11) are the labor market clearing conditions. Equation (10) requires that the demandforunskilledlaborforproductionbytheintermediateproducers(summingoverallquality levels) and entry requirements must equal the supply of unskilled labor. Similarly, Equation (11) requiresthatthedemandforskilledlaborfromintermediateproducersandentryrequirementsmust equalthesupplyofskilledworkers. 40

4. Calibration This section calibrates the model to match the cross-sectional facts documented in Section 2 and some additional moments taken from the Indian data. The key parameters in the counterfactual exercises that determine the change in the size distribution are the degree of non-homotheticity (∆) for consumers and the price-size relationship for producers. These parameters are calibrated independently of the aggregate relationship between the share of employment in small plants and income levels seen across Indian states (which is what we want to explain in the counterfactual). Inparticular,weusethemicro-factsdocumentedinSection2(richerhouseholdsbuyhigherpriced goodsandlargerplantsproducehigherpricedgoods)todisciplinetheseparametersofthemodel. 4.1. Production Parameters For the calibration, we define an individual with less than ten years of education as unskilled. h, the share of the labor force which is skilled, is set to 0.24, which is the share of manufacturing workers with at least ten years of education in India in 2004-05. σ , the elasticity of substitution su betweenskilledandunskilledworkersintheintermediategoodsproductionfunction(equation7), isassumedtobe1.75whichisintherangeofestimatesfordevelopingcountriesinBehar[2009]. Theelasticityofsubstitutionbetweenvarietiesforthefinalgoodsproducer,σ,issetto5,which implies a markup over cost of 25 percent for the intermediate producers and is in the range of estimatesinBrodaandWeinstein[2006]. Thisleavesfivesetsofparameterstobecalibratedontheproductionside: (1) f ,thefixedcost qn for each quality level; (2) θ , the share of unskilled workers in the production function for each qn quality level; (3) µ , the mean of the log of the productivity draw for each quality level; (4) α , qn q the share of skilled labor needed for entry for each quality level; and (5) ν2, the variance of the productivity draw which is common across all quality levels. These parameters (along with the utilityparameters)arejointlycalibratedasthereisnoone-to-onemappingbetweentheparameters and the target moments. However, for expositional purposes, we explain the calibration of each parameterintermsofthemomentswhicharemostinformativeabouttheparameter. 41

Table11: UnskilledtoSkilledRatioforDifferentSizeCategories U/SRatio RatioRelativetoSmallest L<=5 5.05 1.00 5<L<=20 2.92 0.58 L>20 1.25 0.25 Notes: ThedataisfromtheEmployment-UnemploymentSurveyof2004-05. Therowsofthetablerepresentthesize category of the establishment in which an individual works. The first column gives the ratio of skilled to unskilled workersineachsizecategorywherethedefinitionofskilledisassumedtobeanindividualwithatleasttenyearsof education. Thesecondcolumngivestheratioofskilledtounskilledrelativetothesmallestsizecategory. ThenumberofqualitylevelsN issetto12.25 The fixed costs, f , determines the average scale of operation of the intermediate producers qn of each quality level. A larger fixed cost will mean that the average size (in terms of output and employment) of intermediate producers will need to be larger in order for the the free entry condition to hold. As shown in Section 2.2.1, larger plants tend to produce higher price products, which is indicative of higher quality goods being produced in larger plants. Therefore, the fixed costsarechosensuchthattheaverageemployment(skilledplusunskilledworkers)inintermediate producers of the lowest quality levels is 1.25 workers and each higher quality level has double the average size of the previous quality level, that is, the average employment of the intermediate producersofthedifferentqualitylevelsaresize ={1.25, 2.5, 5,...,2560}.26 qn The level of θ′ s determine the demand for unskilled labor relative to skilled labor and are qn informativeaboutthewagepremium,w ,intheeconomy. Theratioofskilledtounskilledworkers S inanyqualitylevelrelativetothelowestqualityisalsoafunctionoftheθ′ sandisgivenby qn ratioU qn ,S = (cid:18) L L U q q S n n (cid:19) / (cid:18) L L U q q S 1 1 (cid:19) = (cid:16) 1− θq θ n qn (cid:17)σus / (cid:16) 1− θq θ 1 q1 (cid:17)σus ∀q n ∈Q. (12) 25TheresultsdiscussedinSection5arenotverysensitivetothechoiceofN. Forexample,ifweinsteadchooseN tobe6,andchoosealltheotherparametersinthesamewayasdescribedbelow,thenthemodelexplains45percent insteadof43percentofthedifferencesinshareofemploymentinsmallplantsinrichversuspoorstates(thebaseline resultsdiscussedinSection5.1). 26Differentintermediateproducersofthesamequalitywillhavedifferentlevelsofemploymentduetoheterogeneity intheproductivitydraw. Withinthesamequalitylevel,intermediateproducerswithhigherproductivitydrawswillbe largercomparedtothosewithlowerproductivitydraws. Thefixedcostsarechosensuchthattheaverageemployment leveloftheproducerswithinaqualitylevelmatchesthetargetsize ={1.25, 2.5, 5,...,2560}. qn 42

Therefore,thetwelveθ′ sarechosentomatchatargetforthewagepremiumandeleventargets qn forunskilledtoskilledratioindifferentqualitylevelsrelativetothelowestqualitylevel. The targets for these moments are obtained from the Employment-Unemployment Survey conductedbytheNSSin2004-05(seeSection2.2.2andAppendixA.4fordetailsaboutthedataset). The target for the wage premium is set at 1.6, and is obtained from running Mincerian regressionsondatafromtheEmployment-UnemploymentSurvey.27 Table11givestheratioofunskilled to skilled workers for three different size categories, along with the ratio relative to the smallest size category, as computed from the Employment-Unemployment Survey. Smaller plants have a muchhigherratioofunskilledtoskilledworkersindicatingthatlowqualityproducershavehigher θ′ s. Unfortunately, the size categories reported in the Employment-Unemployment survey are qn verycoarse,andthereforecannotbeusedtocomputeelevenratiosforequation(12)forelevendifferentquality(size)levels. WeusethefirsttwodatapointsreportedinTable11fortheunskilledto skilled ratio (column 1) and extrapolated the relationship to larger sizes (with a minimum of 0.5) tocomputeelevenratios,oneforeachquality(size)level. µ ,the meanofthelog oftheproductivitydraw foreachqualitylevel, isinformativeaboutthe qn average price of each quality level as p(A,q ) ∝ 1 . If the mean of the productivity draw for a i n A i particular quality is high, then the average price of that quality level will be lower. Therefore, the µ foreachqualitylevelischosentomatchtheprice-sizerelationshipseeninTable3.28 qn The share of skilled labor needed for entry for each quality level, α , is chosen to match the qn share of skilled labor used in the production of that quality. Therefore, high quality producers use a more skill intensive production process (lower θ ) and also have more skill intensive entry qn requirement.29 27Werunaregressionoflogwagesonadummyofwhethertheindividualisskilled(atleasttenyearsofeducation) forallmanufacturingworkers, controllingforpotentialexperience, sex, state, industry, occupation, andwhetherthe individualisresidinginaruralorurbanarea. Individualswithtenormoreyearsofeducationonaveragemake56.8 percent more than workers with less than ten years of education which is rounded up to a wage premium of 1.6. AppendixGreportsmoredetailsandtheregressionresults. 28Inparticular,theµ′ sarechosentomatchaprice-sizeslopeof0.1. Notethatplantsofeachhigherqualitylevel qn are calibrated to be two times the size of the previous quality level. Therefore, the µ′ s are chosen such that each qn higherqualitylevelchargesalogpricewhichis0.1∗log(2)higherthanthepreviousqualitylevelslogprice. 29Theratioofskilledtounskilledlaborusedbyplantsofqualityq isgivenby l i U ,qn = (cid:16) wS θq (cid:17)σus andisindepenn l i S ,qn wU 1−θq dentoftheproductivitydrawoftheplant. Therefore,theshareofskilledworkersusedinproductionofqualityq is n 43

Table12: Calibration Param. Description Targets f Fixedcosts size ={1.25, 2.5, 5, ..., 2560} qn qn µ Meanofproductivitydraws Price-sizeslopeof0.1 qn LU θ ShofU inproduction w =1.6;and qn acrossqualities qn S LS qn LS α Shofskilledinentry qn inproduction qn LS +LU qn qn ν2 Varianceofproductivitydraw Stddevofemployment=0.64 q (∆) Utilityfromquality Price-incomeslopeof0.1 n a Constantinutilityfunction Sizedistribution qn Finally,ν2,thevarianceofthelogoftheproductivitydraw(commonacrossqualities),ischosen to match the standard deviation of the log of employment in the combined ASI and SUM dataset whichwas0.64. 4.2. Utility Parameters Theutilityfunctioninthemodeltakestheform (cid:0) (cid:1) (cid:0) (cid:1) u c ,ε =a +q log c +ε ∀q ∈Q. (13) j,qn j,qn j,qn qn n j,qn j,qn n Twosetsofparametersneedtobecalibrated: (1)q ,thequalityindexes;and(2)a ,thequality n qn specificconstantintheutilityfunction. As mentioned in Section 3.1, the quality indexes are parametrized as follows: q =1 and q = 1 n q +∆.30 Thevalueof∆determinesthesteepnessofthequalityEngelcurve,thatis,howquickly n−1 does demand move to higher quality as income levels increase. In the model, skilled workers earn wagew (whichiscalibratedtobe1.6)andunskilledworkersearnwagew (whichisnormalized S U simply 1 . 1+ (cid:16)wS θq (cid:17)σus wU 1−θq 30Settingq tobeoneisnotanormalizationinthemodel. However,forthebaselinespecificationwithnoloveof 1 variety,theresultsarenotsensitivetothechoiceofq . ThisissueisdiscussesfurtherinSection5.3. 1 44

to one as the numeraire). ∆ is chosen to match the price-income relationship documented in Table 1ofSection2.1. Inparticular,∆ischosensuchthattheprice-incomeelasticityinthemodelis0.1, (cid:16) (cid:17) w that is, the average log price paid by skilled households is 0.1∗log S more than for unskilled w U households. As higher quality producers in the model have higher prices, this in effect determines theextenttowhichdemandshiftstowardshighqualityaswemovefromunskilledwagestoskilled wages. Thequalityspecificconstantintheutilityfunction,a ,determinestheabsolutelevelsofdemand qn for different quality levels i.e. it determines ρ(q |w) given in equation (4). A higher a for a n qn specific quality means that a larger share of households are likely to buy that quality (irrespective ofincomelevel). Therefore,wechoosea suchthatthesizedistributioninthemodelmatchesthe qn sizedistributionforIndiaasawholein2005-06. In summary, a pins down the absolute level of demand for the different qualities and are caliqn bratedtomatchthesizedistributioninthemodeltotheIndiandata. ∆determinesthedifferencesin demand for high versus low quality levels between skilled and unskilled workers and is calibrated tomatchtheprice-incomeelasticityseeninthedata. Table 12 summarizes the calibration. Figure 10 plots the share of workers in plants of different size categories for the calibrated model and the data (combining the ASI and the SUM for 2005- 06). Asthemodelparameterswerechosentomatchthesizedistribution,itisnotsurprisingtosee that the size distribution in the model matches the data very closely. However, the model was not calibrated to match the change in size distribution as income levels change. The extent to which thesizedistributionchangesinthemodelasincomelevelschangedependscruciallyonthedegree ofnon-homotheticity(∆)ontheconsumersideandtheprice-sizerelationontheproducersideand theseparameterswerecalibratedusingmicro-datafromconsumerandproducersurveys. 5. Results This section first conducts counterfactual exercises that simulate differences in per-capita income levels and examine how this effects the size distribution. Second, we explore the sensitivity of the resultstosomeimportantparameters. 45

Figure10: SizeDistribution-DatavsModel 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Size tnemyolpmE fo erahS Data Model <=5 5 to 9 10 to 19 20 to 49 50 to 99 100 to 249 250 to 499 500 to 999 >1000 Notes: Thefigureplotstheshareofemploymentindifferentsizecategoriesinthedataandinthecalibratedbaseline ofthemodel. ThedataisforthemanufacturingsectorinIndiafor2005-06. ItcombinestheASIandtheSUM(same asFigure1). 46

5.1. Cross-section of Indian States How much of the cross-state differences in the size distribution seen in the data can be explained bythemodelifper-capitaincomeinthemodelvariesbythesameamountasitvariesacrossIndian states? To answer this question, we conduct counterfactual exercises in which we vary three sets ofparametersinthemodelwhilekeepingalltheotherparametersunchanged: 1. The share of the households in the model who are skilled, h, is varied in the counterfactual exercises to match the share of workers with ten or more years of education across rich and poor states. About 13 percent of the manufacturing workers in the poorest states are skilled ascomparedto43percentinthericheststates. 2. Theshareparameterofunskilledlaborforintermediateproducers,θ ,ischangedacrossthe qn counterfactuals to keep the wage premia unchanged.31 This can be viewed as skill biased technical change with richer states having a higher supply of skilled labor and also using skilledlabormoreintensivelyintheproductionofallqualitylevels.32 3. The mean of the productivity draw of intermediate producers, µ , is changed to match qn the differences in per-capita income across states and to maintain the price-size slope of 0.1 across the counterfactuals.33 Per-capita income of the poorest Indian state (Bihar) is 31If we do not change θ , then the wage premia falls in the counterfactual for the richer states due to the higher qn supplyofskilledworkers. However,inthedata,wagepremiadoesnotvarysystematicallyacrossstates. Inparticular, if we run a Mincerian regression of log of wages on a dummy which takes value 1 if the person is skilled and also includetheinteractionofthedummywithper-capitastateNDP(controllingforindustry,occupation,sex,experience etc),thenthecoefficientontheinteractionisnotsignificantlydifferentfromzero. 32In effect, θ for each counterfactual is chosen to maintain the wage premia (w =1.6). All the other θ′ s are q1 S qn pickedasdescribedinequation(12)tomatchtheratioofskilledtounskilledindifferentqualitylevelsrelativetothe worst quality level. Furthermore, in the counterfactual, the share of entry labor which needs to be skilled workers (α )isalsochangedtomatchtheshareofskillinproductionforeachqualitylevel. Thatis,thericherstatesdonot qn justusemoreskillintensiveproductiontechniquesbutalsousemoreskillintheentryprocess. 33AsmentionedinSection4.1,µ foreachqualitylevelwaschosentomatchtheprice-sizeelasticityof0.1. Inthe qn counterfactualexercises,astheθ′ sarechanged,thiscanleadtochangesinpricesofthehighqualityrelativetolow qn qualityeventhoughthereisnochangeinwagepremia. Thesechangesinrelativepricescancauseashiftindemand andthuschangesinthesizedistributionforreasonsotherthanchangesinrealincomewhichiswhatwewanttofocus on. Therefore,inthecounterfactual,inadditiontoscalingalltheµ′ sbyaconstant(tomatchthedifferencesinperqn capitaincomeseenacrossIndianstates), wealsochangetherelative µ′ sofdifferentqualitiestomaintainthesame qs relative prices of different quality levels. This eliminates any substitution effects due to relative price changes and onlyfocusesonchangesindemand(causedbythenon-homotheticityinthepreferences)duetochangesinper-capita incomelevels. 47

0.39 times India’s per-capita income while that of the richest state (Maharashtra) is 1.57 times India’s per-capita income. To generate similar differences in per-capita income in the model,thepoorerstatesinthecounterfactualexercisehaveloweraverageproductivitylevels comparedtothericherstates.34 To summarize, three sets of parameters are changed in the counterfactual exercises: the share of skilled in the population, the skill intensity of the production process, and the means of the productivity draws of intermediates. These parameters are changed to match the differences in skillcompositionandper-capitaincomelevelsacrossIndianstateswhilekeepingthewagepremia andtherelativepricesofdifferentqualitylevelsunchanged.35 An increase in the productivity of intermediate producers and in the supply of skill translates into an increase in real income levels in the model. The increase in real income level leads to demand shifting towards higher quality goods due to the non-homotheticity in the preferences. Thischangeindemandleadstoashiftintheproductionside. Thenumberofplantsproducinglow qualitygoodsdeclineswhilethoseproducinghighqualityincreases. Thisinturnimpliesthatthere isashiftinthesizedistributionwiththeshareofemploymentinsmallplantsfalling. The red dashed line in Figure 11 plots the share of employment in plants of size five or less that is predicted by the model when conducting the counterfactual exercises. In the calibrated baseline,theshareofemploymentinsmallplantsinthemodelis63.9percent. Whenproductivity andsupplyofskillisloweredsuchthatper-capitaincomelevelsdecreasebyafactorof0.39(0.94 log points lower), the share of employment in plants of size five or less increases to 75.6 percent. On the other hand, when productivity and supply of skill is increased such that per-capita income levelsincreasebyafactorof1.57(0.43logpointshigher)comparedtothecalibratedbaseline,the 34In order to define per-capita GDP in the model, we need to define a set of base prices. We use the prices of intermediatesinthecalibratedbaselineasthebasepricesandvalueoutputinthecounterfactualsusingtheseprices. 35Much of the variation in income levels across states in the model is captured by differences in productivity, as higher productivity directly raises the amount of output that can be produced by labor. On the other hand, human capitalplaysarelativelyminorroleinexplainingincomedifferencesacrosscounterfactualsbecauseskilledworkers in the model are not intrinsically more efficient than unskilled workers; that is, they do not possess more effective units of labor. Rather, skilled and unskilled workers are just two distinct inputs in the production function and the relatively low supply of skilled workers compared to demand results in the skilled workers getting a wage premia. Assuch,varyingtheshareofskilledworkersdoesnotnecessarilyincreaseaggregatepercapitaincomelevelsinthe model. ThisresultisconsistentwithdevelopmentaccountingexerciseswhichalsofindthatresidualTFPexplainsthe majorityofthedifferencesinper-capitaincomeacrossIndianstates(seeChanda[2011]). 48

Figure11: CounterfactualAcrossIndianStates-DatavsModel 0.9 0.8 0.7 0.6 0.5 9.2 9.4 9.6 9.8 10 10.2 10.4 10.6 10.8 ln(pcGDP) 5=< ni tnemyolpmE fo erahS Data Model Notes: ThefigureplotstheshareofemploymentinplantsofsizefiveorlessacrossIndianstatesinthedataandfor thecounterfactualexerciseinthemodel. Thebluelineisthelinearregressionlineofshareofemploymentinplants ofsizefiveorlessindifferentIndianstatesonlogofper-capitaGDPofthestate. Theredlineisthemodelpredicted shareofemploymentinplantsofsizefiveorlesswhenconductingthecounterfactualexercise. 49

shareofemploymentinsmallplantsfallsto56.3percent. The solid blue line in Figure 11 plots the projection from a linear regression of the share of employment in plants of size five or less on log of per-capita State NDP across Indian states. The share of employment in small plants is computed by combining the ASI and the SUM (the same data as in Figure 2). In the data, the poorest Indian states have about 91.9 percent of employment insmallpantswhiletherichesthave47.2percentemploymentinsmallplants. While the share of employment in small plants varies by 44.7 percentage points across Indian states in the data, the model predicts an 19.3 percentage points difference. Therefore, the model explains about 43 percent of the difference in share of employment in small plants seen across Indianstates. Figure12compareshowtheentiresizedistribution(asopposedtojusttheshareofemployment in plants of size five or less) changes in the model as compared to the data as we change income levels. Inthedata,wepooltogetherthethreepooreststatesandthethreericheststatesandcompute theshareofemploymentindifferentsizecategoriesforthesegroupsofstates.36 The light blue bars in Figure 12 show the difference (in percentage points) in the share of employmentinthethreepooreststatescomparedtothethreericheststatesforeachsizecategory. The poorest states have about 36 percentage points more employment in plants of size five or less as compared to the richest states. The richer states have a larger share of their employment in all the largersizecategoriesascomparedtothepoorstates,whichiswhythethebluebarsliebelowzero for all these size categories. The red bars represent the same difference in share of employment fordifferentsizecategoriesthatthemodelpredictswhenproductivityandskilllevelsinthemodel are varied to match the incomes differences across these groups of states. The model predicts that the share of employment in plants of size five or less is about 15 percentage points higher in the poorer states as compared to richer states, which again accounts for about 42 percent of the difference seen in the data. Again, like the data, the red bars lie below zero for all the other size categories,indicatingthatthemodelpredictsalargershareofemploymentinricherstatesforthese sizecategories. 36Wepoolthethreerichestandpooreststatesinordertoavoidhavingtheresultsbeingdrivenbyanoutlierstate. Theresultsaresimilarifwejustcomparethericheststatetothepooreststate. 50

Figure12: Counterfactual: ChangesinDistributionfor3Richestvs3PoorestStates 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 −0.05 −0.1 Size tnemyolpmE fo erahS Data Model <=5 5 to 9 10 to 19 20 to 49 50 to 99 100 to 249 250 to 499 500 to 999 >1000 Notes: Thefigureplotstheshareofemploymentinthethreepooreststatesminustheshareinthethreericheststates for different size categories in the data and in the model (when productivity and skill levels are varied to match the differencesinper-capitaincomeacrossthesegroupsofstates). ThedataisfromtheASIandSUMfor2005-06. 5.2. India Over Time This subsection examines how well the model can explain the changes in the size distribution of manufacturing plants in India over time. Five waves of the Survey of Unorganized Manufacturing (SUM) have been conducted in Indian between 1989-90 and 2010-11. These can be combined with the corresponding years of the Annual Survey of Industries (ASI) to get five data points for howthesizedistributionhasevolvedovertimeinIndia. The bars in Figure 13 show the share of employment in plants of size five or smaller for 1989, 1994, 2000, 2005, and 2009.37 As can be seen, the share of employment in small plants has decreasedfrom77percentoftotalemploymentin1989to58percentin2009. Per-capitaincomein1989was0.54timesthe2005levelofper-capitaincomewhiletheshareof manufacturingworkerswithtenormoreyearsofschoolingwasjust14percent. In2009per-capita income levels were 1.30 times the 2005 level while the share of manufacturing workers with ten 37MoredetailsofthesurveysaregiveninAppendixA.1andA.2. 51

Figure13: CounterfactualIndiaOverTime-DatavsModel 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 1989 1994 2000 2005 2009 Year 5=< ni tnemyolpmE fo erahS Data Model Notes:TheredbarsinthefigureplottheshareofemploymentinplantsofsizefiveorlessforfiveyearsforIndia. The dataforeachyearpoolstheSUMandandthetheASIforthatyear. Thebluelineplotsthemodelpredictedshareof employmentforeachyearwhenproductivityandskilllevelsarevariedtomatchthedifferencesinper-capitaincome inIndiaovertime. or more years of schooling had increased to 31 percent. The blue line in Figure 13 plots the share of employment in plants of size five or less as predicted by the model when productivity and skill supply in the model is varied to the extent required to match the differences in per-capita income levelsandshareofskilledinthedata. Themodelwascalibratedtomatchtheshareofemployment insmallplantsin2005,therefore,thefitin2005isverygoodbyconstruction. Themodelpredicts that 72 percent of employment would be in plants of size five or less in 1989, which is a little less than the 77 percent seen in the data. Similarly, the model under-predicts the change in the size distributiongoingfrom2005to2009byasmallamount. Overall,themodelpredicts65percentof thechangeinshareofemploymentinsmallplantsseeninthedatabetween1989to2009.38 38Arelatedquestioncanalsobeasked: Didstateswhichgrowmoreovertimeseealargerdropinshareofemploymentinsmallplants? Ifwelookatthechangeinthesizedistributionbetween1989and2009,thenthisdoesseemto bethecase. 52

Table13: LoveofVariety: PercentofCross-StateDifferenceExplained q =1 q =0.1 1 1 η = 1 43.1% 43.1% σ−1 η =0 71.2% 53.1% Notes: Thetableshowsthepercentofcross-statevariationinshareofemploymentinplantsofsizefiveorlessthatis explainedbythemodelcounterfactualfordifferentparametervaluesofη andq .η= 1 isthebaselinespecification 1 σ−1 ofnoloveofvarietywhileη =0isthecaseoffullloveofvariety. 5.3. Parameter Sensitivity: Love of Variety As mentioned in Section 3.2, the baseline specification of the model assumed that the final goods producersproductionfunctionhadnoloveofvariety. Ageneralizationofthetheproductionfunctionofthefinalgoodsproducerofqualityq isgivenby n (cid:32) (cid:33) σ 1 Mqn σ−1 σ−1 Ys = ∑x σ ∀q∈Q. qn M η i,qn qn i=1 In the baseline specification, η was set equal to 1 , which corresponded to the case of no love σ−1 of variety. In this section, we provide results for the case when η = 0 (the case with full love of variety) and compare this to the baseline. As mentioned in Section 3.2, the no love of variety assumption is the conservative case, with changes in the size distribution in the counterfactual being larger when we allow for love of variety. Furthermore, when allowing for love of variety, the results become more sensitive to the choice of q , the quality index of the lowest quality level 1 (notethatgivenq ,allsubsequentqualityindexesaregivenbytherecursionq =q +∆). 1 n n−1 Table13showshowmuchofthecross-statedifferencesinshareofemploymentinsmallplantsis explainedbythemodelfordifferentvaluesofη andq . Thefirstrowandfirstcolumncorresponds 1 to the baseline specification, with η = 1 (no love of variety) and q = 1. As mentioned in σ−1 1 Section 5.1, when varying productivity and supply of skill to match the differences in per-capita incomesacrossstates,themodelexplains43.1percentofthedifferenceintheshareofemployment insmallplantsascomparedtothedata. Now consider the model with love of variety (η =0). When allowing for love of variety, all 53

other parameters are recalibrate to match the same moments as in the baseline. We then run the same counterfactual exercises as in Section 5.1. As reported in Table 13, in the case with love of variety, the model can explain 71.2 percent of the differences in size distribution between the rich andpoorstates. Why is it that in the case with love of variety, the model generates bigger changes in the size distributioninthecounterfactual? Thereasonisthatinthecasewithloveofvariety,relativeprices of different quality levels change in the counterfactual, due to changes in the relative varieties of the different qualities. In particular, the CES price index (the price charged by the final producer totheconsumer)forqualityq isgivenby n ˆ (cid:18) (cid:19) 1 P =M η− σ− 1 1 (p(A,q ))1−σg (A)dA 1−σ ∀q∈Q. qn qn i n qn i q i Inthebaselinespecification,becauseη = 1 ,thepriceindexforq wasindependentoftheof σ−1 n the number of varieties M . However, when η =0, the CES price index of a quality level, P , is qn qn inversely related to the number of varieties of that quality (M ) available in the economy. In the qn counterfactual, as income levels increase, demand shifts towards higher quality, and this induces more entrants of the higher quality levels. The increase in number of varieties of high quality intermediate producers causes the relative price of high quality goods to fall in the counterfactual when η = 0. This causes a further shift in demand towards high quality which in turn causes more entry into higher quality goods. The additional increase in demand for high quality which acts through relative price changes due to change in number of varieties does not occur in the baseline specification when η = 1 . Hence, the change in size distribution in the counterfactual σ−1 in the baseline specification is less than in the case with love of variety. In effect, the baseline specificationfocusesattentiononthechangesindemandcausedbychangesinincomelevelsalone. It abstracts away from any changes in relative prices caused by changes in number of varieties in thecounterfactual. Furthermore, when allowing for love of variety, the change in the size distribution in the counterfactual, becomes more sensitive to the choice of q , the quality index for the lowest quality 1 level. When q is set to 0.1 and η =0, the model counterfactual explains only 53.1 percent of the 1 54

difference in size distribution as opposed to 71.2 percent when q =1. As shown in Section 3.1, 1 theshareofhouseholdswithwagewwhochoosequalitylevelq isgivenby n eaqn (cid:16) w (cid:17)qn ρ(q |w)= Pqn ∀q ∈Q. n (cid:16) (cid:17)q n ∑ N eaqi w i i=1 Pqi As P is raised to the power q in the numerator, the absolute levels of q approximately deterqn n n mine the own price elasticity of demand for a quality level. Lower absolute levels of the quality indexesimplythatdemandislesssensitivetochangesinrelativeprices(oftheCESpriceindexes). Therefore, a lower value for q (which translates into lower values for all the quality indexes) 1 makesthemodellesssensitivetothechangesinrelativepricesinducedbychangesinvarieties. 6. Conclusion The size distribution in developing countries usually has a thick left tail compared to developed countries. The same holds across Indian states, with richer states usually having a much smaller share of their manufacturing employment in small plants. This paper explores the hypothesis that this income-size relationship arises from the fact that low income countries and states have high demand for low quality products which can be produced efficiently in small plants. We find compelling consumer and producer evidence that is consistent with this hypothesis. We show that richer households buy higher price goods and larger plants produce more expensive products (and usemoreexpensiveinputs). Finally,wedevelopamodelthatfeaturesnon-homotheticpreferences with respect to quality and is calibrated to match the cross-sectional empirical findings. Our calibrated model indicates that up to 41 percent of the cross-state variation seen in the left tail of manufacturingplantsinIndiacanbeexplainedbythemodel. A key simplification in our model is the assumption that product prices are determined entirely by input costs—directly following from our model of free entry and zero profits for final producersandmonopolisticcompetitionwithconstantmarkupsforintermediateproducers-—abstracting from the possibility that consumers may pay explicitly for quality. This assumption allows us to focus sharply on how technology differences impact production of quality and firm size. How- 55

ever,italsolimitsthemodel’sabilitytocapturesettingswherefirmscanchargequality-dependent markupsorwherepricesreflectconsumerwillingnesstopay. Weviewthisasanimportantavenue for future research. Extending the model to allow interaction between the demand-side for quality and firm pricing decisions would allow for a richer mapping between quality, price, and firm scale—potentiallysheddingfurtherlightonthevariationobservedacrossindustriesandcountries. To sum, this paper suggests that a large part of the differences in size distribution that we see across countries and states is a natural consequence of the low levels of income in developing countries and is not caused by policies which discriminate against large productive plants in favor of small unproductive plants. The presence of small plants in developing countries should not be viewedasoriginatingnecessarilyfrompolicyfailures. 56

References M. Aguiar and E. Hurst. Life-cycle prices and production. American Economic Review, 97(5):1533–1559, December 2007. URL http://ideas.repec.org/a/aea/aecrev/ v97y2007i5p1533-1559.html. U. Akcigit, H. Alp, and M. Peters. Lack of selection and limits to delegation: firm dynamics in developingcountries. AmericanEconomicReview,111(1):231–275,2021. L. Alfaro, A. Charlton, and F. Kanczuk. Plant-size distribution and cross-country income differences. In NBER International Seminar on Macroeconomics 2008, NBER Chapters, pages 243– 272. National Bureau of Economic Research, Inc, April 2009. URL http://ideas.repec. org/h/nbr/nberch/8244.html. D. Atkin and D. Donaldson. Who’s getting globalized? The size and nature of intranational trade costs. Technicalreport,YaleUniversity,2012. D. Atkin, A. K. Khandelwal, and A. Osman. Exporting and firm performance: Evidence from a randomizedexperiment. Thequarterlyjournalofeconomics,132(2):551–615,2017. O.P.AttanasioandC.Frayne. Dothepoorpaymore? Technicalreport,January2006. A. Banerji and S. Jain. Quality dualism. Journal of Development Economics, 84(1):234–250, September 2007. URL http://ideas.repec.org/a/eee/deveco/v84y2007i1p234-250. html. L.BarseghyanandR.DiCecio. Entrycosts,industrystructure,andcross-countryincomeandTFP differences. JournalofEconomicTheory,146(5):1828–1851,2011. V. Bassi, J. H. Lee, A. Peter, T. Porzio, R. Sen, and E. Tugume. Self-employment within the firm. Technicalreport,NationalBureauofEconomicResearch,2023. A. Behar. Directed technical change, the elasticity of substitution and wage inequality in developing countries. Economics Series Working Papers 467, University of Oxford, Department of Economics,Dec2009. URLhttp://ideas.repec.org/p/oxf/wpaper/467.html. 57

P. Bento and D. Restuccia. Misallocation, establishment size, and productivity. American EconomicJournal: Macroeconomics,9(3):267–303,2017. doi: 10.1257/mac.20140137. P. Bento and D. Restuccia. Financial frictions at entry, average firm size, and productivity. B.E. JournalofMacroeconomics,forthcoming. M. Bils and P. J. Klenow. Quantifying quality growth. American Economic Review, 91(4):1006–1030, September 2001. URL http://ideas.repec.org/a/aea/aecrev/ v91y2001i4p1006-1030.html. N. Bloom and J. Van Reenen. Measuring and explaining management practices across firms and countries. ThequarterlyjournalofEconomics,122(4):1351–1408,2007. C. Broda and D. E. Weinstein. Globalization and the gains from variety. The Quarterly Journal ofEconomics,121(2):541–585,May2006. URLhttp://ideas.repec.org/a/tpr/qjecon/ v121y2006i2p541-585.html. A.Chanda. Accountingforbihar’sproductivityrelativetoindia’s: Whatcanwelearnfromrecent developmentsingrowththeory. TechnicalReport11/0759,InternationalGrowthCentre,August 2011. Y. C. Choi, D. Hummels, and C. Xiang. Explaining import quality: The role of the income distribution. Journal of International Economics, 77(2):265–275, April 2009. URL http://ideas.repec.org/a/eee/inecon/v77y2009i2p265-275.html. X. Cirera, D. Comin, and M. Cruz. Bridging the technological divide: Technology adoption by firmsindevelopingcountries. WorldBankPublications,2022. M.Dalgin,D.Mitra,andV.Trindade. Inequality,nonhomotheticpreferences,andtrade: Agravity approach. Southern Economic Journal, 74(3):747–774, January 2008. URL http://ideas. repec.org/a/sej/ancoec/v743y2008p747-774.html. H.DeSoto. Theotherpath. Harper&RowNewYork,1989. 58

A. Deaton and O. Dupriez. Spatial price differences within large countries. Working Papers 1321, Princeton University, Woodrow Wilson School of Public and International Affairs, ResearchPrograminDevelopmentStudies.,July2011. URLhttp://ideas.repec.org/p/pri/ rpdevs/1321.html. Y.Dikhanov. Incomeeffectandurban-ruralpricedifferentialsfromthehouseholdsurveyperspective. Technicalreport,ICPGlobalOffice,2010. S. Djankov, R. L. Porta, F. Lopez-De-Silanes, and A. Shleifer. The regulation of entry. The Quarterly Journal of Economics, 117(1):1–37, February 2002. URL http://ideas.repec. org/a/tpr/qjecon/v117y2002i1p1-37.html. P. Fajgelbaum, G. M. Grossman, and E. Helpman. Income distribution, product quality, and international trade. Journal of Political Economy, 119(4):721 – 765, 2011. URL http: //ideas.repec.org/a/ucp/jpolec/doi10.1086-662628.html. H. Flam and E. Helpman. Vertical product differentiation and north-south trade. American Economic Review, 77(5):810–22, December 1987. URL http://ideas.repec.org/a/aea/ aecrev/v77y1987i5p810-22.html. M. García-Santana and J. Pijoan-Mas. Small scale reservation laws and the misallocation of talent. Working papers, CEMFI, Dec 2010. URL http://ideas.repec.org/p/cmf/wpaper/ wp2010_1010.html. L.Garicano,C.Lelarge,andJ.VanReenen. Firmsizedistortionsandtheproductivitydistribution: EvidencefromFrance. AmericanEconomicReview,106(11):3439–3479,2016. E. Ghani, A. G. Goswami, and W. R. Kerr. Is India’s manufacturing sector moving away from cities? Policy Research Working Paper Series 6271, The World Bank, Nov. 2012. URL http: //ideas.repec.org/p/wbk/wbrwps/6271.html. D.Gollin. Dotaxesonlargefirmsimpedegrowth? EvidencefromGhana. Bulletins7488,UniversityofMinnesota,EconomicDevelopmentCenter,1995. URLhttp://ideas.repec.org/p/ ags/umedbu/7488.html. 59

N.Guner,G.Ventura,andX.Yi. Macroeconomicimplicationsofsize-dependentpolicies. Review of Economic Dynamics, 11(4):721–744, October 2008. URL http://ideas.repec.org/a/ red/issued/07-73.html. J. C. Hallak. Product quality and the direction of trade. Journal of International Economics, 68(1):238–265, January 2006. URL http://ideas.repec.org/a/eee/inecon/ v68y2006i1p238-265.html. J. C. Hallak and J. Sivadasan. Product and process productivity: Implications for quality choice andconditionalexporterpremia. TechnicalReport1,2013. R. Hasan and K. R. Jandoc. The distribution of firm size in India: What can survey data tell us? AsianDevelopmentBankEconomicsWorkingPaperSeries,(213),2010. R. Hillberry and D. Hummels. Trade responses to geographic frictions: A decomposition using micro-data. European Economic Review, 52(3):527–550, April 2008. URL http://ideas. repec.org/a/eee/eecrev/v52y2008i3p527-550.html. C.-T. Hsieh and P. J. Klenow. The life cycle of plants in india and mexico. The Quarterly Journal ofEconomics,129(3):1035–1084,2014. C.-T. Hsieh and B. A. Olken. The missing ‘missing middle’. Journal of Economic Perspectives, 28(3):89–108,2014. D. Hummels and P. J. Klenow. The variety and quality of a nation’s exports. American Economic Review, 95(3):704–723, June 2005. URL http://ideas.repec.org/a/aea/aecrev/ v95y2005i3p704-723.html. L. Iacovone and B. Javorcik. Getting ready: Preparation for exporting. CEPR Discussion Papers 8926, C.E.P.R. Discussion Papers, Apr. 2012. URL http://ideas.repec.org/p/cpr/ ceprdp/8926.html. M. Kugler and E. Verhoogen. Prices, plant size, and product quality. Review of Economic Studies, 79(1):307–339, 2012. URL http://ideas.repec.org/a/oup/restud/ v79y2012i1p307-339.html. 60

R.LaPortaandA.Shleifer. Theunofficialeconomyandeconomicdevelopment. NBERWorking Papers 14520, National Bureau of Economic Research, Inc, Dec. 2008. URL http://ideas. repec.org/p/nbr/nberwo/14520.html. D. Lagakos. Explaining cross-country productivity differences in retail trade. journal of political economy,124(2):579–620,2016. I. Little, D. Mazumdar, and J. M. Page Jr. Small Manufacturing Enterprises: A Comparative AnalysisofIndiaandOtherEconomies. NY:OxfordU.Press,1987. N.V.Loayza. Theeconomicsoftheinformalsector: Asimplemodelandsomeempiricalevidence from latin america. Carnegie-Rochester Conference Series on Public Policy, 45(0):129 – 162, 1996. ISSN 0167-2231. doi: http://dx.doi.org/10.1016/S0167-2231(96)00021-8. URL http: //www.sciencedirect.com/science/article/pii/S0167223196000218. N. V. Loayza, A. M. Oviedo, and L. Serven. The impact of regulation on growth and informality - cross-country evidence. Policy Research Working Paper Series 3623, The World Bank, May 2005. URLhttp://ideas.repec.org/p/wbk/wbrwps/3623.html. N. V. Loayza, L. Serven, and N. Sugawara. Informality in latin america and the caribbean. Policy Research Working Paper Series 4888, The World Bank, Mar. 2009. URL http://ideas. repec.org/p/wbk/wbrwps/4888.html. B. R. Mandel. Heterogeneous firms and import quality: evidence from transaction-level prices. InternationalFinanceDiscussionPapers991,BoardofGovernorsoftheFederalReserveSystem (U.S.),2010. URLhttp://ideas.repec.org/p/fip/fedgif/991.html. K. Manova and Z. Zhang. Export prices across firms and destinations. The Quarterly Journal of Economics, 127(1):379–436, 2012. URL http://ideas.repec.org/a/oup/qjecon/ v127y2012i1p379-436.html. D. F. McFadden. Conditional Logit Analysis of Qualitative Choice Behavior, pages 105–142. AcademicPress: NewYork,1974. 61

D. Mitra and V. Trindade. Inequality and trade. Canadian Journal of Economics, 38(4):1253–1271, November 2005. URL http://ideas.repec.org/a/cje/issued/ v38y2005i4p1253-1271.html. S. Nataraj. The impact of trade liberalization on productivity: Evidence from India’s formal and informal manufacturing sectors. Journal of International Economics, 85(2):292–301, 2011. URLhttp://ideas.repec.org/a/eee/inecon/v85y2011i2p292-301.html. M. Poschke. The firm size distribution across countries and skill-biased change in entrepreneurial technology. AmericanEconomicJournal: Macroeconomics,10(3):1–41,2018. D.RestucciaandR.Rogerson. Misallocationandproductivity. ReviewofEconomicDynamics,16 (1):1–10,January2013. URLhttp://ideas.repec.org/a/red/issued/13-0.html. P.K.Schott. Across-productversuswithin-productspecializationininternationaltrade. TheQuarterly Journal of Economics, 119(2):646–677, May 2004. URL http://ideas.repec.org/a/ tpr/qjecon/v119y2004i2p646-677.html. D. Scur, S. Ohlmacher, J. V. Reenen, M. Bennedsen, N. Bloom, A. Choudhary, L. Foster, J. Groenewegen,A.Grover,S.Hardeman,L.Iacovone,R.Kambayashi,M.-C.Laible,R.Lemos,H.Li, A. Linarello, M. Maliranta, D. Medvedev, C. Meng, J. M. Touya, N. Mandirola, R. Ohlsbom, A. Ohyama, M. Patnaik, M. Pereira-Lopez, R. Sadun, T. Senga, F. Qian, and F. Zimmermann. The international empirics of management. Proceedings of the National Academy of Sciences, 121(45):e2412205121,2024. A.ShakedandJ.Sutton. Productdifferentiationandindustrialstructure. TheJournalofIndustrial Economics,pages131–146,1987. J. Sutton. Sunk costs and market structure: Price competition, advertising, and the evolution of concentration. MITpress,1991. K.Train. DiscreteChoiceMethodswithSimulation. CambridgeUniversityPress,2009. J. R. Tybout. Manufacturing firms in developing countries: How well do they do, and why? Journal of Economic Literature, 38(1):11–44, March 2000. URL http://ideas.repec.org/ a/aea/jeclit/v38y2000i1p11-44.html. 62

G. Ulyssea. Firms, informality, and development: Theory and evidence from brazil. American EconomicReview,108(8):2015–2047,2018. E. Verhoogen. Firm-level upgrading in developing countries. Journal of Economic Literature, 61 (4):1410–1464,2023. E. A. Verhoogen. Trade, quality upgrading, and wage inequality in the mexican manufacturing sector. The Quarterly Journal of Economics, 123(2):489–530, 05 2008. URL http://ideas. repec.org/a/tpr/qjecon/v123y2008i2p489-530.html. 63

Appendix A. Data ThispaperusesdatafromthefollowingsurveysfromIndia: 1. AnnualSurveyofIndustriesof2005-06,1989-90,1994-95,2000-01,and2009-10 2. SurveyofUnorganizedManufacturingof2005-06,1989-90,1994-95,2000-01,and2010-11 3. ConsumerExpenditureSurveyofIndiaof2003and2004-05 4. Employment-UnemploymentSurveyofIndiaof2004-05 Thissectionprovidessomemoredetailsregardingthesesurveys. Italsoprovidesabriefdescription oftheCountyBusinessDatabaseoftheUS. A.1. Annual Survey of Industries TheAnnualSurveyofIndustries(ASI)isconductedbytheCentralStatisticsOfficeoftheGovernment of India every year. It covers all factories registered under Sections 2m(i) and 2m(ii) of the FactoriesAct,1948,thatis,thosefactoriesemployingtenormoreworkersusingpower,andthose employingtwentyormoreworkerswithoutusingpower. The paper primarily uses data from the 2005-06 ASI (as the SUM was also conducted in 2005- 06)whichreportsdataforthefinancialyearendingMarch2006. Thegeographicalcoverageofthe 2005-06 ASI was all of India except the states of Arunachal Pradesh, Mizoram, and Sikkim and theUnionTerritoryofLakshadweep. ASI 2005-06 uses the National Industrial Classification (NIC) 2004 (which is closely based on International Standard of Industrial Classification (ISIC) Rev 3.1) to classify economic activity. For all the analysis done in the paper, we restrict the sample to plants which report a 2-digit NIC 64

between 15 to 36 as this constitutes the manufacturing sector and matches the coverage of the SUM.Furthermore,forsomeofthefigures,attentionisrestrictedto15largeIndianstates.39 Themainvariablesusedinthepaperaretotalemploymentleveloftheplant,anddetailsregardingtheproductsproducedandinputsused(quantitiesandrupeevalues)byeachplant. Plants report the average number of employees working in the plant for seven different categories,namely: maleworkersemployeddirectly,femaleworkersemployeddirectly,childworkers employeddirectly,workersemployedthroughcontractors,supervisoryandmanagerialstaff,other employees, and unpaid family workers. The size of the plant (total employment) is defined as the sumacrossallthesecategories. All plants report the output they produce using a standardized classification of products called the ASICC product classification. The ASICC has about 5,500 product categories. Plants can report up to ten main products produced in terms of this ASICC classification. Each product category has an associated standardized unit (kilograms, tonnes, numbers, etc) in terms of which thequantityproducedistobereported. Plantsalsoreportthetotalvalueofproductionbeforetaxes and distribution costs for each product which can be combined with the information on quantity produced to infer per-unit prices. As all plants are supposed to report quantities in standardized units,thepricesinferredshouldbecomparableforallplantsproducingthesameproduct. However, there seems to be some misreporting in units and this issue is discussed further in Section F. The same commodity classification and units are used to report the quantity and value of materials inputsused. Inadditiontothe2005-06ASI,Section5.2alsousesdataonlevelofemploymentofeachplant from four other years of the ASI, namely 1989-90, 1994-95, 2000-01, and 2009-10. As with the 2005-06survey,thebroadestdefinitionofemploymentwasusedforallyearswhichincludedparttime workers and unpaid workers. Arunachal Pradesh, Mizoram, Sikkim, and Lakshadweep were excluded from the sample for all years as these states were not covered in the ASI for many of the waves. Different years of the survey used different industrial classifications (NIC 1987, NIC 39The main states included are: Andhra Pradesh, Bihar, Gujarat, Haryana, Himachal Pradesh, Karnataka, Kerala, MadhyaPradesh,Maharashtra,Orissa,Punjab,Rajasthan,TamilNadu,UttarPradesh,andWestBengal. ThreeIndian statesweresplitintotwoin2000. Inordertomaintaincomparabilitywithsomeofthetime-seriesresultsinSection 5.2andD,thepre-splitdefinitionofstatesisusedthroughoutthepaper. 65

1998, NIC 2004, and NIC 2008). We created a concordance across these different classifications and only industries which corresponded to 2-digit NIC 2004 between 15 and 36 were included in thesample. Table 17 reports the number of observations, estimated number of establishments (using samplingweightsprovidedbytheASI),andtheestimatedtotalnumberofworkersemployedbasedon theASIforallfiveyearsthatareusedinthepaper. More details about the ASI can be found on the website of the Ministry of Statistics and ProgrammeImplementation,GovernmentofIndia(http://mospi.nic.in/). A.2. Survey of Unorganized Manufacturing The Survey of Unorganized Manufacturing (SUM) is conducted by the National Sample Survey Office (NSS) of India. The coverage of the survey includes all manufacturing enterprises not registeredunderSections2m(i)and2m(ii)oftheFactoriesAct,1948. TheSUMisusuallyconducted every five years. The last five waves were done in 1989-90, 1994-95, 2000-01, 2005-06, and 2010-11. The paper primarily uses data from the 2005-06 SUM (62nd Round of the NSS). The survey periodwasfromJuly2005toJune2006.40 ThegeographicalcoverageofthesurveywascomprehensiveandincludedallStatesandUnion- Territories of India, with only Leh and Kargil districts of Jammu and Kashmir and a few remote villages in Nagaland and Andaman and Nicobar Islands being excluded. The states of Arunachal Pradesh, Mizoram, and Sikkim and Union Territory of Lakshadweep were dropped to maintain comparabilitywiththecoverageofthe2005-06ASI. Like the ASI, the SUM 2005-06 uses the National Industrial Classification (NIC) 2004 to classifyeconomicactivity. Foralltheanalysisdoneinthepaper,werestrictthesampletoplantswhich report a 2-digit NIC between 15 to 36. Furthermore, for some of the figures, attention is restricted to15largeIndianstates. Themainvariablesusedinthepaperarethetotalemployment,anddetailsregardingtheproducts 40NotethatthereisathreemonthdifferenceincoverageperiodbetweentheASIandSUM. 66

producedandinputsused(quantitiesandrupeevalues)byeachplant. Plantsreporttheaveragenumberofemployeesworkingintheplantforthereferenceperiodfor which the data is collected (for most plants this was one month). The plants reported the average number of hired workers, working owners, and other workers that they employed on a part-time and full-time basis. Like the ASI, the broadest definition of employment is used with the size of theplant(totalemployment)beingdefinedasthesumacrossallthesecategories. All plants report the output they produce and material inputs consumed using the same standardized classification of products as is used by the ASI plants. Plants can report up to five main products produced in terms of this product classification. However, unlike the ASI, SUM plants canchoosetheunitsinwhichtheyarereportingquantitiesandprices. Forexample,allASIplants which produce matchsticks must report quantities in kilograms. However, different SUM plants reportquantitiesandpricesofmatchsticksindifferentunitsincludingkilograms,tonnes,andnumbers (number of matchsticks). We concord units across the two surveys when combining the two surveys. If the same product is being reported in different units which are simple scalar multiples of each other (kilograms and tonnes for example), then we convert the units so that all quantities and prices are being measured in the same unit i.e., divide quantities and prices of all SUM units which report quantities of matchsticks in tonnes by 1000, to get per kilogram prices which are comparabletoASIprices. However,ifaSUMplantisreportingtheoutputofmatchsticksinnumbers, then it is not possible to make this comparable to the the ASI plants which are reporting in kilograms. Insuchcases,theSUMproductsaretreatedasaseparateproductcategory. In addition to the 2005-06 SUM, Section 5.2 also uses data on level of employment of each plant from four other years of the SUM, namely 1989-90, 1994-95, 2000-01, 2005-06, and 2010- 11. As with the 2005-06 survey, the broadest definition of employment was used for all years whichincludedpart-timeworkersandunpaidworkers. Thesamesamplingonstatesandindustries wasdoneasintheASI. Table 17 reports the sample size, number of establishments (using sampling weights provided by the SUM), and the total number of workers employed based on the SUM for all five years that areusedinthepaper. More details about the SUM can be found on the website of the Ministry of Statistics and Pro- 67

grammeImplementation,GovernmentofIndia(http://mospi.nic.in/). A.3. Consumer Expenditure Surveys The National Sample Survey Office of India (NSS) conducts an annual Consumer Expenditure Surveys (Schedule 1.0) in India. From 1972-73, the NSS started a quinquennial series in which every five years, it conducts a survey with a sample size which is about four times larger than the annualsurvey. The paper uses data mainly from the 2004-05 (61st Round of the NSS) Consumer Expenditure Survey which was part of the quinquennial series and interviewed about 125,000 households. The geographical coverage of the survey was comprehensive and included all States and Union- Territories of India, with only Leh and Kargil districts of Jammu and Kashmir and a few remote villages in Nagaland and Andaman and Nicobar Islands being excluded. The survey period was fromJuly2004toJune2005. The Consumer Expenditure Surveys of 2004-05 asks households to report the value of consumption for 339 different goods. Households report quantities and rupee values separately for 209 goods, which can be used to compute prices for these goods. 156 of these 209 goods are food items, 10 fall under the “fuel and light” category, another 24 are clothing and footwear, while the remainingaredurables. Forfooditems,householdsreportconsumptionoutofhomeproduction(quantitiesandimputed rupee values) and total consumption (which includes home production and market purchases). Thepricecomputeddividestotalvalueofconsumptionbytotalquantityconsumed,thusaveraging acrosshomeandmarketconsumption. The reference period for consumption of all food items is 30 days, i.e., households report quantity consumed and rupee values for food consumption for the last 30 days. For clothing and footwear categories, households report consumption for a reference period of 30 days as well as 365 days. The 365 day reference period for these categories is used as many households report zeropurchasesfortheseitemsforthe30dayreferenceperiodbutpositiveamountsforthe365day referenceperiod. 68

Table 14 uses data from the 2003 (59th Round) Consumer Expenditure Survey which was not part of the quinquennial series and interviewed about 41,000 households. The geographical coverage of the 2003 survey was similar to the 2004-05 survey. The survey period was from January 2003 to December 2003. The consumption items recorded across the two surveys were also very similarwithonlyafewminordifferences. Table 18 reports some summary statistics for the Consumer Expenditure Survey of 2004-05. It reports the number of items and share of expenditure for five broad expenditure heading and also the share of expenditure within the heading for which prices could be computed. The summary statisticsforthe2003surveyareverysimilarandarenotreported. More details about the dataset can be found on the website of the Ministry of Statistics and ProgrammeImplementation,GovernmentofIndia(http://mospi.nic.in). A.4. Employment-Unemployment Survey The National Sample Survey Office of India (NSS) conducts an Employment-Unemployment Survey (Schedule 10.0) as part of its quinquennial series. This paper uses the Employment- Unemployment Survey of 2004-05 (61st Round of the NSS). The geographical coverage of the survey was comprehensive and included all States and Union-Territories of India, with only Leh and Kargil districts of Jammu and Kashmir and a few remote villages in Nagaland and Andaman and Nicobar Islands being excluded. The survey period was from July 2004 to June 2005. In this surveyitinterviewsabout125,000households(about600,000individuals). The survey asks all the individuals in the household to report demographic characteristics like age, education etc. It also asks individuals to report the main industry in which they work, the size of establishment in which they work, and the wage they earned in the last week. To maintain comparabilitywiththeproductionsurveys,onlyindividualswhoreporta2-digitNIC2004between 15and36areused. The main variables used from this survey are the education level of individuals and the size categoryoftheestablishmentinwhichtheywork. The survey asks individuals to report the level of general education that they have achieved. 69

The possible responses are: illiterate, literate but not through formal schooling, primary, middle, secondary,highersecondary,diploma/certificatecourse,graduate,andpostgraduateorabove. For thepurposeofthemodel,apersonwasdefinedasskilledifheorshehadfinishedatleastsecondary education(Gradeten). Individuals were also asked to report the size category of the establishment in which they worked. They could report one of the following options: establishment of size less than 6, between6and9,between10and19,20orgreater,andunknownsize The calibration of the wage premium in Section 4.1 also makes use of wage data from this survey. MoredetailsregardingtheconstructionofthisvariablealongwiththeMincerianregression resultsareprovidedinSectionG. More details about the Employment-Unemployment Survey can be found on the website of the MinistryofStatisticsandProgrammeImplementation,GovernmentofIndia(http://mospi.nic.in/). A.5. County Business Patterns Database (US) The County Business Patterns Database maintained by the US Census Bureau provides level of employmentforeach6-digitNAICS foreachUScounty. Theemployment levelisasontheweek ofMarch12th ofthatyear. The paper uses the 2006 release of the data. For many industry-county cells the exact level of employment is not reported. Instead, the dataset reports an employment size class for that cell. In these cases, the employment in the cell is assigned the midpoint of the size class reported. For example,ifaNAICS-Countycellreportsemploymentinthesizeclass‘B’whichrepresents20-99 employees,thenthecellisassignedanemploymentlevelof60. Thedatacanbedownloadedfromhttp://www.census.gov/econ/cbp/. 70

Figure14: Distributionofpricesandemployeesforfirmsthatproduce“FinishedCottonCloth” Thisfigureshowsthehistogramofaveragepricecharged(left)andthenumberofemployees(right)byeachfirmthat produces“finishedcottoncloth”intheAnnualSurveyofIndustries. B. Case study: Finished Cotton Cloth To build intuition for why larger firms, on average, are more likely to sell higher priced products, wepresentadeepdiveonfirmsthatproduce“finishedcottoncloth.” Toensurecomparabilityanda moredetailedanalysis,werestrictattentiontothefirmsthatarereportedintheASI.First,wedetail the observed variation in this industry; second, establish that larger firms charge higher prices, on average, and finally, explain why larger firms have a comparative advantage in producing more expensivegoods. Thereissubstantialvariationinthepricechargedforameteroffinishedcottoncloth(leftpanel of figure 14). The median firm charges an average of 51 Indian rupees, but a firm at the 10th percentile charges only 22 rupees (equivalent to US $0.50 in 2006), and the 90th percentile firm charges 140 rupees (equivalent to US $3.16)—a difference of more than 500 percent. Alongside this variation in the price charged, the variation in firm size is substantially more (right panel of figure14). Themedianfirmemploys197peoplebutthereissubstantialmassoffirmsthatarevery small with 20 percent of firms with less than 35 people.41 At the same time, there are two very largefirmswithcloseto4000and5000employeeseach. Consistent with the main argument in our paper, we find substantial correlation between these 41RecallthattheASIomitsthesmallestfirmssincetheASIonlyincludesmanufacturingplantsemployingtwenty ormoreworkersandnotusingelectricityoremployingtenormoreworkersandusingelectricity. 71

two measures. Figure (15) shows that, on average, larger cloth firms charger a higher price (we plot the residuals for the natural logarithm of employees in a firm and for the natural logarithm of product price after controlling for the interaction of state, and rural fixed effects). Moreover, consistentwithourearlierargument,onaverage,largerfirmsalsoemploysignificantlymorecapital (figure16)suggestingthathigherqualityproductsrequiremorecapitalinvestment. Figure15: Finishedcottoncloth: Relationshipbetweensizeoffirmandpricescharged This figure shows the binned scatterplot and line of best fit for the relationship between the price charged and the numberofemployeesineachfirminthefinishedcottonclothindustriesusingtheASI.Weplottheresidualsforlog (“price”)andlog(“laboremployed”)aftercontrollingforstateandruralfixedeffects. To provide more color beyond what is available in the survey data, we can examine the largest cloth manufacturing clothing companies in India. The largest primarily textile company with financialinformationonDun&Bradstreet(acommercialdataproviderwithalargeIndianpresence) isVardhmanTextilesLimitedwithmorethan$1billioninsales(2024). Giventhesizeofthecompany, it retail products at many different price points. However, a close examination of a single lineofproducts—men’sformalshirts—suggestsahigher-endproductwithaback-of-the-envelope calculation for the average shirt price of $8.40 (which is likely the sale price to other intermedi- 72

Figure16: Finishedcottoncloth: Relationshipbetweensizeoffirmandcapitalstock This figure shows the binned scatterplot and line of best fit for the relationship between the price charged and the numberofemployeesineachfirminthefinishedcottonclothindustriesusingtheASI.Weplottheresidualsforlog (“price”)andlog(“laboremployed”)aftercontrollingforstateandruralfixedeffects. aries, the final retail price will be considerably higher).42 Other evidence that is consistent with a higher-endproduct: • Large capital expenditures, the latest available audited accounts lists US $450 million in fixed assets (“property, plant, and machinery”). Therefore, it has substantial capital relative to total sales (over 30 percent). Moreover, the machinery suggests high-quality equipment withmanyofthesemachinesimportedfrommoreeconomicallyadvancedcountries. • Thecompanyboastsofseveraltextilecertificationsonitswebsite,suchasOEKO-TexStandard100. ThisisconsistentwiththeresultsinVerhoogen[2008]thatproxiesproductquality usinginternationalcertifications.43 42Specifically,the2023-2024annualreportdetailsrevenueofINR1.12billionand1.5milliongarmentsproduced. 43The OEKO-Tex Standard 100 is a label for textiles tested for harmful substances, https://www.oekotex.com/en/our-standards/oeko-tex-standard-100. Items bearing the STANDARD 100 label is certified as having passedsafetytestsforthepresenceofharmfulsubstances. 73

• Customersincludehigh-endglobalbrands(suchas“UnitedColorsofBenetton”)andIndian domesticbrandssuchas“Raymond.” C. Could higher opportunity cost cause richer households to pay more for the same good? In section (2.1) we showed that richer households buy goods at a higher unit price which is consistentwiththehypothesisthattheybuyhigherqualitygoods. However,asdocumentedbyAguiar and Hurst [2007], households might be paying different prices for similar goods because households with higher opportunity cost of time tend to shop around less for lower prices. If richer households have a higher opportunity cost of time, then the findings in Table 1 might be a result of less time spent shopping by richer households and not because of purchase of higher quality goods.44 The 2003 Consumer Expenditure Survey asked each individual in the household the main activity they were engaged in (whether they were employed, studying, attending to domestic duties, retired etc).45 We use this to construct a proxy variable which takes value 1 if the household has at least one member between the age of 15 and 70 who is only attending to domestic duties or is retired, and 0 otherwise.46 We interpret households with a non-worker present as households with low opportunity cost of time and include this variable as a control in the regressions. Column 1 of Table 14 repeats the regression from Column 1 of Table 1, but with the 2003 data instead of the 2004-05 data. Column 2 of Table 14 now adds the measure of “non-worker present” as an additional control. Although the coefficient on the “non-worker present” variable is positive, the key point is that the coefficient of per-capita expenditure does not change substantially. Column 44For developing countries, there is evidence that poorer households might in fact be paying more for the same productasopposedtorichhouseholdswhichwouldimplythattheestimatesforβ arealowerboundforthequalityincomerelation. Forexample,AttanasioandFrayne[2006]findthatpoorpeopleinruralColumbiaarelesslikelyto availofbulkdiscountsandthusenduppayingmoreforthesameproductascomparedtoricherhouseholds. 45Unfortunately, the 2004-05 Consumer Expenditure Survey does not ask this question so this exercise cannot be conductedusingthesamedatausedinTable1. The2003surveyhasonlyonefourththenumberofhouseholdsasthe 2004-05survey. However,thepointestimatesfortheelasticityofpricewithrespecttoper-capitaexpenditure(β)are quitesimilaracrossthetwosurveys. 46Table 19 in the appendix lists the possible responses for the question regarding main activity of the individual. Peoplewhoreportedcodes92,93,94,or97wereclassifiedasnon-workers. 74

Table14: HouseholdRegressions: ControllingforOpportunityCostofTime DependentVariable: log(price) (1) (2) (3) (4) (5) (6) log(per-capitaexpenditure) 0.102*** 0.102*** 0.094*** 0.105*** 0.104*** 0.099*** (0.0010) (0.0010) (0.0015) (0.0029) (0.0029) (0.0036) non-workerpresent 0.020*** -0.059*** 0.017*** -0.065** (0.0011) (0.0113) (0.0027) (0.0318) (non-workerpresent)*pce 0.012*** 0.011** (0.0017) (0.0045) HouseholdSize All All All 1and2 1and2 1and2 Observations 1,822,762 1,822,762 1,822,762 219,390 219,390 219,390 Numberofproducts 169 169 169 169 169 169 Clusters 41,013 41,013 41,013 6,161 6,161 6,161 Notes: The data is from the Consumer Expenditure Survey of 2003. Column 1 reports results for the regression of log of price paid by households for different goods on log of per-capita expenditure (replicating Column 1 of Table 1). Column 2 includes a control for opportunity cost of time, namely a variable which takes value 1 if there is at leastonenon-workingadultinthehousehold. Column3alsoincludestheinteractionofthisvariablewithper-capita expenditure. Columns4,5,and6repeatthespecificationsin1,2,and3butrestrictthesampletohouseholdsofsize 1 and 2 only. Regressions include the triple interaction of good interacted with state and rural-urban fixed effect. Standarderrorsareclusteredatthehouseholdlevel. ***p<0.01,**p<0.05. 3 also includes the interaction of the “non-worker present” variable with per-capita expenditure and this does not change the results substantially either. Columns 4, 5, and 6 repeat the regressions from columns 1, 2, and 3 respectively, but restrict the sample to include households with one or two members only. This controls for the fact that larger households are more likely to have non-workingadults. Again,thecoefficientonper-capitaexpendituredoesnotchangesubstantially whenincludingthe“non-workerpresent”variableasacontrol. D. Inter-State Trade The model presented in the main paper implicitly assumed that each state in India can be treated asaclosedeconomyandthatdifferencesinincomelevelsacrossstatestranslateintodifferencesin demandandinthesizedistributionatthestatelevel. Howwouldthepossibilityofinter-statetrade affectthehypothesispresentedinthepaper? 75

A potential confounding effect of inter-state trade could come through the location choice of large plants. For example, if the richer states are more suited for operating large plants (due to availability of skilled labor, better labor laws etc), then all the larger plants might choose to locate in these states and ship their goods to the poor states. In this case, the fact that richer states have a smaller share of employment in small plants would not reflect differences in demand across states butratherjustthespatiallocationchoiceoflargeplants. Toaddressthisconcern,itwouldbeidealtohaveameasureofinter-statetradeflows(similarto theCommodityFlowSurveyintheUS)toseehowimportantthischannelcouldbe. Unfortunately, data on extent of inter-state trade is not collected in India. Here we provide indirect evidence to suggestthatinter-statetradeisnottheprincipaldriverofcross-stateemploymentdifferences. Firstly,transportationcostsindevelopingcountriesareoftenveryhighwhichmakesitharderfor plants to transport goods over large distances to poorer states. Atkin and Donaldson [2012] show thatintranationaltransportationcostsintwoAfricancountriesareseventofifteentimeslargerthan similar estimates for the US. Furthermore, Hillberry and Hummels [2008] show that even in the US, manufacturing production is extremely localized with local shipments volumes being three times larger than shipments to more distant locations. This suggests that local demand is likely to be an predominant determinant of the the size distribution in any region, especially in developing countries. Furthermore, if inter-state trade is driving the cross-state employment differences, then we would expect more tradable industries to exhibit larger differences in share of employment in small plants across states as compared to less tradable industries. To test this fact, we construct two measures of tradability (within manufacturing) at the 3-digit level of the National Industrial Classification(NIC)of2004.47 Theseare: 1. Herfindahl index of geographical concentration in the US: The County Business Patterns Databaseof2005releasedbytheUnitedStatesCensusBureauprovidesinformationregarding the number of people working in each 6-digit industry of the North American Industry 47Economic activity in India is classified according to the National Industrial Classification (NIC) which closely follows the United Nation’s International Standard Industrial Classification (ISIC). Details regarding different NIC revisionsandtheconcordanceusedbetweenthemaregiveninAppendixE.3. 76

Classification System (NAICS) for each county in the US.48 As the tradability index is to be applied to the Indian industry classification, we first create a concordance from 6-digit NAICSto3-digitNICandthenconstructaHerfindahlIndex(H-index)ofgeographicalconcentrationofeach3-digitNICacrossUScounties.49 TheH-indexisdefinedas C H = ∑ (cid:0) shL (cid:1)2 , i i,c c=1 where ‘i’ indexes industry (according to NIC), ‘c’ indexed counties, and shL represents i,c the share of industry ‘i’ employment which is in county ‘c’. The H-index for industry ‘i’ is simplythesumacrosscountiesofthesquareoftheshareoftheindustriesemploymentwhich ispresentincounty‘c’. Theindustrieswhicharehighlyconcentratedinafewcountiesinthe US (have a high value for Herfindahl index) are considered to be tradable industries while industries which have employment spread over lots of counties (have a low value for the Herfindahl index) are considered non-tradable industries. This measure for tradability of an industrybasedonUSlevelsofconcentrationisappliedtoIndia. 2. Degree of international trade in India: For each 3-digit NIC in the manufacturing sector, we constructameasureofthedegreeofinternationaltradecarriedoutintheindustryasashare of domestic production. In particular, we define this measure of international trade as the exports plus imports in that industry as a share of gross production of that industry carried out by domestic plants in 2005-06. The data for exports and imports for India is taken from the website of the Department of Commerce, Government of India.50 The imports andexportsdataisnotattheindustrylevelbutratherclassifiedaccordingtotheHarmonized CommodityDescriptionandCodingSystem(HS)productclassification. Thisisconvertedto 3-digitNICusingtheproductstoindustryconcordancedevelopedbyWorldIntegratedTrade Solutions (WITS).51 The data on gross domestic production for each industry is computed 48Thedatacanbefoundathttp://www.census.gov/econ/cbp/. Theexactnumberofpeopleinmanyindustry-county cellsismasked. Instead,thedatasetreportsanemploymentsizeclassforthatcell. Inthesecases,theemploymentin thecellisassignedthemidpointofthesizeclassreported. 49Theconcordancefrom6-digitNAICSand3-digitISICRev3.1wasbasedontheCensusBureau’sconcordance file available at http://www.census.gov/eos/www/naics/concordances/concordances.html. ISIC Rev 3.1 to NIC 2004 isaonetoonecorrespondenceatthe3-digitlevel. AppendixE.1hasmoredetailsregardingtheconcordance. 50Thedataisavailablefromhttp://commerce.nic.in/eidb/default.asp. 51WITSisbasedonacollaborationoftheWorldBankwithUNCTAD,WTOandotherinternationalorganization 77

Table15: SizeIncomeRelationAcrossStatesforTradablesvs. Non-tradables DependentVariable: shareofemploymentin<=5inindustry‘i’,state‘s’,time‘t’ (1) (2) (3) (4) log(per-capitaSNDP)*tradability 0.068* 0.052 -0.010 0.000 (0.0351) (0.0394) (0.0469) (0.0498) Index H-index H-index Exp-Imp Exp-Imp Cutoff Median Quartile Median Quartile Observations 3,885 1,826 3,899 1,959 Notes: The data is from five rounds of the ASI and SUM. The table reports regression results for the share of employmentinplantsofsize5orlessinindustry‘i’instate‘s’attime‘t’onlogper-capitastateNDPinteractedwith a dummy which takes value 1 if industry ‘i’ is classified as a tradable industry. Column 1 classifies an industry as tradableiftheHerfindahlIndexacrossUScountiesfortheindustrywasabovethemedianofHerfindahlIndexes,and non-tradableifitwasbelowthemedian. Column2usestopandbottomquartilesoftheHerfindahlIndexascutoffs. Column 3 and 4 use the tradability index based on Indian exports and imports and uses the median and the top and bottomquartilesascutoffsrespectively.Allregressionsincludefixedeffectsforindustryinteractedwithtimeandstate interacted with time. Each observation is weighted by the share of observations in the state-industry cell out of the totalobservationsintheASIandSUMcombinedforthegivenyear. Standarderrorsareclusteredatthestatelevel. *p<0.1. bycombiningtheASIandtheSUM.Industriesinwhichinternationaltradeisalargepercent ofdomesticproductionareconsideredtobemoretradable. Table 21 in the appendix lists the 3-digit industries which lie above and below the median value of the two indexes of tradability. The two measures of tradability are weakly positively correlated withtherankcorrelationcoefficientbetweenthembeing0.25. Werunregressionsoftheform sd =α +α +γln(SNDP )∗tradabilty +ε (14) i,s,t i,t s,t s,t i i,s,t where sd is the share of employment in plants of size five or less in industry ‘i’ in state ‘s’ at i,s,t time‘t’,SNDP istheper-capitaNDPofstate‘s’attime‘t’,andtradabilty isadummyvariable s,t i whichtakesvalue1ifanindustryisclassifiedastradable. α representsfixedeffectsforindustry i,t interactedwithtimeanditcontrolsforthefactthatdifferentindustriesmighthavedifferentaverage associatedwithinternationaltradedata. MoredetailsregardingtheconcordancecanbefoundinAppendixE.2. 78

levels for the share of employment in small plants. α represents fixed effects for state interacted s,t withtimeandcontrolsforthefactthatrichstatesonaveragehavealowershareofemploymentin smallplants. The coefficient of interest is γ, the coefficient on the interaction of state per-capita income and thetradabilitydummy. Apositiveγ impliesthattherelationshipbetweentheshareofemployment in small plants and log of per-capita income across states is stronger for non-tradables. This is because the share of employment in small plants and per-capita NDP are negatively related and therefore a positive interaction term implies that the slope for tradable industries is less negative comparedtonon-tradables. Therefore,apositivevalueofγ issupportiveoftheviewthatinter-state tradeisnotamajordrivingforcebehindthesizedistributionofplantsacrossstates. Anindustryisclassifiedastradableifthetradabilityindexfortheindustryliesabovethemedian (or in the top quartile) of the index across industries. Data for five waves of the SUM is combined withthecorrespondingyearoftheASI(1989,1994,2000,2005,and2010). Onlythefifteenlarge Indianstatesmentionedinfootnote39areincludedasthesmallerstatesoftenhavenoobservations formanyindustriesasthe3-digitlevel. Table 15 reports results for equation (14) for both the measures of tradability. Each observation isweightedbytheshareofobservationsinthestate-industrycelloutofthetotalobservationsinthe ASI and SUM combined for the given year.52 Column 1 uses the Herfindahl index and classifies an industry as tradable if its Herfindahl Index is above the median value of the Herfindahl Index across industries. The coefficient on the interaction of per-capita NDP and the tradability index is positive and marginally significant at the 10 percent level. Column 2 classifies an industry as tradable if it is in the top quartile in terms of the Herfindahl Index and non-tradable if it is in the bottom quartile. The results are very similar to the first column. Columns 3 and 4 use the median and quartile of the tradability measure based on exports and imports in India. The point estimates ofthecoefficientontheinteractionofper-capitaNDPandthetradabilityindexismuchsmallerin absolutevalueandstatisticallyinsignificant. 52This weighting scheme accounts for the fact that the size distribution variable (dependent variable) for some industry-statepairsisbasedonalotfewerobservationsthanothercells,andarethereforelikelytobemeasuredwith lessprecision. Table22intheappendixreportsresultswhenallobservationsareweightedequally. Table23reports resultswhenindustrialcategorieswhichareresidualcategories(industrycategorieswithdescriptionswhichinclude wordslike“notelsewherecovered”or“others”)areexcluded. 79

TheresultsinTable15suggestthatthesize-incomerelationshipacrossstatesisnotstrongerfor tradableindustriesascomparedtonon-tradableindustries. E. Inter-State Trade: Concordances E.1. NAICS 2002 to NIC 2004 Concordance for Herfindahl Index Section D uses a Herfindahl Index of employment concentration across US counties as a measure oftradabilityforindustriesinIndia. WhiletheUSCountyBusinessPatternsDatabaseusesNAICS toclassifyeconomicactivity,theHerfindahlIndexneedstobebasedontheIndianclassificationof economicactivity(NIC2004)forittobeappliedusingIndiandata. Inordertoconstructthisindex, Wecreatedaconcordancebetween6-digitNAICS2002and3-digitISICRev3.1(theclassification used by the ASI and SUM is the NIC 2004, which is a one to one match to ISIC Rev 3.1 at the 3-digitlevel). Theconcordancebetween6-digitNAICSand3-digitISICRev3.1wasbasedontheCensusBureau’sconcordancefileavailableathttp://www.census.gov/eos/www/naics/concordances/concordances.html. Althoughthisfilegivesamanytomanyconcordance,thiswasreducedtoaonetooneconcordance bytakingthe3-digitISICwhichwastheclosestfitforeach6-digitNAICS. Of the 59 3-digit ISIC industries in the manufacturing sector, three industries (182, 231, and 233) were not represented in this concordance i.e. none of the 6-digit NAICS industries were mappedintothese3-digitISICindustries. Theseindustriesemployedonly0.16percentofthetotal manufacturingworkforceinIndiain2005. Theseindustriesaredroppedforalltheanalysiswhich usestheHerfindahlIndex. E.2. HSProductClassificationtoNIC2004ConcordanceforExport-Import Index SectionDalsousesameasureofinternationaltradeasaproportionofdomesticproductioninIndia atthe3-digitlevelforNIC2004. TheexportandimportdataforIndiawasnotattheindustrylevel 80

but rather at the product level using the Harmonized Commodity Description and Coding System (HS). The World Integrated Trade Solution (WITS) provides a one to one concordance from HS 2002 to ISIC Rev 3. WITS is based on a collaboration of the World Bank with UNCTAD, WTO andotherinternationalorganizationassociatedwithinternationaltradedata.53 Two NIC 04 industries (223 and 273) were not represented in this concordance i.e. none of the HS codes mapped into these industries. These industries employed only 0.37 percent of the total manufacturingworkforceinIndiain2005. Theseindustriesaredroppedforalltheanalysiswhich uses the Export-Import Index. Furthermore, industry 233 (nuclear fuel) had some imports in the tradedatabutnolocalproductioninIndia. Thisindustrywasalsodropped. E.3. Concordances Across Different NIC Revisions DifferentyearsoftheASIandSUMusedifferentrevisionsoftheNIC.The1989and1994surveys use NIC87, the 2000 surveys uses NIC98, the 2005 surveys use NIC04 and the 2010 surveys use NIC08. We create a concordance from the different NIC revisions to NIC04 at the 3-digit level as the tradability indexes are constructed for NIC04. The concordances were based on official concordance tables which can be found at http://mospi.nic.in under the “Economic and Social ClassificationHeading”. TheNIC04industries341(Manufacturingofmotorvehicles)and342(Manufactureofbodiesof motor vehicles, trailers, and semi-trailers) cannot be separately identified in NIC87. Hence, these twoindustriesaremergedintooneindustrygroupforallthetradabilityregressions. F. Units Misreporting Problem in the ASI As mentioned in footnote 10, there seems to be a misreporting of units and quantities in the ASI. We discuss an example here to clarify the problem. ASICC code 11401 stands for “milk”. All plantswhoreportthattheyproducemilkaresupposedtoreportthequantityproducedinkiloliters (1000 liters) which should mean that when we divide rupee values by the quantity, then it should 53Theconcordancecanbefoundathttp://wits.worldbank.org/wits/product_concordance.html. 81

yield the price of milk that the plant charges in kiloliters. Figure 17 plots the log of price charged formilkbydifferentplantsintheASIagainstlogofthenumberofemployeesintheplant. Ascan beseen,thelogofthepricechargedbymostplantsisaboutten. However,thereisagroupofplants who report a price which is seven log points lower or about 1000 times lower (exp(7)=1096). Thisisclearlyacaseofsomeplantsreportingquantitiesinlitersinsteadofkiloliterswhichmakes thepricecomputedapriceperliter. Such misreporting can potentially bias the results from regressions of price on size if larger plantsaremorelikelytomisreportquantitiesintermsoflargerunits. Toaccountforthisproblem, we manually go through about a 1000 product categories to see which product categories suffer from this problem. We split products which suffer from this problem into two separate product categories based on a sensible price cutoff (for the milk example, all plants charging a log price greaterthansixwereplacedinadifferentproductcategory).54 Asadifferentproductfixedeffectis allowedforthisnewproductcategory,theregressionscontrolforthepriceleveldifferencesarising frommisreportingofquantityunits. However,theclusteringwhencomputingstandarderrorsdoes nottreatthenewproductcategoryasaseparatecategorywhichiswhythenumberofproductfixed effectsexceedthenumberofclustersintheseregressions. Table24comparestheresultswhenthe units correction described above is implemented versus when it is not implemented. Column 1 is the same as the first column of Table 3 (it corrects for the units problem). Column 2 repeats the regression but does not split products with the units problem into different categories. As can be seen,thepriceelasticitywithrespecttoemploymentissmallerwhentheunitsproblemiscorrected, implyingthatthemisreportingofunitsiscorrelatedwithsize. In addition to the manual correction, we also implement an algorithm which identifies product categories for which units have been potentially misreported. The algorithm consists of the followingsteps: 1. If the maximum price reported for a product is less than 50 times the minimum price, then theproductisclassifiedasonewithnounitsmisreporting. 2. We first arrange prices in ascending order within a product category. If there are two con- 54While in the milk example presented here, the units problem and the appropriate price cutoff was obvious, for some other products the problem is harder to clearly identify. In these cases we use our judgment to decide on the pricecutoff. 82

secutive prices which are at least different by a factor of 20, and the average price above the jump is between 500 and 2000 times the average price below the jump, then the product is classifiedasonewithaunitsmisreportingproblemandissplitintotwoproductcategories. 3. Werunregressionsoflogofpriceonlogofemploymentwithadummywhichtakesvalue1 forallobservationsbelowagivenprice,thatis,ifaproductcategoryhas50plantsproducing it,werun50separateregressions-inthefirstthedummyonlytakesthevalue1forthelowest priceplants,forthenextregression,thedummytakesvalue1forthetwolowestpricesandso on. WethencomparethehighestR-squarethatwegetwiththedummies(withintheproduct category)withtheR-squarewhenwerunaregressionsoflogofpriceonlogofemployment with no dummy. If the difference in R-square is more than 0.75, and the difference in mean prices above the dummy (for the highest R-square) is at least 300 times higher than the averagepricebelowthedummy,thentheproductisclassifiedasonewithunitsmisreporting andissplitintotwoproductcategories. When implementing the algorithm instead of the manual correction, the elasticity of price to size is 0.1037 in the ASI. The result is not very sensitive to using slightly different thresholds in the threestagesofthealgorithm. Forexample,changingthethresholdfortheR-squareinstep3to0.8 and0.7changestheestimatedelasticityto0.1091and0.0986respectively. We also implement the algorithm for the input prices regressions and the results are similar to theoneswiththemanualcorrection G. Calibrating Production Parameters - θ q n This section provides more details on the calibration of θ , the share of unskilled workers in the qn intermediateproducersproductionfunction. Asmentionedinthepaper,θ ischosentomatchthe qn wage premium and the ratio of unskilled to skilled workers for different qualities relative to the lowestqualitylevel. The target for the wage premium is obtained by running Mincerian type regression using the Employment-UnemploymentSurveyof2004-5. Eachindividualisaskedtoreportthemainactivi- 83

tiesheorsheundertookinthelastsevendays. Individualscanreportmultipleactivities,andreport iftheywereinvolvedintheactivitywith“fullintensity”or“halfintensity. Thewagesearnedinthe last week are reported for all activities separately (if that activity generated wages). The average wage earned by each individual is computed by dividing the total wage earned for each activity overthelastsevendaysbythenumberofintensity-daysworked(summingacrossdaysandtreating fullintensityas1dayandhalfintensityas0.5days)inthatactivity. Thewagepremiumforskilledworkersiscomputedbyrunningaregressionoflogofwagesona dummywhichtakesthevalue1iftheworkerisskilled(tenormoreyearsofeducation)controlling for potential experience (age minus years of education minus four) and its square, and dummies for each 4-digit industry, 2-digit occupation, state, sector (urban or rural), and sex. We restrict the sample to workers reporting their industry as manufacturing (2-digit NIC between 15 and 36) and individualsbetweentheageof15and65only. Column1ofTable20reportstheresultsfortheregression. Thecoefficientonthedummywhich takes value 1 if a person is classified as skilled is 0.45, implying a wage premium of 56.8 percent whichisroundedupto60percentwhencalibratingthemodel. Calibratingθ alsorequiresN−1ratiosforequation12,theunskilledtoskilledratiofordifferqn ent qualities relative to the lowest quality. As mentioned in the paper, size categories reported in the Employment-Unemployment survey are very coarse, and therefore cannot be used to compute elevenratiosforequation12forelevendifferentquality(size)levels. Insteadtherelationbetween the size of an establishment and the share of unskilled to skilled workers is extrapolated based on the values reported in Table 11. The table reports that plants of size five or less have a unskilled to skilled ratio of 5.05 while plants of size 5 to 20 have a unskilled to skilled ratio of 2.92. These twopointsareusedtoextrapolatetheunskilledtoskilledratioforlargersizedplantswiththeratio taking a minimum value of 0.5 (hire twice as many skilled as compared to unskilled workers). These extrapolated values are used to compute equation 12 for different quality levels given the averagesizeofeachqualitylevel. 84

H. Additional figures and tables Table16: Plantswithmorecapitalproduceusemoreexpensiveinputgoods (1) (2) (3) (4) log(inputprice) log(inputprice) log(inputprice) log(inputprice) MachineryCapital/Laborratio 0.019∗∗∗ (0.0056) MachineryInvestment/Laborratio 0.018 (0.012) TotalCapital/Laborratio 0.013∗∗∗ (0.0040) TotalCapitalInvestment/Laborratio 0.014∗ (0.0079) AdjustedR2 0.892 0.892 0.893 0.893 Winsor Yes Yes Yes Yes Observations 105197 105197 106749 106749 StatexRuralxProductFE Yes Yes Yes Yes NumberofProducts 2190 2190 2190 2190 SEclusters: Product Product Product Product NumberofClusters 1485 1485 1487 1487 Sample ASI ASI ASI ASI Standarderrorsinparentheses ∗ p<0.10,∗∗ p<0.05,∗∗∗ p<0.01 This table examines whether plants with a higher capital to labor ratio use more expensive inputs. Column 1 (2) regresses the stock of machinery capital (total capital) to labor on input prices. Column 2 (4) regresses machinery investment (total investment) to labor on input prices. To prevent our results being distorted by outliers, the input prices are winsorized at the 1 percent level. Each regression includes the triple interaction of state, rural, and input productfixedeffects. Weclusterourstandarderrorsforeachinputproduct. 85

Table17: SummaryStatistics: ASIandSUM AnnualSurveyofIndustries SurveyofUnorg. Manufacturing Observations Plants Employment Observations Plants Employment 1989-90 45 88 6,999 94 13,279 26,968 1994-95 52 105 7,853 156 12,114 29,924 2000-05 30 119 7,762 220 16,994 37,016 2005-06 42 125 8,811 82 17,037 36,376 2009-10 41 144 11,506 98 17,211 34,910 Notes: All numbers are in thousands (’000). The data is from the Annual Survey of Industries and the Survey of Unorganizedmanufacturingforfivedifferentyears. Therowcorrespondingtotheyear2009reportsresultstheASI of2009-10buttheSUMof2010-11. Thecolumn"Observations"reportsthetotalnumberofobservationssurveyed intheyear. The"Plants"and"Employment"columnsreportthetotalplantsandthetotalemploymentintheseplants after taking into account the survey weights provided with the surveys. Four states were excluded due to lack of coverageinsomeyearsoftheASI.Onlyplantswhichreportedindustrieswhichcorrespondedtothe2-digitNIC2004 classificationbetween15and36wereincluded. Table18: SummaryStatistics: ConsumerExpenditureSurvey Shareof Items ItemswithPrices SharewithPrices Expenses Food 161 0.499 156 0.980 FuelandLight 13 0.094 10 0.932 ClothingandFootwear 27 0.075 24 0.967 Othergoodsandservices 86 0.288 0 0.000 Durables 52 0.044 19 0.449 Notes: The data is from the Consumer Expenditure Survey conducted by the NSS in 2004-05. The rows represent broadexpenditurecategories. Thecolumn"Item"givesthenumberofdistinctgoodsinthecategoryforwhichconsumption was reported. "Share of Expenses" gives the share of total expenditure that was devoted to the particular expenditurecategorywhensummingoverallhouseholds. "ItemswithPrices"reportsthenumberofitemsinthecategoryforwhichvaluesandquantitieswerereported,allowingcalculationofprices. "ShareofPrices"reportstheshare ofexpenditurewithinthecategorywhichwasdevotedtoitemsforwhichthepricecouldbecomputed. 86

Table19: MainActivityofIndividual-2003ConsumerExpenditureSurvey Description Code Workedinhhenterprise(self-employed): ownaccountworker 11 Workedinhhenterprise(self-employed): employer 12 Workedashelperinhhenterprise 21 Workedasregularsalaried/wageemployee 31 Workedascasualwagelabor: inpublicworks 41 Workedascasualwagelabor: inothertypesofwork 51 Didnotworkbutwasseekingand/oravailableforwork 81 Attendededucationalinstitution 91 Attendeddomesticdutiesonly 92 Domesticdutiesandengagedinfreecollectionofgoods,sewing,tailoring,etc. 93 forhouseholduse Rentiers,pensioner,remittancerecipientsetc 94 Notabletoworkduetodisability 95 Beggars,prostitutes 96 Others 97 Notes: The 2003 consumer expenditure survey asks each individual in the household to report their main activity duringtheyear. Thetableliststhedifferentactivitieswhichtheindividualscouldreport. Peoplewhoreportedcodes 92,93,94,or97wereclassifiedasnon-workersandhouseholdswhichhadatleastonepersonbetweentheageof15 and70whowasclassifiedasanon-workerwereconsideredtohavelowopportunitycostoftime. Table20: WagePremiumfromEmployment-UnemploymentSurvey Dependentvariable: log(wage) (1) (2) skilled 0.450*** 0.445*** (0.0150) (0.0147) WagePremium 1.568 1.560 Winsorize1% Y Observations 11,003 11,003 Notes: ThedataisfromtheEmploymentUnemploymentSurveyof2004-05. Column1reportsresultsfortheregressionoflogofwagesearnedbyanindividualonadummywhichtakesvalue1iftheindividualhas10ormoreyears ofeducation. Column2winsorizes1percenttailsofwages. Allregressionsincludecontrolsforpotentialexperience (ageminusyearsofschoolingminus4)anditssquare,anddummiesforeach4-digitNICindustry,2-digitoccupation, state,sector(urbanorrural),andsex. Thewagepremiumimpliedbythecoefficientestimateforskilledisgiveninthe rowlabeled"WagePremium". Robuststandarderrorsarereported. ***p<0.01. 87

Table21: RankingofIndustriesBasedonTradabilityIndex HerfindahlIndex Export-ImportIndex 151,152,153,154,155,171, 152,153,154,155,160,171, IndustriesBelowMedian 201,202,210,221,222,241, 182,201,202,210,222,231, (Non-tradable) 242,251,252,261,269,272, 251,252,269,271,281,293, 273,281,289,291,292,311 311,313,314,315,341,342, 312,343,361,369 343,352,359,361 160,172,173,181,191,192, 151,172,173,181,191,192, IndustriesAboveMedian 223,232,243,271,293,300, 221,232,241,242,243,261, (Tradable) 313,314,315,319,321,322, 272,289,291,292,300,312, 323,331,332,333,341,342, 319,321,322,323,331,332, 351,352,353,359 333,351,353,369 Notes: Thetableliststhe3-digitindustries(NIC04)whichfallaboveandbelowthemedianofforthetwotradability indexes. Table22: SizeIncomeRelationAcrossStatesforTradablesvs. Non-tradables: NoWeighting DependentVariable: shareofemploymentin<=5inindustry‘i’,state‘s’,time‘t’ (1) (2) (3) (4) log(per-capitaSNDP)*tradability 0.015 -0.026 -0.043 -0.006 (0.0376) (0.0705) (0.0278) (0.0415) Index H-index H-index Exp-Imp Exp-Imp Cutoff Median Quartile Median Quartile Observations 3,885 1,826 3,899 1,959 Notes: The data is from five rounds of the ASI and SUM. The table reports regression results for the share of employmentinplantsofsize5orlessinindustry‘i’instate‘s’attime‘t’onlogper-capitastateNDPinteractedwitha dummywhichtakesvalue1ifindustry‘i’isclassifiedasatradableindustryand0ifitisclassifiedasnon-tradable. Column1classifiesanindustryastradableiftheHerfindahlIndexacrossUScountiesfortheindustrywasabovethe medianofHerfindahlIndexes,andnon-tradableifitwasbelowthemedian.Column2usestopandbottomquartilesof theHerfindahlIndexascutoffs.Column3and4usethetradabilityindexbasedonIndianexportsandimportsanduses themedianandthetopandbottomquartilesascutoffsrespectively. Allregressionsincludefixedeffectsforindustry interacted with time and state interacted with time. No weights are applied to the observations in the regressions. Standarderrorsareclusteredatthestatelevel. 88

Table 23: Size Income Relation Across States for Tradables vs. Non-tradables: Exclude “NEC” and“Others” DependentVariable: shareofemploymentin<=5inindustry‘i’,state‘s’,time‘t’ (1) (2) (3) (4) log(per-capitaSNDP)*tradability 0.050 0.036 0.002 -0.006 (0.0399) (0.0525) (0.0500) (0.0549) Index H-index H-index Exp-Imp Exp-Imp Cutoff Median Quartile Median Quartile Observations 3,219 1,531 3,233 1,593 Notes: The data is from five rounds of the ASI and SUM. Residual industries (with words "NEC" or "other") are removed. The table reports regression results for the share of employment in plants of size 5 or less in industry ‘i’ in state ‘s’ at time ‘t’ on log per-capita state NDP interacted with a dummy which takes value 1 if industry ‘i’ is classifiedasatradableindustryand0ifitisclassifiedasnon-tradable. Column1classifiesanindustryastradableif theHerfindahlIndexacrossUScountiesfortheindustrywasabovethemedianofHerfindahlIndexes,andnon-tradable ifitwasbelowthemedian. Column2usestopandbottomquartilesoftheHerfindahlIndexascutoffs. Column3and 4usethetradabilityindexbasedonIndianexportsandimportsandusesthemedianandthetopandbottomquartiles ascutoffsrespectively. Allregressionsincludefixedeffectsforindustryinteractedwithtimeandstateinteractedwith time. Eachobservationisweightedbytheshareofobservationsinthestate-industrycelloutofthetotalobservations intheASIandSUMcombinedforthegivenyear. Standarderrorsareclusteredatthestatelevel. Table24: UnitsMisreportingProblemintheASI DependentVariable: log(outputprice) (1) (2) (3) (4) log(labor) 0.096*** 0.155*** 0.106*** 0.125*** (0.0087) (0.0125) (0.0133) (0.0152) UnitsProblemAccountedFor Y N Y N Sample ASI ASI Both Both Observations 46,704 46,704 75,161 75,161 Numberofproducts 1,217 1,077 3,181 3,041 Numberofclusters 1,078 1,078 3,042 3,042 Notes: The data is from the ASI and SUM of 2005-06. All columns report results for regressions of log of price chargedbyplantsfortheirproductsonlogofnumberofemployeeshiredbytheplant. Columns1and2restrictthe sample to the ASI alone while columns 3 and 4 combine the ASI and the SUM. Columns 1 and 3 implement the manualunitscorrection(sameasreportedinmaintext)whilecolumns2and4donotcorrectformisreportingofunits. 1percenttailsofprices(withinaproduct)andplantsizearewinsorized. Allregressionsincludeproductfixedeffects andstatetimesurban-ruralfixedeffects. Standarderrorsareclusteredattheproductlevel. ***p<0.01. 89

Figure17: PriceChargedforMilkbyDifferentPlantsAgainstSize ecirP fo goL 01 8 6 4 2 2 3 4 5 6 7 Log of Employment 90

Cite this document

APA

Anil K. Jain and Siddharth Kothari (2025). Why are Manufacturing Plants Smaller in Developing Countries? Theory and Evidence from India (IFDP 2025-1417). Board of Governors of the Federal Reserve System, International Finance Discussion Papers. https://whenthefedspeaks.com/doc/ifdp_2025-1417

BibTeX

@techreport{wtfs_ifdp_2025_1417,
  author = {Anil K. Jain and Siddharth Kothari},
  title = {Why are Manufacturing Plants Smaller in Developing Countries? Theory and Evidence from India},
  type = {International Finance Discussion Papers},
  number = {2025-1417},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2025},
  url = {https://whenthefedspeaks.com/doc/ifdp_2025-1417},
  abstract = {Poorer countries (and poorer states within India) have a larger share of manufacturing employment in small plants. This paper presents empirical evidence and a theoretical model to show that this relationship is driven by greater demand for lower quality goods in poorer regions, which can be produced efficiently in small plants. First, using data for India, we show that richer households buy higher price goods and larger plants produce higher price products. Second, we develop a model that matches these facts. Finally, we find that our model explains about forty percent of the cross-state variation in the size distribution of manufacturing plants in India.},
}