feds · June 1, 2017

Comparing Cross-Country Estimates of Lorenz Curves Using a Dirichlet Distribution Across Estimators and Datasets

Abstract

Chotikapanich and Griffiths (2002) introduced the Dirichlet distribution to the estimation of Lorenz curves. This distribution naturally accommodates the proportional nature of income share data and the dependence structure between the shares. Chotikapanich and Griffiths (2002) fit a family of five Lorenz curves to one year of Swedish and Brazilian income share data using unconstrained maximum likelihood and unconstrained non-linear least squares. We attempt to replicate the authors' results and extend their analyses using both constrained estimation techniques and five additional years of data. We successfully replicate a majority of the authors' results and find that some of their main qualitative conclusions also hold using our constrained estimators and additional data. Accessible materials (.zip)

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Comparing Cross-Country Estimates of Lorenz Curves Using a Dirichlet Distribution Across Estimators and Datasets Andrew C. Chang, Phillip Li, and Shawn M. Martin 2017-062 Please cite this paper as: Chang, Andrew C., Phillip Li, and Shawn M. Martin (2017). “Comparing Cross-Country EstimatesofLorenzCurvesUsingaDirichletDistributionAcrossEstimatorsandDatasets,” FinanceandEconomicsDiscussionSeries2017-062. Washington: BoardofGovernorsofthe Federal Reserve System, https://doi.org/10.17016/FEDS.2017.062. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Comparing Cross-Country Estimates of Lorenz Curves Using a Dirichlet Distribution Across Estimators and Datasets Andrew C. Chang∗ Phillip Li† Shawn M. Martin‡ April 18, 2017 Abstract Chotikapanich and Griffiths (2002) introduced the Dirichlet distribution to the estimation of Lorenz curves. This distribution naturally accommodates the proportional natureofincomesharedataandthedependencestructurebetweentheshares. Chotikapanich and Griffiths (2002) fit a family of five Lorenz curves to one year of Swedish and Brazilian income share data using unconstrained maximum likelihood and unconstrained non-linear least squares. We attempt to replicate the authors’ results and extend their analyses using both constrained estimation techniques and five additional years of data. We successfully replicate a majority of the authors’ results and find that some of their main qualitative conclusions also hold using our constrained estimators and additional data. JEL Codes: C24; C51; C87; D31 Keywords: ConstrainedEstimation; Dirichlet; GiniCoefficient; IncomeDistribution; Lorenz Curve; Maximum Likelihood; Non-linear Least Squares; Replication; Share Data ∗Chang: Board of Governors of the Federal Reserve System. 20th St. NW and Constitution Ave., Washington DC 20551 USA. +1 (657) 464-3286. a.christopher.chang@gmail.com. https://sites.google. com/site/andrewchristopherchang/. †Li: Office of Financial Research. phil.li@gmail.com. ‡Martin: University of Michigan, Ann Arbor. smm332@georgetown.edu. §TheviewsandopinionsexpressedherearethoseoftheauthorsandarenotnecessarilythoseoftheBoard ofGovernorsoftheFederalReserveSystem,DepartmentoftheTreasury,ortheOfficeofFinancialResearch. We thank conference participants at the 2015 International Atlantic Economic Conference, especially Tiago Pires and Keshab Bhattarai, for helpful comments. We are responsible for any errors. 1

Introduction The Lorenz curve is a commonly used tool to illustrate income distributions and income inequality. It is constructed by relating ordered cumulative proportions of income to ordered cumulative population shares. The curve is then used to estimate income inequality measures, such as the Gini coefficient or Atkinson’s inequality measure. Unfortunately, estimates of inequality from Lorenz curves can depend crucially on distributional assumptions, functional form assumptions, and estimation methodologies (Cheong, 2002; Chotikapanich and Griffiths, 2002, 2005; and Abdalla and Hassan, 2004). Therefore, the literature proposes different functional forms and re-parameterizations for both the Lorenz curve and income distributions.1 Estimation is commonly based on least squares techniques, with more recent studies using Bayesian and maximum likelihood estimation.2 We have three main objectives in this paper. For our first objective, we attempt a narrow replication of Chotikapanich and Griffiths (2002), hereafter CG, who propose using a Dirichlet distribution to model cumulative income share data. The Dirichlet distribution naturally accommodates the proportional nature and dependence structure of income share data, which are characteristics of income share data that often lack recognition (Chotikapanich and Griffiths, 2002). CG estimate five Lorenz curves using both maximum likelihood (ML) and non-linear least squares (NL) on one year of Brazilian and Swedish data, obtaining implied Gini coefficients. CG have three main findings: (1) the point estimates of the parameters and of the Gini coefficients are generally insensitive to the choice of Lorenz curve specification and estimator, (2) the standard errors are sensitive to the specification and estimator, and (3) ML under the Dirichlet distributional assumption performs better than NL for all Lorenz curve specifications. We replicate a majority of CG’s three main findings. For less parameterized Lorenz curves, our point estimates and standard errors match CG. We experience considerable instability in estimating the more parameterized Lorenz curves, consistent with CG. Our successful narrow replication contributes to the current push for replication and robustness 1For example, Kakwani (1980), Rasche et al. (1980), Ortega et al. (1991), Chotikapanich (1993), Sarabia et al. (1999, 2001, 2005), Rohde (2009), Helene (2010), and Wang and Smyth (2015). 2See Chotikapanich and Griffiths (2002, 2008), Hasegawa and Kozumi (2003). 2

in economics research (Chang and Li, 2015; Welch, 2015; Zimmermann, 2015). Our second objective is to extend CG by using constrained estimators. We apply constrained maximum likelihood (CML) and constrained non-linear least squares (CNL) to the same functional forms and data as CG. We use constrained estimators because the parameters from the Lorenz curve specifications in CG should be constrained to ensure that the curves are invariant to increasing convex exponential and power transformations (Sarabia et al. (1999)). Although these restrictions are mentioned in CG, some of CG’s estimates violate the constraints. We find that some parameter estimates differ between constrained and unconstrained estimators, but the implied Gini coefficients are similar between constrained and unconstrained estimators. Our third objective is to fit the various Lorenz curve specifications with both constrained and unconstrained estimators on five additional years of Swedish and Brazilian income distribution data from the World Bank: data not used by CG. We find that a few of the main conclusions from CG also hold using the constrained estimators and these additional data. Similar to Abdalla and Hassan (2004), who apply the methodologies from CG to data from the Abu Dhabi Emirate and their own Lorenz curve form, we find that Gini coefficient point estimates are robust to different functional forms and estimation methods when applied to additional data. Narrow Replication The data are the cumulative proportions of income (η ,η ,...,η with η = 1) and 1 2 M M corresponding cumulative population shares (π ,π ,...,π with π = 1).3 Let q = η − 1 2 M M i i η be the income shares. CG assume that (q ,...,q ) has a Dirichlet distribution with i−1 1 M parameters (α ,...,α ), where α = λ[L(π ;β) − L(π ;β)]. L(·) is the Lorenz curve 1 M i i i−1 specification with an associated vector of unknown parameters β, and λ > 0 is an unknown scalar parameter from the Dirichlet distribution. 3For this paper, we conduct the replications without assistance from the authors and without their code, using data from the original source (Jain, 1975). We use Matlab R2013a and Stata 13MP on the Windows 7 Enterprise (64-bit) and OS X Version 10.9.5 operating systems respectively. 3

CG apply five Lorenz curve specifications to one year of Brazilian and Swedish data: ekπ −1 L (π ;k) = , k > 0 (1) 1 i ek −1 L (π ;α,δ) = πα[1−(1−π)δ], α ≥ 0,0 < δ ≤ 1 (2) 2 i L (π ;δ,γ) = [1−(1−π)δ]γ, γ ≥ 1,0 < δ ≤ 1 (3) 3 i L (π ;α,δ,γ) = πα[1−(1−π)δ]γ, α ≥ 0,γ ≥ 1,0 < δ ≤ 1 (4) 4 i L (π ;a,d,b) = π −aπd(1−π)b. a > 0,0 < d ≤ 1,0 < b ≤ 1 (5) 5 i Each specification is then estimated with ML based on the Dirichlet distributional assumption or with NL without the distributional assumption. Functions L and L are nested in 2 3 function L when γ = 1 and α = 0. L is the “beta” function, see Kakwani (1980), and can 4 5 yield L when a and d are 1 in L and α = 1 in L . 2 5 2 The log-likelihood of the j-th Lorenz curve specification and the Dirichlet distribution is M X log[f(q|θ)] = logΓ(λ)+ (λ[L (π ;β)−L (π ;β)]−1)×logq (6) j i j i−1 i i=1 M X − logΓ(λ[L (π ;β)−L (π ;β)]). j i j i−1 i=1 ML standard errors are derived from the negative inverse of the numeric Hessian matrix evaluated at the maximum. We use the Matlab function fminunc to perform the optimizations. The NL objective function is M R = X (η −L (π ;β))2. (7) i j i i=1 We use the Matlab function lsqcurvefit and the Stata command nl for the optimizations. For NL, CG suggest using Newey and West (1987) standard errors.4 Tables 1 and 2 show our narrow replication results. For Lorenz curves L to L and 1 3 4We implement nl in Stata with different lag values for the Newey-West standard errors and find that a lag of 2 matches the standard errors reported by CG. These are the standard errors we report. We use the Stata option vce(hac nwest 2) in the nl command. 4

for both countries, our ML point estimates and standard errors more or less match those from CG. Our ML estimation for L is unstable, with more stable estimation using Brazilian 4 data than Swedish data, consistent with CG. However, the Swedish ML point estimates for α fluctuate around values that are often greater than CG’s estimates. When we perform ML with random starting values on Swedish data, the point estimates are similar to CG’s but the standard errors are unstable.5 This instability may indicate that the area around the maximum is flat, yielding point estimates and variances that are not unique (Gill and King, 2003). In addition, the numeric variance-covariance matrix evaluated at the converged values is not positive definite for over 50% of the random starting values. As a result, we do not report ML standard errors for L with Swedish data. For L with Brazilian data, our 4 4 point estimates and standard errors more or less match those from CG. We are unable to replicate CG’s ML results for L for both countries, despite attempting 5 estimation using a grid of starting values. As noted in Ortega et al. (1991) and Sarabia et al. (1999), L can result in a negative income share η for a population share π , leading to the 5 i i difference L (π ;β)−L (π ;β) being negative and the term logΓ(λ[L (π ;β)−L (π ;β)]) 5 i 5 i−1 5 i 5 i−1 from (6) being computationally infeasible. We use NL for each Lorenz curve, initialized over a grid of starting values that spans the support of the parameters. We find that all Lorenz curve specifications except L display 1 some instability.6 Instability is most frequent for L and L . However, the parameter 4 5 estimates that minimize the NL objective function and the corresponding standard errors are equivalent to CG’s estimates. ForbothMLandNL,wealsoattempttoreplicatetheGinicoefficientG = 1−2R1L (π;β) 0 j dπ, which is an income inequality measure. Following CG, we obtain point estimates of G by ˆ replacing β with the ML or NL βs for each Lorenz curve specification. With the exception of L , we successfully replicate the Gini point estimates and standard errors for all estimation 1 techniquesandLorenzcurvespecifications. OurinitialinabilitytoreplicatetheMLstandard errors for the L Gini coefficients led us to analytically verify the formula for the variance of 1 ˆ ˆ the Gini coefficient, var(G). We find a typo in CG’s L formula for var(G) but are able to 1 5We use 2000 sets of random starting values from a standard normal distribution. 6A majority of the parameter estimates are similar. However, some initial values lead to NL point estimates with larger residual sum of squares, and in some cases infinite Gini coefficients. 5

replicate the ML standard errors for the Gini coefficient with our corrected formula.7 Also, we discover a minor computational issue in the calculation of the NL standard errors for L 1 by CG.8 We report the corrected quantities in our tables. SimilartoCG,wefindthattheGinipointestimatesareinsensitivetothechoiceofLorenz curve specification and estimator, although L fitted with Brazilian data is an exception. 1 Given our inability to estimate L using ML and the non-positive definite numeric Hessian 5 for L using Swedish data, we do not report an ML Gini coefficient for L or standard errors 4 5 of the Gini coefficient for both L and L . 4 5 We also successfully replicate the information inaccuracy measures suggested by Theil (1967)andthelikelihoodratiotest(LRT)resultsexceptforL vs. L (withα=1)forBrazil.9 5 2 We obtained 51.355 as the test statistic compared to 31.355 from CG. Both likelihood ratio statistics, however, lead to the same conclusion that the functional form L , with α = 1, 2 is rejected relative to L . The L LRT and information inaccuracy measure for L are 5 5 5 calculated using CG’s reported point estimates. Scientific Replication: Constrained Optimization Although the parameters for each Lorenz curve specification in (1) to (5) should be constrained to ensure that the Lorenz curves are invariant to increasing convex exponential and power transformations, we believe that CG did not enforce the constraints as some of their estimates violate the ranges. Therefore, we reestimate the models with the constraints imposed.10 Our results are detailed in Tables 1 and 2. 7CG report var(Gˆ) = h 2(ekˆ(e2−kˆ2−2)+1) i2 var(kˆ) but we analytically find var(Gˆ) = (kˆ(ekˆ−1))2 h 2(ekˆ(ekˆ−kˆ2−2)+1) i2 var(kˆ). (kˆ(ekˆ−1))2 8WefindthattheCGstandarderrorsforthe L NLGinicoefficientarecalculatedasvar(Gˆ)= ∂Gvar(kˆ) 1 ∂β when the correct formula is var(Gˆ)= ∂Gvar(kˆ)∂G. We verify this using CG’s reported Brazilian values for ∂β0 ∂β SE(Gˆ) and SE(kˆ), .1647 and .6726, in the formula of var(Gˆ) corrected for the typo detailed in footnote 7: .16472 = ∂G ×.67262 × ∂G, which implies [∂G]2 = .0600 and ∂G = .2449, however ∂G evaluated at kˆ = ∂β ∂β ∂β ∂β ∂β .0600. Therefore the variance of Gˆ should be var(Gˆ)=.0600×.67422×.0600 = .0016 and SE(Gˆ) = .0403. A similar computational error occurs for Swedish data. 9The Theil (1967) information inaccuracy measure, I = PM q log(qi), compares actual income shares, i=1 i qˆi q , to predicted income shares, qˆ. Smaller values of I indicate a better fit. i i 10We use the Matlab functions fmincon and lsqcurvefit. In unreported results we also attempt to use the Matlab function patternsearch to apply ML and CML. Patternsearch yields parameter estimates that are 6

Point estimates for L to L are identical for constrained and unconstrained estimation. 1 3 For L with Swedish data, the CNL estimates deviate the most from the NL estimates. For 4 example, the CNL estimate for α is close to 0 while the NL estimate is −0.7549. For L with 4 Brazilian data, the CNL estimates are close to the NL estimates. In terms of CML results for L , the constrained estimates and standard errors match the unconstrained quantities. 4 ThoughwewereunabletoreplicateunconstrainedMLpointestimatesforL ,withparameter 5 constraints imposed the CML point estimates are close to the unconstrained estimates from CG; we were unable to generate standard error estimates as the numeric hessians were quite unstable across different sets of starting values. The CNL point estimates for L are either 5 identical to or very close to the NL quantities. Overall, we find that the CML and CNL estimates of the model parameters can differ from their ML and NL counterparts. However, the implied Gini coefficients are similar even when the unconstrained and constrained parameter estimates differ. Scientific Replication: Extension to World Bank Data We further extend CG using data from the World Bank Poverty and Equity Database (World Bank, 2015b).11 We construct a dataset of seven quantiles of cumulative income shares for Brazil in 1987, 1992, 1995, 2001 and 2005 and for the equivalent years for Sweden, with 2001 replaced by 2000.12 Tables 3 and 4 show our results using these World Bank data. Unconstrained and constrained estimation applied to World Bank data yield qualitative conclusions similar to those reported by CG, who use data from Jain (1975). With the exception of L , the point es- 4 timates of the parameters for all Lorenz curve specifications are similar across estimation techniques, but there are differences in the standard errors. ML and NL point estimates for L differ for all years of Brazilian and Swedish World Bank data. Similar to our narrow 4 either identical to fmincon or imply a smaller log-likelihood; it also tends to be less stable than fmincon. 11We have agreed to the terms of use as described at http://go.worldbank.org/OJC02YMLA0. 12World Bank Poverty and Equity Database variables used include income share held by lowest 10%, lowest 20%, second 20%, third 20%, fourth 20%, highest 20%, and highest 10%. Swedish data for 2001 and Brazilian data for 2000 are unavailable. At the time of submission, Swedish data for these years and variables were no longer available in the World Bank DataBank, but they are available from the authors upon request. 7

replication, we experience the same computational instability with unconstrained ML for L 4 and computational infeasibility for L with World Bank data. We employ methods from our 5 narrow replication and constrained estimation to obtain point estimates for L and L . 4 5 We also find that, for a given year, Gini coefficients are similar across Lorenz curve specifications and estimators, with the exception of L with Brazilian data. Although some 1 unconstrained parameter estimates violate the restricted ranges and are different from the constrained estimates, the estimates still yield similar point estimates of the Gini coefficients. For Brazil, ML estimation of L results in Gini coefficients that are lower than other 1 functional forms, and NL estimation results in higher Gini coefficients. In addition, the point estimates of the Gini coefficients obtained in our analysis are similar to those officially reported by the World Bank (see Table 5). World Bank Gini coefficients are based on the generalized quadratic and beta parameterizations of the Lorenz curves, suggested by Villasenor and Arnold (1989) and Kakwani (1980).13 Table 6 compares the fit using the Theil (1967) information inaccuracy measure. Similar to CG, we find ML estimation with Swedish data provides a better fit than NL for all Lorenz curve specifications, with the largest differences observed for L and L . CG’s conclusion is 4 5 also consistent for the Swedish World Bank data with the exception of 2001 and 2005 for L 1 and 2005 for L . For Brazil, CG find that ML provides a better fit than NL for L , L , and 4 2 3 L , a worse fit for L and an equivalent fit for L . We find that NL is a better fit in 4 of the 4 1 5 5 years of World Bank data for L and in all years for L , but ML is a better fit for all years 1 4 with functions L , L and L . For both L fit to Swedish data and L fit to Brazilian data, 2 3 5 4 5 CNL has a smaller information measure than NL, suggesting that CNL provides a better fit relative to NL.14 A closer examination shows that for both Brazilian and Swedish data NL overpredicts q and underpredicts q and q , relative to q , by a larger margin than CNL. 1 2 3 i 13Dataarefromnationallyrepresentativehouseholdsurveysconductedbynationalstatisticalofficesorby private agencies under the supervision of government or international agencies. Parametric Lorenz curves are used with groups distributional data when they are expected to provide close estimates to the micro data. If estimation using parametric Lorenz curves is unlikely to work well, estimation is done directly from micro data obtained from nationally representative household surveys (World Bank, 2015a). 14The NL objective function, however, is lower for unconstrained NL. 8

Conclusion Our narrow replication of CG verifies a majority of their results. However, we discover a few minor computational and presentational issues in CG. These issues do not affect CG’s qualitative conclusions. Our scientific replication extends the analysis from CG to constrained estimators and additional data. We conclude that some of the qualitative results from CG also hold with constrained estimators and additional data. Some of our constrained parameter estimates are different than CG’s corresponding unconstrained estimates. However, the Gini coefficient estimates from both sets of estimates are similar. Although we have explored different functional forms and estimators for modeling Lorenz curves, it is difficult for us to make a sweeping recommendation as to which estimator and functional form that researchers should use. However, assuming you only care about the Gini coefficient, and not the fit of actual income shares, then we feel the parsimonious L is 1 the best option. L ’s implied Gini coefficient is relatively, though not completely, invariant 1 to estimator choice and also is stable across initialized starting values. References Abdalla, I. M., Hassan, M. Y., 2004. Maximum likelihood estimation of lorenz curves using alternative parametric model. Metodoloski Zvezki 1 (1), 109–118. Chang, A. C., Li, P., 2015. Is economics research replicable? sixty published papers from thirteen journals say “usually not”. Finance and Economics Discussion Series 2015-083. Washington: Board of Governors of the Federal Reserve System. Cheong, K. S., 2002. An empirical comparison of alternative functional forms for the lorenz curve. Applied Economics Letters 9 (3), 171–176. Chotikapanich, D., 1993. A comparison of alternative functional forms for the lorenz curve. Economics Letters 41 (2), 129–138. Chotikapanich, D., Griffiths, W. E., 2002. Estimating lorenz curves using a dirichlet distribution. Journal of Business & Economic Statistics 20 (2), 290–295. Chotikapanich, D., Griffiths, W. E., 2005. Averaging lorenz curves. The Journal of Economic Inequality 3 (1), 1–19. Chotikapanich, D., Griffiths, W. E., 2008. Estimating Income Distributions Using a Mixture of Gamma Densities. Springer. 9

Gill, J., King, G., 2003. Numerical issues involved in inverting hessian matrices. Numerical Issues in Statistical Computing for the Social Scientist. Hasegawa, H., Kozumi, H., 2003. Estimation of lorenz curves: a bayesian nonparametric approach. Journal of Econometrics 115 (2), 277–291. Helene, O., 2010. Fitting lorenz curves. Economics Letters 108 (2), 153–155. Jain, S., 1975. Size distribution of income: A compilation of data. Kakwani, N., 1980. On a class of poverty measures. Econometrica 48 (2), 437–446. Newey, W. K., West, K. D., 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelationconsistent covariance matrix. Ortega, P., Martin, G., Fernandez, A., Ladoux, M., Garcia, A., 1991. A new functional form for estimating lorenz curves. Review of Income and Wealth 37 (4), 447–452. Rasche, R., et al., 1980. Functional forms for estimating the lorenz curve: Comment. Econometrica 48 (4), 1061–62. Rohde, N., 2009. An alternative functional form for estimating the lorenz curve. Economics Letters 105 (1), 61–63. Sarabia,J.M.,Castillo,E.,Pascual,M.,Sarabia,M.,2005.Mixturelorenzcurves.Economics Letters 89 (1), 89–94. Sarabia, J.-M., Castillo, E., Slottje, D. J., 1999. An ordered family of lorenz curves. Journal of Econometrics 91 (1), 43–60. Sarabia, J.-M., Castillo, E., Slottje, D. J., 2001. An exponential family of lorenz curves. Southern Economic Journal 67 (3), 748–756. Theil, H., 1967. Economics and Information Theory. North-Holland Amsterdam. Villasenor, J., Arnold, B. C., 1989. Elliptical lorenz curves. Journal of econometrics 40 (2), 327–338. Wang, Z., Smyth, R., 2015. A hybrid method for creating lorenz curves. Economics Letters 133, 59–63. Welch, I., 2015. Plausibility. Available at SSRN 2570577. World Bank, 2015a. Povcalnet: an online analysis tool for global poverty monitoring. http: \iresearch.worldbank.org/PovcalNet, accessed 2015-08-05. World Bank, 2015b. Poverty & equity data. http://data.worldbank.org/data-catalog/ poverty-and-equity-database, accessed 2015-06-04. Zimmermann, C., 2015. On the need for a replication journal. FRB St Louis Paper No. FEDLWP2015-016. 10

Tables Table 1: Sweden estimates using data from Jain (1975)a Chotikapanich & Griffiths Our Unconstrained & Constrained Results α δ γ Gini α δ γ Gini L NL .5954 .6352 .3880 .5954 .6352 .3880 2 (.0136) (.0052) (.0013) (.0136) (.0052) (.0013) ML .6068 .6412 .3872 .6068 .6412 .3872 (.0206) (.0085) (.0041) (.0206) (.0084) (.0040) L NL .7269 1.5602 .3871 .7269 1.5602 .3871 3 (.0032) (.0076) (.0007) (.0032) (.0076) (.0007) ML .7335 1.5767 .3877 .7335 1.5767 .3877 (.0072) (.0176) (.0036) (.0081) (.0190) (.0038) L CNL – – – – .0000 .7269 1.5602 .3871 4 NL -.7552 .7931 2.2893 .3864 -.7549 .7931 2.2890 .3865 (.5638) (.0366) (.5458) (.0000) (.5643) (.0366) (.5462) (.0006) CML – – – – .0050 .7330 1.5720 .3877 ML .0048 .7330 1.5721 .3876 .0045 .7330 1.5724 .3877 (.6612) (.0756) (.6369) (.0036) – – – – L k k 1 NL 2.5029 .3792 2.5029 .3792 (.0826) (.0292) (.0825) (.0103) ML 2.5313 .3828 2.5313 .3828 (.1831) (.0228) (.1830) (.0228) L a d b a d b 5 NL .7664 .9397 .5929 .3876 .7664 .9397 .5929 .3876 (.0148) (.0138) (.0108) (.0010) (.0148) (.0138) (.0108) (.0011) CML – – – – .7492 .9200 .5862 .3870 – – – – – – – – ML .7492 .9199 .5862 .3870 – – – – (.0143) (.0093) (.0109) (.0031) – – – – a‘ML’: maximum likelihood. ‘NL’: non-linear least squares. ‘CML’ : constrained maximum likelihood. ‘CNL’: constrained non-linear least squares. We report constrained estimates only when they differ from the unconstrained estimates. ‘–’ represents estimates we are unable to obtain. 11

Table 2: Brazil estimates using data from Jain (1975)b Chotikapanich & Griffiths Our Unconstrained & Constrained Results α δ γ Gini α δ γ Gini L NL .5727 .2876 .6361 .5727 .2876 .6361 2 (.0223) (.0019) (.0012) (.0223) (.0019) (.0012) ML .5270 .2857 .6326 .5270 .2857 .6326 (.0383) (.0053) (.0052) (.0382) (.0053) (.0052) L NL .3782 1.4357 .6328 .3782 1.4357 .6328 3 (.0038) (.0127) (.0010) (.0038) (.0127) (.0010) ML .3721 1.4160 .6325 .3721 1.4160 .6325 (.0068) (.0225) (.0040) (.0069) (.0228) (.0039) L CNL – – – – .2170 .3467 1.2674 .6340 4 NL .2169 .3467 1.2674 .6339 .2169 .3467 1.2674 .6340 (.1950) (.0289) (.1473) (.0013) (.1954) (.0289) (.1474) (.0013) ML .0262 .3683 1.3950 .6325 .0262 .3683 1.3950 .6326 (.2148) (.0318) (.1734) (.0039) (.2229) (.0330) (.1800) (.0039) k k L NL 5.3685 .6368 5.3685 .6368 1 (.6726) (.1647) (.6726) (.0403) ML 3.8438 .5234 3.8438 .5234 (.8237) (.0747) (.8237) (.0747) a d b a d b L CNL – – – – .9150 1.0000 .2698 .6349 5 NL .9151 1.0001 .2698 .6349 .9151 1.0001 .2698 .6349 (.0030) (.0024) (.0016) (.0003) (.0030) (.0024) (.0016) (.0003) CML – – – – .9131 .9991 .2685 .6350 – – – – – – – – ML .9131 .9990 .2685 .6349 – – – – (.0044) (.0024) (.0021) (.0013) – – – – b‘ML’: maximum likelihood. ‘NL’: non-linear least squares. ‘CML’ : constrained maximum likelihood. ‘CNL’: constrained non-linear least squares. We report constrained estimates only when they differ from the unconstrained estimates. ‘–’ represents estimates we are unable to obtain. 12

cataD knaB dlroW gnisu setamitsE nedewS :3 elbaT 5991 2991 7891 iniG γ δ α iniG γ δ α iniG γ δ α 0552. 8267. 9243. 9352. 3657. 9723. 6632. 2087. 8213. LN L 2 )6000.( )5200.( )1500.( )0100.( )2400.( )5800.( )1100.( )1500.( )7900.( 9452. 8467. 1643. 9352. 1067. 6433. 5632. 9487. 2023. LM )8200.( )2600.( )7010.( )4400.( )8900.( )9610.( )1500.( )6110.( )9810.( 8452. 2433.1 1908. 8352. 2913.1 7108. 5632. 0603.1 9028. LN L 3 )4000.( )7300.( )8100.( )8000.( )2700.( )7300.( )0100.( )6800.( )7400.( 1552. 6833.1 2118. 2452. 1723.1 9508. 8632. 5413.1 7528. LM )7200.( )5010.( )0600.( )4400.( )6610.( )6900.( )0500.( )6810.( )1110.( 8452. 2433.1 1908. 1000. 8352. 2913.1 7108. 0000. 5632. 0603.1 9028. 0000. LNC L 4 7452. 6737.1 2548. 1014.- 6352. 6104.2 0878. 4790.1- 4632. 6066.4 3049. 7473.3- LN )4000.( )8236.( )0540.( )9346.( )8000.( )3529.1( )9180.( )7549.1( )4100.( )2501.9( )8601.( )3821.9( 1552. 9482.1 1508. 7450. 2452. 1723.1 9508. 0000. 7632. 5413.1 7528. 0000. LMC 1552. 6482.1 1508. 1550. 2452. 1723.1 9508. 0000. 7632. 5413.1 7528. 0000. LM – – – – – )6610.( )6900.( – – )6810.( )1110.( – k k k 8252. 0975.1 2252. 6475.1 6432. 5654.1 LN L 1 )2700.( )7840.( )8700.( )0350.( )4600.( )8240.( 1452. 3785.1 0352. 3085.1 5632. 1964.1 LM )2020.( )6631.( )7020.( )4041.( )5910.( )5921.( b d a b d a b d a 2552. 8856. 8268. 3705. 1452. 1846. 0568. 0105. 8632. 5176. 6758. 4474. LN L 5 )4000.( )7700.( )9900.( )6700.( )8000.( )8610.( )6020.( )0610.( )0100.( )0120.( )4520.( )8810.( 0552. 5356. 8158. 1005. 8352. 1046. 5548. 1984. 3632. 6166. 7338. 5064. LMC egap txen no deunitnoC ehT .)esabatad-ytiuqe-dna-ytrevop/golatac-atad/gro.knabdlrow.atad//:ptth( esabataD ytiuqE dna ytrevoP knaB dlroW eht morf atad emocni esu eWc mumixam :’LM‘ .%01 tsehgih dna ,%02 tsehgih ,%02 htruof ,%02 driht ,%02 dnoces ,%02 tsewol ,%01 tsewol yb dleh erahs emocni edulcni selbairav deniartsnoc troper eW .serauqs tsael raenil-non deniartsnoc :’LNC‘ .doohilekil mumixam deniartsnoc : ’LMC‘ .serauqs tsael raenil-non :’LN‘ .doohilekil .niatbo ot elbanu era ew setamitse stneserper ’–‘ .setamitse deniartsnocnu eht morf reffid yeht nehw ylno setamitse 13

Table 3: Sweden Estimates using World Bank Datad Continued from previous page 2001 2005 α δ γ Gini α δ γ Gini L NL .2949 .7021 .2743 .2802 .7183 .2605 2 (.0055) (.0024) (.0007) (.0055) (.0026) (.0007) ML .2962 .7035 .2739 .2807 .7196 .2599 (.0084) (.0044) (.0022) (.0074) (.0040) (.0020) L NL .7495 1.2831 .2741 .7623 1.2702 .2603 3 (.0017) (.0033) (.0004) (.0019) (.0034) (.0004) ML .7511 1.2859 .2741 .7635 1.2720 .2602 (.0041) (.0073) (.0020) (.0034) (.0058) (.0016) L CNL .0000 .7495 1.2830 .2741 .0000 .7623 1.2702 .2603 4 NL -.2552 .7798 1.5305 .2739 -.4223 .8075 1.6819 .2602 (.2821) (.0287) (.2730) (.0004) (.3089) (.0263) (.3015) (.0003) CML .0000 .7511 1.2859 .2742 .0000 .7635 1.2720 .2602 ML .0000 .7511 1.2859 .2742 .0000 .7635 1.2720 .2602 – (.0041) (.0073) – – (.0034) (.0058) – k k L NL 1.7263 .2744 1.6295 .2603 1 (.0892) (.0129) (.0801) (.0118) ML 1.6940 .2697 1.6043 .2566 (.1681) (.0244) (.1539) (.0227) a d b a d b L NL .5175 .8865 .5844 .2745 .4965 .8833 .5972 .2607 5 (.0094) (.0118) (.0095) (.0005) (.0089) (.0113) (.0096) (.0004) CML .5121 .8771 .5815 .2743 .4928 .8761 .5959 .2605 dWe use income data from the World Bank Poverty and Equity Database (http://data.worldbank.org/data-catalog/poverty-and-equity-database). The variables include income share held by lowest 10%, lowest 20%, second 20%, third 20%, fourth 20%, highest 20%, and highest 10%. ‘ML’: maximum likelihood. ‘NL’: non-linear least squares. ‘CML’ : constrained maximum likelihood. ‘CNL’: constrained non-linear least squares. We report constrained estimates only when they differ from the unconstrained estimates. ‘–’ represents estimates we are unable to obtain. 14

eataD knaB dlroW gnisu setamitsE lizarB :4 elbaT 5991 2991 7891 iniG γ δ α iniG γ δ α iniG γ δ α 0206. 2463. 2838. 7535. 7264. 5558. 8206. 4863. 9178. LN L 2 )8200.( )5600.( )5350.( )3200.( )8600.( )9730.( )7200.( )4600.( )8250.( 2695. 2363. 4387. 4135. 9164. 6128. 0795. 7663. 4218. LM )3800.( )9010.( )5660.( )5600.( )5010.( )6640.( )3800.( )1110.( )5760.( 0995. 7476.1 8294. 6335. 9637.1 2595. 7995. 5307.1 9005. LN L 3 )3100.( )6120.( )0600.( )0100.( )0610.( )8400.( )2100.( )5020.( )5500.( 1795. 7956.1 4094. 2235. 6727.1 8395. 7795. 4486.1 4794. LM )7300.( )8520.( )6700.( )8200.( )8810.( )0600.( )6300.( )6520.( )4700.( 0995. 7476.1 8294. 0000. 6335. 9637.1 2595. 0000. 7995. 5307.1 9005. 0000. LNC L 4 8695. 8745.2 5295. 8700.1- 1235. 1257.2 5396. 5801.1- 6795. 9165.2 3795. 6889.- LN )4000.( )1261.( )4410.( )7281.( )2000.( )9402.( )6410.( )8912.( )2000.( )1440.( )8300.( )4940.( 1795. 7956.1 4094. 0000. 2235. 6727.1 8395. 0000. 8795. 4486.1 4794. 0000. LM – )8520.( )6700.( – – )8810.( )0600.( – – )6520.( )4700.( – k k k 0126. 0411.5 3445. 4180.4 8026. 2011.5 L 1 )1030.( )5764.( )2420.( )7382.( )3920.( )0554.( LN 0945. 8631.4 6115. 5517.3 9155. 1271.4 )0750.( )7976.( )0930.( )7514.( )0550.( )5266.( LM b d a b d a b d a 9006. 8753. 0000.1 8169. 6435. 0854. 0000.1 9759. 7106. 9263. 0000.1 8869. LNC L 5 9995. 5173. 5520.1 4199. 1435. 6664. 0510.1 8579. 5006. 4873. 6820.1 5200.1 LN )5100.( )0210.( )1810.( )5420.( )2100.( )3110.( )5510.( )7120.( )1300.( )8010.( )4610.( )3220.( 6795. 7853. 9999. 5759. 7235. 6854. 1799. 4359. 9795. 4263. 0000.1 2269. LMC egap txen no deunitnoC .)esabatad-ytiuqe-dna-ytrevop/golatac-atad/gro.knabdlrow.atad//:ptth( esabataD ytiuqE dna ytrevoP knaB dlroW eht morf atad emocni esu eWe :’LM‘ .%01 tsehgih dna ,%02 tsehgih ,%02 htruof ,%02 driht ,%02 dnoces ,%02 tsewol ,%01 tsewol yb dleh erahs emocni edulcni selbairav ehT eW .serauqs tsael raenil-non deniartsnoc :’LNC‘ .doohilekil mumixam deniartsnoc : ’LMC‘ .serauqs tsael raenil-non :’LN‘ .doohilekil mumixam .niatbo ot elbanu era ew setamitse stneserper ’–‘ .setamitse deniartsnocnu eht morf reffid yeht nehw ylno setamitse deniartsnoc troper 15

Table 4: Brazil Estimates using World Bank Dataf Continued from previous page 2000 2005 α δ γ Gini α δ γ Gini L NL .8259 .3668 .5985 .7159 .3849 .5708 2 (.0422) (.0053) (.0023) (.0306) (.0043) (.0019) ML .7868 .3664 .5941 .6855 .3845 .5670 (.0521) (.0088) (.0066) (.0400) (.0075) (.0057) L NL .4942 1.6662 .5956 .5004 1.5861 .5683 3 (.0035) (.0124) (.0007) (.0019) (.0060) (.0005) ML .4940 1.6622 .5947 .5000 1.5826 .5677 (.0047) (.0157) (.0023) (.0030) (.0095) (.0015) L CNL .0000 .4942 1.6662 .5956 .0000 .5004 1.5861 .5683 4 NL -.5239 .5517 2.1143 .5943 -.2725 .5338 1.8194 .5676 (.1680) (.0161) (.1453) (.0004) (.0772) (.0088) (.0670) (.0002) ML .0000 .4940 1.6622 .5947 .0000 .5000 1.5826 .5677 – (.0047) (.0157) – – (.0030) (.0095) – k k L 5.0577 .6174 4.6512 .5893 1 NL (.4662) (.0304) (.4410) (.0322) 4.0997 .5459 3.8252 .5218 ML (.6744) (.0571) (.6300) (.0574) a d b a d b L CNL .9595 1.0000 .3604 .5976 .9294 1.0000 .3740 .5698 5 NL .9766 1.0149 .3685 .5969 .9314 1.0018 .3750 .5698 (.0204) (.0152) (.0102) (.0012) (.0149) (.0114) (.0079) (.0009) CML .9554 .9974 .3610 .5956 .9178 .9897 .3703 .5689 f We use income data from the World Bank Poverty and Equity Database (http://data.worldbank.org/data-catalog/poverty-and-equity-database). The variables include income share held by lowest 10%, lowest 20%, second 20%, third 20%, fourth 20%, highest 20%, and highest 10%. ‘ML’: maximum likelihood. ‘NL’: non-linear least squares. ‘CML’ : constrained maximum likelihood. ‘CNL’: constrained non-linear least squares. We report constrained estimates only when they differ from the unconstrained estimates. ‘–’ represents estimates we are unable to obtain. 16

Table 5: World Bank estimates of Gini coefficientsg 1987 1992 1995 2000 2001 2005 WB Brazilh 0.5969 0.5317 0.5957 0.5933 0.5665 Our Brazili 0.5979 0.5327 0.5976 0.5956 0.5683 WB Sweden 0.2371 0.2542 0.2554 0.2748 0.2608 Our Sweden 0.2365 0.2539 0.2549 0.2741 0.2603 g We use income data from the World Bank Poverty and Equity Database (http://data.worldbank.org/data-catalog/poverty-and-equity-database). The variables include income share held by lowest 10%, lowest 20%, second 20%, third 20%, fourth 20%, highest 20%, and highest 10%. World Bank Gini coefficients are from Povcalnet, an online tool for poverty measurement developed by the Development Research Group of the World Bank (http://iresearch.worldbank.org/PovcalNet, World Bank (2015a)). hFortheaboveyear-countrycombinationsPovcalnetutilizesincome-baseddata in the format of household level (‘unit record’) data. iCalculated as the median of implied Gini coefficients for unconstrained and constrained ML and NL point estimates of functions L to L . 1 5 17

jataD knaB dlroW ,serusaeM ycaruccanI noitamrofnI :6 elbaT nedewS lizarB 5002 1002 5991 2991 7891 5002 1002 5991 2991 7891 679400. 048500. 749300. 171400. 706300. 971440. 402640. 578540. 901120. 273340. LM L 1 279400. 538500. 849300. 271400. 706300. 477340. 377540. 607540. 304120. 692340. LN 040000. 050000. 970000. 002000. 062000. 134000. 226000. 989000. 865000. 689000. LM L 2 140000. 150000. 080000. 702000. 862000. 974000. 786000. 901100. 526000. 621100. LN 520000. 830000. 570000. 291000. 052000. 230000. 470000. 591000. 701000. 581000. LM L 3 620000. 040000. 870000. 991000. 952000. 330000. 670000. 802000. 311000. 402000. LN 520000. 830000. 570000. 291000. 052000. 230000. 470000. 591000. 701000. 581000. LM L 4 520000. 830000. 570000. 291000. 052000. 230000. 470000. 591000. 701000. 581000. LMC 520000. 240000. 280000. 902000. 862000. 210000. 730000. 320000. 020000. 300000. LN 620000. 040000. 870000. 991000. 952000. 330000. 670000. 802000. 311000. 402000. LNC 730000. 640000. 530000. 131000. 571000. 412000. 004000. 955000. 233000. 715000. LMC L 5 440000. 650000. 740000. 961000. 522000. 313000. 046000. 158000. 835000. 177000. LN 440000. 650000. 740000. 961000. 522000. 882000. 414000. 385000. 543000. 365000. LNC -dna-ytrevop/golatac-atad/gro.knabdlrow.atad//:ptth( esabataD ytiuqE dna ytrevoP knaB dlroW eht morf atad emocni esu eW j tsehgih ,%02 htruof ,%02 driht ,%02 dnoces ,%02 tsewol ,%01 tsewol yb dleh erahs emocni edulcni selbairav ehT .)esabatad-ytiuqe ,serahs emocni detciderp eht ot , q ,serahs emocni lautca serapmoc erusaem ycaruccani noitamrofni ehT .%01 tsehgih dna ,%02 i .doohilekil mumixam :’LM‘ .tfi retteb a gnitseggus ,sesaerced erusaem siht sesaerced q dna ˆq neewteb ecnereffid eht sA .ˆq i i i troper eW .serauqs tsael raenil-non deniartsnoc :’LNC‘ .doohilekil mumixam deniartsnoc : ’LMC‘ .serauqs tsael raenil-non :’LN‘ .niatbo ot elbanu era ew setamitse stneserper ’–‘ .setamitse deniartsnocnu eht morf reffid yeht nehw ylno setamitse deniartsnoc 18

Cite this document

APA

Andrew C. Chang, Phillip Li, & and Shawn M. Martin (2017). Comparing Cross-Country Estimates of Lorenz Curves Using a Dirichlet Distribution Across Estimators and Datasets (FEDS 2017-062). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2017-062

BibTeX

@techreport{wtfs_feds_2017_062,
  author = {Andrew C. Chang and Phillip Li and and Shawn M. Martin},
  title = {Comparing Cross-Country Estimates of Lorenz Curves Using a Dirichlet Distribution Across Estimators and Datasets},
  type = {Finance and Economics Discussion Series},
  number = {2017-062},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2017},
  url = {https://whenthefedspeaks.com/doc/feds_2017-062},
  abstract = {Chotikapanich and Griffiths (2002) introduced the Dirichlet distribution to the estimation of Lorenz curves. This distribution naturally accommodates the proportional nature of income share data and the dependence structure between the shares. Chotikapanich and Griffiths (2002) fit a family of five Lorenz curves to one year of Swedish and Brazilian income share data using unconstrained maximum likelihood and unconstrained non-linear least squares. We attempt to replicate the authors' results and extend their analyses using both constrained estimation techniques and five additional years of data. We successfully replicate a majority of the authors' results and find that some of their main qualitative conclusions also hold using our constrained estimators and additional data. Accessible materials (.zip)},
}