Measuring the Informativeness of Market Statistics
Abstract
Market statistics can be viewed as noisy signals for true variables of interest. These signals are used by individual recipients of the statistics to imperfectly infer different variables of interest. This paper presents a framework under which the 'informativeness' of statistics is defined as their efficacy as the basis of such inference, and is quantified as expected distortion, a concept from information theory. The framework can be used to compare the informativeness of a set of statistics with that of another set or its theoretical limits. Also, the proposed informativeness measure can be computed as solutions to familiar problems under a range of assumptions. As an application, the measure is used to explain the difference in usage levels of temperature derivatives across different base weather stations. The informativeness measure is found to be at least as effective as city size measures in explaining the difference in usage levels.
Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Measuring the Informativeness of Market Statistics Kyungmin Kim 2016-076 Please cite this paper as: Kim, Kyungmin (2016). “Measuring the Informativeness of Market Statistics,” Finance and Economics Discussion Series 2016-076. Washington: Board of Governors of the Federal Reserve System, http://dx.doi.org/10.17016/FEDS.2016.076. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.
Measuring the Informativeness of Market Statistics Kyungmin Kim∗ Federal Reserve Board September 14, 2016 Abstract Market statistics can be viewed as noisy signals for true variables of interest. These signals are used by individual recipients of the statistics to imperfectly infer different variables of interest. This paper presents a framework under which the ‘informativeness’ of statistics is defined as their efficacy as the basis of such inference, and is quantified as expected distortion, a concept from information theory. The framework can be used to compare the informativeness of a set of statistics with that of another set or its theoretical limits. Also, the proposed informativeness measure can be computed as solutions to familiar problems under a range of assumptions. As an application, the measure is used to explain the difference in usage levels of temperature derivatives across different base weather stations. The informativeness measure is found to be at least as effective as city size measures in explaining the difference in usage levels. ∗E-mail: kyungmin.kim@frb.gov. I thank Kristin Meier for invaluable research assistance. The paper expresses solely my own view, not those of the Federal Reserve Board or the Federal Reserve System.
1 Introduction Market statistics convey information on a large number of variables. Individual recipients use market statistics to infer their own variables of interest, but the inference is usually far from perfect. If the set of statistics included all the variables that the recipients were interested in learning about, the statistics would perfectly satisty the recipients’ informational need. However, the number of published statistics is typically small and not sufficient to cover all the variables of interest. Given this limitation, it is natural to ask for a measure to quantify the informativeness of market statistics as an input for such inference. This paper proposes a framework under which a measure of informativeness can be defined. It is based on the idea that if there are M market statistics that convey information on K variables of interest, it is generally not possible to recover the K variables precisely from the M statistics, especially if M < K. Then, theexpecteddistancebetweentheK variablesrecoveredfromtheM statisticsandthe actualvaluesoftheK variablescanserveasameasureofinformativeness. Asmallerdistance means that the statistics are more informative. This formulation of informativeness as the expected distance between actual variables and recovered variables is known as expected distortion in information theory.1 Thisdefinitionofinformativenessisdistinctfromtwopopularalternativewaystomeasure the quantity of information in the economics literature: Price impact and entropy. Many studieshavemeasuredthechangeinthepriceofanassetinresponsetonewinformation,such as changes in bond ratings (Goh and Ederington (1993) and Pinches and Singleton (1978)) and divident announcement (Aharony and Swary (1980)). This method can be applied to the problem of this paper, by measuring the impact of changes in the market statistics on each variable of interest. This paper’s contribution is to provide a framework under which the impact on multiple variables can be interpreted. Entropy (Shannon entropy, as in Shannon (1948), or derived measures such as transfer entropy, as in Schreiber (2000)) is sometimes used as a general measure of the quantity of information conveyed by a set of random variables, for example, by Dimpfl and Peter (2012). Rate distortion theory relates entropy-related measures to informativeness, or expected distortion as it is known in information theory (see Cover and Thomas (2006), chapter 10). However, the complex form of the relationship discovered by rate distortion theory is not directly applicable within the scope of this paper. The framework prposed by this paper consists of four elements: (i) The market statistics, (ii) the variables of interest, (iii) the recovery rule, and (iv) the distance measure. The 1For example, Cover and Thomas (2006), chapter 10. 2
framework can be used to evaluate a given set of statistics and to establish limits on the expected distance. Specifying the variables of interest is essential, as there is no sense in discussing informativeness without reference to which variables are being represented by the statistics. The recovery rule is restricted to be either linear or discrete, and either L1 or L2 norm is used as the distance measure. With these assumptions, the informativeness measure can be computed by routine methods. Theframeworkisusefulbothasapracticaltoolandasacomponentofempiricalresearch. The measures can be used to compare the informativeness of different statistics and can tell how much potential improvement in informativeness is possible. Such measures can inform a decision between competing choices of statistics to publish or communicate. Also, they can be used to study how the informativeness of availabe statistics affects economic decisions, as a source of information or as a reference value for financial contracts. The use of informativeness measures can be readily extended to any economic problem in which the distance between actual and recovered variables matters, for example, in assessing the efficiency of insurance contracts. Asanapplication, thispapercomputestheinformativenessofthetemperaturerecordings at certain weather stations to explain the variation in the trade volume of temperature derivativesacrossweatherstations. Theinformativenessmeasuresarefoundtobeaseffective as city size measures in explaining the variation. The rest of paper is organized as follows: Section 2 defines the informativeness measure and discusses their theoretical limits. Section 3 shows that how the measure can be computed as solutions to familiar problems, such as group means/medians, linear regression, k means/medians clustering and principal component analysis (PCA) under L and L norms. 1 2 Section 4 applies the informativeness measure to explaining the cross-sectional variation in the trade volume of temperature derivatives across weather stations. Section 5 concludes. 2 Framework Market statistics imperfectly communicate variables of interest. The informativeness measure represents the degree of imperfection. Let I ,I ,...,I ∈ R be M market statistics and let z ,z ,...,z ∈ R be K variables 1 2 M 1 2 K of interest. The statistics and the variables of interest follow a static joint probability distribution, whose density is denoted by p(I ,...,I ,z ,...,z ). Let g : RM → RK denote a 1 M 1 K recovery rule, which is the value of (z ,...,z ) inferred from (I ,...,I ). For i = 1,2,...,K, 1 K 1 M let g denote the i-th component of g, which is the recovery rule for z . i i d : R2 → R, i = 1,...,K, isthedistanceorpenaltyassociatedwiththedifferencebetween i 3
recovered and actual z . At this point, d can be any function, but for simplicity, d = α d i i i i for every i = 1,...,K and some metric d : R2 → R that is common across the K variables of interest. α is a positive real number representing the weight given to the i-th dimension. i With these d ’s, the expected distance measure, D, is defined as follows: i K (cid:88) D = E α d (z ,g (I ,...,I )). (1) i i i i 1 M i=1 As usual, E denotes expected value. The expected distance measure depends on three sets parameters: p, the joint distribution of the M market statistics and the K variables of interest; (α ,...,α ,d), representing the penalty for an imperfect recovery; and the re- 1 K covery rule g. Therefore, D can be viewed as a function of the parameters, and written as D(p,(α ,...,α ,d),g). 1 K This formulation is known as expected distortion in information theory, as mentioned in the introduction. However, this term will not be used in this paper because distortion has an unrelated established meaning in economics. Instead, the measure D will be simply referred to as informativeness measure or, sometimes, as expected distance. The first two parameters, p and (α ,...,α ,d), define the objective of communicating 1 K marketstatistics. g canbeinterpretedbothastherecoveryrulethattheindividualrecipients use or the recovery rule that the publisher of statistics expect the recipients to use. Given this ambiguity, there are alternative ways to choose a reasonable g. In this paper, g will be defined as the minimizer of D among a class of simple functions. The following two types of functions, or rules, are considered: (i) Linear rule: g is a linear function of market statistics: For i = 1,...,K, g (I ,...,I ) = i 1 M M (cid:80) β + β I . i,0 i,j j j=1 (ii)Discreterule: M partitionsofRintonsets, S(i,j)fori = 1,...,M andj = 1,...,n, are given: S(i,1),S(i,2),...,S(i,n) form a partition, and i is an index that distinguishes M (cid:81) the M partitions. g is constant on each set S(i,j ) for any j ,j ,...,j ∈ {1,...,n} i 1 2 M i=1 In reality, the recipients may not achieve the minimum distance under either rule because they do not know the joint distribution p. The measures are useful only if the recipients are ‘smart enough’ to derive a reasonably good recovery rule. The simple forms that the linear rule and the discrete rule force on g partially reflect this limitation. In addition, these restrictions make D easily computable. LetD betheminimumofD underlinearg: D = min D(p,(α ,...,α ,d),g), whereg is L L g 1 K a linear rule. Similarly, let D be the minimum of D under discrete g, with given partitions. D D and D can be used to compare two sets of statistics in terms of their informativeness: L D 4
The set of statistics with smaller D or D is a better basis of inference for the variables of L D interest. In absolute terms, how does a set of market statistics compare with the best possible set and the worst possible set? These questions are useful in thinking about how much improvement can be made by choosing more informative statistics and in interpreting the magnitude of the expected distance D and D . With a fixed number of statistic M, the L D expected distance of the best possible statistics, D , is simply defined as the minimum of L D under linear g and under an arbitrary joint density p: D = min D(p,(α ,...,α ,d),g). L p,g 1 K Similarly, D is defined as the minimum of D under discrete g for a given partition and an D arbitrary p. The worst set of statistics is the set of statistics that does not help infer the variables of interest at all. This happens if the statistics are statistically independent of the variables of interest, or more simply, if the statistics are constant. This condition is equivalent to g being a vector of constants. Let D be the expected distance from using the worst set of statistics. Then, 0 K (cid:88) D = min α d (z ,g ). (2) 0 i i i i g∈RM i=1 Proposition 1. D ≥ D ≥ D and D ≥ D ≥ D . 0 L L 0 D D By definition, D ≥ D and D ≥ D . Also, D ≥ D because a constant recovery rule L L D D 0 L is a linear recovery rule with zero slopes. Similarly, D ≥ D because a contant recovery 0 D rule is a discrete recovery rule with the same value on all the nM products of partitioning sets. Unfortunately, there is no closed-form solution for D , D , D , D or D in general, and L D L D 0 no general computational technique to approximate the solutions exists either. However, if d is linear distance (Euclidean), d(x,y) = |x − y|, or if d is square distance, d(x,y) = (x − y)2, there are known closed-form solutions or computational techniques. The next section describes them in more detail. 3 Solutions to the Minimization Problems In this section, (I ,...,I ,z ,...,z ) is assumed to follow a sample distribution of size T: 1 M 1 K GivenT vectorsinRM+K indexedbyt,denoted(I ,...,I ,z ,...,z ),theprobabilitydist,1 t,M t,1 t,K tribution of (I ,...,I ,z ,...,z ) is a discrete random variable over (I ,...,I ,z ,...,z ) 1 M 1 K t,1 t,M t,1 t,K with uniform probability 1/T. The following four subsections explain how to compute D , D , D , D and D . L L D D 0 5
3.1 Computation of D L The minimum expected distance under a linear recovery rule, D , can be computed by L ˆ linear regression. Let β ,0 ≤ j ≤ M denote the coefficients of the linear regression of z on i,j i T (cid:80) I ,...,I . The coefficients solve the minimization problem for each i: min d (z ,β + 1 M i t,i 0 β0,β1,...,βM t=1 T (cid:80) ˆ β I +...+β I ). SR is the sum of residuals transformed by d: SR = d (z ,β + 1 t,1 M t,M i i i t,i 0 t=1 ˆ ˆ β I +...+β I ). 1 t,1 M t,M K Proposition 2. D = (1) (cid:80) α SR . L T i i i=1 Proof. By definition, T K 1 (cid:88) 1 (cid:88) D = min( ) ( ) α d(z ,β +β I +...+β I ). (3) L i t,i i,0 i,1 t,1 i,M t,M β T T j,k t=1 i=1 Since the term d(z ,β + β I + ... + β I ) does not include any β for j (cid:54)= i, t,i i,0 i,1 t,1 i,M t,M j,k the minimization can be done before the summation, and the order of the summation can be switched: K 1 (cid:88) D = ( ) α SR , (4) L i i T i=1 T (cid:80) where SR = min d(z ,β +β I +...+β I ). This completes the proof. i t,i i,0 i,1 t,1 i,M t,M β j,k t=1 If d(x,y) = (x−y)2, SR can be simply computed from the least squares regression of z i i on I ,...,I . If d(x,y) = |x−y|, SR can be computed from median regression (see Portnoy 1 M i and Koenker (1997), for example). More generally, for d(x,y) = |x−y|p,p ≥ 1, the linear regression is a convex optimization problem, also known as (cid:96) regression. For an example of p computational methods to solve the optimization problem, see Dasgupta et al. (2009). 3.2 Computation of D L Computing D requires finding I that minimize D given z . If M ≥ K, D = 0 L t,i L t,i L because the first K statistics, I ,...,I , can simply be set to equal z ,...,z and D will t,1 t,K t,1 t,K L be zero as a consequence. Therefore, it is assumed that M < K in the following discussion of D . L The problem of D can be transformed to a problem similar to principal component L analysis (PCA): 6
K Proposition 3. For z,w ∈ RK, let d (z,w) = (cid:80) α d(z ,w ), where the subscript i on a v i i i i=1 vector denotes its i-th component. Then, T 1 (cid:88) D = ( ) min[ mind (z ,B +B u)], (5) L v t 0 1 T B0,B1 u t=1 where z = (z ,z ,...,z ),u ∈ RM,B ∈ RK and B is a K ×M matrix. t t,1 t,2 t,K 0 1 Proof. T 1 (cid:88) D = ( ) min [ d (z ,B +B I )], (6) L v t 0 1 t T B0,B1,It,i t=1 where I = (I ,...,I ). Equation (5) is obtained by moving the minimum over I inside t t,1 t,M t,i the summation and replacing I by u. t In particular, with d(x,y) = |x − y|p for p ≥ 1, the minimum exists: B and u can be 0 bounded given the z ’s and the columns of B can be restricted to be orthonormal. t 1 K K In addition, d (z ,w ) = (cid:80) α |z −w |p = (cid:80) |α1/pz −α1/pw |p for any z ,w ∈ RK. v t t i t,i t,i i t,i i t,i t t i=1 i=1 Therefore, the weights α can be moved inside the absolute value expression, which yields i T K 1 (cid:88) (cid:88) D = ( ) min [min |α1/pz −A(B +B u)|p], (7) L T B0,B1 u i t,i 0 1 t=1 i=1 where A is a K ×K diagonal matrix with A = α1/p. By redefining B and B as AB and ii i 0 1 0 AB , respectively, A can be simply droppped from the expression for D . 1 L With p = 2, D can be computed by PCA using the following procedure:2 L (1) Set B to equal the mean of z over t for each i = 1,...,K. 0,i t,i √ (2) Define z(cid:48) = α (z −B ). t,i i t,i 0,i (3) Set B to be the matrix of the first M principal components of z(cid:48),t = 1,...,T. 1 t (4) Let w(cid:48) be the projection of z(cid:48) onto the column space of B . t t 1 (5) Sum the square of the Euclidean distance between z(cid:48) and w(cid:48) over t and divide by T t t to obtain D . L With p = 1, D can also be computed by PCA generalized to L1 norm by finding an L M dimensional subspace that minimizes the sum of distance between z and the subspace. t Some care must be taken with the use of the term PCA with L1 norm, as it has been used 2The equivalence between the problem of finding the first M principal components and that of finding the minimum distance M-dimensional subspace follows from basic properties of PCA. See section 8 of Theil (1983), for example. 7
to describe at least three distinct optimization problems in recent studies.3 Step 2 must be √ changed so that z −B is multiplied by α , not by α . In addition, the objective function t,i 0.i i i to be minimized is not convex and there is no known process to obtain its exact solution, as opposed to the ordinary PCA with square distance. Therefore, only computational solutions of B and B are available. For example, both Ke and Kanade (2005) and Brooks et al. 0 1 (2013) develop methods to find B with a given B . B is sometimes chosen to be the 1 0 0 vector consisting of medians of z over t, but such a choice does not guarantee minimum in t,i general.4 3.3 Computation of D D T K D minimizes (1) (cid:80) (cid:80) α d(z ,g (I ,...,I )) over all possible functions g that is con- D T i t,i i t,1 t,M i t=1i=1 tant on nM subsets of RM that are products M partitions of R into n sets. Let L = nM and let P ,P ,...,P be the L sets that partition RM, where g (P ) is a singleton for any 1 2 L i j i = 1,...,K and j = 1,...,L. With c defined as g (P ), D can be computed by minimizing i,j i j D the objective function with respect to c . This can be achieved by choosing c to minimize i,j i,j the sum of the distance between c and z for t such that I ∈ P . i,j t,i t j Proposition 4. L K 1 (cid:88)(cid:88) (cid:88) D = ( ) α min d(z ,c). (8) D i t,i T c j=1 i=1 It∈Pj Proof. T K 1 (cid:88)(cid:88) D = ( )min α d(z ,c ), (9) D i t,i i,j(t) T ci,j t=1 i=1 where j(t) is the value of j such that I ∈ P . Dividing the summation over T into L groups t j according to the value of j(t) and moving min inside the summations produce the equation of the proposition. If d(x,y) = |x−y|p and p ≥ 1, a minimizer c of (cid:80) |z −c|p exists because the minimand t,i It∈Pj is a convex function of c. Especially, if p = 1 or p = 2, the minimizer c is the median or the 3Ke and Kanade (2005) and Brooks et al. (2013) present computational techniques to find an approximate minimal distance subspace, and their objective function is the same as that in equation (7). Kwak (2008) analyzes the problem of maximizing L dispersion of Euclidean projection of given points. Euclidean 1 projection does not find the point on a subspace that is closest to the original point under L norm. In 1 addition, unlike with L norm, maximizing the dispersion of projections is not equivalent to minimizing the 2 distance between original points and their projections with L norm. Park and Klabjan (2014) studies the 1 sameproblemasKwak(2008)andanotherproblemofminimizingtheL distancebetweengivenpointsand 1 their Euclidean projection into a subspace. 4For example, Brooks and Jot (2012) describes R codes that ‘center’ datapoints with their median. 8
mean, respectively, of z over t such that I ∈ P . In other words, c can be computed as t,i t j the group median or the group mean for each of the K dimensions of z , and the group is t defined by the index j such that I ∈ P . t j 3.4 Computation of D D D can be computed by finding I that minimizes D , given z and a partition of RK D t D t into L = nM sets. As with D , The number L can be any number, not just nM, but L = nM D is used to be consistent with earlier sections. Also, the shape of the partition, other than the number of sets in it, is irrelevant because I can be chosen in an arbitrary way. Since t I is arbitrary, it is sufficient to consider a partition of {1,2,...,T} into L sets P(cid:48),..,P(cid:48). t 1 L Then, following proposition 4, computing D reduces to a problem of finding a partition of D {1,...,T} into L sets minimizing D : D Proposition 5. L K 1 (cid:88)(cid:88) (cid:88) D = ( ) inf α min d(z ,c) (10) D i t,i T P(cid:48),...,P(cid:48) c 1 L j=1 i=1 t∈P(cid:48) j With d(x,y) = |x−y|p, the expression for D can be rewritten as follows: D L K 1 (cid:88)(cid:88) (cid:88) D = ( ) inf inf |α1/pz −c|p (11) D T P(cid:48),...,P(cid:48) c i t,i 1 L j=1 i=1 t∈P(cid:48) j This problem is an example of partitional clustering (see Chapter 3.3 of Jain and Dubes √ (1988)). With p = 2, the minimizer c is simply the mean of α z over t ∈ P(cid:48) for each i t,i j i = 1,...,K and j = 1,...,L. Finding the partition P(cid:48),...,P(cid:48) that minimizes the square 1 L distance between the group means and z has no closed-form solution, but the solution can t be approximated by a k-means algorithm. Similarly, with p = 1, the minimizer c is the median of α z , and a computational solution can be found by a k-medians algorithm. i t,i 3.5 Computation of D 0 D = D = D if n = 1 and hence, L = nM = 1. Therefore, D can be computed by 0 D D 0 proposition 4. 9
4 Use of Temperature Derivatives 4.1 Market Structure and Variation in Trade Volume The purpose of this section is to demonstrate the usefulness of informativeness measures in explaining the variation in trade volume in monthly HDD (heat degree days) options across 24 cities in the US. Payoff from temperature derivatives depends on the temperature recorded at certain weather stations. For example, Chicago Mercantile Exchange (CME) has various options and futures listed, which are based on HDD and CDD (cold degree days). HDD is the average of daily maximum and minimum temperature minus 65 degreees Fahrenheit, if that number is positive, and zero otherwise. Similarly, CDD is 65 degrees Fahrenheit minus the average of daily maximum and minimum temparature, if that number is positive, and zero otherwise.5 HDD and CDD broadly represent the demand for energy for heating and cooling, respectively. CMEfirstintroducedthesetemperaturederivativesbetween1999and2000on10weather stations. As trade volume grew, they listed contracts on 14 additional weather stations by 2008. However, trade volume started declining rapidly after 2010, and derivatives on 16 stations have been completely or partially delisted. As of June 2016, only 8 stations have a full range of derivatives, and 8 additional stations have only a limited range of derivatives.6 Table 1 lists the metropolitan areas, or cities, represented by these stations, and figure 1 shows the annual volume of trade on HDD monthly options, which account for a large part of temperature derivatives trading on CME. Firms whose profit is sensitive to temperature, such as gas and electricity untilities, may trade these derivatives to insure themselves against the risk of too high or too low temperature. P´erez-Gonz´alez and Yun (2013) provides evidence that some firms, especially gas and electricity utilities, use temperature derivatives to hedge risk and argues that such hedging increases the value of those firms. The volume of trades on derivatives based on different weather stations varies wildly. Acrossthe24weatherstations,themeanandthestandardvariationofthemeanannualtrade volume of monthly HDD options between 2009 and 2014 are 994 and 1710, respectively. The relativelylargestandarddeviationisdrivenbyafewweatherstationswithdisproportionately large volumes, such as New York. This variation in trade volume across the weather stations cannot simply be explained by the variation in the size of the cities represented. For example, the average trade volume 5This definition of HDD and CDD is from Chapter 403 of CME Rulebook. 6ThishistoryofCMEtemperaturederivativesisbasedonPurnanandamandWeagley(2016)andpublic announcements by CME. 10
Full listing Partial listing Delisted Atlanta, GA Des Moines, IA Baltimore, MD Chicago, IL Philadelphia, PA Detroit, MI Cincinnati, OH Portland, OR Salt Lake City, UT New York, NY Tucson, AZ Colorado Springs, CO Dallas-Fort Worth, TX Boston, MA Jacksonville, FL Las Vegas, NV Houston, TX Little Rock, AR Minneapolis-St. Paul, MN Kansas City, MO Los Angeles, CA Sacramento, CA Washington, DC Raleigh-Durham, NC Table 1: List of Represented Cities Volume(logscale) 106 105 104 103 102 101 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 Year Figure 1: Annual Trade Volume 11
for Boston and that for Washington were both lower than that for Portland, Sacramento, or Colorado Springs, even though Boston and Washington are both considerably larger than any of the three cities with larger trade volume. The lack of close correlation between trade volume and city size may be explained by the fact that there is a close substitute for Boston- or Washington-based contracts, which is contracts based on New York. New York-based contracts can serve as a substitute because trade volume on New York-based contracts is large and the deviation of daily HDD from its monthly average shows a highly positive correlation between New York, Boston, and Washington, reflecting their geographic proximity. Given this highly positive correaltion, considering eastern US as a whole, rather than a collection of individual cities, makes sense in understanding variation in trade volume. However, doingsoinvolvesanarbitrarychoiceofregionalboundaries, anddoesnotreflectvarying temperature correlation across different pairs of eastern cities. The informativeness measure can address this problem by quantifying the amount of temperature variation represented by any group of cities. Indeed, informativeness measure does as well as city size measures in explaining the different levels of trade volume, as shown in the rest of this section. 4.2 Data and Method Three types of data are used: (i) temperature, (ii) city size measures, and (iii) trade volume, on each of the 24 cities. The historical daily HDD values are publicly available on the CME website.7 Six different measures of city size are used, which are population, GDP, annual and winter gas consumption, and annual and winter expenditure on gas.8 Citylevel population comes from the Census Bureau,9 and GDP data from the the Bureau of Economic Analysis.10 The data are reported for each Metopolitan Statistical Area (MSA) and publicly available on the websites. State-level gas consumption and price are from the Energy Information Administration, again available on its public website.11 State-level gas consumption is converted to city-level consumption by multiplying state consumption by the ratio of city population to state population. The list of cities in table 1 shows in which state 7CME Group Inc. ‘Heating Degree Day (HDD), Historical Daily Data.’ http://www.cmegroup.com/ market-data/reports/historical-weather-data.html (accessed August, 2016). 8Gas consumption is measured by the volume of gas consumed, while expenditure is volume times price. These two measures are different because different prices apply to different cities. Winter is defined to be seven months from October to April. 9US Census Bureau. ‘City-Level Population.’ http://www.census.gov/popest/ (accessed August, 2016). 10US Department of Commerce. ‘Gross Domestic Product.’ Bureau of Economic Analysis. http://www. bea.gov/national/Index.htm (accessed August, 2016). 11USEnergyInformationAdministration. ‘State-LevelConsumptionandPrices.’ http://www.eia.gov/ petroleum/data.cfm (accessed August, 2016). 12
each city is located. However, assigning one state to each city ignores the fact that certain city areas intersect with multiple states. As a consequence, the city-to-state population ratio is close to one for New York and much greather than one for Washington. Trade volume has been collected from a Bloomberg terminal.12 The general idea tested in this section is that the set of cities with large combined trade volume should explain a large amount of variation in daily HDD across cities. More specifically, let the integers 1,2,...,24 denote the 24 cities, with U = {1,2,...,24} denoting the universe of cities. For a subset C of U, v(C) is the sum of trade volumes for the cities in C, and D(C) is the informativeness of their HDDs in representing the HDDs of all the cities in U for winter months. Using the language of previous chapters, the HDDs of the cities in C are market statistics and those of the cities in U are variables of interest. This section tests whether a high v(C) implies a small D(C) (small expected distance means more informative), while a low v(C) does not necessarily imply a large D(C). This hypothesis implies both smaller mean and smaller standard deviation of D(C) for larger v(C), considering the distribution of D(C) as functionally dependent on v(C). Following is the logic behind this hypothesis: A large v(C) means that it is enough to trade in C to hedge against a large part of temperature variation. Therefore, temperature indices for C must capture a large part of temperature variation across U, leading to a small D(C). At the same time, there can be alternative choices of C that have small D(C) but do not have large v(C), so a small v(C) is compatible with both a small D(C) and a large D(C). In comparing different C’s, all possible choices of C as a six-element set has been considered. Sets that have too few or too many elements will automatically have both small v(C) and D(C) or both large v(C) and D(C), respectively. Sets with six elements were chosen because they generated a good mix of high v(C) and low v(C), and the number of possibilities was small enough to require only modest computational power. In computing v(C), the averge annual trade volume from 2009 to 2014 was used. In computing D(C), the variables of interest were the deviations of daily HDD from its monthly average for each city. One interpretation of this choice is that mean monthly temperature can be perfectly predicted in advance, and the risk that needs to be insured against is the deviationofdailyHDDfromthepredictedmonthlyaverage. DailyHDDdatafromMay2008 to the end of 2015 are used, because May 2008 is the first month with available historical data from the source. Finally, for normalization, each of the six city size measures for each year betwen 2009 and 2014 is normalized by dividing an individual city’s measure by the sum of the measure over the 24 cities. Then, the average of each city’s size measure over 12Bloomberg Finance LP. ‘Trade Volume.’ (accessed August, 2016). 13
Gas con. Gas exp. Population GDP Gas con.2 Gas exp.3 (winter) (winter) Mean4 4.2 4.2 4.2 4.2 4.2 4.2 Median 2.4 2.4 2.4 2.5 2.5 2.4 St. dev. 4.3 4.8 5.2 5.4 5.2 5.4 Maximum 18.8 21.5 22.2 24.4 21.8 24.4 (New York) 1 Numbers are in percent. 2 Shorthand for consumption. 3 Shorthand for expenditure. 4 Mean is identical by construction. Table 2: Statistics on Size Measures from 2009 to 2014 is used as the city’s weight in computing D(C). D(C) is computed as D under linear distance, and divided by D for normalization. L 0 The other choices of informativeness measure, D with square distance or D with linear or L D square distance, had been tried as well for robustness and produced similar results. For comparison, aggregate city size measures S(C) were also computed, simply as the sum of city size measures over the cities in C. 4.3 Descriptive Statistics The pairwise correlation between the six city size measures is close to one. Table 2 shows that the mean size measure is larger than the median, which is consistent with the fact that there are a few very large cities. New York is larger than any other city by far, and certainly much larger than the mean. This can be seen from both tables 2 and 3, the latter of which ranks the cities by their population. Daily HDD minus its monthly average tends to show higher correaltion between cities that are geographically closer. Table 4 shows the correlation matrix for the following six cities: New York, Boston, Washington, Portland, Sacramento and Colorado Springs. The highly positive correlation between New York, Boston and Washington is consistent with the idea that temperature risks for Boston and Washington can be effectively insured by New York-based contracts, while temperature risks for the other cities cannot be. Figure 2 plots daily HDD minus monthly average in Boston and Colorado Springs against that in 14
Under 2 Between 2 and 5 Over 5 Des Moine (0.6) Cincinnati (2.0) Atlanta (5.2) Colorado Springs (0.6) Sacramento (2.1) Washington (5.5) Little Rock (0.7) Portland (2.1) Philadelphia (5.7) Tucson (1.0) Baltimore (2.6) Houston (5.8) Salt Lake City (1.1) Minneapolis-St. Paul (3.2) Dallas-Fort Worth (6.4) Raleigh-Durham (1.1) Detroit (4.1) Chicago (9.1) Jacksonville (1.3) Boston (4.4) Los Angeles (12.4) Las Vegas (1.9) New York (18.8) Kansas City (2.0) 1 Numbers in parentheses are normalized population. 2 The cities are listed in the order of increasing population in each column. Table 3: Population Ranking of Cities New York. The city size measures and informativeness measures for individual cities are weakly correlated with trade volumes. The rank correlation between trade volume, v({i}), and population, S({i}), is0.36, whiletherankcorrelationbetweentradevolumeandinformativeness, D({i}), using population as weights, is −0.15. Figure 3 plots the rank of population and that of informativeness measure against the rank of trade volume, which shows no strong relationship. 15
Colorado New York Boston Washington Portland Sacramento Springs New York 1.00 0.91 0.89 0.10 0.05 -0.09 Boston 1.00 0.76 0.11 0.07 -0.08 Washington 1.00 0.07 0.01 -0.08 Portland 1.00 0.49 0.38 Sacramento 1.00 0.32 Table 4: Correlation Matrix of HDD DailyHDDMinusMonthlyAverage Boston ColoradoSprings 30 40 30 20 20 10 10 0 0 −10 −10 −20 −20 −30 −30 −30 −20 −10 0 10 20 30 −30 −20 −10 0 10 20 30 NewYork NewYork Figure 2: Correlation in HDD between Cities 16
PopulationRank InformativenessRank 25 25 20 20 15 15 10 10 5 5 0 0 0 5 10 15 20 25 0 5 10 15 20 25 TradeVolumeRank TradeVolumeRank Figure 3: Relationship between Trade Volume and City Size or Informativeness 17
Mean Mean±St.Dev. Informativeness Population 0.6 0.5 0.45 0.55 0.4 0.5 0.35 0.45 0.3 0.4 0.25 0.35 0.2 0.3 0.15 0.25 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 TradeVolume TradeVolume Figure 4: Informativeness and Population as Functions of Trade Volume 4.4 Results Figure 4 shows the distribution of informativeness and population as a function of volume. All the possible sets of 6 cities are divided into 20 equally-sized bins in the order of increasing total trade volume, v(C). For each bin, the means of informativeness and population are computed, along with standard deviation, which represents how dispersed informativeness and popluation are for different levels of trade volume. As a reminder, the measure of informativeness referred to in this section is D (linear rule) with linear distance. L As mentioned earlier, using other informativeness measures, either with square distance or with D (discrete rule), produces similar results. D Consistent with the hypothesis proposed in section 4.2, informativeness measure D(C) decreases with trade volume, while population increases with trade volume. Consistent with the figure, the rank correlation between information distance and trade volume is −0.54, and that between population and trade volume is 0.65. Also, the standard deviation tends to be smaller for larger trade volume. A potential problem with looking at every possible six-element set of cities is that New York dominates all other cities both in trade volume and population. Therefore, the negative relationship between informativenss measure and trade volume and the positive relationship between population and trade volume may just reflect the difference between the sets of cities which contain New York and those which do not. To address this issue, the exercise is repeated only with all six-element sets that contain New York. 18
Informativeness Population 0.42 0.42 0.41 0.4 0.4 0.38 0.39 0.38 0.36 0.37 0.34 0.36 0.35 0.32 0.34 0.3 0.33 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.35 0.4 0.45 0.5 0.55 0.6 0.65 Trade Volume Trade Volume Figure 5: Mean of Informativeness and Population with New York The result of this second exercise is consistent with that from the first exercise. Figure 5 shows the mean of informativeness and population as a function of trade volume, and figure 6 shows their standard deviation. The decrease in standard deviation is more evident withinformativenessmeasure. Inaddition, therankcorrelationbetweeninformativenessand tradevolumeisstrongerat−0.48, comparedwith0.28betweenpopulationandtradevolume. This result suggests that the dominating effect of New York is stronger with population than with informativeness measure. Among all the six-element sets, the mean value of informativeness for the top quintile of trade volume is 0.35, compared with 0.44 on average for the other four quintiles. Similarly, the mean population for the top quintile is 0.37, compared with 0.22 for the others. The standard deviation of informativeness measure for the top quintile is 0.42, compared with 0.72 for the others. This decrease in the standard deviation is more pronounced than that for population, with 0.59 for the top quintile and 0.65 for the rest. Table 5 shows the mean and the standard deviation for each quintile. RedoingtheexerciseonlywithsetscontainingNewYorkproducessimilarresults, andthe standard deviation of the informativeness measure decreases more consistently with trade volume than that of population. Table 6 shows the mean and the standard deviation for each quintile, using only the sets containing New York. Overall, both the informativeness measure and population show behavior consistent with the hypothesis that (i) smaller informativeness measure (more informative) and higher size are associated with higher trade volume, and (ii) the variability in informativeness and size 19
St.Dev.ofInformativeness St.Dev.ofPopulation 0.055 0.062 0.06 0.05 0.058 0.045 0.056 0.04 0.054 0.035 0.052 0.03 0.05 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.35 0.4 0.45 0.5 0.55 0.6 0.65 TradeVolume TradeVolume Figure 6: St. Dev. of Informativeness and Population with New York All Sets Quintiles Informativeness Population Trade Volume 0.35 0.37 0.50 Top (0.042)1 (0.059) (0.057) 0.41 0.26 0.31 Second (0.073) (0.074) (0.057) 0.43 0.22 0.20 Third (0.076) (0.061) (0.017) 0.44 0.21 0.15 Fourth (0.073) (0.062) (0.015) 0.46 0.19 0.08 Bottom (0.066) (0.061) (0.029) 1 Numbers in ( ) are standard deviations. Table 5: Informativeness and Population for Each Volume Quintile 20
Sets Containing New York Quintiles Informativeness Population Trade Volume 0.33 0.39 0.58 Top (0.036)1 (0.054) (0.035) 0.35 0.37 0.52 Second (0.038) (0.057) (0.013) 0.36 0.36 0.48 Third (0.040) (0.058) (0.010) 0.37 0.35 0.44 Fourth (0.042) (0.059) (0.012) 0.39 0.34 0.39 Bottom (0.045) (0.057) (0.024) 1 Numbers in ( ) are standard deviations. Table 6: Informativeness and Population for Each Volume Quintile for Sets Containing New York is smaller for higher trade volume. 5 Conclusion This paper introduced a framework to quantify the informativeness of market statistics under different assumptions on how the recipients of the statistics infer their variables of interest from the statistics and how the deviation from true value is penalized. In particular, linear and discrete inference rules and linear and square penalties for deviation were shown to lead to easily computable measures for both informativeness and its theoretical bounds. This paper also used the informativeness measures to explain the different levels of trade volume in temperature derivatives across different weather stations within the US. Informativeness explains the variation in trade volume at least as well as city cize measures do. This example shows that the proposed informativeness measures are useful in understanding why some market statistics, or ‘numbers/variables’ more generally, are more frequently referred to or adapted as bases of financial contracts. 21
References Aharony, Joseph and Itzhak Swary (1980) “Quarterly Dividend and Earnings Announcements and Stockholders’ Returns: An Empirical Analysis,” Journal of Finance, Vol. 35, pp. 1–12. Brooks, J. P., J. H. Dula´, and E. L. Boone (2013) “A Pure L -Norm Principal Component 1 Analysis,” Computational Statistics & Data Analysis, Vol. 61, pp. 83–98. Brooks, J. Paul and Sapan Jot (2012) “pcaL1: An Implementation in R of Three Methods for L1-Norm Principal Component Analysis,” Optimization Online. Cover, Thomas M. and Joy A. Thomas (2006) Elements of Information Theory, Hoboken, NJ: John Wiley and Sons. Dasgupta, Anirban, Petros Drineas, Boulos Harb, Ravi Kumar, and Michael W. Mahoney (2009) “Sampling Algorithms and Coresets for (cid:96) Regression,” SIAM Journal on Computp ing, Vol. 38, pp. 2060–2078. Dimpfl, Thomas and Franziska J. Peter (2012) “Using Transfer Entropy to Measure Information Flows between Financial Markets,” SFB 649 Discussion Paper. Goh, Jeremy C. and Louis H. Ederington (1993) “Is a Bond Rating Downgrade Bad News, Good News, or No News for Stockholders?” Journal of Finance, Vol. 48, pp. 2001–2008. Jain, AnilK.andRichardC.Dubes(1988)Algorithms for Clustering Data, EnglewoodCliffs, NJ: Prentice Hall. Ke, Qifa and Takeo Kanade (2005) “Robust L Norm Factorization in the Presence of Out- 1 liers and Missing Data by Alternative Convex Programming,” in IEEE Conference on Computer Vision and Pattern ¿ Recognition (CVPR 2005), July. Kwak, Nojun (2008) “Principal Component Analysis Based on L1-Norm Maximization,” IEEE Transactions on Pattern Aanlysis and Machine Intelligence, Vol. 30, pp. 1672–1680. Park, Young Wook and Diego Klabjan (2014) “Algorithms for L1-Norm Principal Component Analysis.” P´erez-Gonz´alez, Francisco and Hayong Yun (2013) “Risk Management and Firm Value: Evidence from Weather Derivatives,” Journal of Finance, Vol. 68. 22
Pinches, George E. and J. Clay Singleton (1978) “The Adjustment of Stock Prices to Bond Rating Changes,” Journal of Finance, Vol. 33, pp. 29–44. Portnoy, StephenandRogerKoenker(1997)“TheGaussianHareandtheLaplacianTortoise: Computability of Squared-Error versus Absolute-Error Estimators,” Statistical Science, Vol. 12, pp. 279–300. Purnanandam, Amiyatosh and Daniel Weagley (2016) “Can Markets Discipline Government Agencies? Evidence from the Weather Derivatives Market,” Journal of Finance, Vol. 71. Schreiber, Thomas (2000) “Measuring Information Transfer,” Physical Review Letters, Vol. 85, pp. 461–464. Shannon, C. E. (1948) “A Mathematical Theory of Communication,” The Bell System Technical Journal, Vol. 27, pp. 379–423. Theil, Henri (1983) “Linear Algebra and Matrix Methods in Econometrics,” in Zvi Griliches and Michael D. Intriligator eds. Handbook of Econometrics, Vol. 1: Elsevier B. V. Chap. 1, pp. 3–65. 23
Cite this document
Kyungmin Kim (2016). Measuring the Informativeness of Market Statistics (FEDS 2016-076). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2016-076
@techreport{wtfs_feds_2016_076,
author = {Kyungmin Kim},
title = {Measuring the Informativeness of Market Statistics},
type = {Finance and Economics Discussion Series},
number = {2016-076},
institution = {Board of Governors of the Federal Reserve System},
year = {2016},
url = {https://whenthefedspeaks.com/doc/feds_2016-076},
abstract = {Market statistics can be viewed as noisy signals for true variables of interest. These signals are used by individual recipients of the statistics to imperfectly infer different variables of interest. This paper presents a framework under which the 'informativeness' of statistics is defined as their efficacy as the basis of such inference, and is quantified as expected distortion, a concept from information theory. The framework can be used to compare the informativeness of a set of statistics with that of another set or its theoretical limits. Also, the proposed informativeness measure can be computed as solutions to familiar problems under a range of assumptions. As an application, the measure is used to explain the difference in usage levels of temperature derivatives across different base weather stations. The informativeness measure is found to be at least as effective as city size measures in explaining the difference in usage levels.},
}