feds · March 31, 2016

Microstructure Invariance in U.S. Stock Market Trades

Abstract

This paper studies invariance relationships in tick-by-tick transaction data in the U.S. stock market. Over the period 1993-2001, the estimated monthly regression coefficients of the log of trade arrival rate on the log of trading activity have an almost constant value of 0.666, strikingly close to the value of 2/3 predicted by invariance hypothesis. Over the period 2001-2014, the estimated coefficients rise, and their average value is equal to 0.79, suggesting that the reduction in tick size in 2001 and subsequent increase in algorithmic trading resulted in a more intense order shredding in more liquid stocks. The distributions of trade sizes, adjusted for differences in trading activity, resemble a log-normal before 2001; there are clearly visible truncation at the round-lot boundary and clustering of trades at even-levels. These distributions change dramatically over the period 2001-2014 with their means shifting downwards. The invariance hypothesis explains about 88% of the cross-sectional variation in trade arrival rates and average trade sizes; additional explanatory variables include invariance-implied measure of effective price volatility.

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Microstructure Invariance in U.S. Stock Market Trades Albert S. Kyle, Anna A. Obizhaeva, and Tugkan Tuzun 2016-034 Please cite this paper as: Kyle, Albert S., Anna A. Obizhaeva, and Tugkan Tuzun (2016). “Microstructure Invariance in U.S. Stock Market Trades,” Finance and Economics Discussion Series 2016-034. Washington: Board of Governors of the Federal Reserve System, http://dx.doi.org/10.17016/FEDS.2016.034. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Microstructure Invariance in U.S. Stock Market Trades Albert S. Kyle, Anna A. Obizhaeva, and Tugkan Tuzun∗ First Draft: September 1, 2010 This Draft: April 19, 2016 This paper studies invariance relationships in tick-by-tick transaction data in the U.S. stock market. Over the 1993–2001 period, the estimated monthly regression coefficients of the log of trade arrival rate on the log of trading activity have an almost constant value of 0.666, strikingly close to the value of 2/3 predicted by the invariance hypothesis. Over the 2001–14 period, the estimated coefficients rise, and their average value is equal to 0.79, suggesting that the reduction in tick size in 2001 and the subsequent increase in algorithmic trading resulted in a more intense order shredding in more liquid stocks. The distributions of trade sizes, adjusted for differences in trading activity, resemble a log-normal before 2001; there is clearly visible truncation at the round-lot boundary and clustering of trades at even levels. These distributions change dramatically over the 2001–14 period with their means shifting downward. The invariance hypothesis explains about 88 percent of the cross-sectional variation in trade arrival rates and average trade sizes; additional explanatory variables include the invariance-implied measure of effective price volatility. JEL: G10, G23 Keywords: market microstructure, transactions data, market frictions, trade size, tick size, order shredding, clustering, TAQ data. ∗ Kyle: Robert H. Smith School of Business, University of Maryland, College Park, MD 20742 USA, akyle@rhsmith.umd.edu. Obizhaeva: New Economic School, Moscow, Russia, aobizhaeva@nes.ru. Tuzun: Board of Governors of the Federal Reserve System, Washington, DC 20551 USA, tugkan.tuzun@frb.gov. We are grateful to Elena Asparouhova, Peter Bossaerts, and Stewart Mayhew for very helpful comments on an earlier draft of this paper. Joseph Saia

1 Over the past two decades, the U.S. stock market has undergone notable transformations. Technological changes and regulatory reforms have significantly influenced the way stocks are traded. This paper uses market microstructure invariance to define benchmarks for examining how changing market frictions are reflected in cross-sectional and time-series variation in the number and size of trades reported in public transaction data feeds. Large variation in transaction data makes such analysis challenging. Relying on the benchmarks imposed by the market microstructure invariance of Kyle and Obizhaeva (2016) enables us to filter out the substantial “natural” variation in trading activity across markets and analyze properties of the data due to effects of market frictions. Theinvariancehypothesis isbasedontheintuitionthattradinginsecurities markets can be modeled as trading games played at different speeds. Asset managers place bets or meta-orders, which approximately represent uncorrelated decisions to buy or sell specific numbers of shares. A bet may be executed as many smaller orders. The speed with which business time passes is the speed with which new bets are made. In markets for liquid stocks, trading occurs at fast speeds and bets arrive over short horizons, perhaps only a few minutes. In markets for illiquid stocks, trading takes place slowly and bets arrive over longer horizons, perhaps a few months. Theinvariance hypothesis conjectures thatthedollarrisktransferred bybetsand the dollar costs of executing bets are the same across markets when measured in business-time units corresponding to the rate at which bets occur. This hypothesis implies a specific decomposition of the order flow. Trading activity—a measure of aggregate risk transfer—is defined as the product of dollar volume and return volatility. In frictionless markets, invariance implies that the number of bets is proportional to the 2/3 power of trading activity, and the distribution of bet sizes as a fraction of daily volume is proportional to the negative 2/3 power of trading activity. These invariance principles define frictionless market benchmarks for examining the number of trades and the distribution of trade sizes in the Trades and Quotes (TAQ) dataset that contains tick-by-tick transactions between 1993 and 2014 for the stocks listed in the U.S. market. Microstructure invariance is ultimately an empirical hypothesis. Over the 1993– 2001 subperiod, a time series of month-by-month regression coefficients of the log of trade arrival rates on the log of trading activity shows that the estimated coefficientsremainedvirtuallyconstant. Theestimatedcoefficientof0.666isindeed strikingly close to the benchmark invariance prediction of 2/3. After 2001, the monthly estimates increase from about 0.690 in 2001 to about 0.77 in 2014; this provided excellent research assistance. Kyle has worked as a consultant on finance topics for companies, banks, stock exchanges, and various U.S. federal agencies including the Securities and Exchange Commission, the Commodity Futures Trading Commission, and the Department of Justice. He also serves as a non-executive director of a U.S.-based asset manager. The views expressedhereinarethose of the authors anddo notnecessarilyreflectthe viewsof the Boardof Governors or the staff of the Federal Reserve System.

2 breakdown in the invariance relationships is both statistically and economically significant. For the years 1993, 2001, and 2014, the empirical distributions of logs of scaled print sizes for stocks sorted into 10 dollar-volume groups and 4 price-volatility groups tell a similar story (see figures 6–10). In 1993, consistent with invariance, all 40 empirical distributions resemble a bell-shaped normal density function with common mean and variance across the 40 subgroups. In 2001 and 2014, the shape of the distributions looks much less like the shape of a normal distribution than in 1993. Furthermore, average scaled trade sizes decrease during the 1993–2014 period by a factor of about 2. Statistical tests clearly reject the hypothesis that scaled trade sizes are distributed as a common log-normal random variable. The rejection arises due to clearly visible microstructure effects such as the one-cent tick size, censoring of trades at the minimum round-lot threshold, and clustering of trades at round-lot sizes such as 100, 1,000, and 5,000 shares, consistent with O’Hara, Yao and Ye (2012) and Alexander and Peterson (2007). Invariance explains a substantial fraction of the variation in trade arrival rates and average trade sizes across stocks, especially in the first half of the sample. Specifically, when the slope coefficient is restricted to be ±2/3 as implied by the invariance hypothesis and only intercepts are estimated in separate month-bymonth regressions, the time series of R2 fluctuates around 0.88. Glosten and Harris(1988)find that average tradesize (inshares) is negatively relatedto market depth. Brennan and Subrahmanyam (1998) regress average trade sizes on return volatility, standard deviation of trading volume, market capitalization, number of analysts following a stock, number of institutional investors holding a stock, and the proportion of shares institutional investors hold. The R2 of 0.92 in their crosssectional regressions with multiple explanatory variables is only modestly larger than the average R2 of 0.88 in our restricted regressions. This small difference suggests that other variables offer only limited improvement in explanatory power over the invariance hypothesis. We attribute the remaining variation to differences in market frictions resulting from how lot size and tick size are related to volume, volatility, and stock price. These market frictions are studied by Harris (1994), Angel (1997), Goldstein and Kavajecz (2000), and Schultz (2000). A new perspective on these market frictions results from examining the data throughthelensof theinvariancehypothesis. Weintroduce two new measures that take into account that trading games run at different speeds in different financial markets. Effective tick size is defined as the ratio of tick size to price volatility in business time. Effective lot size is defined as a ratio of lot size to a median bet, or equivalently to volume in business time. Both measures are closely related to effective price volatility, defined as price volatility in business time. In addition to the 88 percent of the variation in print arrival rates and average print sizes explained by the invariance benchmark, an additional 4.5 percent and 5.5 percent can be attributed to variations in effective price volatility during the 1993–2000

3 and 2001–14 periods, respectively. It is difficult to further disentangle the effects of lot size and tick size because the effects go in opposite directions. The invariance hypotheses of Kyle and Obizhaeva (2016) make predictions about bets rather than actual trades or prints resulting from shredding bets into smaller pieces over time and over trading venues via order execution algorithms. Because there is an important distinction between bets and trades, we do not expect estimates based on prints to precisely match the predictions of the invariance hypothesis for bets. Conceptually, the invariance-implied benchmarks are appropriate only under assumptions that the ratio of bet size to trade size and the ratio of intermediate volume to bet volume remain constant across stocks and time. Our results suggest that order shredding and the degree of intermediation may have increased over time, with bets in more liquid stocks recently generating more trades and perhaps more intermediation trades per bet than bets in illiquid stocks. Our results supplement the findings of Kyle and Obizhaeva (2016), who provide evidence in favor of the invariance hypothesis using a sample of portfolio transitions. Portfolio transition orders are better suited for testing the invariance hypothesis, as they can be thought of as good proxies for bets, but they represent only a small subset of transactions in the U.S. stock market. In contrast, this study is based on a much broader sample that includes all reported transactions in the U.S. stock market; the advantage comes at the expense of having to deal with transaction data affected by order shredding and intermediation. The remainder of this paper states the implications of the invariance hypothesis, discusses the design of our empirical tests, and presents our results. 1. Testable Implications of the Invariance Hypothesis. An Invariance Hypothesis. Wefirstreviewtheempiricalhypothesisofmarket microstructure invariance developed in Kyle and Obizhaeva (2016). Market microstructureinvarianceisbasedontheintuitionthattradinginspeculative markets can be thought of as the same trading game being played out in different markets at different speeds. This game takes place at a fast speed in active, liquid markets and at a slow speed in inactive, illiquid markets. Asset managers buy and sell securities by placing bets that represent decisions to acquire a long-term position of a specific size, distributed approximately independently from other such decisions. Intermediaries with short-term trading strategies clear markets by taking the other side of bets. Suppose bets arrive at an expected rate of γ bets per day and their size is jt Q˜ shares in asset j and time t. The bet arrival rate γ measures the speed of jt jt the market. The random variable Q˜ has a zero mean; positive values represent jt buying, and negative values represent selling. Let P denote the share price in jt

4 dollars, and let V denote expected daily volume in shares: jt ζ (1) V = jt ·γ ·E|Q˜ |. jt jt jt 2 In this equation, expected daily volume is equal to the product of the expected number of bets per day γ and their average size E|Q˜ |, adjusted for the volume jt jt multiplier ζ . jt The parameter ζ in equation (1) measures the short-term intermediation tradjt ing as the ratio of total volume to bet volume. Non-bet intermediation volume includes trading by market makers, high-frequency traders, and other arbitragers who intermediate among long-term bets. The parameter ζ intuitively reflects the jt typical lengthof intermediation chains inthe market; thelonger the intermediation chains, the larger the ζ . Volume is divided by 2 because each unit of volume has jt both a buy side and a sell side. If there is no intermediation, then ζ = 1. For jt example, if each bet is intermediated by a single market maker, similar to a New York Stock Exchange (NYSE) specialist intermediating every bet, then ζ = 2. If jt each bet is intermediated by two market makers, who lay off positions trading with one another, similar to Nasdaq dealers in the early 1990s, then ζ = 3. If each jt bet goes through the hands of multiple short-term intermediaries before finding its place in portfolios of long-term traders, then ζ > 3. Let W denote trading activity, defined as the product of daily returns volatility jt σ and expected daily dollar volume P ·V : jt jt jt (2) W = σ ·P ·V . jt jt jt jt Trading activity measures aggregaterisk transfer taking place inthe market during the day. It can be easily calculated, as there are usually data available for prices, volume, and volatility. Plugging equation (1) into equation (2) shows that trading activity may be written in terms of less easily observable characteristics of order flow, such as bet arrival rates γ and bet sizes E|Q˜ |: jt jt ζ (3) W = jt ·σ ·P ·γ ·E|Q˜ |. jt jt jt jt jt 2 Theinvariancehypothesispredicts howthesecharacteristics γ andQ˜ varyacross jt jt markets with different levels of trading activity W . jt Business time is measured by the expected arrival rate of bets γ , and the dollar jt risks transferred by bets per unit of business time are the same across assets and time. More specifically, the random variable I˜ , defined by jt P ·Q˜ ·σ (4) I˜ := jt jt jt = d I˜, jt 1/2 γ jt has an invariant probability distribution I˜. Here, the risk transferred by a bet per unit of business time is the product of dollar bet size P Q˜ and returns volatility jt jt −1/2 per unit of business time σ ·γ . jt jt

5 The invariance hypothesis (4) and equation (3) yield the following testable predictions concerning how bet arrival rate γ and bet size Q˜ vary with trading jt jt activity: 2/3 (5) γ = µ ·W , jt γ jt ˜ |Q | (6) jt = d µ ·W −2/3 ·I˜, V q jt jt where µ and µ parameters depend on the volume multiplier ζ and the moments γ q jt of |I˜|.1 In these equations, the distribution of the random variable I˜ is the same across assets and time. In what follows, we make an identifying assumption that the volume multiplier ζ is an invariant constant, and therefore µ does not have jt γ indices j and t. Under this assumption, the equations imply that the scaled bet −2/3 2/3 ˜ arrival rate W · γ and distributions of scaled bet sizes W · |Q |/V — jt jt jt jt jt that is, all of its percentiles—are invariant across markets. For example, Kyle and Obizhaeva (2016) find that the distribution of logs of scaled bet sizes, 2/3 · ln(W )+ln(|Q |/V ), is close to a normal with log-variance 2.53. jt jt jt Equations (5) and (6) fully describe the composition of the order flow. Changes in trading activity come from both changes in bet sizes and changes in bet arrival ˜ rates. Specifically, if ζ and I are invariant and σ is constant by assumption, then jt a 1 percent increase in trading activity W is associated with an increase by 2/3 of jt one percent in the bet arrival rate γ and an upward shift by 1/3 of one percent of jt the entire distribution of unsigned bet sizes P |Q˜ |. The latter is also equivalent jt jt to a downward shift by 2/3 of one percent in the distribution of unsigned bet sizes as a fraction of share volume |Q˜ |/V . jt jt KyleandObizhaeva (2016)alsodiscuss aninvariance hypothesis relatedtotransaction costs. In this paper, we focus only on order flow and leave testing the implications for market impact and bid-ask spreads for future research. Invariance Implications for TAQ Print Data. The TAQ dataset reports transaction prices and share quantities for all trades in stocks listed in the United States from 1993 to 2014. Each report of a trade execution is called a “print.” We test implications of the invariance hypothesis using data on TAQ print sizes and the number of TAQ prints recorded per day. Testing invariance this way is not straightforward because prints are different from bets. One bet may generate multiple prints. To minimize transaction costs, traders often break bets or meta-orders into smaller pieces—as documented in Keim and Madhavan (1995), among others—and execute them at several venues, trading with multiple counterparties at multiple prices. −2/3 −1/3 1The specific parameter values are µ :=E ζjt ·|I˜| and µ :=E ζjt ·|I˜| . γ 2 q 2 h i h i

6 Let X denote the unsigned number of shares in a single print for asset j and at jt time t. Let ξ denote the ratio of the average size of a bet to the average size of a jt print, so that ξ represents the average number of prints per bet. In practice, we jt expect tiny orders (for example, for the minimum round-lot size of 100 shares) to be executed as one print and gigantic orders to be executed as thousands of small prints or as one print of a gigantic block trade. This multiplier depends on specific details of order-shredding algorithmsused by traders—as modeled by Almgren and Chriss (2000) and Obizhaeva and Wang (2013)—and may potentially vary across stocks in a complex and systematic manner that depends on tick size, lot size, and perhaps other factors. The distribution of average print sizes X˜ differs fromthe distribution of average jt bet sizes Q˜ by a factor ξ : jt jt (7) X˜ = ξ ·|Q˜ |. jt jt jt Let N denote the expected number of prints per day for asset j at time t. Each jt bet Q˜ results on average in ξ prints and its execution inflates volume by a factor jt jt of ζ /2 due to induced intermediation volume. The expected number of prints N jt jt differs from the expected number of bets γ by a factor of ξ ·ζ /2: jt jt jt ζ jt (8) N = ξ · ·γ . jt jt jt 2 It is easy to show that equations (5) and (6) imply the following testable implications for the number of prints and their sizes: (9) N = µ ·Wαn, jt n jt |X˜ | (10) jt = d µ ·Wαx ·I˜, V x jt jt where α = −α = 2/3 andparameters µ andµ depend onthe volume multiplier n x n x ζ , the order-shredding multiplier ξ , and moments of |I˜|.2 jt jt As a benchmark for interpreting our empirical results, we make two identifying assumptions. First, assume that there exists an invariant order-shredding multiplier ξ such that ξ = ξ for any asset j and time t. Second, assume that there jt exists an invariant volume multiplier ζ such that ζ = ζ for any asset j and time jt t. For simplicity of exposition, results may be interpreted under the identifying assumptions that ξ = 1 and ζ = 2. This case corresponds to the hypothesis that each bet is executed as one print against a single intermediary, which makes µ n −2/3 2 Specific parameter values are µ := (ξ · ζ /2) · E ζjt ·|I˜| and µ := ξ · n jt jt 2 x jt E ζjt ·|I˜| −1/3 . h i 2 h i

7 and µ in equations (9) and (10) constant across markets with no indices j and t. x It is also straightforward to write these predictions in the form of regressions. Inactualmarkets,ξ maycertainlydeviatefromξ = 1andζ mayalsodeviatefrom ζ = 2. Theorder-shreddingalgorithmsthatdetermineξ maypotentiallydependon order sizes and tick size. The amount of intermediation may potentially fluctuate with volatility and trading volume. The effect of relaxing these assumptions is discussed later. Alternative Hypotheses. We also consider two alternative benchmark hypotheses that make different predictions about the implied exponents on trading activity in equations (9) and (10). The first alternative hypothesis, invariance of print frequency, asserts that the expected number of prints is the same for all stocks, implying α = −α = 0 in n x equations (9) and (10). Of course, this hypothesis is empirically unrealistic; it is well known that actively traded stocks have more prints than inactively traded stocks. This benchmark is potentially interesting because the illiquidity measure of Amihud (2002) is based on the assumption that order imbalances are proportional totradingvolume. Thisproportionalityonlyholdsifthenumber ofbetsisconstant across markets. The second alternative hypothesis, invariance of print sizes, asserts that the number of prints is proportional to trading activity, implying that α = −α = 1 n x in equations (9) and (10). Related models are discussed in Tauchen and Pitts (1983), Harris (1987), Jones, Kaul and Lipson (1994), Hasbrouck (1999) and An´e and Geman (2000), among others. Although extreme, these two hypotheses provide convenient benchmarks for thinking about the relationship between invariance and the existing literature on trade size and trade frequency. Institutional Details Related to the Microstructure of TAQ Data. Although we make the identifying assumptions that the order-shredding multiplier is invariant (ξ = ξ) and the volume multiplier is invariant (ζ = ζ), we do not it it expect these assumptions to perfectly describe prints in the U.S. equity market. Instead, these assumptions generate benchmarks that can be used to evaluate the economic significance of deviations resulting from changes in various institutional arrangements related to tick size, lot size, and intermediation. We next discuss how changing institutional arrangements may have affected the order-shredding multiplier and volume multiplier over the 1993–2014 sample period. Progress in computing technology and regulatory changes have significantly affected trading patterns. As a result, the degree of order shredding varies with order size, ticker symbol, and time. At the NYSE in the early 1990s, traders typically executed large bets as block trades in the “upstairs” market, in which case at least one side of a reported block trade might correspond precisely to a bet. Prior to the changes in order handling

8 rules in 1997, Nasdaq dealers often took the other side of entire bets because customers themselves could not place their own orders into a central limit order book and Nasdaq dealers were unhappy if customers “bagged” them by dumping many blocks into the market one after another. As the use of the NYSE’s Direct OrderTransfer (DOT)system becamemorecommonlyusedby professional traders in the 1990s, the use of electronic order submission strategies and order shredding increased. For Nasdaq stocks, this practice accelerated after new order handling rules were implemented in the late 1990s. Bets were shredded more in the second half of our sample (2001–14) than in the second half (1993–2000). In 2001, the tick size was cut from 6.25 cents (1/16 of a dollar) to one cent. As a result, quoted bid-ask spreads decreased, and fewer shares were shown at the best bid and best offer. Traders used electronic interfaces to place scaled limit orders of small size at adjacent price points separated by one cent, and this led to smaller print sizes for bets of the same size. Regulation National Market System (NMS), introduced in 2005, further encouraged market fragmentation and competition among multiple trading venues based on speed and efficiency of electronic interfaces, which led to significant order shredding across both time and trading venues. In the past decade, continued improvements in computer technology have widened the use of electronic order handling systems and have made it practical to shred bets for many thousands of shares into tiny pieces of 100 shares or fewer. To summarize, we expect prints to more closely resemble bets in the earlier part of our sample but to become less representative of bets more recently. We also expect that large bets may result in more prints than small bets. Several institutional details associated with trade reporting may have further influenced order shredding: • Although traders may have increasingly shredded orders into “odd lots” of fewer than 100 shares, some traders probably resist shredding orders into numerous odd lots. • Under NYSE Rule 411(b), broker-dealer member firms have an obligation to consolidate a customer’s odd-lot orders if the share amount of such orders exceeds 100 shares. Other exchanges have similar provisions and have brought enforcement cases against member firms that did not comply with those rules. • During the sample period, odd-lot transactions were executed through a separate odd-lot trading system, and these small trades were not reported for dissemination on the consolidated tape, as discussed by O’Hara, Yao and Ye (2012). • “Tape shredding” affects trading patterns. As suggested by Caglio and Mayhew (2012), large orders may be broken up into more trades than small orders to generate additional revenue from sales of consolidated trade and quote data.

9 • There is some asymmetry in the treatment of buy and sell orders. Large bets are likely to be matched against multiple bets of smaller sizes, also resulting in more TAQ prints for large bets than for smaller ones. According to the Consolidated Tape Association (CTA), the exchanges are required to collect and report last sale data (CTA Plan (1992) Section VII). At the NYSE, for example, it is the duty of the member representing the seller to ensure that a trade has been reported. Because the rules required reporting of “sales” and not “trades,” order splitting may be intrinsically more important for intended buy orders than intended sell orders. The volume multiplier may also vary with intended order size, ticker symbol, and time. In the beginning of our sample, for example, the volume multiplier was probably larger for orders traded on Nasdaq than for orders traded on the NYSE. AtkynandDyl(1997)claimthatbecauseNasdaqdealerswereeither buyers or sellers in almost every trade at the Nasdaq, the Nasdaq trading volume was inflated by at least a factor of 2 relative to the number of trades actually occurring between end investors. Over time, these patterns may have changed, as dealers’ participation rate in trade facilitation has decreased and trades from other trading systems have begun to be reported on the consolidated tape through the Nasdaq system. Technological developments have most likely increased the amount of intermediation in securities markets in the second half of the sample. The number of TAQ prints has soared because of order shredding and intermediation by high-frequency traders, who now account for a significant share of volume as described in Chordia, Roll and Subrahmanyam (2011) and Hendershott, Jones and Subrahmanyam (2011). For example, Kirilenko et al. (2015) find that high-frequency traders account for more than 30 percent of stock index futures trading volume but hold their inventories for only a few minutes. Another important market friction is the tick size. The tick size was reduced from 12.5 cents (1/8 of a dollar) to 6.25 cents (1/16 of a dollar) in 1997 and to one cent in 2001. Changes in tick size affect trading decisions by changing incentives to provide liquidity and shred orders, as discussed in Harris (1994). When volatility is high and stock price is high, the tick size is small relative to a typical day’s trading range, and thus there are better opportunities for order shredding andmakingintermediation trades. Althoughfirmscanimplement stocks splitstoadjustpercentageticksizes, theseadjustmentsoccurinfrequently andwith long time lags, as noted in Angel (1997). As we discuss later, theinvariance hypothesis suggests that it is necessary totake into account differences in the speed of trading games across markets to properly analyze the effects of market frictions.

10 2. Data. 2.1. Data Description The NYSE TAQ database, which is accessed through Wharton Research Data Services (WRDS), contains trades and quotes reported on the consolidated tape by each participant in the Consolidated Tape Association (CTA) for all stocks listed on all exchanges during the entire 1993–2014 sample period. Because we do not attempt to sign trades as buys or sells based on whether they are executed at the bid or ask price, our analysis employs only data on trades, not quotes. For each trade, the data record the time, exchange, ticker symbol, number of shares traded, execution price, trade condition, and other parameters. The data set contains about 50 billion records with the number of data entries exponentially increasing over time from over 5 million records per month in 1993 to over 400 million records per month in 2014. We transform the very large raw data files into a smaller dataset convenient for subsequent analysis. First, bad records are removed using standard filters. The TAQ database provides information about the quality of recorded trades using condition and correction codes. We eliminate prints with condition codes 8, 9, A, C, D, G, L, N, O, R, X, Z or with correction codes greater than 1. The correction code 8 indicates, for example, that the trade was canceled. The remaining prints are aggregated in a specific way to reduce the size of the data set while preserving information about the monthly distributions of trade sizes. For each ticker symbol and each day, each print is placed into one of 55 bins constructed based on the number of shares traded. Letting X denote the size of a print in shares, “even” bins correspond to prints of the following exact “even” sizes of X = 100, X = 200, X = 300, X = 400, X = 500, X = 1,000, X = 2,000, X = 3,000, X = 4,000, X = 5,000, X = 10,000, X = 15,000, X = 20,000, X = 25,000, X = 30,000, X = 40,000, X = 50,000, X = 60,000, X = 70,000, X = 75,000, X = 80,000,X = 90,000, X = 100,000,X = 200,000, X = 300,000, X = 400,000, and X = 500,000 shares. “Odd” bins correspond to prints with trade sizes X between adjacent even bins—that is, X < 100, 100 < X < 200, ..., 400,000 < X < 500,000, and 500,000 < X. Note that the size of bins grows approximately at a log-rate. Prints with even sizes are considered separately because trades tend to cluster in round-lot sizes. The result is a much smaller dataset storing the number of trades by day, ticker symbol, and bin. To simplify the subsequent analysis, we make the approximate assumption that the assumed average print size in a bin (in shares) is equal to a midpoint of that bin. If print size is larger than 500,000 shares, we assign it to the 55th bin and assume its size to be 1,000,000 shares. This simplified aggregation makes it possible to capture the most important properties of printsize distributions while implementing our analysis efficiently. The convenience comes, however, at the expense of introducing some additional noise, which may slightly affect results.

11 For each day and each ticker symbol, the small database also stores the open price; the close price; the number of trades per day; the dollar volume per day; the share volume per day; the close-to-close return; and the volatility, defined as the daily standard deviation of returns over the past 20 trading days from the TAQ data. For each stock, each bin, and each month, the number of prints is summed over all the days in the month to calculate the frequency of trade sizes. Later we average the frequency distributions of scaled print sizes to construct an empirical distribution of print sizes (in shares) for each stock and each month in the sample. Aggregation by month makes it possible to build better empirical approximations to theoretical distributions for the many inactively-traded stocks with few daily prints. In addition to calculating the average number of prints per day, we calculate several statistics describing the possibly complicated shape of the distribution of print sizes. We consider the average print size and various percentiles of tradesize distributions. We refer to these percentiles as trade-weighted percentiles. For example, the xth trade-weighted percentile corresponds to a print size such that prints with sizes below this threshold constitute x percent of all prints for a given stock in a given month. Note that trade-weighted percentiles effectively put the same weight onto prints of different sizes, which tends to emphasize small trades. For example, if there are 99 prints of 100-sharelots and one print of 100,000shares, then the distribution of print sizes is mostly concentrated at a 100-share level. All trade-weighted percentiles below the 99th percentile are equal to 100 shares. The total trading volume and average print size, however, are largely determined by one big print of 100,000 shares. Because large trades are economically more important than small trades, we also investigate the right tail of print-size distributions in more detail by examining volume-weighted percentiles based on trades’ contributions to total volume. The contribution to the total volume by trades from a given print-size bin is calculated based on its midpoint. The volume-weighted distributions give the percentage of trading volume resulting from prints of different sizes. The xth volume-weighted percentile corresponds to a trade size such that trades with sizes below this threshold constitute x percent of total trading volume. In the example in the previous paragraph, percentiles 1–9 are 100 shares and percentiles 10–99 are 100,000 shares. We report empirical results for both trade- and volume-weighted distributions. Of course, if we know a trade-weighted distribution of print sizes, then we can easily calculate a volume-weighted distribution as well. For the purpose of comparing trade-weighted and volume-weighted distributions, the log-normal is a useful benchmark. It is a straightforward exercise (involving a change in probability measure) to show that that if the log of trade-weighted print size is distributed as N(µ,σ2), then the log of volume-weighted print size is distributed as N(µ+σ2,σ2). The only difference between the two distributions is the shift in mean; the logvariance remains the same.

12 To acquire share and exchange codes for stocks in our sample, the monthly data are matched with the Center for Research and Security Prices (CRSP) data, which is accessed through WRDS. Only common stocks listed on the NYSE, American Stock Exchange (AMEX), and Nasdaq from 1993 through 2014 are included in our study. Stocks that had splits or reverse splits in a given month are eliminated from the sample for that month. For each stock and each month, the data are also augmented by adding average daily volume (in dollars and in shares), average price, andthehistorical volatility. Ourfinal sampleincludes 1,383,857stock-month observations. For each of the 263 months between February 1993 and December 2014, there are, on average, observations for about 5,262 stocks. 2.2. Descriptive Statistics Table 1 describes the data. Panel A reports statistics for the 1993–2000 subperiod. Panel B reports statistics for the 2001–14 subperiod. These statistics are reported separately because the properties of the data changed substantially following decimalization in2001. Statistics arecalculatedfor allsecurities inaggregate as well as separately for 10 groups of stocks sorted by average dollar volume. Instead of dividing the securities into 10 deciles with the same number of securities, volume break points are set at the 30th, 50th, 60th, 70th, 75th, 80th, 85th, 90th, and 95th percentiles of dollar volume for the universe of stocks listed on the NYSE with CRSP share codes of 10 and 11. Group 1 contains stocks in the bottom 30th percentile. Group 10 contains stocks in the top 5th percentile. Group 10 approximately corresponds to the universe of S&P 100 stocks. The top five groups approximately cover the universe of S&P 500 stocks. Smaller percentiles for the more active stocks make it possible to focus on the stocks that are the most economically important. For each month, the thresholds are recalculated and stocks are reshuffled across groups. Summary Statistics before 2001. Panel A of table 1 reports statistical properties of securities and prints in the sample before 2001. For the entire sample of stocks, the average trading volume is $6.186 million per day, ranging from $0.15 million for the lowest volume decile to $176.99 million for the highest volume decile. The average volatility for the entire sample is equal 4 percent per day. The volatility tends to be higher for smaller stocks. The volatility is 4.5 percent for the lowest-volume decile and 2.9 percent for the highest-volume group. Thus, the measure of trading activity, equal to the product of dollar volume and volatility, increases from 0.15 · 0.045 to 176.99 · 0.029—that is, by a factor of 760 from the lowest- to the highest-volume group. Before 2001, the average print size is equal to $23,629, ranging from $11,441 for low-volume stocks to $89,338 for high-volume stocks. This corresponds to a decrease from 7.6 percent to 0.05 percent of daily volume from the lowest- to the highest-volume group. The median is much lower than the mean, as large prints make the distribution of print sizes positively skewed. The trade-weighted median

13 ranges from $5,682for low-volume stocks to $28,567for high-volume stocks, corresponding toadecrease from3.8percent to0.016percent ofdailyvolume. Notethat the invariance hypothesis predicts that the shape of the distributions of trade sizes as a fraction of daily volume must be similar across stocks, with the only difference that their log-means are shifted downward by two-thirds of the increase in a logtrading activity (equation (10)). Because trading activity increases by a factor of 760 from the lowest to the highest deciles, a back-of-the-envelope calculation suggests that the distributions of trade sizes as a fraction of volume should be shifted downward by a factor of 7602/3 ≈ 80. This estimate is less than the observed differences in means of 7.6%/0.05% ≈ 150 and medians of 3.8%/0.016% ≈ 240 between the highest- and lowest-volume groups. Inthe 1993–2001subperiod, the average number of prints recorded per dayis 142 for the entire sample, increasing monotonically from 17 to 2,830 from the first to the tenth volume group. The number of prints increases by a factor of 2,830/17 ≈ 166. The invariance hypothesis predicts that the expected number of prints should increase by two-thirds of the increase in trading activity (equation (9))—that is, 7602/3 ≈ 80. This back-of-the envelope calculation suggests that the number of prints increases more than predicted, potentially reflecting a more intensive order shredding in high-volume groups, but further investigation is certainly warranted. Some print sizes are unusually common in the TAQ data. Before 2001, evensized trades account for over 61 percent of volume traded and 80 percent of trades executed. The fraction of even-prints is stable across volume groups. The prevalence of these prints validates our choice of bins with even-share bins considered separately. About 16percent of all transactions and2 percent of volume traded are executed in 100-share prints. These trades represent 15 percent of transactions for low-volume stocks and 25 percent of transactions for high-volume stocks. There is also a significant number of 1,000-share prints. The large fraction of 1,000-share prints for low-volume stocks relative to high-volume stocks—18 percent versus 14 percent—probably reflects the regulatory rule according to which the Nasdaq market makers had to post quotes for at least 1,000 shares prior to 1997. Summary Statistics after 2001. Panel B of table 1 describes statistical properties of the sample after 2001. Daily volume more than tripled from $6.18 million before 2001 to over $24.4 million after 2001. Volatility decreased from 4 to 3.1 percent per day. These numbers imply that trading activity doubled from the 1993–2001 to 2001–14 subperiods. The average number of prints increased by a factor of 21 from 143 to 3,013, and the average print size decreased by more than a factor of 3 from $23,629 to $6,424. Back-of-the-envelope calculations implied by the invariance hypothesis suggest that the changes in print arrival rates and print sizes cannot be explained only by differences in levels of trading activity between the two subperiods but must be attributed to other factors. One of these factors is the order shredding that became increasingly prevalent over time, especially after the reduction in tick size to one cent on January 29, 2001, for NYSE stocks and on April 9, 2001, for Nasdaq stocks. During the 2001–14 subperiod, for example,

14 100-share trades account for 57 percent of all transactions and 25 percent of volume traded, with these numbers reaching their peaks at 73 percent and 41 percent, respectively, in 2013. Trades of 1,000 shares became less important in this half of the sample. Evident order shredding will significantly affect tests of the invariance hypothesis using TAQ data after 2001. Frequency and Sizes of TAQ Prints during the 1993–2014 Period. For the period from February 1993 to December 2014, figure 1 plots the 263 monthly values of the scaled mean of the number of prints per month, calculated as N¯ · m,i W −2/3 , where N¯ is the number of prints per month and W is trading activity. i m,i i Figure 1 also plots the averages of the 20th, 50th, and 80th percentiles of the trade- and volume-weighted distributions of logs of scaled print sizes, calculated ˜ 2/3 ˜ as ln(|X |/V ·W ), where |X | is the print size. To facilitate comparison across i i i i −2/3 2/3 stocks and across time, all variables are scaled by W and W as implied i i by the invariance hypothesis. The left panel shows the scaled variables averaged across low-volume stocks (group 1). The right panel shows the scaled variables averaged across high-volume stocks (groups 9 and 10). Trading patterns differ significantly across the 1993–2000 and 2001–14 subperiods. For high-volume stocks, the percentiles of print sizes and print rates do not change much prior to the beginning of decimalization in 2001. Afterward, percentiles of print size decrease steadily, and the average number of prints correspondingly increases. For low-volume stocks, similar changes started to occur even before 2001. Because most low-volume stocks are Nasdaq stocks, the pre-2001 decrease in print sizes and increase in print arrival rates may be explained by the reduction in tick size from 12.5 cents (1/8 of a dollar) to 6.25cents (1/16 of a dollar) announced at Nasdaq in 1997. With the exception of the largest print sizes in high-volume stocks, the downward trend in scaled print sizes and the upward trend in scaled number of prints seems to end at around 2007, and all variables stabilize at some constant levels. A similar pattern can be found in figures of Hendershott, Jones and Subrahmanyam (2011). In the following sections, we will examine these patterns in more detail. 3. Empirical Results Theinvariancehypothesisandtwoalternativemodelsmakedistinctively different predictions—all three nested in equations (9) and (10)—concerning the differences in the distributions of print sizes and their frequencies across stocks and time. We run our tests based on both the number of prints and distributions of their sizes to determine which of the models provides a more reasonable description of the data. 3.1. Tests Based on Print Frequency Comparison of Three Models. According to each model, the scaled number of prints W−a ·N¯ per day is constant across stocks. The three models differ only i i

15 in the exponent a used to normalize the average number of prints. The invariance hypothesis implies a = 2/3, the model of invariant print size implies a = 0, and the model of invariant print frequency implies a = 1. Figure 2 has three columns and four rows. Each of the three columns contains plots of thelog of the average number of prints per day N¯ against the logof trading activity W, where the average number of prints is scaled according to each of the three models, respectively. The four rows present results for April 1993(NYSE and Nasdaq), April 2001, and April 2014. Results are presented for different periods because trading has changed dramatically over time. Also, trading patterns of the NYSE- and Nasdaq-listed stocks differed historically because of differences in regulatory rules across exchanges during the earlier part of the sample, so our results are presented separately for the NYSE- and Nasdaq-listed stocks in April 1993. We choose the month of April to avoid seasonality, as trades tend to cluster much less before the end of the calendar quarter, as shown by Moulton (2005). Each observation corresponds to the average number of prints per day for a given stock in a given month. There are about 6,000 observations on each subplot. If the model is correctly specified, the points are expected to line up along a horizontal line. In subplots for the invariance hypothesis, observations are scattered around horizontal lines for each of the three years. The invariance hypothesis explains the data very well, especially for the NYSE stocks traded in April 1993. The levels of the horizontal lines move up from the top to bottom figures, showing that the average number of prints has increased over time. For April 1993, the average number of prints is slightly higher for the NYSE stocks than for the Nasdaq stocks. Some Nasdaq stocks with low trading activity have outliers with a very small number of prints. A few of the most illiquid Nasdaq stocks also have outliers with a very high number of prints in April 1993. In subplots for the model of invariant print frequency, observations are lined up across a line with a positive slope. The model attributes all differences in trading activity entirely to differences in print sizes. Because changes in trading activity are also partially explained by changes in print arrival rates, the model tends to underestimate the number of prints for high-volume stocks and overestimate it for low-volume stocks. In subplots for the model of invariant print size, observations are lined up along a line with a negative slope. The model attributes all differences in trading activity entirely to differences in trading rates. Because some part of these differences is actually explained by differences in print sizes, the model tends to overestimate the number of prints for high-volume stocks and underestimate it for low-volume stocks. OLS Estimates of the Number of TAQ Prints. The distinctly different

16 predictions of the models can be nested into a simple linear regression W (11) ln N¯ = µ +a ·ln i +ǫ˜. i n n W (cid:20) ∗(cid:21) (cid:2) (cid:3) The equation relates the log of the mean number of prints N¯ per month for stock i i to the level of trading activity W . The invariance hypothesis predicts a = 2/3, i n the model of invariant print frequency predicts a = 0, and the model of invariant n print size predicts a = 1. For each month, we estimate the parameters µ and a n n n using an OLS regression, in which there is one observation per stock per month. The constant term µ is scaled to represent the log of the expected number of n prints for a benchmark stock with trading activity W . The scaling constant W = ∗ ∗ (40)(106)(0.02) measures trading activity for an arbitrary benchmark stock with a price of $40 per share, trading volume of 1 million shares per day, and daily volatility of 2 percent per day. Table 2 presents the results of the regression (11) pooled over time. The six columns show the results for all stocks, the subsets of NYSE-/AMEX-listed stocks, andthesubset ofNasdaq-listedstocks, eachshownduringthe1993–2000and2001– 14 subperiods. The table reports Fama-MacBeth estimates of the coefficients. To account for trends, Newey-West standard errors are calculated with three lags relative to a linear time trend estimated by OLS regressions from the estimated coefficients µˆ and aˆ for each month. The specific equations from which the n,T n,T lineartimetrendisestimatedareµˆ = µ +µ ·(T−T¯)/12+ǫ˜ andaˆ = a + n,T n,0 n,t T n,T n,0 ¯ a ·(T −T)/12+ǫ˜, where T is the number of months from the beginning of the n,t T subsample and T¯ is the mean month in the subsample. For the subperiod February 1993 to December 2000, T = 1 for February 1993 and T = 95 for December 2000— that is, T¯ = 48. For the subperiod January 2001 to December 2014, T = 1 for January 2001 and T = 168 for December 2014—that is, T¯ = 84.5. For the 1993–2000 subperiod, the point estimate of a is equal to 0.666, stan,0 tistically indistinguishable from the predicted value of 2/3. For the subperiod 2001–2014, the point estimate of a is equal to 0.79. The standard errors of these n,0 estimates are 0.002 and 0.005, respectively. Note also that the alternative models predicting a = 0 and a = 1 are clearly rejected. For the period 1993–2000, the n,0 γ,0 estimated time trend coefficient a of −0.001 per year is statistically insignificant. N,t The estimates a of 0.626 and 0.76 for NYSE stocks during the two sub-periods n,0 1993–2000 and 2001–2014 are smaller than the corresponding estimates a of n,0 0.679 and 0.816 for Nasdaq stocks. For the 2001–14 period, the estimated time trend coefficient a of 0.007 per n,t year is statistically significant; it approximately corresponds to the increase in a n by 0.11, from about 0.66 to about 0.77, over the 13-year period. The point estimate of intercept µ is equal to 6.147 forthe 1993–2000subperiod n,0 and 8.513 for the 2001–14 subperiod, indicating an increase in the average number of prints over time. The constant terms µ of 0.093 and0.148 for those subperiods n,t also show statistically significant upward time trends, corresponding to growth

17 rates in the number of prints per year of about 9.3 percent and 14.8 percent for the two subperiods, respectively. Figure 3 shows the time series of coefficients from the monthly regressions between February 1993 and December 2014. Panel A presents the time series of 263 month-by-month regression coefficients a from monthly regression equations (11). n The superimposed horizontal line represents a = 2/3, the value predicted by the n invariance hypothesis. The figure shows that there are two distinctive sub-periods, 1993–2000 and 2001–14. Both the constant term µ and the coefficient a change n n over time and may have time trends that are different over the 1993–2000 and 2001–14 subperiods. Over the 1993–2000 subperiod, all estimated coefficients a n remained virtually constant. The average value of 0.666 is strikingly close to the predicted values of 2/3. Over the 2001 − −14 subperiod, the estimates clearly begin to drift up from the values implied by the invariance hypothesis, increasing from about 0.65 to about 0.77 by the end of 2014. The combined results for both subperiods are consistent with the following interpretation: Over the 1993–2000 subperiod, the invariance hypothesis held because there was a reasonably close correspondence between TAQ prints and bets. Over the 2001–14 subperiod, the invariance relationship broke down because order shredding and intermediation have increased over time and affected high-volume stocks more than low-volume stocks after decimalization in 2001. We discuss these patterns further in section 3.3. 3.2. Tests Based on TAQ Print Sizes Comparison of Three Models. Next, we examine the trade- and volumeweighted distributions of print sizes scaled for differences in trading activity as suggested by the three models. The three models predict that the distributions of Wa·|X˜|/V are constant across stocks and time, but the models make different assumptions about the exponent a. The invariance model predicts a = 2/3, the model of invariant print frequency predicts a = 0, and the model of invariant print size predicts a = 1. To approximate the distribution, we calculate print size |X| based on the midpoint of a print-size bin where it was placed. For each month and for each volume group, the empirical stock-level distributions of scaled print sizes are combined by averaging across stocks in each volume group the frequency distributions of the number of prints in each bin. The results are plotted in figures 4 and 5. For illustrative purposes, only results for April 1993 are presented, but the data for the entire 1993–2014 period are examined more closely below. Figure 4 shows empirical distributions of logs of scaled print sizes for the NYSE stocks. The figure hasthree rowsandsix columns. Thethree rows containplots for low-volume stocks in volume group 1, medium-volume stocks in volume groups 2 through 8, and high-volume stocks in volume group 9- 10, respectively. The first three columns contain plots of the trade-weighted distributions, with the density of logs of scaled print sizes on the vertical axis. The second three columns contain

18 plots of the volume-weighted distributions, with the volume contribution of these trades on the vertical axis. In each of the three columns, print sizes are scaled according to the three models. If one of the three models is correct, then the three distributions in the column corresponding to that model should be the same across rows. To make it easier to interpret results, we superimpose the bell-shaped densities of a normal distribution with the common means and variances equal to the means and variances of trade- and volume-weighted distributions of scaled print sizes basedontheentiresample. Aspreviously discussed, iftrade-weighted distributions are log-normally distributed, then volume-weighted distributions are log-normally distributed as well. If scaled sizes are distributed as a log-normal, then the three plots in each column of plots are expected to coincide with the superimposed common normal density. The first column presents the three trade-weighted distributions implied by the invariancehypothesis; theyhavesimilarmeans, variance,andsupports. Theshapes of empirical distributions bear some resemblance to the superimposed normal density, but the fit is by no means exact. The low-volume group matches the superimposed normal better than the medium- and high-volume groups. The fourth column presents the volume-weighted distributions implied by the invariance hypothesis; they are even more similar to the superimposed common normal density. Thus, the invariance hypothesis explains a substantial part of variation in the distribution of print sizes, especially in the distribution of economically important large trades. The print sizes seem to be distributed similarly to a log-normal; Kyle and Obizhaeva (2016) find that the distribution of portfolio-transition orders is close to a log-normal as well. For the model of invariant print frequency, the trade-weighted densities are in the second column and the volume-based densities are in the fifth column; they are much less stable across volume groups. In both columns, the distributions shift to the left as trading volume increases, which suggests that the first alternative model understates print sizes for high-volume stocks and overestimates them for low-volume stocks. The model fails to account for the fact that some variation in trading activity is explained by variation in the number of prints. For the model of invariant print size, the trade-weighted densities are in the third columnandthevolume-based densities areinthesixthcolumn; theyareclearlyunstable across volume groups as well. In both columns, the distributions shift to the right as trading volume increases. The second alternative model overstates print sizes for high-volume stocks and underestimates them for low-volume stocks. The alternative models clearly provide worse explanations for the observed variations in print sizes than the invariance hypothesis. Figure 5 shows our results for the sample of the Nasdaq stocks. Similar to the NYSE stocks, the distributions of print sizes are more stable across volume groups when print sizes are scaled according to the invariance hypothesis. Compared with the NYSE distributions, the Nasdaq distributions are less smooth and have

19 more spikes, especially the trade-based densities. We attribute these patterns to a regulatory rule that required Nasdaq dealers to quote prices for at least 1,000 shares, leading to a disproportionably large number of 1,000-share Nasdaq trades recorded on the consolidated tape before 1997. Implications for Log-Normal Distributions. As previously discussed, if the log of trade-weighted scaled print sizes is distributed as N(µ,σ2), then the log of volume-weighted scaled print sizes is distributed as N(µ+σ2,σ2). It is interesting to examine how close the log-means and log-variances of the distributions superimposed in figures 4 and 5 for the invariance hypothesis come to satisfying this constraint. For NYSE stocks, the constraint implies that the volume-weighted mean of 0.97 shouldbethesameasthesumofthetrade-weightedmeanof−1.15anditsvariance of 1.90. As −1.15+1.90 = 0.75 6= 0.97, we see that the constraint fails to hold by a margin of only about 25 percent. The log-variance of 3.04 for the volume-based distribution is much larger than the log-variance of 1.90 for the trade-weighted distribution. This discrepancy is inconsistent with log-normality, which implies that these log-variances should be the same. For Nasdaq stocks, the volume-weighted mean of 1.2 should be the same as the sumofthetrade-weightedmeanof−0.19anditsvarianceof1.93. As−0.19+1.93 = 1.72 6= 1.2, we see that this constraint fails to hold by a margin of about 30 percent. The log-variance of 1.98 for the volume-based distribution is similar to the log-variance of 1.93 for the trade-weighted distribution, as consistent with the predictions of log-normality. Because these moment restrictions are not perfectly satisfied in the data, the hypothesis of log-normality can be valid only as a very rough approximation at best. Deviations from log-normality include clustering of trades in even-lot sizes (especially prints of 1,000 shares on Nasdaq), censoring and rounding of odd lots, clustering of 100-share trades, and the possibility that very large trades follow a fatter-tailed power-law distribution rather than a log-normal. OLS Regression Estimates of TAQ Print Sizes, February 1993 to December 2014. We test implications of the invariance hypothesis for print sizes using OLS regressions in which the left-side variable is either a mean or a percentile of either the trade- or volume-weighted distributions of logs of print sizes. For each stock in a given month, the empirical trade- and volume-weighted distributions of logs of print sizes are constructed. Letting f(.) denote a functional that corresponds to either the mean or the pth percentiles (20th, 50th, 80th) of these distributions, these variables are regressed on logs of trading activity: ˜ X W i i (12) f ln = µ +a ·ln +ǫ˜. x x i V W " i #! (cid:20) ∗(cid:21) There is a clear connection between equations (11) and (12). The expected trading volume is equal to the product of the expected number of prints and expected print size, V = N¯ · E{X˜}, implying that the left side of equation (11)

20 is logN¯ = −log(E{X˜}/V). Thus, the left-hand side variable in equation (12) is similar to reversing the sign on the left-hand-side variable in the regression equation (11). Note that the concavity of the log function implies by Jensen’s inequality that the log of the expectation is less than the expectation of the log: log(E{X˜/V}) < E{log(X˜/V)}. For example, if X˜ /V were distributed logi i normally with the same variance across stocks, then the coefficient estimates for a and a would be the same in absolute value but opposite in sign in all of the γ x regressions in equations (12) and (11), but the constant terms µ and µ would γ x be different. In our data (as we show below), X˜ /V deviates from a log-normal i i distribution sufficiently to make the coefficients a and a vary across regression γ x specifications. Figure 3 shows the time series of coefficients of the monthly regressions from February 1993 to December 2014. There are the time series of 263 month-bymonth regression coefficients a from regression equation (12) for the 20th, 50th, x and 80th percentiles of print sizes over the period 1993–2014. Panel B of figure 3 presents results for trade-weighted distributions. Panel C presents results for volume-weighted distributions. Superimposed horizontal lines represent the level of negative 2/3, the benchmark predicted by the invariance hypothesis. Again, the figure shows that there are two distinct subperiods. Over the 1993– 2000, subperiod all estimated coefficients remained virtually constant. The estimates of a are slightly lower than the predicted value of −2/3 for the 20th, 50th, x and 80th percentiles of trade-weighted distributions; they fluctuate between −0.70 and −0.82, implying that small print sizes as a fraction of volume decrease faster with trading activity than predicted by the invariance hypothesis. For the 50th and80thpercentiles of volume-weighted distributions, the estimates of a fluctuate x between −0.45 and −0.70, somewhat higher than predicted by the invariance hypothesis, and all estimates for the 20th volume-weighted percentiles are very close to negative 2/3. Over the subperiod 2001–2014, the estimates begin to drift away from their initial levels. For the volume-weighted percentiles, the estimates of a decrease x from about −0.45 to −0.82 for the 80th percentile, from about −0.54 to −0.89 for the 50th percentile, and from −0.645 to −0.75 for the 20th percentile. For the trade-weighted percentiles, the estimates for the 20th and 50th percentiles do not exhibit any definite patterns, but the estimates for the 80th percentile decrease from about −0.71 to −0.77. Overall, changes in the large print sizes (right tails) are more significant than changes in the small print sizes (left tails). Table 3 reports the estimates from regressions in equation (12) pooled over the 1993–2000 period. The first four columns show estimates for the means and percentiles of the trade-weighted distributions. The last four columns show estimates for the means and percentiles of the volume-weighted distributions. Because the monthly estimates of µˆ and aˆ for each month T are changing over time, we x,T x,T add a linear time trend. The table reports Fama-MacBeth estimates of the coefficients, with Newey-West standard errors calculated with three lags relative to a

21 linear time trend estimated by OLS regressions fromthe estimated coefficients µˆ x,T and aˆ for each month. As before, the equations estimated for the time trend x,T are µˆ = µ + µ · (T − T¯)/12+ ǫ˜ and aˆ = a + a · (T − T¯)/12 + ǫ˜, x,T x,0 x,t T x,T x,0 x,t T ¯ where T is the number of months from the beginning of the subsample and T is the median month in the subsample. For the trade-weighted distributions, the estimate of a is equal to −0.741 for x,0 the means. This estimate is larger in absolute value by 0.075 than the estimate of 0.666 for the number of prints in table 2. For the trade-weighted percentiles, the estimated coefficients range from −0.781 for the 20th percentile to −0.725 for the 80th percentile. All of these estimates are larger in absolute value than the value of negative 2/3 predicted by the invariance hypothesis, which implies that print sizes as a faction of volume tend to decrease with trading activity faster than implied by the theory. For the volume-weighted distributions, the estimated coefficient a is equal to −0.56 for the means; the estimates range from −0.661 for the 20th x,0 percentile to −0.481 for the 80th percentile. Across means and percentiles, the standard errors of estimates a range from 0.002 to 0.003; these values are similar x,0 in magnitude to the averages of standard errors of a from the cross-sectional x monthly regressions (12), in which those values range from 0.002 to 0.065. The data suggests that the invariance hypothesis a = −2/3 explains the data much x,0 better than the alternatives a = 0 and a = −1, but all three models are x,0 x,0 statistically rejected. For the trade-weighted distributions, the estimated time trend coefficient a x,t ranges from 0.008 to 0.012 per year and is statistically significant. For the volumeweighted distributions, the time-trend coefficient is either statistically insignificant or negative. The estimated intercept µ of −7.238 in the regression for the trade-weighted x,0 means implies that the median print size for the benchmark stock is equal to exp(−7.238), or 0.07% of daily volume. The estimated intercepts of −8.490 and −6.260 in the regressions for the 20thand 80thpercentiles suggest that the average 20th and 80th print-size percentiles are equal to 0.02 percent and 0.19 percent of daily volume for the benchmark stock, respectively. Under the assumption of lognormality, Kyle and Obizhaeva (2016) note that the fraction of volume generated by trades larger than z standard deviations above the log-mean (which equals the median) is given by 1 − N(z − σ), where σ is the standard deviation for the distribution of the log of trade sizes; based on the trade-weighted variance of 1.90 in figure 4, a log-normality would imply that about 91 percent of volume occurs in print sizes larger than 0.07 percent of daily volume (median trade). The standard errors of µ in cross-sectional regressions are similar in magnitude and x,0 range between 0.014 and 0.031, across means and percentiles. The negative and statistically significant estimates of time trend µ indicate that the print sizes x,t have been gradually decreasing during the 1993–2000 subperiod, with a downward drift that is especially pronounced in the right tails of the distributions. The R2 is lower in regressions based on volume-weighted distributions than in

22 regressions based on trade-weighted distributions. For the means, the values of R2 are 0.90 and 0.69, respectively. The difference in R2 increases monotonically from a difference between 0.87 and 0.83 for the 20th percentiles to a difference between 0.88 and 0.52 for the 80th percentiles. These numbers show that there is more unexplained variation in large print sizes than in small print sizes. Note that some of this variation may result from the rounding of large odd-size trades to the mid-point of bins or from the small number of observations in the largest bins. Table 4 reports the estimates for regressions in equation (12) for the 2001–14 subperiod. For the means, the estimates of a are −0.779 for trade-weighted x,0 distributions and −0.776 for volume-weighted distributions. These estimates are larger in absolute value than the corresponding estimates of −0.741 and −0.560 for the1993–2000subperiod intable3. All butoneoftheestimates of−0.757, −0.769, and −0.811 for the 20th, 50th, and 80th trade-weighted percentiles, respectively, and −0.796, −0.843, and −0.779 for the 20th, 50th, and 80th volume-weighted percentiles, respectively, are also higher in absolute values than the estimates of −0.781, −0.750, −0.725, −0.661, −0.579, and −0.481 for the earlier subperiod. Thebiggestchangesoccurintheestimates forthe80thpercentileoftrade-weighted distributions and the 50th and 80th percentiles of volume-weighted distributions, which suggests that recent technological and regulatory changes had the largest effect on the right tail of print-size distributions. The standard errors of a are x,0 between 0.003and0.004, similar totheaverages ofstandarderrorsofa inmonthly x regressions (12). This fact validates the adjustment for time trend in the Fama- McBeth procedure; without inclusion of a time trend, the standard errors would have been higher. The estimates of the intercept µ for the 1993–2000 subperiod are lower than x,0 for the 2001–14 subperiod for the means as well as for all percentiles. For the pooled sample, for example, the estimate of −9.029 in table 4 is lower than the corresponding estimate of −7.238 in table 3; these estimates imply that a typical print size for the benchmark stock fell from 0.07 percent to 0.01 percent of daily volume over the 1993–2000 subperiod. The estimated time trend µ is negative x,t and statistically significant in all columns, except for the 20th percentile of tradeweighted distributions, also implying that the distributions of print sizes have been shifting downward. 3.3. Detailed Analysis of Market Frictions over Time We next study how market frictions such as tick size and lot size affect the trading process. More specifically, we examine whether these market frictions can help explain variations across markets in the number and size distribution of prints that cannot be explained by the invariance hypothesis. To facilitate our analysis, we introduce several new concepts that measure restrictiveness of market frictions in the spirit of the invariance hypothesis. We also discuss why these measures are related to price volatility and share volume in business time as well as to each other.

23 Effective Price Volatility and Effective Tick Size. Tick size imposes a restrictionontheminimumpricechange. Theticksizechangedfrom1/8ofadollar to 1/16 of a dollar in the late 1990s and then to one cent in 2001. To assess the restrictiveness of this friction, it is reasonable to compare it with price volatility— that is, with a typical price change. Based on the same intuition, practitioners often measure price volatility in units of tick size. For example, if a $40 stock has a volatility of 2 percent per day, then daily dollar price volatility is equal to $0.80 = $40·0.02 and a tick size of $0.01 is equal to 1/80 of price volatility. The invariance hypothesis further suggests that it is more natural to define relative tick size as a fraction of price volatility in business time, not in calendar time. Because business time is proportional to the expected arrival rate of bets γ , or, as jt 2/3 shown in equation (5), in terms of trading activity it is also proportional to W , jt define effective price volatility in asset j and time t as −1/3 W jt (13) Effective Price Volatility := P ·σ · . jt jt jt W (cid:18) ∗ (cid:19) −1/3 This measure is scaled by a constant W , previously defined as trading activity ∗ in the benchmark stock, so that effective price volatility is exactly equal to daily price volatility for the benchmark stock. In comparison with calendar-time volatility, effective price volatility is lower for more liquid stocks and higher for less liquid stocks to take into account that business time runs faster in more liquid securities. Now define effective tick size as the ratio of the dollar tick size to effective price volatility: Tick Size jt (14) Effective Tick Size := . jt −1/3 Pjt·σjt · Wjt P∗·σ∗ W∗ (cid:16) (cid:17) The presence of P ·σ in the denominator scales the definition of relative tick size ∗ ∗ so that it is exactly equal to the dollar tick size for the benchmark stock. A higher effective price volatility makes the effective tick size lower. We conjecture that lower effective tick size may encourage traders to shred meta-orders into a larger number of trades and lead to a larger amount of intermediation. Effective Share Volume and Effective Lot Size. The lot size imposes a restriction on the minimum number of shares in prints on the tape. For most stocks in the sample, the lot size is equal to 100 shares. An odd lot comprises orders smaller than a round lot (for example, 30 shares) or the non-round-lot portion of larger orders (for example, the 30-share portion of a 530-share order); these odd-lot transactions are executed according to special, often less flexible rules, and information about them is not disseminated to the tape, as described in detail by Hasbrouck, Sofianos and Sosebee (1993). To assess how restrictive this friction is, one can compare it with the size of a median bet in the corresponding market. Equation (10) implies that median bet

24 size is also proportional to share volume in business time, which we refer to as effective share volume: −2/3 W jt (15) Effective Share Volume := V · . jt jt W (cid:18) ∗ (cid:19) 2/3 This measure takes into account that business time runs faster by a factor of W jt −2/3 in more liquid securities. It is scaled by the constant W so that effective share ∗ volume is exactly equal to daily share volume for the benchmark stock. Now define effective lot size as the ratio of the lot size to effective share volume: Lot Size jt (16) Effective Lot Size := . jt −2/3 Vjt · Wjt V∗ W∗ (cid:16) (cid:17) Practitioners often measure order size as a fraction of daily volume and restrict their trading rate to a fixed fraction—say, 5 percent—of volume in order to control transaction costs. The presence of V in the denominator scales the definition of ∗ effective lot size so that it is exactly equal to the lot size for the benchmark stock. Alower effective share volume makes the effective lotsize largerandtherefore more binding, in the sense that a larger fraction of bets falls below that threshold. Some small bets may be executed as odd lots and not recorded on the consolidated tape, some may not be executed at all, and some may be rounded up to round-lot size. The extent of such censoring and rounding is expected to be related to effective lot size. Because W = V ·P ·σ , the product of effective tick size (14) and effective jt jt jt jt lot size (16) is a constant equal to the product of tick size and lot size—that is, the dollar value of one tick on a lot-size trade. It reflects the minimum possible dollar profit (and cost) per transaction in the market. For example, when tick size is one cent and lot size is 100 shares, then this constant is equal to $1 dollar; if a trader executes at least 100 shares and pays a half spread that is at least one cent, his or her profit (and cost) is at least $0.50 dollar. For a given month in the sample, dollar tick size and lot size are usually constant across most of the U.S. stocks (for example, one cent and 100 shares, respectively), implyingthateffectiveticksizeandeffectivelotsizearecloselyrelatedtoeachother andto effective price volatility. In thefollowing analysis, we therefore examine how trading patterns differ across stocks with different levels of effective price volatility. The effects of the two market frictions on the number of prints and print-size distributions are difficult to separate. For example, higher effective price volatility makes effective tick size lower, but it also makes effective lot size higher; lower effective share volume has similar implications. The first effect, operating through lower effective tick size, probably encourages more intermediation and more order shredding of bets into smaller trades placed at finer adjacent price points as a strategy to avoid front-running, leading to more prints of smaller sizes. The second effect, operating through higher effective lot size, probably encourages more

25 censoring and rounding up of odd-lot trades, thus leading to fewer prints of larger sizes in the data sample. Distributions of Print Sizes over Time. We next examine how market frictions affect trading in the U.S. equities market cross-sectionally and through time. After sorting stocks into 10 volume groups and 4 effective price-volatility groups, we analyze distributions of the logs of scaled print sizes ln(Xi · W 2/3 ), Vi i 2/3 with the scaling factor W implied by the invariance hypothesis. The invariance i hypothesis predicts that these 40 distributions are the same in frictionless markets. To examine how these distributions change over the entire period, we examine the shape of the distributions for the months April 1993, April 2001, and April 2014. Because dollar tick size and lot size are constant during selected months, sorting stocks into effective price-volatility groups allows us to study the effects of variations in both effective tick size and lot size. Trade-Weighted Distributions for NYSE-Listed Stocks, April 1993. In April 1993, NYSE-listed stocks were priced in increments of 12.5 cents (1/8 of a dollar). Most of the stocks were traded in multiples of 100-share lots, even though some of them were traded in multiples of 10-share lots, as described in Hasbrouck, Sofianos and Sosebee (1993). Figure 6 shows the trade-weighted distributions of logs of scaled print sizes for 5 of the 10 volume groups and all 4 effective price-volatility groups for the NYSElisted stocks in April 1993. The 100-share trades are highlighted in light gray and 1,000-share trades in dark gray. The number of stocks in each subgroup and the average number of trades per day for these stocks are also reported. On each subplot, the density of a normal distribution with the mean of −1.15 and standard deviationof1.38issuperimposed, calculatedforthepooledsampleinApril1993. If the invariance hypothesis holds, identification assumptions are valid, and bet sizes are distributed as a log-normal, then all distributions are expected to be invariant across 40 subplots and coincide with the superimposed normal density. For most subgroups, distributions are indeed close to the superimposed normal. There is, however, a clear truncation below the 100-share odd-lot boundary with clustering of 100-share trades, shown in light gray, in the left tails of the distributions. Because variation in levels of dollar volume within groups is small, the 100-share trades usually fall into the same bin or into two adjacent bins. The only exception is the first volume group, where large variation in trading activity makes the 100-share trades spread over more than four bins. The empirical distributions have spikes because of clustering of trades at roundlot levels. There are visible spikes at the 100-share level and the 1,000-share level, marked by clustering of light gray and dark gray columns, as well as two spikes in between, corresponding to the 200-share and 500-share levels. For example, there are more trades of 5,000 shares than 4,000 or 6,000 shares, and far more than 4,900 or 5,100 shares. For the subsample of stocks with low volume and low price

26 volatility, large variation in trading activity smooths out spikes in the distribution of print sizes. A visual inspection suggests that, holding effective price volatility fixed, the supports of distributions stay relatively constant across volume groups, but their shapes become more skewed to the right as volume increases, especially when price volatility is low, consistent with large orders being shredded into smaller trades. Holdingdollarvolumefixed, thedistributionsvaryacrosseffectiveprice-volatility groups in a systematic way as well. When effective price volatility increases, the 100-share boundary becomes more binding, the truncation threshold shifts to the right, and the effects of censoring and rounding to the 100-share boundary become morepronounced. At thesametime, therelative ticksizedecreases, thusencouraging more order shredding. The first effect seems to dominate, because the average number of prints decreases with price volatility. For high-volume stocks, for example, the average number of prints decreases monotonically from 1,139 prints recorded per day for low-volatility stocks to only 285 prints for high-volatility stocks. In the absence of any market frictions, the number of trades is expected to be relatively constant within a given volume group because volatility does not vary much. Volume-Weighted Distributions for NYSE-Listed Stocks, April 1993. Figure7showsthevolume-weighted distributionsoflogsofscaledprintsizesforthe NYSE stocks in April 1993. In comparison with the trade-weighted distributions in figure 6, the volume-weighted distributions put more weight onto larger trades and allow us to see more clearly the distribution of large print sizes. According to Hasbrouck, Sofianos and Sosebee (1993), a block of 10,000 shares was not a particularly large trade in 1993. Some block trades (of 10,000 shares or more), especially block trades in illiquid stocks, were executed in the upstairs market. The average size of upstairs-facilitated blocks was about 43,000 shares. In comparison with the trade-weighted distributions in figure 6, the volumeweighted distributions are more stable across subgroups and more closely resemble the superimposed normal distribution. On most plots, the space below the bellshaped density function is filled up. The truncation at the odd-lot boundary is almost invisible, because the numerous 100-share trades almost “disappear” from the left tail of the distribution, as they contribute little to overall volume. Small gaps in the distributions relative to a log-normal can be seen in mid-range print sizes between 1,000 shares and 10,000 shares. Perhaps these gaps represent intended orders shredded into smaller trades. The strong visual resemblance of the graphs to a log-normal, as in Kyle and Obizhaeva (2016), are consistent with the interpretation that most of the largest orders in 1993 appear to have been executed as single blocks, generating large print sizes. An exception is the lowvolume group for which the largest orders appear to be shredded because the distributions are skewed to the right. Although the volume-weighted distributions are much smoother than the trade-weighted distributions, small spikes are still detectable. These spikes, which likely correspond to clusters of trades at the levels

27 of 1,000 shares, 5,000 shares, and 10,000 shares, are clearly visible, for example, in distributions for stocks with high volume and high effective price volatility. There are also a few spikes in the far-right tails of several distributions, suggesting that a few very large prints occur in the data more often than explained by log-normality. Trade-Weighted Distributions for Nasdaq-Listed Stocks, April 1993. In April 1993, the Nasdaq-listed stocks usually had the minimum lot size of 100 shares. Quotes were restricted to increments of 1/8 of a dollar if the bid price exceeded $10.00, but trades were permitted in finer increments of 1/64 of a dollar for all stocks, even thoughthese prices were then rounded to the nearest eighths for reporting, as described in Christie, Harris and Schultz (1994) and Smith, Selway and McCormick (1998). Also, in 1988, the Securities and Exchange Commission required Nasdaq market makers to have a quotation size of at least 1,000 shares for most stocks. The rule mostly affected large stocks, and, indeed, we observe larger spikes in subplots for high-volume stocks. For small stocks, the rule was slightly different. For example, orders smaller than 1,000 shares could be executed through the Small Order Execution System (SOES) in stocks that were trading at prices lower than $250 per share. After 1996, the minimum quote size restriction was gradually removed. Under the Actual Size Rule, the minimum quote size was reduced from 1,000 to 100 shares, first for 50 pilot stocks in January 1997, then for an additional 104 stocks in November 1997, and finally for all others. Figure 8 shows the trade-weighted distributions of logs of scaled print sizes for theNasdaqstocksinApril1993. Thebiggestdifferencebetween thetrade-weighted distributions of the Nasdaq stocks and the NYSE stocks is the very large fraction of 1,000-share trades, shown as dark gray spikes, typically in the middle of the Nasdaq distributions. These spikes can be attributed to the requirement to quote atleast 1,000shares. Inline withthisexplanation, we do notobserve theclustering at the 1,000-share level after 2001 (unreported). Apart from the clustering in the 1,000-share level and truncation at the 100-share level, the distributions bear some resemblance to the superimposed normal distribution. Trade-Weighted Distributions for All Stocks, April 2001 and 2014. After decimalization in 2001, U.S. stocks traded in increments of one cent. The round-lot size is 100 shares; odd-lot trades account for a large fraction of trades, buttheyarenotreportedintheTAQdataset. TheU.S.stockmarket isfragmented into multiple trading venues. Angel, Harris and Spatt (2011, 2015) discuss other recent innovations in trading. Figures 9 and 10 show the trade-weighted distributions of logs of scaled print sizes for all stocks traded in April 2001 and April 2014, respectively. In 2001, decimalization and use of electronic interfaces led to a significant increase in order shredding, the effect of which is clearly seen in both figures. The frequency of trades has increased significantly over time. For high-volume and low-volatility stocks, for example, there were, on average, only 325 trades

28 per day in April 1993, increasing to 16,230 trades per day in April 2001 and 41,295 trades per day in April 2014. The distributions of scaled print sizes have shifted substantially to the left during the 1993–2014 period. Based on the means of superimposed normals, for example, the median print size dropped from 0.11 percent of daily volume for the NYSE stocks in April 1993 and 0.08 percent for the Nasdaq stocks to 0.003percent in2001andonly 0.001percent in2014. The market for block trades seems almost to have disappeared, and trading is now dominated by transactions of 100 shares. Indeed, trades of 100 shares constitute about 57 percent of trades executed and 38 percent of volume traded in 2014. Note that the estimated log-variances of 2.05 in 1993, 1.78 in 2001, and 1.21 in 2014 for trade-weighted distributions of scaled print sizes are lower than the variance of 2.50 for the distributions of portfolio-transition orders, reported by Kyle and Obizhaeva (2016). A gradual decrease in log-variance is consistent with the hypothesis that large bets in liquid stocks in 2014 result in disproportionately more prints than small bets in illiquid stocks. Regressions withEffectivePriceVolatility. Table5presentsFama-MacBeth estimates of µ and a from monthly regressions γ σ −1/3 2 W P ·σ W ¯ i i i i (17) ln N = µ + ·ln +a ·ln · +ǫ˜. i n σ i 3 W P ·σ W (cid:20) ∗(cid:21) " ∗ ∗ (cid:18) ∗(cid:19) # (cid:2) (cid:3) The regression effectively imposes the invariance restriction of a = 2/3 in regresn sion (11) and adds effective price volatility as an additional explanatory variable to capture the effect of both market frictions. The table reports Fama-MacBeth estimates of the coefficients, with Newey-West standard errors calculated with three lagsrelativetoalineartimetrendestimatedbyOLSregressionsfromtheestimated monthly coefficients µˆ and aˆ for each month. Specifically, the specification is n,T σ,T µˆ = µ +µ ·(T −T¯)/12+ǫ˜ and aˆ = a +a ·(T −T¯)/12+ǫ˜, where T n,T n,0 n,t T σ,T σ,0 σ,t T ¯ is the number of months from the beginning of the subsample, and T is the mean month in the subsample. The six columns show the results for the entire sample as well as subsets of NYSE/AMEX-listed stocks and Nasdaq-listed stocks during the 1993–2000 and 2001–14 subperiods. The point estimates for a are negative and statistically significant for all sub- σ,0 samples. Theestimatesof−0.471,−0.338,and−0.497forthe1993–2000subperiod are smaller in absolute terms than the corresponding estimates of −0.608, −0.475, and −0.676 for the 2001–14 subperiod. The standard errors range from 0.003 to 0.001. The point estimates of a are equal to −0.007, −0.02, −0.007, −0.027, σ,t −0.023, and −0.013, with standard errors between 0.001 and 0.003. There is an inverse relationship between effective price volatility and the number of prints. Higher effective volatility implies fewer prints in the context of the invariance hypothesis. Theestimatesofµ andµ arenottoodifferentfromthecorresponding n,0 n,t estimates in table 2.

29 The significant increase in values of R2 in regressions (17) relative to values of R2 in regression (11) constrained with a = 2/3 shows that the cross-sectional n variation in the number of prints, unexplained by the invariance hypothesis, can be partially attributed to differences in effective price volatility. For the entire sample, adding effective price volatility as an explanatory variable increases the R2 from 0.873 to 0.918 for the 1993–2000 subperiod and from 0.899 to 0.955 for the 2001–14 subperiod. For NYSE stocks, the R2 increases from 0.908 and 0.917 to 0.924 and 0.955; for Nasdaq stocks, the R2 increases from 0.857 and 0.881 to 0.917 and 0.959 during the two subperiods. Finally, we analyze the R2 in the regression that imposes the invariance restrictiona = 2/3inregression(11),butallowsthecoefficientsonthethreecomponents n of trading activity W (volume V , price P , and volatility σ ) to vary freely: i i i i 2 W V P σ ¯ i i i i (18) ln N = µ + ln +b ·ln +b ·ln +b ·ln +ǫ˜. i n 3 W 1 (106) 2 (40) 3 (0.02) (cid:20) ∗(cid:21) (cid:20) (cid:21) (cid:20) (cid:21) (cid:20) (cid:21) (cid:2) (cid:3) ˆ For the entire 1993–2014 period, the estimates are b = 0.20 for the coefficient 1 ˆ ˆ on volume V , b = −0.31 for the coefficient on price P , and b = −0.5 for the i 2 i 3 coefficient on volatility σ (not reported). All coefficients are statistically different i from zero. Similar patterns are observed for other subperiods and subsamples of the NYSE stocks and the Nasdaq stocks. The exponents for volatility behave similarly to the exponents for price and differently from the exponents for volume, thus implying that the rejection of the invariance hypothesis might depend in a subtle manner on how effective price-volatility influences incentives to shred orders and make intermediation trades. Note that for the pooled sample, the R2 increases from 0.873 to 0.928 for the 1993–2000 subperiod and from 0.899 to 0.970 for the 2001–14subperiod. TheR2 of0.928and0.970areonlyslightlylargerthantheR2 of 0.918 and 0.955 in regressions (17), respectively. Although statistically significant, the addition of two extra degrees of freedom beyond the effective price volatility improves the R2 by only a small amount. 4. Conclusion The distributions of TAQ print sizes (adjusted for trading activity as suggested by invariance hypotheses) resemble a log-normal, with truncation below the 100share odd-lot boundary. The resemblance was stronger during the earlier 1993– 2001 period than during the later 2001–14 period, and it shows up more clearly in volume-weighted distributions than in trade-weighted distributions. The invariance hypothesis explains about 88 percent of variation across stocks in the number of prints. The unexplained 10 percent can be most likely attributed to other microstructure effects such as order shredding, intermediation activity, and various market frictions like lot size and tick size. For example, subtle effects of effective price volatility explain an additional 4.5 percent and 5.5 percent of variations during the 1993–2000 and 2001–14 periods, respectively. An interesting

30 topic for future research would be to analyze these effects at a deeper level by designing more refined econometric tests. REFERENCES Alexander, Gordon, and Mark Peterson. 2007. “An Analysis of Trade-Size ClusteringanditsRelationtoStealthTrading.”Journal of FinancialEconomics, 84: 435–471. Almgren, Robert, and Neil Chriss. 2000. “Optimal Execution of Portfolio Transactions.” Journal of Risk, 3: 5–39. Amihud, Yakov. 2002. “Illiquidity and Stock Returns: Cross-Section and Time- Series Effects.” Journal of Financial Markets, 5(1): 31–56. An´e, Thierry, and Helyette Geman. 2000. “Order Flow, Transaction Clock, and Normality of Asset Returns.” Journal of Finance, 55(5): 2259–2284. Angel, James J. 1997. “Tick Size, Share Prices, and Stock Splits.” Journal of Finance, 52(2): 655–681. Angel, James J., Lawrence E. Harris, and Chester S. Spatt. 2011. “Equity Trading in the 21st Century.” The Quarterly Journal of Finance, 1(01): 1–53. Angel, James J., Lawrence E. Harris, and Chester S. Spatt. 2015. “Equity Trading in the 21st Century: An Update.” The Quarterly Journal of Finance, 5(01): 1550002. Atkyn, Allen B., and Edward A. Dyl. 1997. “Transactions Costs and Holding Periods for Common Stocks.” Journal of Finance, 52(1): 309325. Brennan, Michael, and Avanidhar Subrahmanyam. 1998. “The Determinants of Average Trade Size.” Journal of Business, 71(1): 1–25. Caglio, Cecilia, and Stewart Mayhew. 2012. “Equity Trading and the Allocation of Market Data Revenue.” FEDS Working Paper No. 2012-65. Chordia, Tarun, Richard Roll, and Avanidhar Subrahmanyam. 2011.“Recent Trends in Trading Activity and Market Quality.” Journal of Financial Economics, 101: 243–263. Christie, William G., Jeffrey H. Harris, and Paul H. Schultz. 1994. “Why Did NASDAQ Market Makers Stopped Avoiding Odd-Eighth Quotes?” Journal of Finance, 49(5): 1841–1860. Glosten, Lawrence, and Lawrence Harris.1988.“EstimatingtheComponents of the Bid-Ask Spread.” Journal of Financial Economics, 21: 123–142.

31 Goldstein, Michael A., and Kenneth A. Kavajecz. 2000. “Eighths, Sixteenths, and Market Depth: Changes in Tick Size and Liquidity Provision on the NYSE.” Journal of Financial Economics, 56(1): 125–149. Harris, Lawrence. 1987.“Transaction DataTests of the Mixture of Distributions Hypothesis.” Journal of Financial and Quantitative Analysis, 22(2): 127–141. Harris, Lawrence. 1994. “Minimum Price Variations, Discrete Bid-Ask Spreads, and Quotation Sizes.” Review of Financial Studies, 7(1): 149–178. Hasbrouck, Joel. 1999. “Trading Fast and Slow: Security Market Events in Real Time.” Working Paper, New York University. Hasbrouck, Joel, George Sofianos, and Deborah Sosebee. 1993.“New York Stock Exchange Systems and Trading Procedures.” NYSE Working Paper 93-01. Hendershott, Terrence, Charles Jones, and Avanidhar Subrahmanyam. 2011. “Does Algorithmic Trading Improve Liquidity?” Journal of Finance, 66(1): 1–33. Jones, Charles M., Gautam Kaul, and Marc L. Lipson. 1994.“Information, Trading, and Volatility.” Journal of Financial Economics, 36(1): 127–154. Keim, Donald B., and Ananth Madhavan. 1995. “Anatomy of the Trading Process: Empirical Evidence on the Behavior of Institutional Trades.” Journal of Financial Economics, 37(3): 371–398. Kirilenko, Andrei, Albert S. Kyle, Mehrdad Samadi, and Tugkan Tuzun. 2015.“TheFlashCrash: TheImpactofHighFrequencyTradingonanElectronic Market.” Working Paper, University of Maryland. Kyle, Albert S., and Anna A. Obizhaeva. 2016. “Market Microstructure Invariance: Empirical Hypotheses.” Econometrica, accepted for publication, available at http://dx.doi.org/10.2139/ssrn.2722524. Moulton, Pamela. 2005. “You Can’t Always Get What You Want: Trade-Size Clustering and Quantity Choice in Liquidity.” Journal of Financial Economics, 78(5): 89–119. Obizhaeva, Anna A., and Jiang Wang. 2013. “Optimal Trading Strategy and Supply/Demand Dynamics.” Journal of Financial Markets, 16(1): 1–32. O’Hara, Maureen, Chen Yao, and Mao Ye. 2012. “What Is Not There: The Odd-Lot Bias in TAQ Data.” Working Paper. Schultz, Paul. 2000. “Stock Splits, Tick Size, and Sponsorship.” Journal of Finance, 55(1): 429–450. Smith, Jeffrey W., James P. Selway, and Timothy D. McCormick. 1998. “The Nasdaq Stock Market: Historical Background and Current Operation.” NASD Working Paper 98-01.

32 Tauchen, George E., and Mark Pitts. 1983. “The Price Variability-Volume RelationshiponSpeculative Markets.” Journal of Business and Economic Statistics, 51: 485–505.

33 Table 1—Descriptive Statistics VolumeGroups: All 1 2 3 4 5 6 7 8 9 10 Panel A: 1993–2000 Avg. PrintSize($) 23,629 11,441 27,378 36,432 43,558 49,274 53,421 60,364 67,904 78,139 89,338 Med. (VW)PrintSize($) 111,195 48,999 158,303 190,228 218,529 218,038 250,510 255,284 291,867 317,847 373,248 Med. (TW)PrintSize($) 9,363 5,682 10,842 13,407 15,402 16,910 18,024 20,049 21,909 24,599 28,567 Avg#ofPrints,γ 142 17 73 126 185 257 333 409 549 852 2,830 Avg. DailyVolume($1,000) 6,186 151 1,197 2,808 5,051 7,999 11,263 15,847 23,901 42,408 176,985 Avg. DailyVolatility 0.040 0.045 0.031 0.032 0.030 0.030 0.030 0.029 0.029 0.029 0.029 Avg. Price($) 17.58 10.25 20.38 24.50 27.84 31.57 34.73 38.34 43.03 49.98 64.43 100-Shares: %Prints/%Vol 16/2 15/2 17/2 18/2 19/2 20/2 21/2 21/2 21/2 22/2 25/3 1,000-Shares: %Prints/%Vol 18/14 18/15 18/13 17/12 16/12 15/11 15/11 14/10 14/10 13/10 13/10 EvenLots: %Prints/%Vol 80/61 80/63 80/60 80/59 80/58 80/57 80/57 79/56 79/55 79/56 81/58 #Obs 634,322 391,611 93,732 37,272 33,145 14,896 14,155 13,039 12,276 11,831 12,365 Panel B: 2001–14 Avg. PrintSize($) 6,424 3,645 6,715 8,776 10,379 11,913 13,308 14,863 17,003 19,852 26,335 Med. (VW)PrintSize($) 29,900 27,916 23,464 26,460 29,278 27,822 41,432 36,663 42,728 53,538 83,227 Med. (TW)PrintSize($) 2,895 1,642 3,244 4,163 4,793 5,419 6,005 6,647 7,469 8,378 10,533 Avg#ofPrints,γ 3,013 399 2,178 3,584 5,069 6,733 8,323 10,062 12,846 17,651 37,495 Avg. DailyVolume($1,000) 24,408 937 8,295 16,961 27,429 40,200 54,090 72,872 102,484 162,619 503,229 Avg. DailyVolatility 0.031 0.035 0.026 0.025 0.024 0.024 0.023 0.022 0.022 0.022 0.022 Avg. Price($) 21.16 12.56 26.77 32.00 34.92 37.81 40.92 44.46 48.24 51.77 63.33 100-Shares: %Prints/%Vol 57/25 56/23 64/33 61/31 60/29 58/28 57/28 56/28 55/27 53/25 49/20 1,000-Shares: %Prints/%Vol 3/6 4/6 2/4 2/4 2/4 2/4 2/4 2/4 2/4 3/4 3/5 EvenLots: %Prints/%Vol 86/63 85/60 90/68 89/68 89/67 88/66 88/65 87/65 87/64 86/62 84/59 #Obs 749,535 477,127 97,347 39,485 36,846 17,455 16,780 16,222 15,858 15,815 16,600 This table reports descriptive statistics for securities and prints. Each observation represents averages for one security over one month. PanelAreportsstatisticsfordatafromFebruary1993toDecember2000. PanelBreportsstatisticsfordatafromJanuary 2001 to December 2014. Both panels show the average print size, the trade-weighted median print size, the volume-weighted median print size (in dollars), the averagenumber of prints per day, the daily dollar volume (in thousands of dollars), the average volatility of daily returns, the average price, and the percentage of trades and the percentage of volume in the 100-share lot, in the 1,000-share lot, and in the even lots for all samples as well as for 10 volume groups. Volume groups are based on average dollar trading volume with thresholds corresponding to the 30th, 50th, 60th, 70th, 75th, 80th, 85th, 90th, and 95th percentiles of the dollar volume for NYSE-listed common stocks. Volume group 1 has stocks with the lowest volume, and volume group 10 has stocks with the highest volume.

34 Table 2—OLS Estimates of Number of TAQ Prints. AllStocks NYSE/AMEX Nasdaq 1993–2000 2001–14 1993–2000 2001–14 1993–2000 2001–14 µn,0 6.147 8.513 6.109 8.396 6.143 8.646 0.017 0.041 0.012 0.040 0.020 0.048 µn,t 0.093 0.148 0.042 0.178 0.141 0.115 0.007 0.013 0.005 0.012 0.008 0.015 an,0 0.666 0.790 0.626 0.760 0.679 0.816 0.002 0.005 0.001 0.005 0.002 0.006 an,t -0.001 0.007 0.002 0.006 0.003 0.003 0.001 0.001 0.000 0.001 0.001 0.002 Adj-R2 0.87 0.92 0.91 0.93 0.86 0.91 #ofObs 6,621 4,452 2,189 1,759 4,432 2,694 Thistablepresents Fama-MacBethestimatesµn andan frommonthlyregressions ln N¯ i =µn+an·ln Wi +ǫ˜i. W∗ (cid:20) (cid:21) (cid:2) (cid:3) Foreachmonth,thereisoneobservationforeachstocki. ThevalueofN¯ i istheaveragenumber of prints per day. Trading activity Wi is the product of average daily dollar volume Vi·Pi and the percentage standard deviation σi of daily returns in a given month. The scaling constant W∗ = (40)(106)(0.02) corresponds to the measure of trading activity for a benchmark stock with a price of $40 per share, trading volume of 1 millionshares per day, and daily volatility of 0.02. Newey-West standard errors are calculated with three lags relative to a linear time trend estimated by OLS regressions from the estimated coefficients µˆn,T and aˆn,T for each month: µˆn,T = µn,0 +µn,t ·(T −T¯)/12+ǫ˜T and aˆn,T = an,0 +an,t ·(T −T¯)/12+ǫ˜T, where T is the number of months from the beginning of the sample and T¯ is the mean month. “Adj-R2” denotestheadjustedR2averagedovermonthlyregressions,and“#ofObs”denotesthenumberof stocksaveragedovermonthlyregressions. Theestimatesarereportedfor1993–2000 and2001–14 subperiods.

35 Table 3—Regression Estimates of TAQ Print Sizes, February 1993 to December 2000 Trade-Weighted Distribution Volume-WeightedDistribution Mean 20th 50th 80th Mean 20th 50th 80th µx,0 -7.238 -8.495 -7.289 -6.260 -4.684 -6.364 -4.887 -3.326 0.020 0.027 0.031 0.014 0.020 0.016 0.019 0.026 µx,t -0.047 -0.039 -0.046 -0.064 -0.137 -0.101 -0.157 -0.153 0.008 0.011 0.012 0.006 0.008 0.006 0.008 0.011 ax,0 -0.741 -0.781 -0.750 -0.725 -0.560 -0.661 -0.579 -0.481 0.002 0.003 0.002 0.003 0.003 0.002 0.003 0.003 ax,t 0.008 0.012 0.007 0.005 -0.007 -0.001 -0.009 -0.009 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 Adj-R2 0.90 0.87 0.88 0.88 0.69 0.83 0.68 0.52 #ofObs 6,621 6,621 6,621 6,621 6,621 6,621 6,621 6,621 ThistablepresentsFama-MacBethestimatesµx andax fromthemonthlyregressionsofthemean andpercentilesofprintsizeontradingactivityW forthesamplefromFebruary1993toDecember 2000. Thecoefficients µQ andaQ arebasedonmonthlyregressions ln |Xi| =µx+ax·ln Wi +ǫ˜i, (cid:20) Vi (cid:21) (cid:20) W∗ (cid:21) where the left-hand side is either the mean or the pth (20th, 50th and 80th) percentile of the distribution of logarithms of (unsigned) print sizes |Xi|, expressed as a fraction of daily volume Vi in a given month. The means and percentiles are calculated based on both trade-weighted and volume-weighted distributions. For each month, there is one observation for each stock i, with trading activity Wi defined as the product of the average daily dollar volume Vi·Pi and thepercentagestandarddeviationσi ofdailyreturns. ThescalingconstantW∗=(40)(106)(0.02) correspondstothetradingactivityofthebenchmarkstockwithapriceof$40pershare,trading volume of 1 million shares per day, and volatility of 2 percent per day. Newey-West standard errorsarecalculated withthreelags relativetoalineartimetrendestimated byOLSregressions fromtheestimatedcoefficientsµˆx,T andaˆx,T foreachmonth: µˆx,T =µx,0+µx,t·(T−T¯)/12+ǫ˜T andaˆx,T =ax,0+ax,t·(T−T¯)/12+ǫ˜T,whereT isthenumberofmonthsfromthebeginningof thesampleandT¯ isthemeanmonth. “Adj-R2”denotes theadjustedR2 averaged over monthly regressions,and“#ofObs”denotes thenumberofstocksaveragedovermonthlyregressions.

36 Table 4—Regression Estimates of TAQ Print Sizes, January 2001 to December 2014 Trade-Weighted Distribution Volume-WeightedDistribution Mean 20th 50th 80th Mean 20th 50th 80th µx,0 -9.029 -9.519 -9.268 -8.633 -7.420 -9.004 -8.123 -6.304 0.030 0.024 0.038 0.039 0.058 0.047 0.059 0.059 µx,t -0.097 -0.013 -0.084 -0.173 -0.192 -0.091 -0.251 -0.251 0.009 0.007 0.012 0.012 0.018 0.015 0.018 0.018 ax,0 -0.779 -0.757 -0.769 -0.811 -0.776 -0.796 -0.843 -0.779 0.003 0.003 0.003 0.004 0.007 0.006 0.009 0.010 ax,t 0.000 0.005 0.003 -0.006 -0.018 -0.005 -0.021 -0.026 0.001 0.001 0.001 0.001 0.002 0.002 0.002 0.003 Adj-R2 0.907 0.866 0.884 0.913 0.892 0.892 0.862 0.760 #ofObs 4,452 4,452 4,452 4,452 4,452 4,452 4,452 4,452 ThistablepresentsFama-MacBethestimatesµx andax fromthemonthlyregressionsofthemean andpercentilesofprintsizeontradingactivityW forthesamplefromJanuary2001toDecember 2014. Thecoefficients µx andax arebasedonmonthlyregressions ln |Xi| =µx+ax·ln Wi +ǫ˜i, (cid:20) Vi (cid:21) (cid:20) W∗ (cid:21) where the left-hand side is either the mean or the pth (20th, 50th and 80th) percentile of the distribution of logarithms of (unsigned) print sizes |Xi|, expressed as a fraction of daily volume Vi inagivenmonth. Themeansandpercentilesarecalculatedbasedonbothtrade-andvolumeweighted distributions. For each month, there is one observation for each stock i, with trading activity Wi defined as the product of the average daily dollar volume Vi·Pi and the percentage standard deviation σi of daily returns. The scaling constant W∗ = (40)(106)(0.02) corresponds to the trading activity of the benchmark stock with a price of $40 per share, trading volume of 1 million shares per day, and volatility of 2 percent per day. Newey-West standard errors are calculated with three lags relative to a linear time trend estimated by OLS regressions from the estimated coefficients µˆx,T and aˆx,T for each month: µˆx,T =µx,0+µx,t·(T −T¯)/12+ǫ˜T and aˆx,T = ax,0+ax,t·(T −T¯)/12+ǫ˜T, where T is the number of months from the beginning of thesampleandT¯ isthemeanmonth. “Adj-R2”denotes theadjustedR2 averaged over monthly regressions,and“#ofObs”denotes thenumberofstocksaveragedovermonthlyregressions.

37 Table 5—OLS Estimates of Number of TAQ Prints with Effective Volatility. AllStocks NYSE/AMEX Nasdaq 1993–2000 2001–14 1993–2000 2001–14 1993–2000 2001–14 µn,0 6.270 7.952 6.316 7.998 6.247 7.942 0.016 0.023 0.009 0.025 0.020 0.027 µn,t 0.087 0.116 0.024 0.151 0.125 0.093 0.007 0.007 0.004 0.007 0.009 0.008 aσ,0 -0.471 -0.608 -0.338 -0.475 -0.497 -0.676 0.003 0.009 0.005 0.008 0.003 0.010 aσ,t -0.007 -0.020 -0.007 -0.027 -0.023 -0.013 0.001 0.003 0.002 0.002 0.001 0.003 Adj-R2 0.918 0.955 0.924 0.955 0.917 0.959 #ofObs 6,621 4,452 2,189 1,759 4,432 2,694 RegressionwithCoefficientonEffectivePriceVolatilityaσ =0. Adj-R2 0.873 0.899 0.908 0.917 0.857 0.881 RegressionwithSeparateCoefficients forPrice,Volume,andVolatility. Adj-R2 0.928 0.970 0.940 0.974 0.923 0.975 Thistablepresents Fama-MacBethestimatesµn andaσ frommonthlyregressions ln N¯ i =µn+ 2 ·ln Wi +aσ·ln Pi·σi · Wi −1/3 +ǫ˜i. 3 W∗ P∗·σ∗ W∗ (cid:2) (cid:3) (cid:20) (cid:21) (cid:20) (cid:16) (cid:17) (cid:21) Foreachmonth,thereisoneobservationforeachstocki,withtradingactivityWi definedasthe productoftheaveragedailydollarvolumeVi·Piandthepercentagestandarddeviationσiofdaily returns. EffectivepricevolatilityisdefinedasPi·σi· W W ∗ i −1/3 ,withtheeffectivepricevolatility ofthebenchmarkstocksP∗·σ∗ equalto40·0.02. Th(cid:16)evalu(cid:17)eofN¯ iistheaveragenumberofprints perday. ThescalingconstantW∗=(40)(106)(0.02)correspondstothemeasureoftradingactivity forthebenchmarkstockwithapriceof$40pershare,tradingvolumeof1millionsharesperday, anddailyvolatilityof2percent. Newey-Weststandarderrorsarecalculatedwiththreelagsrelative toalineartimetrendestimatedbyOLSregressionsfromtheestimatedcoefficientsµˆn,T andaˆσ,T foreachmonth: µˆn,T =µn,0+µn,t·(T−T¯)/12+ǫ˜T andaˆσ,T =aσ,0+aσ,t·(T−T¯)/12+ǫ˜T,where T isthenumberofmonthsfromthebeginningofthesampleandT¯isthemeanmonth. “Adj-R2” denotes the adjusted R2 averaged over monthly regressions. The table also reports the average R2 from the restricted regressions with aσ = 0 as well as the average R2 from unconstrained regressions (19) ln N¯ =µn+ 2 3 ln W W ∗ i +b1·ln (1 V 0 i 6) +b2·ln (4 P 0 i ) +b3·ln (0 σ .0 i 2) +ǫ˜. (cid:20) (cid:21) (cid:20) (cid:21) (cid:20) (cid:21) (cid:20) (cid:21) (cid:2) (cid:3) “#ofObs”isthenumberofstocksaveragedovermonthlyregressions. Theestimatesarereported for1993–2000 and2001–14 subperiods.

38 Figure 1. Time Series of Percentiles of Scaled TAQ Print Size and Mean Number of Prints, 1993–2014. 2.5 0.0 −2.5 −5.0 1995 2000 2005 2010 2015 selitnecrep dethgiew−edart Volume Group 1(Low) 2.5 0.0 −2.5 −5.0 1995 2000 2005 2010 2015 selitnecrep dethgiew−emulov Volume Group 9/10(High) 2.5 0.0 −2.5 −5.0 1995 2000 2005 2010 2015 2.5 0.0 −2.5 −5.0 1995 2000 2005 2010 2015 80 60 40 20 0 1995 2000 2005 2010 2015 sedart fo rebmun 80 60 40 20 0 1995 2000 2005 2010 2015 20th Percentile 50th Percentile 80th Percentile Thefigureshowsthedynamicsofthe20th,50th,and80thpercentilesforlogarithmsofthepooled scaled print sizes as well as the means of the scaled number of prints per month from 1993 to 2014. Volume groups are based on average dollar trading volume with thresholds corresponding to the 30th, 50th, 60th, 70th, 75th, 80th, 85th, 90th, and 95th percentiles of the dollar volume forcommonNYSE-listedstocks. Trade-weightedpercentilesandvolume-weightedpercentilesare shown for stocks in volume group 1 (low volume) and volume groups 9 and 10 (high volume). Foreachprint,thelogarithmofscaledprintsizeiscalculated basedonthemidpointoftheprint size bin, scaled according to the model of trading game invariance—that is, ln(W i 2/3 ·|Xi|/Vi), where|Xi|isamidpointofaprintsizebininshares,Viistheaveragedailyvolumeinshares,and Wi isthemeasureoftradingactivityequaltotheproductofdollarvolumeandreturnsstandard deviation. The scaled number of prints per month is calculated as N¯ m,i·W i −2/3 , where N¯ m,i is the number of trades per month. The stock-level distributions of scaled print sizes are averaged across stocks for volume groups 1 and 9–10 in a given month. The trade- and volume-weighted percentilesareplottedonthisfigure.

39 Figure 2. The Scaled Number of TAQ Prints Relative to Trading Activity for Three Models. 8 LnN* 4 0 −4 LnW −8 0 5 10 15 20 ESYN,3991 Trading Game Invariance Invariant Bet Frequency Invariant Bet Size 15 5 LnN* LnN* 10 0 5 −5 0 LnW −10 LnW 0 5 10 15 20 0 5 10 15 20 8 LnN* 4 0 −4 LnW −8 0 5 10 15 20 QADSAN,3991 5 12 LnN* LnN* 8 0 4 −5 0 LnW −10 LnW 0 5 10 15 20 0 5 10 15 20 8 LnN* 4 0 −4 LnW 0 5 10 15 20 1002 15 5 LnN* LnN* 10 0 5 −5 0 LnW −10 LnW 0 5 10 15 20 0 5 10 15 20 10 LnN* 5 0 LnW −5 0 5 10 15 20 4102 LnN* LnN* 15 4 0 10 −4 5 LnW −8 LnW 0 5 10 15 20 0 5 10 15 20 Thefigureshowsthelogarithmofthescalednumberofprintsacrossdifferentlevelsofthelogarithm of trading activity Wi. The scaled number of prints is defined by N¯ i/W i α, with α=2/3 for the model of tradinggame invariance, α=0forthe model ofinvariant bet frequency, and α=1for themodelofinvariantbetsize. Foursubsamplesareconsidered: NYSE-listedstocksinApril1993, Nasdaq-listedstocksinApril1993,bothNYSEandNasdaqstocksinApril2001andbothNasdaq andNYSEstocksinApril2014. TradingactivityWi iscalculatedastheproductofaveragedaily dollarPi·Vi volumeandthepercentagestandarddeviationofdailyreturnsσi foragivenmonth.

40 Figure 3. Time Series of Monthly OLS Coefficient Estimates for Number of Trades, Trade-Weighted Percentiles, and Volume-Weighted Percentiles, 1993– 2014. Panel A: Coefficients for Number of Trades 0.85 0.80 0.75 0.70 2/3 0.65 1995 2000 2005 2010 2015 Panel B: Coefficients for Trade−Weighted Percentiles −2/3 −0.7 −0.8 1995 2000 2005 2010 2015 Panel C: Coefficients for Volume−Weighted Percentiles −0.4 −0.6 −2/3 −0.8 −1.0 1995 2000 2005 2010 2015 20th Percentile 50th Percentile 80th Percentile The figure shows the dynamics of coefficients from regressions of number of prints and various percentilesonthemeasureoftradingactivityWifrom1993to2014. PanelAshowsthecoefficient an frommonthlyregressions ln N¯ i =µn+an·ln Wi +ǫ˜i, W∗ (cid:20) (cid:21) (cid:2) (cid:3) whereN¯ i is the average number of prints per dayinagiven month. Themodel of trading game invariance predicts an = 2/3, and alternative models predict that an = 0 or an = 1. Panel B showsthecoefficientax frommonthlyregressions ln X˜ i =µx+ax·ln Wi +ǫ˜i, "Vi# (cid:20) W∗ (cid:21) where the left-hand side is the pth (20th, 50th and 80th) percentiles of the distribution of logarithmsofprintsizesX˜ i. Themodeloftradinggameinvariancepredictsax=−2/3,andalternative models predict that ax =0 or ax =−1. Panel C shows the coefficient ax from similar monthly regressions, but these regressions are based on percentiles Qp, where percentiles are calculated i basedonthecontributiontototaltradingvolume. Themodeloftradinggameinvariancepredicts ax = −2/3, and alternative models predict that ax = 0 or ax = −1. Trading activity Wi is defined as the product of dollar volume and daily percentage standard deviation of returns, and W∗ measurestradingactivityofthebenchmarkstock.

41 Figure 4. Trade-Weighted and Volume-Weighted Distributions of Scaled TAQ PrintSize for Three Models, NYSE-Listed Stocks, April 1993 0.3 0.2 0.1 0.0 −9 0 7 1 puorg emulov Trading Game Invariance 0.3 0.2 0.1 0.0 −9 0 7 8−2 puorg emulov 0.3 0.2 0.1 0.0 −9 0 7 01−9 puorg emulov Invariant Bet Frequency Invariant Bet Size Trading Game Invariance Invariant Bet Frequency Invariant Bet Size 0.3 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 −16 0 −6 0 10 −7 0 9 −14 0 2 −4 0 12 0.3 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 −16 0 −6 0 10 −7 0 9 −14 0 2 −4 0 12 0.3 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 −16 0 −6 0 10 −7 0 9 −14 0 2 −4 0 12 Panel A: Trade−Weighted Distributions Panel B: Volume−Weighted Distributions This figure shows the distribution of the logarithm of scaled print sizes for three different models for NYSE-listed stocks traded in April 1993. The printsizesarescaledasWα·|X |/V ,withα=2/3forthe modelofinvariantbetfrequency, α=0forthe modelofinvariantbetfrequency, i i i and α=1 for the model of invariantbet size. Trading activity W is calculated as the product of dollar volume P ·V andthe daily percentage i i i standard deviation of returns σ . Panel A shows trade-weighteddistributions, and panel B shows volume-weighted distributions. The subplots i showstock-leveldistributionsaveragedacrossstocksinvolumegroup1(lowvolume),volumegroups2–8,andvolumegroups9–10(highvolume). Volume groups are based on averagedollar trading volume with thresholds corresponding to the 30th, 50th, 60th, 70th, 75th, 80th, 85th, 90th, and 95th percentiles of the dollar volume for NYSE-listed common stocks.

42 Figure 5. Trade-Weighted and Volume-Weighted Distributions of Scaled TAQ Print Size for Three Models, Nasdaq- Listed Stocks, April 1993 0.4 0.3 0.2 0.1 0.0 −8 0 8 1 puorg emulov Trading Game Invariance 0.4 0.3 0.2 0.1 0.0 −8 0 8 8−2 puorg emulov 0.4 0.3 0.2 0.1 0.0 −8 0 8 01−9 puorg emulov Invariant Bet Frequency Invariant Bet Size Trading Game Invariance Invariant Bet Frequency Invariant Bet Size 0.4 0.4 0.4 0.4 0.4 0.3 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 −14 0 2 −5 0 11 −7 0 9 −12 0 4 −4 0 12 0.4 0.4 0.4 0.4 0.4 0.3 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 −14 0 2 −5 0 11 −7 0 9 −12 0 4 −4 0 12 0.4 0.4 0.4 0.4 0.4 0.3 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 −14 0 2 −5 0 11 −7 0 9 −12 0 4 −4 0 12 Panel A: Trade−Weighted Distributions Panel B: Volume−Weighted Distributions This figure shows the distribution of the logarithm of scaled print sizes for three different models for Nasdaq-listed stocks traded in April 1993. The print sizes are scaled as Wα·|X |/V , with α = 2/3 for the model of invariant bet frequency, α = 0 for the model of invariant bet i i i frequency, and α=1 for the model of invariant bet size. Trading activity W is calculated as the product of dollar volume P ·V and the daily i i i percentage standard deviation of returns σ . Panel A shows trade-weighted distributions, and panel B shows volume-weighted distributions. i Thesubplotsshowstock-leveldistributionsaveragedacrossstocksinvolumegroup1(lowvolume),volumegroups2–9,andvolumegroups9–10 (high volume). Volume groups are based on average dollar trading volume with thresholds corresponding to the 30th, 50th, 60th, 70th, 75th, 80th, 85th, 90th, and 95th percentiles of the dollar volume for Nasdaq-listed common stocks.

43 Figure 6. Trade-Weighted Distributions of Scaled TAQ Print Sizes, NYSE-Listed Stocks, April 1993 M=111 0.5 N=27 0.4 0.3 0.2 0.1 0.0 −7 0 5 1 puorg vpe volume group 1 volume group 4 volume group 7 volume group 9 volume group 10 M=38 M=14 M=15 M=25 0.5 0.5 0.5 0.5 N=126 N=301 N=667 N=1,139 0.4 0.4 0.4 0.4 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 −7 0 5 −7 0 5 −7 0 5 −7 0 5 M=160 0.5 N=18 0.4 0.3 0.2 0.1 0.0 −7 0 5 2 puorg vpe M=37 M=26 M=20 M=14 0.5 0.5 0.5 0.5 N=106 N=178 N=295 N=553 0.4 0.4 0.4 0.4 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 −7 0 5 −7 0 5 −7 0 5 −7 0 5 M=181 0.5 N=14 0.4 0.3 0.2 0.1 0.0 −7 0 5 3 puorg vpe M=36 M=10 M=12 M=7 0.5 0.5 0.5 0.5 N=82 N=161 N=237 N=1,028 0.4 0.4 0.4 0.4 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 −7 0 5 −7 0 5 −7 0 5 −7 0 5 M=155 0.5 N=12 0.4 0.3 0.2 0.1 0.0 −7 0 5 4 puorg vpe M=9 M=5 M=4 M=1 0.5 0.5 0.5 0.5 N=71 N=100 N=234 N=285 0.4 0.4 0.4 0.4 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 −7 0 5 −7 0 5 −7 0 5 −7 0 5 dollar volume effective price volatility This figure shows distributions of the logarithms of scaled print sizes for NYSE stocks in April 1993. For each trade, the scaled print size is calculated as ln(W 2/3 ·|X |/V ) based on the invariance hypothesis, where |X | is the midpoint of the print size bin in shares, V is the i i i i i average daily volume in shares, and W measures trading activity as the product of dollar volume and the daily percentage standard deviation i of returns. Ten volume groups are constructed based on average dollar trading volume with thresholds corresponding to the 30th, 50th, 60th, 70th, 75th, 80th, 85th, 90th, and 95th percentiles of the dollar volume for common NYSE-listed stocks. Four equally spaced volatility groups are constructed based on effective price volatility, defined as P ·σ ·(W /W )−1/3 . The subplots show stock-leveldistributions averagedacross i i i ∗ stocks for volume groups 1 (low volume), 4, 7, 9, and 10 (high volume) and for all four price volatility groups 1 (low price volatility), 2, 3, and 4 (high price volatility). The 100-share trades are highlighted in light gray; the 1,000-share trades are highlighted in dark gray. Each subplot also shows a normal distribution with the pooled average print size mean of -1.15 and standard deviation of 1.38. M is the number of stocks, and N is the averagenumber of prints per day for the stocks in a given subgroup.

44 Figure 7. Volume-Weighted Distributions of Scaled TAQ Print Sizes, NYSE-Listed Stocks, April 1993 0.4 N=27 0.3 M=111 0.2 0.1 0.0 −5 0 7 1 puorg vpe volume group 1 volume group 4 volume group 7 volume group 9 volume group 10 0.4 0.4 0.4 0.4 N=126 N=301 N=667 N=1,139 0.3 0.3 0.3 0.3 M=38 M=14 M=15 M=25 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 −5 0 7 −5 0 7 −5 0 7 −5 0 7 0.4 N=18 0.3 M=160 0.2 0.1 0.0 −5 0 7 2 puorg vpe 0.4 0.4 0.4 0.4 N=106 N=178 N=295 N=553 0.3 0.3 0.3 0.3 M=37 M=26 M=20 M=14 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 −5 0 7 −5 0 7 −5 0 7 −5 0 7 0.4 N=14 0.3 M=181 0.2 0.1 0.0 −5 0 7 3 puorg vpe 0.4 0.4 0.4 0.4 N=82 N=161 N=237 N=1,028 0.3 0.3 0.3 0.3 M=36 M=10 M=12 M=7 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 −5 0 7 −5 0 7 −5 0 7 −5 0 7 0.4 N=12 0.3 M=155 0.2 0.1 0.0 −5 0 7 4 puorg vpe 0.4 0.4 0.4 0.4 N=71 N=100 N=234 N=285 0.3 0.3 0.3 0.3 M=9 M=5 M=4 M=1 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 −5 0 7 −5 0 7 −5 0 7 −5 0 7 dollar volume effective price volatility This figure shows distributions of total volume across different scaled print size bins for the NYSE stocks in April 1993. For each stock, the volume distribution is calculated as the contribution to the total volume by trades from a given trade size bin. The x-axis is the log of scaled print sizes, defined by ln(W 2/3 · |Xi|) according to the invariance hypothesis, where |X | is a print size in shares (midpoint of a bin), V is i Vi i i the average daily volume in shares, and W is the measure of trading activity equal to the product of dollar volume and returns standard i deviation. Ten volume groups are constructed based on average dollar trading volume with thresholds corresponding to the 30th, 50th, 60th, 70th, 75th, 80th, 85th, 90th, and 95th percentiles of the dollar volume for NYSE-listed common stocks. Four equally spaced volatility groups are constructed based on effective price volatility, defined as P ·σ ·(W /W )−1/3. The subplots show stock-leveldistributions averagedacross i i i ∗ stocksfor volumegroups1(low volume),4,7,9,and10(highvolume)andforallfourprice volatilitygroups1 (lowprice volatility),2, 3,and4 (high price volatility). The 100-sharetrades are highlighted in lightgray,and the 1,000-sharetrades are highlighted in dark gray. Eachsubplot also showsa normaldistribution with the pooled averageprintsize meanof 1.1and standarddeviationof1.74. M is the number ofstocks,and N is the average number of prints per day for the stocks in a given subgroup.

45 Figure 8. Trade-Weighted Distributions of Scaled TAQ Print Sizes, Nasdaq-Listed Stocks, April 1993 1.0 M=565 0.8 N=16 0.6 0.4 0.2 0.0 −6 0 6 1 puorg vpe volume group 1 volume group 4 volume group 7 volume group 9 volume group 10 1.0 1.0 1.0 1.0 M=10 M=5 M=1 M=7 0.8 0.8 0.8 0.8 N=204 N=247 N=526 N=857 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 −6 0 6 −6 0 6 −6 0 6 −6 0 6 1.0 M=483 0.8 N=14 0.6 0.4 0.2 0.0 −6 0 6 2 puorg vpe 1.0 1.0 1.0 1.0 M=23 M=7 M=5 M=5 0.8 0.8 0.8 0.8 N=117 N=226 N=413 N=1,481 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 −6 0 6 −6 0 6 −6 0 6 −6 0 6 1.0 M=616 0.8 N=11 0.6 0.4 0.2 0.0 −6 0 6 3 puorg vpe 1.0 1.0 1.0 1.0 M=21 M=5 M=2 M=1 0.8 0.8 0.8 0.8 N=106 N=159 N=460 N=490 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 −6 0 6 −6 0 6 −6 0 6 −6 0 6 1.0 M=1,391 0.8 N=6 0.6 0.4 0.2 0.0 −6 0 6 4 puorg vpe 1.0 1.0 1.0 1.0 M=12 M= M=1 M=1 0.8 0.8 0.8 0.8 N=106 N= N=365 N=512 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 −6 0 6 −6 0 6 −6 0 6 −6 0 6 dollar volume effective price volatility This figure shows distributions of the logarithms of scaled print sizes for Nasdaq stocks in April 1993. For each trade, the scaled print size is calculated as ln(W 2/3 ·|X |/V ) based on the invariance hypothesis, where |X | is the midpoint of the print size bin in shares, V is the i i i i i average daily volume in shares, and W measures trading activity as the product of dollar volume and the daily percentage standard deviation i of returns. Ten volume groups are constructed based on average dollar trading volume with thresholds corresponding to the 30th, 50th, 60th, 70th, 75th, 80th, 85th, 90th, and 95th percentiles of the dollar volume for common NYSE-listed stocks. Four equally spaced volatility groups are constructed based on effective price volatility, defined as P ·σ ·(W /W )−1/3 . The subplots show stock-leveldistributions averagedacross i i i ∗ stocks for volume groups 1 (low volume), 4, 7, 9, and 10 (high volume) and for all four price volatility groups 1 (low price volatility), 2, 3, and 4 (high price volatility). The 100-share trades are highlighted in light gray; the 1,000-share trades are highlighted in dark gray. Each subplot also shows a normal distribution with the pooled average print size mean of -0.19 and standard deviation of 1.39. M is the number of stocks, and N is the averagenumber of prints per day for the stocks in a given subgroup.

46 Figure 9. Trade-Weighted Distributions of Scaled TAQ Print Sizes, All Stocks, April 2001 1.0 0.8 M=868 0.6 N=103 0.4 0.2 0.0 −7 0 5 1 puorg vpe volume group 1 volume group 4 volume group 7 volume group 9 volume group 10 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 M=43 M=23 M=22 M=41 0.6 0.6 0.6 0.6 N=1,297 N=2,675 N=4,510 N=16,230 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 −7 0 5 −7 0 5 −7 0 5 −7 0 5 1.0 0.8 M=838 0.6 N=63 0.4 0.2 0.0 −7 0 5 2 puorg vpe 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 M=64 M=25 M=28 M=37 0.6 0.6 0.6 0.6 N=952 N=1,841 N=3,243 N=17,216 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 −7 0 5 −7 0 5 −7 0 5 −7 0 5 1.0 0.8 M=981 0.6 N=46 0.4 0.2 0.0 −7 0 5 3 puorg vpe 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 M=61 M=27 M=21 M=9 0.6 0.6 0.6 0.6 N=802 N=1,810 N=6,568 N=12,633 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 −7 0 5 −7 0 5 −7 0 5 −7 0 5 1.0 0.8 M=1,666 0.6 N=29 0.4 0.2 0.0 −7 0 5 4 puorg vpe 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 M=57 M=15 M=2 M=3 0.6 0.6 0.6 0.6 N=809 N=1,765 N=1,944 N=13,284 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 −7 0 5 −7 0 5 −7 0 5 −7 0 5 dollar volume effective price volatility ThisfigureshowsdistributionsofthelogarithmsofscaledprintsizesforNYSEandNasdaqstocksinApril2001. Foreachtrade,thescaledprint size is calculated as ln(W 2/3 ·|X |/V ) based on the invariance hypothesis, where |X | is the midpoint of the print size bin in shares, V is the i i i i i average daily volume in shares, and W measures trading activity as the product of dollar volume and the daily percentage standard deviation i of returns. Ten volume groups are constructed based on average dollar trading volume with thresholds corresponding to the 30th, 50th, 60th, 70th, 75th, 80th, 85th, 90th, and 95th percentiles of the dollar volume for common NYSE-listed stocks. Four equally spaced volatility groups are constructed based on effective price volatility, defined as P ·σ ·(W /W )−1/3 . The subplots show stock-leveldistributions averagedacross i i i ∗ stocks for volume groups 1 (low volume), 4, 7, 9, and 10 (high volume) and for all four price volatility groups 1 (low price volatility), 2, 3, and 4 (high price volatility). The 100-share trades are highlighted in light gray; the 1,000-share trades are highlighted in dark gray. Each subplot also shows a normal distribution with the pooled average print size mean of −1.35 and standard deviation of 1.33. M is the number of stocks, and N is the averagenumber of prints per day for the stocks in a given subgroup.

47 Figure 10. Trade-Weighted Distributions of Scaled TAQ Print Sizes, All Stocks, April 2014 1.0 M=464 0.8 N=2,249 0.6 0.4 0.2 0.0 −8 0 4 1 puorg vpe volume group 1 volume group 4 volume group 7 volume group 9 volume group 10 1.0 1.0 1.0 1.0 M=35 M=22 M=17 M=30 0.8 0.8 0.8 0.8 N=20,056 N=33,004 N=47,322 N=94,858 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 −8 0 4 −8 0 4 −8 0 4 −8 0 4 1.0 M=483 0.8 N=1,499 0.6 0.4 0.2 0.0 −8 0 4 2 puorg vpe 1.0 1.0 1.0 1.0 M=38 M=24 M=22 M=19 0.8 0.8 0.8 0.8 N=13,986 N=22,883 N=33,185 N=57,095 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 −8 0 4 −8 0 4 −8 0 4 −8 0 4 1.0 M=622 0.8 N=1,115 0.6 0.4 0.2 0.0 −8 0 4 3 puorg vpe 1.0 1.0 1.0 1.0 M=50 M=17 M=18 M=5 0.8 0.8 0.8 0.8 N=10,246 N=16,999 N=27,644 N=38,318 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 −8 0 4 −8 0 4 −8 0 4 −8 0 4 1.0 M=862 0.8 N=654 0.6 0.4 0.2 0.0 −8 0 4 4 puorg vpe 1.0 1.0 1.0 1.0 M=54 M=12 M=11 M=17 0.8 0.8 0.8 0.8 N=7,426 N=11,691 N=23,617 N=41,295 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 −8 0 4 −8 0 4 −8 0 4 −8 0 4 dollar volume effective price volatility ThisfigureshowsdistributionsofthelogarithmsofscaledprintsizesforNYSEandNasdaqstocksinApril2014. Foreachtrade,thescaledprint size is calculated as ln(W 2/3 ·|X |/V ) based on the invariance hypothesis, where |X | is the midpoint of the print size bin in shares, V is the i i i i i average daily volume in shares, and W measures trading activity as the product of dollar volume and the daily percentage standard deviation i of returns. Ten volume groups are constructed based on average dollar trading volume with thresholds corresponding to the 30th, 50th, 60th, 70th, 75th, 80th, 85th, 90th, and 95th percentiles of the dollar volume for common NYSE-listed stocks. Four equally spaced volatility groups are constructed based on effective price volatility, defined as P ·σ ·(W /W )−1/3 . The subplots show stock-leveldistributions averagedacross i i i ∗ stocks for volume groups 1 (low volume), 4, 7, 9, and 10 (high volume) and for all four price volatility groups 1 (low price volatility), 2, 3, and 4 (high price volatility). The 100-share trades are highlighted in light gray; the 1,000-share trades are highlighted in dark gray. Each subplot also shows a normal distribution with the pooled average print size mean of −2.25 and standard deviation of 1.10. M is the number of stocks, and N is the averagenumber of prints per day for the stocks in a given subgroup.

Cite this document
APA
Albert S. Kyle, Anna A. Obizhaeva, & and Tugkan Tuzun (2016). Microstructure Invariance in U.S. Stock Market Trades (FEDS 2016-034). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2016-034
BibTeX
@techreport{wtfs_feds_2016_034,
  author = {Albert S. Kyle and Anna A. Obizhaeva and and Tugkan Tuzun},
  title = {Microstructure Invariance in U.S. Stock Market Trades},
  type = {Finance and Economics Discussion Series},
  number = {2016-034},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2016},
  url = {https://whenthefedspeaks.com/doc/feds_2016-034},
  abstract = {This paper studies invariance relationships in tick-by-tick transaction data in the U.S. stock market. Over the period 1993-2001, the estimated monthly regression coefficients of the log of trade arrival rate on the log of trading activity have an almost constant value of 0.666, strikingly close to the value of 2/3 predicted by invariance hypothesis. Over the period 2001-2014, the estimated coefficients rise, and their average value is equal to 0.79, suggesting that the reduction in tick size in 2001 and subsequent increase in algorithmic trading resulted in a more intense order shredding in more liquid stocks. The distributions of trade sizes, adjusted for differences in trading activity, resemble a log-normal before 2001; there are clearly visible truncation at the round-lot boundary and clustering of trades at even-levels. These distributions change dramatically over the period 2001-2014 with their means shifting downwards. The invariance hypothesis explains about 88% of the cross-sectional variation in trade arrival rates and average trade sizes; additional explanatory variables include invariance-implied measure of effective price volatility.},
}