feds · October 31, 2008

The Rigidity of Choice. Lifecycle savings with information-processing limits

Abstract

This paper studies the implications of information-processing limits on the consumption and savings behavior of households through time. It presents a dynamic model in which consumers rationally choose the size and scope of the information they want to process concerning their financial possibilities, constrained by a Shannon channel. The model predicts that people with higher degrees of risk aversion rationally choose more information. This happens for precautionary reasons since, with finite processing rate, risk averse consumers prefer to be well informed about their financial possibilities before implementing a consumption plan. Moreover, numerical results show that consumers with processing capacity constraints have asymmetric responses to shocks, with negative shocks producing more persistent effects than positive ones. This asymmetry results in more savings. I show that the predictions of the model can be effectively used to study the impact of tax reforms on consumers spending.

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. The Rigidity of Choice. Lifecycle savings with information-processing limits Antonella Tutino 2008-62 NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

The Rigidity of Choice. Lifecycle savings with information-processing limits Antonella Tutino (cid:3) First version: November 2007 ; This version: October 2008 Abstract This paper studies the implications of information-processing limits on the consumption and savings behavior of households through time. It presents a dynamic model in which consumers rationally choose the size and scope of the information theywanttoprocess concerning their(cid:133)nancial possibilities, constrained bya Shannon channel. The model predicts that people with higher degrees of risk aversion rationally choose more information. This happens for precautionary reasons since, with (cid:133)nite processing rate, risk averse consumers prefer to be well informed about their (cid:133)nancial possibilities before implementing a consumption plan. Moreover, numerical results show that consumers with processing capacity constraints have asymmetric responses to shocks, with negative shocks producing more persistent e⁄ects than positive ones. This asymmetry results in more savings. I show thatthe predictions of the model can be e⁄ectively used to study the impact of tax reforms on consumers spending. COMMENTS WELCOME. E-mail: Antonella.Tutino@frb.gov. I am deeply indebted to Chris (cid:3) Sims whose countless suggestions, shaping in(cid:135)uence and guidance were essential to improve the quality of this paper. I thank Ricardo Reis for valuable advice, enduring and enthusiastic support. I am grateful to Per Krusell for his insightful advice and stimulating discussions. I would also like to thank Mike Kiley, Nobu Kiyotaki, Angelo Mele, Philippe-Emanuel Petalas, John M. Roberts, Charles Roddie, Sam Schulhofer-Wohl, Tommaso Treu, Mark Watson and Mirko Wiederholt. Finally I thank numerous seminarparticipantsformanyhelpfulcommentsanddiscussions. Anyremaningerrorsaremyown. The views in this paper are solely the responsibility of the author and should not be interpreted as re(cid:135)ecting theviewsoftheFederalReserveBoardoranyotherpersonassociatedwiththeFederalReserveSystem. 1

Information is, we must steadily remember, a measure of one(cid:146)s freedom of choice in selecting a message. The greater this freedom of choice, and hence the greater the information, the greater is the uncertainty that the message actually selected is some particular one. Thus greater freedom of choice, greater uncertainty, greater information go hand in hand. (Claude Shannon, sic.) 1 Introduction Every day people face an overwhelming amount of data. Every day, though, people use these for their decisions. In selecting useful information, people face a trade o⁄ between reacting quickly and precisely to news about their (cid:133)nancial possibilities and not spending time crunching numbers to (cid:133)gure out their exact net worth. To match these facts, macroeconomists have adopted a number of modelling strategies able to inject inertia within the rational expectation framework. These devices, such as the costly acquisition and di⁄usion of information, largely rely on ad-hoc technology to generate smooth and delayed responses of consumption to a shock to income consistent with observed data. Contrary to this approach, this paper proposes a way to relate inertial behavior in consumption and savings based on people(cid:146)s preferences. To this end, the paper o⁄ers a micro-founded explanation on the nature of inertia in consumption and savings. Following Rational Inattention (Sims, 2003, 2006), I model the limits of people to process information at an in(cid:133)nite rate by using Shannon channels. Under this information processing constraint, individuals choose a signal that conveys information about their (cid:133)nancial possibilities. The signal can provide any kind of information as long as its overall content is within the channel(cid:146)s capacity. Consumers base their expectations of the economic conditions on the signal and decide how much to consume. Thus, in my framework, the delayed and smoothed responses of savings to changes in wealth are the result of a slow information (cid:135)ow due to processing constraints. Combining the standard utility maximization framework subject to a budget constraint with information processing limits leads to a departure from rational expectations. My paper shows how to model this formally in an intertemporal setting. In particular, I assume that people do not know the exact value of their wealth but have an idea of their net worth. A way of thinking about this hypothesis is that people do not know exactly of what the dollar value of their paycheck (nominal) corresponds to in terms of cups of co⁄ee (real), assuming that this is what they care about. People process information to sharpen their knowledge of how much consumption their wealth can purchase. I model initial uncertainty as a probability distribution over the possible realizations of wealth. In such a framework, it is possible to study how choices of information play out with people(cid:146)s preferences when they decide on consumption throughout their life time. The challenge of this model and, more generally, of models of rational inattention is dealing with the in(cid:133)nite dimensional state space implied by having a prior as state. For this reason, the applications of rational inattention have been limited to either a linear 2

quadraticframeworkwhereGaussianuncertaintyhasbeenconsidered(suchasSims1998, 2003, Luo 2007, Mackowiak and Wiederholt 2007, Mondria, 2006, Moscarini 2004) or a two-period consumption-saving problem (Sims 2006) where the choice of optimal ex post uncertaintyisanalyzedforthecaseoflogutilityandtwoConstantRelativeRiskAversion (CRRA) utility speci(cid:133)cations. The linear quadratic Gaussian (LQG) framework can be seen as a particular instance of rational inattention in which the optimal distribution chosenbythehouseholdturnsouttobeGaussian. Gaussianityhastwomainadvantages. First, it allows an explicit analytical solution for these models. One can show that the problem can be solved in two steps. First, the information gathering scheme is found and then, given the optimal information, the consumption pro(cid:133)le. Second, it is easy to compare the results to a signal extraction problem. When looking at the behavior of rationalinattentiveconsumers,itisimpossibletoseparateanexogenouslygivenGaussian noise in the signal extraction model from an endogenous noise that is optimally chosen to be Gaussian. The tractability of rational inattention LQG models comes at the cost of restrictive assumptions on preferences and the nature of the signal. Constraining uncertainty of the individual to a quadratic loss / certainty equivalent setting does not take into account the possibility that the agent is very uncertain about his economic environment; ceteris paribus, more uncertainty generates second-order e⁄ects of information that have (cid:133)rst order impact on individuals(cid:146)decisions. In this sense, rational inattention LQG models are subject to the same limits as methods that use linear approximation of optimality conditions to study stochastic dynamic models.1 With little uncertainty about the economic environment, linear approximations of the optimality conditions may provide a fairly adequate description of the exact solution of the system. This fact suggests that the uncertainty at the individual level might actually be large, undermining the accuracy of both linearized and rational inattention LQG models. To assess the importance of information choices for people(cid:146)expectations, it is important to let consumers select their information from a wider set of distributions that includes but it is not limited to the Gaussian family. The theoretical contribution of this paper is to provide the analytical and computational tools necessary to apply information theory in a dynamic context with optimal choice of ex-post uncertainty. I propose a methodology to handle the additional complexity without the LQG setting. I propose a discretization of the framework and derive its theoretical properties. Then, I provide a computational strategy that is able to solve the model. Several predictions emerge from the model. Evaluating the unconditional moments of the time series of consumption for a given degree of risk aversion, the (cid:133)rst result of the paper is that higher information costs are associated with more persistence and higher volatility. The seemingly paradoxical results of having sluggish and volatile consumption 1Since the work of Hall (1982), the assumption of certainty equivalence has also been questioned in the consumption savings literature with no information friction, starting from e.g. Blanchard and Mankiw (1988). 3

at the same time can be reconciled if one considers that information-processing constraints prevent the consumers to respond promptly to (cid:135)uctuations in wealth. To make a concrete example, suppose a person starts o⁄ with low wealth and initially chooses to consume a little. If he is risk averse, he may decide not to modify his consumption pro(cid:133)le until he acquires more information about his wealth. As he processes information through time, he gets more and more data about his high value of wealth and changes his consumption when he is sure that he has saved enough to a⁄ord a higher consumption expenditure. The more risk averse the consumer is, the longer he waits. The longer the wait, the more wealth grows because of the accumulation of savings and current income. The combination of waiting while processing information and sharp changes once information has been processed through time generates sluggishness and volatility in consumption. Second, by looking at the life-cycle pro(cid:133)le of consumption I (cid:133)nd that the behavior of consumptionissmoothandpersistentwithseveralpeaksalongthesimulatedpath. These peaks in consumption occur later in life for people that have access to low information (cid:135)ow. This e⁄ect is stronger as risk aversion increases. The logic behind this result is that risk averse consumers react to uncertainty by processing more information on their low values of wealth and keep their consumption low as a precaution until the uncertainty is diminished. Theyaccumulatemoresavingsthroughoutearlyadultlifethantheirin(cid:133)niteinformation-processing counterparts. They keep saving until the accumulation of wealth and information indicates that they can enjoy a high consumption pro(cid:133)le. This leads also to the (cid:133)nding that individual consumption can have more than one hump along its path as wealth accumulates through time. The key point is that individuals can vary their information (cid:135)ow during their life time. To see why, suppose that a person receives signals that his wealth is low. In this case, he wants to pay attention to his expenses and closely monitor the activity of his account. Once he makes sure that he has saved enough, he may decide to spend less e⁄ort monitoring his balance and enjoy consumption. Decumulation of savings continues until he receives information that he has emptied his checking account. This news call for his attention again, so he starts saving and monitor his balance more frequently than before. These results combined are suggestive of a precautionary motive for savings driven by information processing limits. Third, I (cid:133)nd that consumers with processing capacity constraints have asymmetric responses to income (cid:135)uctuations, with negative shocks producing sharper and more persistent e⁄ects than positive ones. This e⁄ect is stronger as the degree of risk aversion increases. Comparedwithasituationinwhichtherearenoinformation-processinglimits, inarational-inattentionconsumption-savingsmodel, anadversetemporaryincomeshock makes consumers reduce their consumption for a longer period of time. This happens because risk-averse people who receive bad news about their (cid:133)nances save right away to hedge against the possibility of running out of wealth in the future. Once they have enough savings and information, they gradually increase their consumption and smooth the remaining e⁄ect of the shock over time. This result also points toward precautionary motive due to information-processing limits. 4

Finally, I (cid:133)nd that the predictions of the model can be used to address important policy questions. In the context of (cid:133)scal reforms of consumer spending, I show that, as wealth decreases, rationally inattentive consumers respond faster to a tax rebate that increases their income by 10%. For a given level of wealth, the lower the processing capacity, the longer it takes for consumption to react to shocks to disposable income. These (cid:133)ndings make intuitive sense. A tax rebate matters more for people with lower income and, as a result, tighter budgetary constraints than for wealthy people.2 As a result, poorer people acknowledge and react faster to the positive income shocks. By contrast, wealthypeopledonotperceivetheincreaseindisposableincomeasasigni(cid:133)cant change in their (cid:133)nancial position. Thus, consumption for wealthy people does not change signi(cid:133)cantly, instead it adjusts slowly over time. Consider an individual that has wealth and in(cid:133)nite processing capacities. The reaction of consumption to a temporary positive income shock would be to adjust immediately to a new higher value of consumption so to smooth out the e⁄ect of the shock throughout time. With limited processing capacity, the individual smooths consumption slowly over time because the e⁄ect of the increase in disposable income on wealth spread out slowly through time. These predictions are in line with the empircal evidence on tax rebate (e.g., Johnson, Parker, and Soules (2006)). My results are observational distinct fromthe previous literature on consumption and information (e.g., Reis (2006)). The distinguishing feature of my model with respect to previous works is its ability to generate endogenously asymmetric response of consumption to shocks.3 Finally, my paper contributes to the literature that models how people form endogenously expectations and react to the economy on the basis of their rationally chosen information.4 The paper is organized as follows. Section 2 lays out the theoretical basis of rational inattention and informally introduces the model. Section 3 states the problem of the consumersasadiscretestochasticdynamicprogrammingproblem,whileSection4derives the properties of the Bellman function. Section 5 provides the numerical methodology usedtosolvethemodel. Section6deliversitsmainresults. Bycomparingthepredictions of the model on the preliminary evidence on tax rebates, I (cid:133)nd that the model can be a valid instrument to address the impact of tax reforms on consumer spending. Section 7 concludes. 2Anotherwayoflookingatitisthatpeoplewithlowerincomearegenerallymoreliquidityconstrained. This makes their marginal propensity to consume to a positive shock closer to one than the wealthier people. 3In particular, for a given degree of risk aversion and magnitude of a shock, the response of consumption to a negative shock is stronger on impact and more persistent than the one to a positive shock. 4A necessarily non-exhaustive list of papers that address the issue of modeling consumers(cid:146)expectations includes the absent-minded consumer model proposed by Ameriks, Caplin and Leahy (2003), together with Mullainathan (2002) and Wilson (2005), whose models feature agents with imperfect recall. Mankiw and Reis (2002) develop a di⁄erent model in which information disseminates slowly due to infrequent update of information. 5

2 Foundations of Rational Inattention Rational inattention (Sims 1988,5 1998, 2003, 2005, 2006) blends two main (cid:133)elds: Information Theory and Economics. The (cid:133)rst draws mainly on the work of Shannon (1948). The main contribution is to de(cid:133)ne a measure of the choice involved in the selection of the message and the uncertainty regarding the outcome. The measure used is entropy. Details on this part are in Appendix F. Based on Shannon(cid:146)s apparatus, the economic contribution is that of using Shannon capacity as a technological constraint to capture individuals(cid:146)inability of processing information about the economy at in(cid:133)nite rate. Given theselimits, peoplereducetheiruncertaintybyselectingthefocusof theirattention. The resulting behavior depends on the choices of what to observe of the environment once the information-processing frictions are acknowledged. 2.1 The Economics of Rational Inattention Consider a person who wants to buy lunch. He doesn(cid:146)t know his exact wealth but he knowsthathehassomecashandacreditcard. Notrecallingtheexpenseschargedonthe credit card up to that point, he can go to the bank or simply check his wallet. Going to the bank to (cid:133)gure out his wealth for lunch is beyond his time and interest, so he decides tocheckhiswallet. Hebrowsesthroughitthinkingaboutwhathewantsandwhathecan a⁄ord to buy for lunch. Mapping dollar bills into his knowledge of prices from previous consumption, he realizes he can only a⁄ord a sandwich instead of his favorite sushi roll. Then, he uses the receipt to update his prior on the price of sandwiches, what he thinks he has left in his wallet and, ultimately, his wealth. This updated knowledge will be used for his next purchase. Such a story can be directly mapped into a rational inattention framework. First, the person does not know his wealth, W, but he has a prior on it, p(W). Before processing any information, his uncertainty about wealth is the entropy of his prior, (W) E[log (p(W))], where E[:] denotes the expectation operator.6 Before 2 H (cid:17) (cid:0) processing any information, lunch too is a random variable, C, ranging from sandwiches to sushi. To reduce entropy, he can choose whether to have a detailed report from the bank or to look at his wallet. The two options di⁄er in amount of information and e⁄ort in processing their content. The choice of the option (signal) together with consumption result in a joint probability p(c;w). Both dollars in the wallet and knowledge of prices of sandwiches and sushi contribute to the reduction of uncertainty in wealth of an amount equalto (W C) = p(w;c)log p(w c)dcdw,whichistheentropyofW thatremains 2 H j (cid:0) j given the knowledge of C. The information (cid:135)ow, or maximum reduction of uncertainty R 5Thebulkoftheideaofrational inattention canbefoundinC.Sims(cid:146)1988commentintheBrooking Papers on Economic Activity . 6Entropy is a universal measure of uncertainty that can be de(cid:133)ned for a density against any base measure. The standard convention is to use base 2 for the logarithms, so that the resulting unit of information is binary and called a bit, and to attribute zero entropy to the events for which p = 0. Formally,giventhatslog(s)isacontinuousfunctionons [0; ),byl(cid:146)HopitalRulelim slog(s)=0. s 0 2 1 ! 6

abouttheprioronwealth, isboundedbytheinformationthattheselectedsignalconveys. In formulae: I(C;W) = (W) (W C) (cid:20) (1) H (cid:0)H j (cid:20) where (cid:20) is measured in number of bits transmitted. Finally, the signal -peeking at the wallet, p(w;c)-andthereceiptforthesandwich, c(cid:22), areusedtoupdatetheprioronwealth via Bayes(cid:146)rule and the update is then carried over for future purchases. The example illustrates how people handle everyday decision weighing the e⁄ort of processing all the available information (personal net worth), against the precision of the information they can absorb (walking to the bank versus checking the wallet) guided by theirinterest(buyinglunch). Thisisthecoreofrationalinattention: informationisfreely available but people can only process it at (cid:133)nite rate. Information-processing limits make attentionascarceresource. Asforanyotherscarceresource,rationalpeopleuseattention optimallyaccordingtowhat theyhave at stake. Byappendinganinformation-processing constraint to an otherwise standard optimization framework, the theory explains why people react to changes in the economic environment with delays and errors. The appeal of Shannon capacity as a constraint to attention is that it provides a measure of uncertainty which does not depend on the characteristics of the channel. The quantity (1) is a probabilistic measure of the information shared by two randomvariables and it applies to any channel. Thus, the Shannon capacity does not require explicit modellingofhowindividualsprocessinformation. Moreover, treatingprocessingcapacity as a constraint to utility maximization produces inertial reactions to the environment as a result of individual rational choices. A rational person may not (cid:133)nd it worthy to look beyond his wallet when deciding what to buy for lunch. The dollar bills in his wallet provide little information about current and future activities of his balance. Thus, if somethinghappenedtohiscurrentaccount, forexample, asuddendropinhisinvestment, checking his wallet would give him no acknowledgement of the event. Nevertheless, the signal is capable of guiding the consumer on his lunch decision. Over time and through expenses, the person would (cid:133)gure out the drop in his investment and modify his behavior even with respect to lunch. 3 The Formal Set-up 3.1 The problem of the household To understand the implications of the limits to information processing, I start with the full information problem. Let ((cid:10); ) be the measurable space where (cid:10) represents the sample set and the B B event set. States and actions are de(cid:133)ned on ((cid:10); ). Let be the (cid:27) algebra generated t B I (cid:0) by c ;w up to time t, i.e., = (cid:27)(c ;w ;c ;w ;:::;c ;w ). Then, the collection t t t t t t 1 t 1 0 0 f g I (cid:0) (cid:0) 7

such that s t is a (cid:133)ltration. Let u(c) be the utility of the household fI t g 1 t=0 I t (cid:26) I s 8 (cid:21) de(cid:133)ned over a consumption good, c. I assume that the utility belongs to the CRRA family, u(c) = c1 (cid:13)=(1 (cid:13)) with (cid:13) the coe¢ cient of risk aversion. Consumer(cid:146)s problem (cid:0) (cid:0) is: max E 1 (cid:12)t c1 t(cid:0) (cid:13) (2) 0 0 f ct g 1t=0 ( X t=0 (cid:20)(cid:18) 1 (cid:0) (cid:13) (cid:19)(cid:21) (cid:12) (cid:12) I ) s.t. (cid:12) (cid:12) w = R(w c )+y (cid:12) (3) t+1 t t t+1 (cid:0) w given (4) 0 where (cid:12) [0;1) is the discount factor and R = 1=(cid:12) is the interest on savings, (w c ). I t t 2 (cid:0) assume that y Y y1;y2;::;yN follows a stationary Markov process with constant t 2 (cid:17) mean E ((y ) ) = y(cid:22): t t+1 t jI (cid:8) (cid:9) Consider now a consumer who cannot process all the information available in the economy to track his wealth precisely. This not only adds a constraint to the decision problem but fundamentally a⁄ects each constraint (3)-(4). First, because the consumer doesn(cid:146)t know his wealth, (4) no longer holds. His uncertainty about wealth is given by the prior g(w ). Second, before processing any infor- 0 mation, consumption is also a random variable. This is because the uncertainty about wealth translates into a number of possible consumption pro(cid:133)les with various levels of a⁄ordability. It follows that to maximize lifetime utility, consumer needs to reduce uncertainty about wealth and, at the same time, to choose consumption. Hence, when information cannot (cid:135)ow at an in(cid:133)nite rate, the choice of the consumer is the distribution p(w;c) as opposite to the stream of consumption c in (2). Another way of looking f t g 1 t=0 at this is that the consumer chooses a noisy signal on wealth where the noise can assume any distribution selected by the consumer. Given that the agent has a probability distribution over wealth, choosing this signal is akin to choosing p(c;w). The optimal choice of this distribution is the one that makes the distribution of consumption conditional on wealth as close to the wealth as the limits imposed by the Shannon capacity allow. Third, with respect to the program (2)-(4), there is a new constraint on the amount of information the consumer can process. The reduction in uncertainty conveyed by the signal depends on the attention allocated by the consumer to track his wealth. Paying attention to reduce uncertainty requires spending some time and e⁄ort to process information. I model the task of thinking by appending a Shannon channel to the constraint sets. Limits in the capacity of the consumers are captured by the fact that the reduction in uncertainty conveyed by the signal cannot be higher than a given number, (cid:20)(cid:22): The information (cid:135)ow available to the consumer is a function of the signal, i.e., the joint distribution p( ; ). In formulae: (cid:1) ct (cid:1) wt p(c ;w ) t t (cid:20) I(p( ; )) = p(c ;w )log dc dw (5) t (cid:21) (cid:1) ct (cid:1) wt t t p(c )g(w ) t t Z (cid:18) t t (cid:19) Fourth, the update of the prior replaces the law of motion of wealth by using the budget constraint in (3). To describe the way individuals transit across states, de(cid:133)ne 8

the operator E (E (x ) c ) x^ ; which combines the expectation in period t of wt t t+1 j t (cid:17) t+1 a variable in period t + 1 with the knowledge of consumption in period t, c , and the t remaining uncertainty over wealth. Applying E (E ( ) c ) to equation (3) leads to: wt t (cid:1) j t w^ = R(w^ c )+y (6) t+1 t t (cid:0) where, b y = E (E (y ) c ) wt t t+1 j t E (E ((y ) ) c )+[E (E (y ) c ) E (E ((y ) ) c )] (cid:17) wt t t+1 jI t j t wt t t+1 j t (cid:0) wt t t+1 jI t j t b LIE = y(cid:22)+E [(E (y ) c ) (E (y ) c )] wt t t+1 j t (cid:0) t t+1 j t = y(cid:22): To fully characterize the transition from the prior g(w ) to its posterior distribution, I t need to take into account how the choice in time t, p(w ;c ) a⁄ects the distribution of t t consumer(cid:146)s belief after observing c : Given the initial prior state g(w ), the successor t 0 belief state, denoted by g (w ) is determined by revising each state probability as c0 t t+1 displayed by the expression: ~ g w = T (w ;w ;c )p(w c )dw (7) 0 t+1 jct t+1 t t t j t t Z (cid:0) (cid:1) ~ which is known as Bayesian conditioning. In (7), the function T is the transition function representing (6). Note that the belief state itself is completely observable. Meanwhile, Bayesian conditioning satis(cid:133)es the Markov assumption by keeping a su¢ cient statistics that summarizes all information needed for optimal control.7 Thus, (7) replaces (3) in the limited processing world. Let (cid:18) be the shadow cost of using the channel (5), and combine all these four ingredients. Then, the program of the household under information frictions is: max E 1 (cid:12)t c1 t(cid:0) (cid:13) p(c ;w )(cid:22)(dc ;dw ) (8) 0 t t t t 0 f p(wt;ct) g 1t=0 ( X t=0 Z (cid:18) 1 (cid:0) (cid:13) (cid:19) (cid:12) (cid:12) I ) (cid:12) s.t. (cid:12) (cid:12) ((cid:18)) p(c ;w ) t t (cid:20) = I (p( ; )) = p(c ;w )log dc dw (9) t t (cid:1) ct (cid:1) wt t t p(w^ ;c )dw^ g(w ) t t Z t t t t ! (cid:0)R (cid:1) p(c ;w ) (w;c) (10) t t 2 D ~ g w = T (w ;w ;c )p(w c )dw (11) 0 t+1 jct t+1 t t t j t t Z (cid:0) (cid:1) 7See Astrom, K. (1965). 9

g(w ) given (12) 0 where (cid:22)( ) in (8) is the Dirac measure that accounts for discreteness in the optimal (cid:1) choice p(c;w) and (w;c) (c;w) : p(c;w)dcdw = 1; p(c;w) 0; (c;w) in (10) D (cid:17) (cid:21) 8 restricts the choice of the agent to be drawn from the set of distributions. (cid:8) R (cid:9) This problem is a well-posed mathematical problem with convex objective function and concave constraint sets. What makes it hard to solve is that both the state and the control variables are in(cid:133)nite dimensional. To make progress in solving it, I implement two simpli(cid:133)cations: a) I discretize the framework and b) I show that the resulting setting admits a recursive formulation. Then, I study the properties of the Bellman recursion and solve the problem. Before turning to the solution, I present a brief digression about how constraint (9) operates and how the di⁄erence between this model and the existing literature on rational inattention may help to build up the intuition for the solution methodology and the results. 3.2 The role of Shannon(cid:146)s capacity constraint 3.2.1 Shannon(cid:146)s constraint in action To get a sense of how the Shannon capacity constraints a⁄ect the decision of the household, I contrast the optimal policy function p (c;w) for consumers that have identical (cid:3) characteristics but di⁄er in their limits of information-processing. A caveat is in order. In order to explore the interaction between information (cid:135)ow and coe¢ cient of risk aversion, I solve the model in (8)-(12) information (cid:135)ow by (cid:133)xing the shadow cost of processing information, (cid:18), attached to (9) and let (cid:20) vary endogenously everyperiod. Inthis section, Ifollowadi⁄erentroute. Inordertoclarifythemechanisms behind Shannon capacity as a constraint for information transmission, I (cid:133)x the number of bits, (cid:20), across utilities and adjust the shadow cost (cid:18) to map di⁄erent coe¢ cients of risk aversion to the same information (cid:135)ow.8 First consider u(c) = log(c). In the full information case,9 the distribution g(w) is degenerate, the choice of p(c ;w ) reduces to t t that of c(w ) in (8).10 The resulting optimal policy is given by t c (w ) = (1 (cid:12))w +(cid:12)y(cid:22): (13) (cid:3)t t (cid:0) t 8To be more speci(cid:133)c, I solve the model with CRRA consumer assuming the same parameters as the baseline model((cid:12);R;y(cid:22)) (0:9881;1:012;1;1), the samesimplexpoint(prior)g(w~)and adjustingthe (cid:17) shadow cost of processing capacity, (cid:18), to get roughly the same information capacity ((cid:20) = 2:08 and log (cid:20) = 2:13). The latter implies that the di⁄erence in allocation of probabilities within the grid are crra attributable solely on the coe¢ cient of risk aversion (cid:13). As I will explain in more details in the solution methodologies, the same shadow cost ((cid:18)) does deliver di⁄erent information (cid:135)ow ((cid:20)) according to the degree of risk aversion of the agents with more risk averse agents having higher (cid:20) for a given (cid:18) than less risk averse ones. To get (cid:20) log t(cid:20) crra , I set (cid:18) log =0:02 in Figure 3 while (cid:18) crra =0:08 in Figure 4. 9Or, in the wording of my model, when information (cid:135)ows at in(cid:133)nite rate, (cid:20) in (9). !1 10More formally, for I(p( ; )) , the probabilities g(w) and p( ; ) are degenerate. Using w c w c (cid:1) (cid:1) ! 1 (cid:1) (cid:1) 10

For comparison with the case with (cid:133)nite (cid:20), I plot the policy function for the (discretized) full information case as the joint distribution p(c;w)(cid:14) (c;w) with (cid:14) as the Dirac c (w) c (w) (cid:3) (cid:3) measure. Figure 1 plots such a distribution for a 20x20 grid where the equi-spaced vector c ranges from 0:8 to 3 and w is also equi-spaced with support in [1;10]. pcw, Log Utility, k fi ¥ 0.15 0.1 w c p 0.05 0 3 2.421 10 1.842 7.631 5.263 1.263 2.895 c 0 0 w Figure 1: Joint pdf p(c;w), high capacity. Suppose nowthat capacity is low. In this case, rational consumers limit their processing e⁄ort by concentrating probability on the highest feasible value(s) of consumption. To see why, recall that consumers are risk averse (log-utility). They process the necessary information to learn where the boundary c w is and avoid infeasible consumption (cid:20) bundles.11 Since the Shannon capacity places high restriction on information-processing, thisindividualconsumesroughlythesameamounteachperiod, independentlyofhislevel Fano(cid:146)s inequality (Thomas and Cover 1991), c( (p( ; )))=c(w) w c I (cid:1) (cid:1) which makes the (cid:133)rst order conditions for this case the full information solutions. 11The model assumes a standard No-Ponzi condition for the model (8)-(12). 11

of wealth. pcw: Log Utilitky=,0.2 0.08 0.07 0.06 0.05 wcp 0.04 0.03 0.02 0.01 0 3 2.768 2.537 2.305 9.052 10 2.074 8.105 1.842 1.610 5.263 6.210 7.158 1.379 4.316 1.147 3.368 0.916 1.474 2.421 0 0 c w Figure 2: Joint pdf p(c;w), low capacity. This case describes situations in which people have a vague idea of their wealth and preferdefault savings/spendingoptions (whetherit is apensionplanorhealthinsurance) rather than (cid:133)guring out the exact consistency of their net worth. Figure 2 displays the resulting optimal policy. Finally, Figure 3 displays the optimal joint distribution for an intermediate case, 0 < (cid:20) < . The (cid:133)rst observation is that a person with a (cid:133)nite 1 information (cid:135)ow tries to make p(c w) as close to w as the information constraint allows j 12

him to. pcw: Log Utility, k =2.08 0.08 0.06 0.04 0.02 0 3 2.421 10 1.842 7.631 5.263 1.263 2.895 0 0 Figure 3. Joint distribution p(c;w), intermediate capacity. The second observation is that the optimal policy function for the informationconstrained consumer places low weight, even no weight, on low values of consumption for high values of wealth. The reason why this happens depends on the utility function. Aconsumerwithlog-utilitywantstomaintainaconsumptionpro(cid:133)lethatisfairlysmooth throughout the lifetime, as can be seen from (13). To avoid values of consumption that are either too low or too high, he needs to be well informed about such events to reduce the probability of their occurrence. The resulting optimal policy places a higher probability mass on the central values of consumption and wealth. To see how the allocation of probability changes with the utility function, consider a consumerthatdi⁄ersfromthepreviousonlyintheutilityspeci(cid:133)cationwhichnowassume a CRRA form, u(c) = c1 (cid:13)=(1 (cid:13)) with (cid:13) = 2. As in the previous case, the optimal (cid:0) (cid:0) policy function still places a close-to-zero probability on low values of consumption for high values of wealth but now the CRRA consumer trade o⁄probabilities about modest values of consumption and wealth so that he can have high probability mass on high values of consumption when wealth is high. 13

pcw: CRRA g =2, k =2.13 0.1 0.08 0.06 0.04 0.02 0 3 2.421 10 1.842 7.631 5.263 1.263 2.895 0 0 Figure 4. Joint distribution p(c;w), CRRA utility. In other words, with CRRA preferences, individuals want to be better informed on low and middle values of wealth to enjoy high consumption in every period. Figure 4 illustrates this case. 3.2.2 Shannon(cid:146)s channel through the economic literature The goal of this section is to compare my model with the literature in rational inattention. The (cid:133)rst comparison is with the consumption saving model in the linear quadratic Gaussian (LQG) case 12 Sims(2003) fully characterizes the analytical solution of a consumption saving model where utility is quadratic, u(c) = c 0:5 c2, constraints are (cid:3) (cid:0) linear and ex-ante optimal shape of uncertainty is Gaussian. In this LQG setting, the optimal distributionof ex-postuncertaintyisalsoGaussian. TheGaussiansolutionmake a model with rational inattention in the LQG case observationally equivalent to a signal extraction problem a la Lucas. Note that the analytical solution in Sims (2003) cannot be recovered if one assume a restriction in the support of either c or w (e.g., the conventional c > 0) or a no-borrowing constraint (e.g., c < w t). This is because both constraints break the LQ framework, t t 8 necessary to obtain Gaussianity in the optimal ex-post uncertainty. 12cfr. Sims (1998, 2003), Luo (2008), Lewis (2007). 14

The second issue with the LQG approach is that the linear quadratic approximation gives valid predictions when uncertainty is small. This is similar to the argument for linearizingthe(cid:133)rstorderconditionofaproblemandgettinglocallyagoodapproximation. However,ifonewantstoexplainanobservedconsumptionandsavingstimeseriesthrough limited processing constraints, the inertial behavior that we see in the data suggests that uncertainty is fairly big. Thus, the tractability of the LQG framework comes at the expense of e⁄ectiveness in matching the data. The third issue, which is the most important for the purpose of this paper, is that rational inattention LQG models do not allow to explain di⁄erent speed and amounts of reactions of people todi⁄erent news about theirwealth. Forinstance, consumptiondrops fasterfollowingasuddenlayo⁄thanintheeventofataxbreak. Moreover, themagnitude of the change in consumption depends on people(cid:146)s attitude towards risk13 and their income level14. The certainty equivalence framework that arises with Gaussian ex ante uncertainty and quadratic utility does not allow for endogenous di⁄erentiation amongst theseevents. Insuchasetting, thespeedandamountofhouseholds(cid:146)reactionstodi⁄erent news are created by sources of inertia exogenous to the model. This has been one of the criticisms to signal extraction models a la Lucas and applies also to rational inattention LQG.15 For instance, di⁄erent reactions are generated by assuming that people have immediate access to some signals and not others, as in Lucas (1973) or they receive independent information about di⁄erent news, as in Ma‘ckowiak and Wiederholt (2008). In this paper, I choose another approach. I assume that information is freely available and I do not constrain ex-ante uncertainty to be Gaussian. Moreover, I explore the link between risk aversion and information-processing limits by allowing utility speci(cid:133)cations of the CRRA family. Before this paper, Sims (2006) solves a two period model with non-Gaussian ex-ante uncertaintyandCRRApreferences. Sims(2006)assumesthatagentslivetwoperiods,the (cid:133)rst of which they are inattentive while the second period their uncertainty is resolved. This paper focuses on a fully dynamic rational inattention model. I depart from the work of Sims (2006) in two main dimensions. The (cid:133)rst is conceptual. A fully dynamic model with rational inattention allows the researcher to investigate time series properties of consumption and savings. The resulting behavior reveals endogenous noise and delays of consumption in response to shock to income, with negative income shocks producing faster reactions e⁄ects as the risk aversion increases. The intuition for this result is the reaction of risk adverse individuals to signals that indicate a reduction in wealth is to immediately decrease their consumption for precautionary motives while collecting information over time about the consistency of their net worth. Complementary to these (cid:133)ndings, richer dynamic makes the model suitable to address policy questions such as reaction to (cid:133)scal policy stimulus as I will show in the last section. This paper is also distinct from the one of Lewis (2008) . The most prominent di⁄erences are that, in Lewis (2008), households do not see consumption over time and they optimize over a (cid:133)nite 13cfr., e.g. Gourinchas and Parker, 2001. 14cfr., e.g., Johnson, Souleles and Parker (2006). 15For a discussion on the Gaussian assumption in rational inattention models see Lewis (2007). 15

horizon. Not observing consumption in turn implies that once the stream of probabilities ischosenatthebeginningofperiod, theupdateofthebeliefsisdeterministicinthechoice of the signal. While Lewis (2008)(cid:146)s framework does deliver upward-sloping age pro(cid:133)les as average consumption over a (cid:133)xed time length, it does not allow to study unconditional moments of consumption nor conditional response of consumption to shocks as in my framework. The second contribution is methodological. A fully dynamic rational inattention model involves facing an in(cid:133)nite dimensional problem as displayed in (8)-(12). To work with this framework, I developed analytical and computation tools that are suitable to address the dynamics of a non-LQG model. Moreover, my results are observational distinct from the previous literature on sticky information (Mankiw and Reis (2002)) and consumption and information (Reis (2006)). MankiwandReis(2002)assumethateveryperiodanexogenousfractionofagents((cid:133)rms) obtain perfect information concerning all current and past disturbances, while all other (cid:133)rms set prices based on old information. Reis (2006) shows that a model with a (cid:133)xed cost of obtaining perfect information can provide a microfoundation for this kind of slow di⁄usion of information. My model di⁄ers from the literature on inattentiveness in that I assume that information is freely available in each period but the bounds on information processing given by the Shannon channel force consumers to choose the scope of their information within the limit of their capacity. The interaction of information (cid:135)ow and riskaversioninmymodeldeliversendogenousasymmetryintheresponseofconsumption toshocksbothintermsofspeedandamount. Thispredictionconstitutesadistinguishing feature of my model with respect to the literature of inattentiveness and, more generally, to the consumption-saving literature. 4 Solution Methodology 4.1 Discretizing the Framework I consider wealth and consumption as de(cid:133)ned on compact sets. In particular, admissible consumption pro(cid:133)les belong to (cid:10) c ;:::;c : Likewise, wealth has support (cid:10) c min max w (cid:17) f g (cid:17) w ;:::;w . I identify by j the elements of set (cid:10) and by i the elements in (cid:10) : I min max c w f g approximatethestateoftheproblem,i.e.,thedistributionofwealthbyusingthesimplex: De(cid:133)nition The set (cid:5) of all mappings g : (cid:10) w R ful(cid:133)lling g(w) 0 for all w (cid:10) w ! (cid:21) 2 and g(w) = 1 is called a simplex. Elements w of (cid:10) are called vertices of w w (cid:10)w the simP 2plex (cid:5), functions g are called points of (cid:5). Let S be the dimension of the belief simplex which approximates the distribution j j S g(w) and let (cid:0) g Rj S j : g(i) 0 for all i j j g(i) = 1 denote the set of all prob- (cid:17) 2 (cid:21) ( i=1 ) P 16

ability distribution on (cid:5). The initial condition for the problem is g(w ): 0 The consumer enters each period choosing the joint distribution of consumption and (cid:133)nancial possibilities. From the previous section, the control variable for the discretized set up as the probability mass function Pr(w;c) where c (cid:10) and w (cid:10) , constrained c w 2 2 to belong to the set of distributions. Given g(w ) and Pr(c ;w ) and the observation of 0 t t c consumed in period t; the belief state is updated using Bayesian conditioning: t g w = T (w ;w ;c )Pr(w c ) (14) 0 t+1 jct t+1 t t t j t (cid:0) (cid:1) w Xt 2 (cid:10)w ~ ~ where T (:) is a discrete counterpart of the transition function T (:). Note that T (:) is a density function on the real line while T (:) is a density function on a discrete set with countingmeasure. Theprocessingconstraint, intermsofthediscretemutualinformation between state and actions, is: Pr(c ;w ) t t (p( ; )) = Pr(c ;w ) log (15) I t (cid:1) ct (cid:1) wt t t p(c )g(w ) w Xt 2 (cid:10)wc Xt 2 (cid:10)c (cid:18) t t (cid:19) The interpretation of (15) is akin to its continuous counterpart. The capacity of the agents to process information is constrained by a number, (cid:20)(cid:22), which denotes the upper bound on the rate of information (cid:135)ow between the random variables C and W16 in time t. Finally, the objective function (8) in the discrete world amounts to max E 1 (cid:12)t c1 t(cid:0) (cid:13) Pr(c ;w ) : (16) 0 t t 0 f p(wt;ct) g 1t=0 ( X t=0 " w Xt 2 (cid:10)wctX2 (cid:10)w (cid:18) 1 (cid:0) (cid:13) (cid:19) #(cid:12) (cid:12) I ) (cid:12) (cid:12) (cid:12) 4.2 Recursive Formulation The purpose of this section is to show that the discrete dynamic programming problem has a solution and to recast it into a Bellman recursion. To show that a solution exists, (cid:133)rst note that the set of constraints for the problem is a compact-valued concave correspondence. Second, I need to show that the state space is compact. Compactness comes from the curvature of the utility function and the fact that the belief space has a bounded support in [0;1]. The compact domain of the state and the fact that Bayesian conditioning for the update preserves the Markovianity of the belief state ensures that the transition Q : ((cid:10) Y [0;1]) and (14) has the Feller property. Then, the w (cid:2) (cid:2)B ! conditions for applying the Theorem of the Maximum are ful(cid:133)lled which guarantees the existence of a solution. In the next section, I provide su¢ cient conditions to guarantee uniqueness. Casting the problem of the consumer in a recursive Bellman equation formulation, 16Recall from the argument in Section 2.1 that both W and C are random variables before the household has acquired and processed any information. 17

the full discrete-time Markov program amounts to: u(c )Pr(c ;w ) + t t t V (g(w t )) = max 2 w Xt 2 (cid:10)w c Xt 2 (cid:10)c ! 3 (17) Pr(ct;wt) +(cid:12) V g (w ) Pr(c ;w ) 6 c 0j t+1 t t 7 6 6 4 wt P 2 (cid:10)wc Xt 2 (cid:10)c (cid:16) t (cid:17) 7 7 5 subject to: ((cid:18) :) Pr(c ;w ) t t (cid:20) = (p( ; )) = Pr(c ;w ) log (18) t I t (cid:1) ct (cid:1) wt t t p(c )g(w ) w Xt 2 (cid:10)wc Xt 2 (cid:10)c (cid:18) t t (cid:19) g w = T (w ;w ;c )Pr(w c ) (19) 0 t+1 jct t+1 t t t j t (cid:0) (cid:1) w Xt 2 (cid:10)w Pr(c ;w ) = g(w ) (20) t t t c Xt 2 (cid:10)c 1 Pr(c ;w ) 0 (c ;w ) B; t (21) t t t t (cid:21) (cid:21) 8 2 8 where B (c ;w ) : w c ; c (cid:10) ; w (cid:10) , t and (cid:18) is the Lagrange multiplier t t t t t c t w (cid:17) f (cid:21) 8 2 8 2 8 g (shadow cost) associated to (18). The Bellman equation in (17) takes up as its argument the marginal distribution of wealth g(w ) and uses as the control variable the joint distribution of wealth and t consumption, Pr(c ;w ). The latter links the behavior of the agent with respect to t t consumption (c), on one hand, and income (w) on the other, hence specifying the actions over time. The (cid:133)rst term on the right hand side of (17) is the utility function u(:). The second term, V g (w ) Pr(c ;w ), represents the expected continuation c 0j t+1 t t value of being w in t P 2 s (cid:10) t w at c Xt e 2 g (cid:10)c (:) (cid:16) dis t counted (cid:17) by the factor (cid:12) = 1=R = 0:9881. This corresponds to interest rate R = 1:012 which gives an annualized gross real rate of investment R^4 = 1:0489, with a quarterly frequency of the data. The expectation is taken with respect to the endogenously chosen distribution Pr(c ;w ). I have discussed the relations in (18)t t (21) earlier. Moreover, I appended the equation in (20) which constrains the choice of the distribution to be consistent with the initial prior g(w ): t Next, I analyze the main properties of the Bellman recursion (17) and derive conditions under which it is a contraction mapping and show that the mapping is isotone. 4.3 Properties of the Bellman Recursion To prove that the value function is a contraction and an isotonic mapping, I shall introduce the relevant de(cid:133)nitions. Let me restrict attention to choices of probability distributions that satisfy the constraints (18)-(21). To make the notation more compact, let p Pr(c w ), c (cid:10) , w (cid:10) and let (cid:0) be the set that contains (18)-(21). I j i j c i w (cid:17) j 8 2 8 2 introduce the following de(cid:133)nitions: 18

D1. A control probability distribution p Pr(c ;w ) is feasible for the problem (17)i j (cid:17) (21) if p (cid:0): Let W be the cardinality of (cid:10) and let w 2 j j W j j g Rj W j : g(w i ) 0; i; g(w i ) = 1 G (cid:17) 8 2 (cid:21) 8 9 < X i=1 = denote the set of all pr:obability distributions on (cid:10) w . An optima;l policy has a value function that satis(cid:133)es the Bellman optimality equation in (17): V (g) = max u(c)p(c w) g(w)+(cid:12) (V (g ( )))p(c w)g(w) (cid:3) p (cid:0) " j ! (cid:3) c0 (cid:1) j # 2 w X2 (cid:10)w c X2 (cid:10)c w X2 (cid:10)wc X2 (cid:10)c (22) The Bellman optimality equation can be expressed in value function mapping form. Let be the set of all bounded real-valued functions V on and let h : (cid:10) w V G G (cid:2) (cid:2) ((cid:10) w (cid:10) c ) R be de(cid:133)ned as follows: (cid:2) (cid:2)V ! h(g;p;V) = u(c)p(c w) g(w)+(cid:12) (V (g ( )))p(c w)g(w): j c0 (cid:1) j ! w X2 (cid:10)w c X2 (cid:10)c w X2 (cid:10)wc X2 (cid:10)c De(cid:133)ne the value function mapping H : as (HV)(g) = max h(g;p;V). p (cid:0) V ! V 2 D2. A value function V dominates another value function U if V (g) U (g) for all (cid:21) g : 2 G D3. A mapping H is isotone if V, U and V U imply HV HU: 2 V (cid:21) (cid:21) D4. A supremum norm of two value functions V, U over is de(cid:133)ned as 2 V G V U = max V (g) U (g) jj (cid:0) jj g j (cid:0) j 2G D5. A mapping H is a contraction under the supremum norm if for all V, U , 2 V HV HU (cid:12) V U jj (cid:0) jj (cid:20) jj (cid:0) jj holds for some 0 (cid:12) < 1: (cid:20) Endowed with these notion, it is possible to derive some properties of the solution to the Bellman equation. First, note that the uniqueness of the solution to which the value function converges to requires concavity of the constraints and convexity of the objective function. It is immediate to see that all the constraints but (18) are actually linear in p(c;w) and g(w). For (18), the concavity of p(c;w) is guaranteed by Theorem (16.1.6) of Thomas and Cover (1991). The concavity of g(w) is the result of the following: Lemma 1. For a given p(c w); the expression (18) is concave in g(w). j 19

Proof. See Appendix B. Next, I need to prove the convexity of the value function and the fact that the value iteration is a contraction mapping. All the proofs are in Appendix A. Proposition 1. For the discrete Rational Inattention Consumption Saving value recursion H and two given functions V and U, it holds that HV HU (cid:12) V U ; jj (cid:0) jj (cid:20) jj (cid:0) jj with 0 (cid:12) < 1 and : the supreme norm. That is, the value recursion H is a (cid:20) jj jj contraction mapping. Proposition 1 can be explained as follows. The space of value functions de(cid:133)nes a vector space and the contraction property ensures that the space is complete. Therefore, the space of the value functions together with the supreme norm form a Banach space; the Banach (cid:133)xed-point theorem ensures (a) the existence of a single (cid:133)xed point and (b) that the value recursion always converges to this (cid:133)xed point (see Theorem 6 of Alvarez and Stockey, 1998 and Theorem 6.2.3 of Puterman, 1994). Corollary For the discrete Rational Inattention Consumption Saving value recursion H and two given functions V and U, it holds that V U = HV HU (cid:20) ) (cid:20) that is the value recursion H is an isotonic mapping. The isotonic property of the value recursion ensures that the value iteration converges monotonically. These theoretical results establish that in principle there is no barrier in de(cid:133)ning value iteration algorithms for the Bellman recursion for the discrete rational inattention consumption-savings model. 5 Numerical Technique and its Predictions I solve the model by transforming the underlying partially observable Markov decision process into an equivalent, fully observable Markov decision process with a state space that consists of all probability distributions over the core 17 state of the model (wealth). For a model with n core states, w ;::;w , the transformed state space is the (n 1)- 1 n (cid:0) dimensional simplex, or belief simplex. Expressed in plain terms, a belief simplex is a point, alinesegment, atriangleoratethraedoninasingle, two, threeorfour-dimensional space, respectively. Formally, a belief simplex is de(cid:133)ned as the convex hull18 of belief 17The state of the model is a probability distribution of wealth, i.e., g(w). For lack of a better alternative, I call core state the random variable w whose distribution is the state of the model. This nomenclature is borrowed from information theory and AI literature. cfr. Puterman (1994) . 18A convex hull of a set of points is de(cid:133)ned as the closure of the set under convex combination. 20

states from an a¢ nely independent19 set B. The points of B are the vertices of the belief simplex. The convex hull formed by any subset of B is a face of the belief simplex. To address the issue of dimensionality in the state space of my model, I use a grid-based approximation approach. The idea of a grid based approach is to use a (cid:133)nite grid to discretize the uncountably in(cid:133)nite continuous state space. The implementation has the following steps: I place a (cid:133)nite grid over the simplex point, I compute the values for points in the grid, and I use a kernel regression to interpolate solution points that fall outside the grid. 5.1 Belief Simplex and Dynamic Programming If full information were available, previous history of the process would be irrelevant to the problem. However, because the consumer cannot completely observe wealth, he may require all the past information about the system to behave optimally. The most general approach is to keep track of the entire history of his previous consumption purchases up to time t, denoted H = g ;c ;::;c . For any given initial state probability distribut 0 1 t 1 tion g , the number of po f ssible histo (cid:0) rie g s is ( )t with denoting the set of consumption 0 jCj C behavior up to time t. This number goes to in(cid:133)nity as the decision horizon approaches in(cid:133)nity, whichmakesthismethodofrepresentinghistoryuselessforin(cid:133)nite-horizonproblems. To overcome this issue, Astrom (1965) proposed an information state approach. It is based upon the idea that all the information needed to act optimally can be summarized by a vector of probabilities over the system, the belief state. Let g(w) denote the probability that the wealth is in state w (cid:10) where (cid:10) is assumed to be a (cid:133)nite set. w w 2 Probability distributions such as g(w) that are de(cid:133)ned on (cid:133)nite sets are in fact simplices. Let n be the possible values that w can assume. The discretization of the core state is an equi-spaced grid with n = 20 values of w ranging from 1 to 10. The points in the simplex (cid:1) are n distinct values for the marginal pdf g(w) in the interval I [0;1]. The simplex (cid:17) is constructed using uniform random samples from the unit simplex. The reason why I use this methodology is that it is computationally faster than non-uniform grid and it is able to handle higher dimensional space.20 In my model, each point in the simplex is an n-array whose column contains m random values in the [0;1] range and whose sum per row is 1. To span the simplex I use m = (n 1)!.21 The distribution of values within (cid:0) the simplex is uniform in the sense that it has the conditional probability of a uniform distribution over the whole m-cube, given that the sum per row is 1. The algorithm calls three types of random processes that determine the placement of random points 19A set of belief states g , 1 i z is called a¢ nely independent when the vectors g g are i i z f g (cid:20) (cid:20) f (cid:0) g linearly independent for 1 i z. (cid:20) (cid:20) 20At least compared to the ndgrid library functions in Matlab. This is because the algorithm creates thesimplexdirectlywhilewhenusingndgriditisnecessarytode(cid:133)neauniformgridoverthewholen 1 (cid:0) space and then sectioning the resulting grid so that each simplex point sum to one. 21With n=20, the proposed sampling produces the same results for sample size of m=(n k)!, for (cid:0) k = 1;::;5. I have not tried cases with k < 5: When k > 1, even if the algorithm produces the same results it takes longer to converge (about 3 minutes more per iteration). 21

in the n 1 dimensional simplex. The (cid:133)rst process considers values uniformly within (cid:0) (cid:0) each simplex. The second random process selects samples of di⁄erent types of simplex in proportion to their volume. Finally, the third process implements a random permutation in order to have an even distribution of simplex choices among types. Foreachsimplexpoint,Iinitializethecorrespondingjointdistributionofconsumption c and wealth w. I assume n = 20 equi-spaced values for c ranging in (cid:10) [0:8;3]. The c (cid:17) values in (cid:10) are chosen so that w is about 3 times c, roughly consistent with individual c data on consumption and wealth. Let core states and behavior states be sorted in descending order. I impose the constraint c < w,22. Then, given the symmetry in the dimensionality of (cid:10) and (cid:10) , the joint c w distribution of consumption and wealth for a given multidimensional grid point is square matrix with rows corresponding to levels of consumption. Summing the matrix per row results in the marginal distribution of consumption, p(c). Likewise, the columns of the matrix correspond to levels of wealth. Evaluating the sum per columns of the matrix amounts to the marginal pdf of wealth, g(w). Given the initial belief simplex, its successor belief states can be determined by Bayesian conditioning at each multidimensional point of the simplex and gives the expression: g(w ) = T (w ;w ;c)Pr(w c) = Pr(w c): (23) 0 jc 0 i i j 0 j i X Let be the set of all bounded real-valued function V on G. Then, the Bellman V optimality equation of the household is described by (17)-(21). Without loss of generality, I restrict the columns of the matrix Pr(c;w) to sum to the marginal pdf of wealth in the main diagonal. Moreover, because some of the values of the marginal g(w) per simplex-point are exactly zero given the de(cid:133)nition of the envelope for thesimplex,Iconstrainthechoicesofthejointdistributionscorrespondingtothosevalues to be zero. This handling of the zeroes makes the parameter vector being optimized over have di⁄erent lengths for di⁄erent rows of the simplex. Hence the degrees of freedom in the choice of the control variables for simplex points vary from a minimum of 0 to a maximum of n (n 1).23 Once the belief simplex is set up, I initialize the joint probability (cid:3) (cid:0) 2 distribution of consumption and wealth per belief point and solve the program of the household by backward induction iterating on the value function V (g(w)). To map the 22The constraint c<w makes economic sense since there is no borriwng in this economy. To encode thisconstraintwithoutcomplicatingthemodel,onemayassumethat(cid:20) in(18)isthecapacityleftafter t th consumer has processed his spending limits. Note also that this constraint is computationally convenient reducing the number of choice variables from n2 =400 to n(n+1) =210 per iteration. 2 23To illustrate this point, two example in which the 0-degree of freedom and the n (cid:3) (n (cid:0) 1)-degree of 2 freedomoccurareasfollows. Supposeforsimplicitythatn=3:Then,if asimplexpointhasrealization 1 0 0 g 1;0;0 the joint pdf of consumption and wealth turns out to be p(c;w) = 0 0 leaving (cid:17) f g 2 3 0 zero degrees of freedom. If, instead, e.g., g 1;1;1 , the consumers has to choos4e 3 (cid:3) (2) =35points on (cid:17) 3 3 3 2 the joint distribution, p ;p ;p placed as: 1 2 3 f g (cid:8) (cid:9) 22

(cid:133)ner state space into Matlab possibilities, I interpolate the value function with the new values of (23) using a kernel regression of V ( ) into g (w ): I use an Epanechnikov (cid:1) 0 0 ja kernel with smoothing parameter h = 2:7. 24 A kernel regression approximates the exact non linear value function in (17) with a piece-wise linear function. The following propositions illustrate this point. Proposition 2. If the utility is CRRA with a parameter of risk aversion (cid:13) (0;+ ) 2 1 and if Pr (c ;w ) satis(cid:133)es (18)-(21), then the optimal n step value function V (g) j i n (cid:0) de(cid:133)ned over G can be expressed as: V (g) = max (cid:11) (w )g(w ) n n i i (cid:11)i f ngi i X where the (cid:11) vectors, (cid:11) : (cid:10) R, are W dimensional hyperplanes. w (cid:0) ! j j(cid:0) Intuitively, each (cid:11) vector corresponds to a plan and the action associated with a n (cid:0) given (cid:11) vector is the optimal action for planning horizon n for all priors that have such n (cid:0) a function as the maximizing one. With the above de(cid:133)nition, the value function amounts to: V (g) = max (cid:11)i;g ; n n (cid:11)i f ngi (cid:10) (cid:11) and thus the proposition holds. Using the above proposition and the fact that the set of all consumption pro(cid:133)les P c < w : p(c) > 0 is discrete, it is possible to show directly the convex properties (cid:17) f g for the value function. For (cid:133)xed (cid:11)i vectors, the (cid:11)i;g operator is linear in the belief n(cid:0) h n i space. Therefore, the convex property is given by the fact that V is de(cid:133)ned as the n maximum of a set of convex (linear) functions and, thus, obtains a convex function as a result. The optimal value function V is the limit for n and, becuase all the V are (cid:3) n ! 1 convex function, so is V . (cid:3) Proposition 3. Assuming the CRRA utility function and the conditions of Proposition 1, let V be an initial value function that is piecewise linear and convex. Then 0 the ith value function obtained after a (cid:133)nite number of update steps for a rational inattention consumption-saving problem is also (cid:133)nite, piecewise linear and convex (PCWL). 1 p p 3 1 2 p(c;w)= 1 p : 2 3 1 3 3 3 24Epanech4nikov kernel is5an optimum choice for smoothing because it minimizes asymptotic mean integrated squared error (cfr. Marron, J. S. and Nolan, D. (1988)). I use the algorithm proposed in Beresteanu, A. and C. F. Manski (2000) and experiement with smoothing paramter h [0:3:0:3:4:2]. 2 For the characteristics of the problem, and the optimization routine used (csminwel), for di⁄erent speci(cid:133)cation of utility functions and Lagrange multiplier (cid:18), the parameter h=2:7 performs best in terms of computational time and convergence of the value function. 23

To implement numerically the optimization of the value function at each point of the simplex, I use Sims(cid:146)csminwel as a gradient-based search method and iterate on the value function up to convergence. The value iteration converges in about 202 iterations. Table 1 reports the benchmark parameter values and the grids. I simulate the model for T = 80 periods by drawing from the optimal policy function, p (c;w), and generate the time series path of consumption, wealth and expected wealth. (cid:3) For each t = 1;::;T, I use the joint distribution p (c;w) to evaluate the time path of (cid:3)t information (cid:135)ow ((cid:20) p (c ;w )log p (cid:3)t (cj;wi) ). Finally, I derive the impulse (cid:3)t (cid:17) i j (cid:3)t j i p (cid:3)t (cj)g t(cid:3) (wi) responsefunctionsfortheeconomybyassumi(cid:16)ngtempora(cid:17)ryshockstothemeanofincome, P P y(cid:22). A pseudocode that implements the procedure is in Appendix C. Benchmark Values Discretization Wealth Space W [1 : 0:4737 : 10] Consumption Space C [0:8 : 0:1158 : 3] Mean of Income, y(cid:22) 1.1 Joint Distribution per simplex point, p(c;w) 20 20 (cid:2) Marginal C 20 1 (cid:2) Marginal W 20 1 (cid:2) Coe⁄. risk aversion, (cid:13) 1 Interest rate, R 1.012 Discount Factor, (cid:12) 0.9881 Table 1 6 Results In this section, I investigate the dynamic interplay of information (cid:135)ow and degree of risk aversion. In particular, I study di⁄erent speci(cid:133)cations of the baseline model changing degrees of risk aversion, (cid:13) 0:5;1;2;5;7 , and di⁄erent Lagrange multipliers, (cid:18) 2 f g 2 [0:2;4], representing the shadow costs of processing information in (18). Time path for each individual are average across simplex points. For the time series of the aggregate economy, I perform 10;000 Monte Carlo runs and simulate the model for each path for T = 80 periods. Then, I compute average across runs and simplex-points. Sample statistics are calculated after I compute these averages. I choose this way of calculating average to compare my model, tailored for individual behavior, to aggregate data. I divide the results into three parts: (1) interaction of information (cid:135)ow and risk aversion; (2) implications of information constraint on lifetime consumption; and (3) consumption 24

reaction to temporary income shocks. Statistics (cid:18)=0:2 CRRA (cid:13) = 7 CRRA (cid:13) = 5 Log Utility CRRA (cid:13) = :5 E(C) 1.14 1.09 1.08 1.02 std(C) 0.08 0.09 0.11 0.14 (cid:20) 2.03 1.99 1.87 1.72 (cid:18)=2 CRRA (cid:13) = 7 CRRA (cid:13) = 5 Log Utility CRRA (cid:13) = :5 E(C) 1.01 0.98 0.91 0.83 std(C) 0.15 0.18 0.21 0.33 (cid:20) 1.41 1.20 0.86 0.78 Table 2 Result 1. Information (cid:135)ow and risk aversion In the discrete rational inattention consumption-savingsmodel, higherdegreesofriskaversionresultinahigheramount of information processed for a given processing cost. Moreover, for a given degree of risk aversion, as the information (cid:135)ow decreases, the volatility of consumption increases. This (cid:133)nding is documented in Table 2 and in Figures 5-6. Figure 5a plots the difference between the mean of the time series of consumption between (cid:18) = 0 and (cid:18) > 0. After deriving the time path of consumption as described above, I calculate the mean and standard deviation of the average of the time path and subtract from it the mean of the time path for the full information equivalent ((cid:18) = 0).25 Figure 5a shows how this di⁄erence changes as (cid:18) varies and when utility is logarithmic. Figure 5b plots the corresponding di⁄erence in standard deviation of consumption as a function of (cid:18). 25For the parameter of the model, when (cid:18) = 0 a full information solution cf = (cid:12)w +(1 (cid:12))y(cid:22) has t t (cid:0) mean E cf =1:124 and standard deviation std cf =0:0713: t t (cid:16) (cid:17) (cid:16) (cid:17) 25

Loss in consumption due to increasing processing e⁄ort Di⁄erence in std. of consumption due to processing e⁄ort. Log Utility, Log Utility, 0.35 Difference in mean consumption betwq=e0e anndq>0 Difference in standard deviations between q =0 andq >0 0.2 0.3 0.18 0.16 0.25 .d0.14 naem noitpmusnoc ni ecnereffiD 0 0 .1 .2 5 ts n i ec n ere ffiD 0 0 . . 0 0 1 . 8 2 1 0.1 0.06 0.04 0.05 0.02 00 0.4 0.8 1.2 1.6 q2 2.4 2.8 3.2 3.6 4 0 0 0.2 0.8 1.2 1.6 q 2 2.2 2.4 3.2 3.6 4 Figure 5a. Figure 5b. To understand this result, consider what happens in the full information ((cid:18) = 0) case. With R(cid:12) = 1, the agent smooths consumption regardless of his utility. To appreciate how preferences towards risk play out with processing limits ((cid:18) > 0), consider Figure 6c. It plots the optimal distribution of consumption for two individuals ((cid:13) 1 and ! (cid:13) = 5) when information is very costly to process ((cid:18) = 3). In this case, a rational agent consumes a (cid:133)xed amount every period in the limits of his net worth. This requires very little bits of information. In Figure 6c note how a person with log-utility puts probability mass mostly on the lower values of consumption while a more risk averse agent sacri(cid:133)ces smoothing consumption to allocate some probability on higher values of consumption. Assuming the same (cid:18); the resulting e⁄ect is solely due to consumer preference. Now consider Table 2 and Figures 6a-6b. 26

Marginal Distribution of Consumption, Log Utility Marginal Distribution of Consumption, CRRA (cid:13)=5 0.25 0.25 0.2 q=0.2 0.2 q=0.2 q=2 0.15 0.15 0.1 ytilibaborP 0.1 q=2 0.05 0.05 0 0 0 0.91 1.15 1.38 1.61 1.84 2.07 2.30 2.53 2.76 0.91 1.14 1.37 1.61 1.84 2.07 2.42 2.65 2.76 3 Consumption Figure 6a. Figure 6b. Marginal Distribution of Consumption, (cid:18)=3 0.5 Log Utility 0.45 0.4 0.35 0.3 g=5 ytilibaborP 0.25 0.2 0.15 0.1 0.05 0 0.91 1.15 1.37 1.61 1.84 2.07 2.30 2.53 2.76 3 Consumption Figure 6c. When (cid:18) = 2, people select how much information they want to process and which values of wealth to be better informed about according to their utility. Also in this case, the higher the degree of risk aversion, the higher the quest for information ((cid:20)). This is exactly what Table 2 shows. In the table, the higher the coe¢ cient of risk aversion, (cid:13), the higher the information collected by the agent, (cid:20), and the higher the mean of consumption. The same story can be told in terms of probability distribution as in 6a- 6c. For a given level of (cid:18), a person with log utility would be better informed on extreme values of wealth to avoid such values. This knowledge makes it possible to assign high 27

probability to the middle value of consumption, as his utility commands. By contrast, a consumer with CRRA, (cid:13) = 5, wants to avoid low values of consumption for high values of wealth. Processing information about these events decreases the likelihood of their occurrence and makes it possible to place high probability on high value of consumption. This mechanism makes consumption more persistent for people with a higher degree of risk aversion (cfr. Figure 6a-6c). Processing capacity ((cid:20)) strengthens this e⁄ect. This is because high information (cid:135)ow allows consumers to enjoy high and smooth consumption throughout their life time. If information (cid:135)ows at very low rate, households update their knowledge slowly over time and wait to modify their behavior until they have su¢ cient knowledge of their (cid:133)nancial possibilities. Inertial behavior of consumption due to low information (cid:135)ow induces sharp changes in consumption after the consumer accumulates information. This mechanism makes consumption more volatile for people with lower information (cid:135)ow. Figure 5b plots the standard deviation of consumption for several values of (cid:18). As pointed out, for very high shadow cost of processing information (cid:18) > 3, consumption does not vary over time. For 0 < (cid:18) < 3, the volatility of consumption increases with (cid:18). This result makes sense. To see why, consider again the full information version of the model. People(cid:146)s will to smooth consumption in full information is limited by the (cid:133)nite (cid:135)ow of information available. When deciding on the precision of their signals, risk averse people trade o⁄lower volatility in consumption for better knowledge of low value of wealth. The time series path of consumption, wealth and information (cid:135)ow drawn from the optimal policy p (c;w) con(cid:133)rm this result and o⁄er further insights on the properties of (cid:3) the model. Aggregate Consumption Log Utility CRRA, g=2 1.3 1.5 1.25 1.4 1.2 q=0.2 1.3 q=2 1.15 noitpmusnoC 1.1 noitpm usnoC1 1 . . 1 2 1.05 1 q=2 1 q=0.2 0.9 0.95 0.90 10 20 30 40 50 60 70 80 0.8 0 10 20 30 40 50 60 70 80 time time Figure 7a. Figure 7b. 28

Aggregate Consumption and Information Flow q =0.2, Log Utility q =2, Log Utility 1.4 0.8 1.4 0.7 n n o1.2 0.6o1.2 0.65 itp itp m m usn usn k k o C 1 0.4o C 1 0.6 0.8 0.2 0.8 0.55 00 2200 4400 6600 8800 00 2200 4400 6600 8800 time time q =0.2, CRRA g=2 q =2, CRRA g=2 1.5 0.65 1.4 0.75 1.3 0.7 n n o o itp itp1.2 0.65 m usn 1 0.6m usn k 1.1 0.6 k o o C C 1 0.55 0.5 0.55 0.9 0.5 00 2200 4400 6600 8800 00 2200 4400 6600 8800 time time Figure 7c. Result 2. Time path of consumption and savings. Changes in consumption over time are infrequent and signi(cid:133)cant. Moreover: 1. Consumption is hump-shamped. It gets to its peak later for individual that have low information (cid:135)ow. The e⁄ect is stronger as the degree of risk aversion increases.. 2. Individuals with high information (cid:135)ow, by having sharper signals on their wealth, have savings behavior that follows closely their wealth. Furthermore, the lower the degree of risk aversion, the higher the (cid:135)uctuations of savings per period. 3. Individuals with low information (cid:135)ow, tend to consume a constant amount every period. They increase their consumption only if the information they process points themtowards asigni(cid:133)cant increase inwealth. The higher the degree of risk aversion, the less volatile the time path of consumption for these types. Figures 7-8 illustrate these points for aggregate and individual time series behavior, respectively. The simulations are derived by drawing the time path of consumption and wealth from p (c;w), after the value iteration has converged. Figures 7a-7c plot (cid:3) the average across the Monte Carlo runs and simplex points (i.e., initial beliefs about wealth). Individual time series (Figures 8a-8b) are an average of initial beliefs. To have some interesting transitional dynamics, I begin the simulation with an initial condition for wealth far from the steady state26. 26For the grid in the model, the steady state value of wealth is = 5:65 and I initialize the simulation (cid:24) with w =3. 0 29

Individual consumption Individual savings and wealth Log Utility Log Utility,q=0.2 Log Utility,q=2 1.8 7 6 10 10 1.6 q=2 6 5 noitpm usnoc 1 1 . . 1 2 4 q=0.2 htlaew 3 4 5 2 3 4sgnivas htlaew 5 5 sgnivas 0.8 2 1 0 0 0 10 20 30 40 50 60 70 00 2200 4400 6600 8800 00 2200 4400 6600 8800 time time time CRRA Utility,g=2 CRRAg=2,q=0.2 CRRAg=2,q=2 1.8 8 6 10 8 1.6 6 4 8 6 noitpm usnoc 1 1 . . 2 4 htlaew 2 4 0 2 sgnivas htlaew 4 6 2 4 sgnivas 1 0.8 0 2 2 0 0 10 20 30 40 50 60 70 00 2200 4400 6600 8800 00 2200 4400 6600 8800 time time time Figure 8a. Figure 8b. To appreciate the results, consider what would happen with full information. In such a case, consumption smoothing (R(cid:12) = 1) implies an immediate (T = 1) adjustment of consumption to its long-run optimal values and no transient behavior. Thus, in that case from T = 2 onwards, the simulations lead to a constant time path. Now consider Figures 7(a-c)-8(a,b). The hump in consumption comes from Result 1 and a simple intuition: information-constrained people are cautious (degree of risk aversion (cid:13) 1), consume a (cid:21) little and collect information about wealth before they change consumption. For a (cid:133)xed (cid:18), themoreriskaversetheyare(cfr. Figure7awithlogutilityandFigure7bwithCRRA, (cid:13) = 2), the longer they wait before increasing their consumption. This inertial behavior in consumption leads to an increase in savings and, as a result, in wealth (cfr. Figure 8a-8b). Processed information keeps signaling the increase in wealth until households realize that they are wealthy enough to increase their consumption. Thus, the hump in consumption is the mirrored image of the rise (until people knowthey rich) and fall (once people know they are rich) in wealth. Note that, depending on the history of income shocks, consumption can have more than one hump in its path. To see why, consider a high realization of income occurring after a hump in consumption. Over time, signals about wealth convey such information, consumers start saving and history as well as humps repeat themselves. These e⁄ects are enhanced by the shadow cost of processing information, (cid:18), with higher costs forcing long periods of inertia in consumption followed by sizeable changes. Note also the relationship between consumption and information (cid:135)ow (Figure 7c): risk averse agents would rather push forward consumption in times in which they are processing information about wealth. Finally, note from 7(a-b)-8(a,b) how the peak in consumption occurs later for an individual with higher degree of risk aversion and lower information (cid:135)ow. The rationale for this result is that more cautious people wait to be better informed about their wealth before modifying their consumption 30

behavior. In particular, since a consumer with CRRAutility ((cid:13) = 2) chooses to be better informed about low values of wealth than a log utility consumer (cfr. Figures 7a and 7b), he processes news about high value of wealth slower than his log counterpart. The resultingadditionalsavingsforprecautionarymotivesaretriggeredbyboththecurvature of the utility function and the bound on information-processing constraint. The last result comes from studying how consumers with limited processing capacity react to temporary shocks to income (y). Before stating the result, it is worth comparing to the predictions of standard consumption-saving literature. With full information, the response of consumption to either negative and positive temporary income shocks are immediate: consumption adjust in period T = 0 to an amount exactly equal to the discounted present value of the shock, (cid:1)y . This is the case regardless whether the shock j j is adverse or favorable, so long as the absolute value of these shocks match. The same holds true under certainty-equivalence with a linear constraints and quadratic utility (LQ) framework. With risk averse agents and information-processing limits, it happens that: Result 3. Persistent stickiness and asymmetric response to shocks. Consumption(cid:146)s response to temporary (cid:135)uctuations of wealth is asymmetric: Negative shocks trigger a sharper reaction and higher persistence of consumption than positive ones. The logic behind this result is easily understood by considering the interdependence of information (cid:135)ow and coe¢ cient of risk aversion. A risk averse person is more likely to be a⁄ected by negative events than positive ones. As soon as he receives signals that his wealth is lower than what he thought, he reacts by decreasing his consumption. The change in behavior and its persistence are more consistent the more risk averse and uninformed the consumer is. This occurs because consumers wait to gather more information before changing their behavior and, in the meanwhile, build up a savings bu⁄er. Thus, the temporary change in income propagates slowly over time. A positive temporary income shock triggers the opposite behavior in a risk averse uninformed person. The intuition is that this type of consumer is concerned about negative wealth (cid:135)uctuations and allocates most of his information capacity to prevent this event. A signal that indicates positive wealth may be ignored, generating extra savings in the meanwhile. Once this is acknowledged, a prudent consumer distributes the additional consumption driven by the income shock plus savings throughout his lifetime. This pattern of consumption behavior matcheswhatweobserveinmacrodataonconsumptionanddocumentedintheliterature as excess smoothness. Furthermore, the discrete rational inattention consumption-saving model provides a rationale for excess sensitivity in response to news on wealth.27 27Excess sensitivity (Flavin, 1981) of consumption refers to the empirical evidence that aggregate consumption reacts with delays to anticipated changes in income while excess smoothness (Deaton, 1987) refers to the observation that aggregate consumption is smoother than permanent income in that it reacts with a less than one-to-one ratio to shocks to permanent income. 31

IRF to a temporary increase in income IRF to a temporary decrease in income Log Utility, q=0.2 CRRA g=2,q=0.2 Log Utility, q=0.2 CRRA g=2,q=0.2 0.02 0.02 0 0 noitpm 0.015 noitpm 0.015 noitpm 0.02 noitpm 0.02 usnoc FR0. 0 0 . 0 0 5 1 usnoc FR0. 0 0 . 0 0 5 1 usnoc FR 0 0 . . 0 0 6 4 usnoc FR 0 0 . . 0 0 6 4 I I I I 0 0 0.08 0.08 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 time time time time Log Utility, q=2 CRRA g=2,q=2 Log Utility, q=2 CRRA g=2,q=2 0.02 0.02 0 0 noitpm 0.015 noitpm 0.015 noitpm 0.02 noitpm 0.02 usnoc FR0. 0 0 . 0 0 5 1 usnoc FR0. 0 0 . 0 0 5 1 usnoc FR 0 0 . . 0 0 6 4 usnoc FR 0 0 . . 0 0 6 4 I I I I 0 0 0.08 0.08 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 time time time time Figure 9a. Figure 9b. Since the model is non-linear, let me (cid:133)rst explain how the impulse responses are generated and then focus on the intuition that the graphs suggest. I simulate the model drawing 10;000 times from the same optimal policy distribution under two scenarios. In the (cid:133)rst, I draw from a distribution with constant mean of the shock to income. In the second, I assume that the mean of the shocks increase/decrease in the very (cid:133)rst period (one-time shocks) and then revert back to its original distribution. Impulse responses of consumption are the di⁄erence between the two income paths averaged over simplexpoints and 10;000 Monte Carlo draws of income. The impulse response functions are plotted in Figures 9a-9b. Consider Figures 9a (cid:133)rst. They display a positive (Figure 9a) and a negative (Figure 9b) shock to income respectively. Note that for both log and CRRA (cid:13) = 2 and for di⁄erent value of the shadow cost ((cid:18) = 0:2 (cid:18) = 2) the reaction _ to a negative shocks ((cid:1)y = 1 ) starts from the very (cid:133)rst period. However, the extent j j of the reaction varies across utilities and information costs. When (cid:18) = 0:2, a log utilitytype consumer reacts on impact by increasing savings to an extent lower than the shock. He then adjust savings and consumption so to distribute the averse shock throughout time. The same log-type but with (cid:18) = 1 decreases more consumption on impact than his (cid:18) = 0:2 counterpart. He increases consumption slowly over time until it reaches its new long-run value. Likewise, a consumer with risk aversion (cid:13) = 2 varies his saving when the shock hits to an extent that depends on his information (cid:135)ow. In particular, note that for (cid:18) = 2 the decrease in consumption on impact and in the following periods is so signi(cid:133)cant that consumers can use the accumulated savings to restore their original consumption plan. The endogenous asymmetric response to shocks, even in this very simple setting, makes rational inattention models observationally distinct from any other standard macroeconomic model. In those frameworks, either there is no asymmetric 32

reaction (as in LQG) or the asymmetric response is due to the asymmetric magnitude of the shocks (as in models a la Lucas). These implications make the theory appealing from an empirical standpoint (e.g., think about consumers(cid:146)reactions to a tax break vs. being (cid:133)red from the job). Moreover, they make the theory suitable to study the impact of policy changes on private sectors decision. The 2008 Tax Rebate provides one such example. 6.1 Sensitivity analysis and policy implications A feature of the model worth exploring is how consumption(cid:146)s reaction to shocks depends on the initial value of wealth. Drawing a time series from the probability distribution that solves the model, it is natural that the farther away wealth is from its steady state, the more consumption reacts to a shock to wealth. The interesting prediction of the model with an informationprocessing constraint is that for either the log or the CRRA, (cid:13) > 1 utility, it does matter for the impulse response whether we start from a value of wealth above or below the steady state. In both cases, the reactions are faster in case of a negative shock than a positive one. However, extent and timing are di⁄erent with wealthier people reacting faster and with sharper decrease in consumption to a negative shock than poorer people do when facing the same kind of shock. This is due to the fact that poorer people already consume small amount so that when a negative shock hits, even if they receive immediately signal of the news, they only gradually reduce their consumption. Savings slowly accumulate over time until the shock is absorbed. For a given processing capacity, wealthier people can a⁄ord to reduce their consumption as soon as they acknowledge the negative shock. The jump start in savings makes it possible for them to absorb the shock faster. By contrast, a positive shock has a stronger e⁄ect on poorer people than wealthier one. To see why, consider a tax rebate. Taking two individuals with the same characteristics in terms of risk aversion and information-processing constraints but di⁄erent initial net worth, the wealthier person takes longer to change his consumption behavior. When the change does occur, the magnitude is smaller than the one for a poorer person. The intuition for this result is that an increase in disposable income for a less wealthy person .implies a more sizeable (cid:133)nancial break than the same amount does to a wealthy person. Risk aversion prevents both types of consumers from immediately disposing of the additional credit but it has a bigger e⁄ect on impact for the more constrained consumer. 33

Impulse Response function as a function of wealth, (cid:1)y=0:02 4 Log Utility,k =2.5 4 CRRA g =5,k =2.5 x 10 x 10 8 8 n 6 n 6 o o itp itp m 4 m 4 u u s s n n o o c F 2 c F 2 R R I I 0 0 I II III IV V I II III IV V Quarters Quarters 4 Log Utility,k =0.88 4 CRRA g =5, k =0.88 x 10 x 10 8 8 6 6 n n o o itp itp m 4 m 4 u u s s n n o o c F 2 c F 2 R R I I 0 0 I II III IV V I II III IV V Quarters Quarters Figure 10. Solid: w = 1:94; Star Dashed:w = 3:3; Dotted: w = 5:2 0 0 0 34

Eveninitssimplicity,themodelcanbeusedtoaddressimportantpolicyquestions. In particular, it can be used to analyze the e⁄ectiveness of tax policy reforms on individual consumption and savings decisions. Figure 10 displays the impulse response function of consumption to a stimulus payment which increases income of 2% with respect to its (constant) long run level. The discretized solutions are generated using equi-spaced grid of consumption and wealth, with 50 points each. Consumption takes up value in [0:5;3] while wealth ranges from 1 to 10. I use the same parameters (R = 1:012 and (cid:12) = 1=R) of the baseline model and a simplex of size (50!) (49) and two speci(cid:133)cations of utility (cid:3) functions. In both cases I choose (cid:18) so that the capacities corresponds to = 2:5 bits and (cid:24) 0:88 bits28. Once the value iteration converged, I generate the impulse response function bysimulatingatimeseriespathofconsumptionandwealthwith10;000MonteCarloruns for each initial condition on wealth. I consider three initial values of wealth as a proxy of populationwithlow,middleandlownetworth. Ithenaveragethetimeseriesperquarters and simplex points. Figure 10 gives interesting insights on the e⁄ect of the stimulus on consumer spending. For the degrees of risk aversion considered and information capacity, thereactionofthestimulusishigherthelowertheinitialwealth. Thisisnotsurprising,as the stimulus payments have bigger impact on the disposable income of credit constrained consumers than richer people. For a given amount of information capacity and wealth, the higher the risk aversion, the lower the spending in the (cid:133)rst quarter. This result also makes sense. If a consumer is risk averse and have no credit frictions, he allocates more attention in processing information about low values of wealth. This leads to processing information slower and, in turn, reacting slower to positive news to income (Result 3). Finallyforagivenwealthanddegreeofriskaversion,thelowertheinformationprocessing capacity, the lower the response of consumption spending to the rebate. The (cid:133)ndings in Figure 10 can be summarized as: Result 4. Economic stimulus and rational inattention. The impact of a one time tax rebate on rational inattentive consumers: 1. is stronger the lower the initial net worth. 2. is more delayed the higher the degree of risk aversion. 3. is more persistent but less e⁄ective the lower the information-processing capacity. The insights one can gather from the model have strong policy implications on the e⁄ectiveness of tax reform on people(cid:146)s behavior. The 2008 tax rebate provides one such example. The model predicts that such a policy has greater response on impact for individual with low net worth. Figure 10 also suggests that the e⁄ect will be mild and spread out through several quarters for middle-high income households. These (cid:133)ndings are consistent with the empircal evidence on consumers spending of 2001 tax rebates (cfr., Johnson, Parker and Soules (2006)). 28The constraint (cid:20) = 2:5 corresponds to (cid:18) = 0:01 and (cid:18) = 0:05 for the log case and the crra, log crra (cid:13) =2 case respectively, while (cid:20)=0:88 is given by (cid:18) =0:1 and (cid:18) =0:9: log crra 35

7 Conclusions This paper applies rational inattention to a dynamic model of consumption and savings. Consumers rationally choose the nature of the signal they want to acquire subject to the limits of their information processing capacity. The dynamic interaction of risk aversion and endogenous choice of information (cid:135)ow enhances precautionary savings. I showed that for a given degree of risk aversion, the lower the information (cid:135)ow, the (cid:135)atter the consumption path. The model predicts that for a given information (cid:135)ow, the higher the degree of risk aversion, the more persistent consumption. Also, for a given degree of risk aversion, the lower the information (cid:135)ow, the more volatile consumption. Furthermore,themodelpredictsthatconsumptionpathhashumps. Underinformationprocessing constraints, an hump occurs when people consume a little and save a lot while collecting information about wealth. When consumers realize that they are rich, they increase consumption and decumulate savings. This increase stops when they acknowledge that their wealth is low again: they start to save and process more information. Thus, consumption decreases. Consistent with the previous two results, I (cid:133)nd that the peak in consumption is delayed the more the individual becomes risk averse. Di⁄ering from other life-cycle models, in my model there could be more than one hump in the consumption path. Depending on the history of the income shocks, a very low or very high realization of income a⁄ects consumers(cid:146)signal through its e⁄ect on wealth. Consumers react to the news by varying savings and information over time, thereby generating another hump. Finally, the model predicts that consumers with processing capacity constraints have asymmetric responses to shocks, with negative shocks producing more persistent e⁄ects than positive ones. This asymmetry, observed in actual data, is novel to the theoretical literature of consumption and savings. Studying the reactions of rational inattentive people to temporary income shocks can also be used to assess the e⁄ectiveness of policy reforms on consumption spending. The model predicts that, for a given level of wealth, thespeedandmagnitudeof theconsumptionadjustmenttotheincomeshockdependson theirprocessingcapacity. Moreover, consumerswithlowwealthreactfastertotemporary tax relief than wealthier people. The results agree with both intuition and preliminary data on consumer spending. The results seem to suggest that enriching the standard macroeconomic toolbox with rational inattention theory is a step worth taking. 36

References [1] Allen, F., S. Morris and H. S. Shin, (2006), Beauty Contests, Bubbles and Iterated Expectations in Asset Markets Review of Financial Studies, forthcoming. [2] Alvarez, F.andN.Stockey, (1998), Dynamic Programming with Homogeneous Functions, Journal of Economic Theory, 82, pp.167-189. [3] Amato, J.S. Morris and H. S. Shin (2002), Communication and Monetary Policy, Oxford Review of Economic Policy 18, pp. 495-503. [4] Ameriks, J., Andrew C. and J. Leahy (2003b) The Absent-Minded Consumer, unpublished, New York University. [5] Angeletos G.-M. and A. Pavan, (2004), Transparency of Information and Coordination in Economies with Investment Complementarities, American Economic Review 94 (2). [6] Aoki, K. (2003), On the Optimal Monetary Policy Response to Noisy Indicators, Journal of Monetary Economics 50, pp. 501-523. [7] Astrom,K.(1965),Optimal Control of Markov decision process with incomplete state estimation. Journal of Mathematical Analysis and Applications 10, pp.174-205 [8] Broda, C. and J. Parker (2008), A preliminary analysis of how household spending changed in response to the receipt of a 2008 economic stymulus payment. Mimeo. [9] Caballero, R. J. (1990), Expenditure on Durable Goods: a case for slow adjustment, The Quarterly Journal of Economics, Vol. 105, No. 3. pp. 727-743. [10] Caballero, R.J.(1995), Near-rationality, Heterogeneity and Aggregate Consumption, Journal of Money, Credit and Banking, 27 (1), pp29-48. [11] Campbell, J.Y. (1987), Does Savings Anticipate Declining Labor Income? An alternative Test of the Permanent Income Hypothesis, Econometrica, 55, pp.1249-1273. [12] Campbell, J.Y. and A. Deaton (1989), Why is Consumption so Smooth? Review of Economic Studies, 56, pp.357-374. [13] Campbell, J.Y.andN.G.Mankiw(1989), Consumption, Income and Interest Rates: Reinterpreting the Time Series Evidence, NBER Macroeconomic Annual 4, pp.185- 216. [14] Campbell, J.Y. and N. G. Mankiw (1989), Permanent Income, Current Income and Consumption, Journal of Business and Economic Statistics, 8 (3), pp.265-279. [15] Carroll, C. D. (2003) Macroeconomic Expectations of Households and Professional Forecasters, Quarterly Journal of Economics, 118 (1), pp. 269-298. 37

[16] Cochrane, J.H. (1989), The Sensitivity of Tests of the Intertemporal Allocation of Consumption to Near-Rational Alternatives, American Economic Review, 90, pp.319-337. [17] Cover, T.M. and J. A. Thomas, Elements of Information Theory, John Wiley & Sons, Inc., 1991 [18] Deaton, A. (1987), Life-Cycle Models of Consumption: is the evidence consistent with the Theory? in Advances in Econometrics, Fifth World Congress, vol.2, ed. Truman Bewley, Cambridge, Cambridge University Press. [19] Deaton, A. (1992), Understanding Consumption, Oxford, Oxford University Press. [20] Dynan, Karen (2000), Habit Formation in Consumer Preferences: Evidence from Panel Data, American Economic Review, 90, pp. 391-406. [21] Flavin, M. A. (1981), The Adjustment of Consumption to Changing Expectations about Future Income, Journal of Political Economy, 89, pp. 974-1009. [22] Friedman, M. (1957), A Theory of the Consumption Function, Princeton, Princeton University Press. [23] Goodfriend, Marvin (1992), Information-Aggregation Bias, American Economic Review, 82, pp. 508-519. [24] Gourinchas, P-O, and J.A. Parker (2002), Consumption over the Life-Cycle, Econometrica, vol. 70 (1), pp.47-89. [25] Hall, R. (1978), Stochastic Implications of the Life Cycle-Permanent Income Hypothesis: Theory and Evidence, Journal of Political Economy, 86, pp.971-987. [26] Hellwig, C. (2004), Heterogeneous Information and the Bene(cid:133)ts of Transparency, Discussion paper, UCLA. [27] Hellwig, C. and L. Veldkamp, (2006) Knowing What Others Know: Coordination Motives in Information Acquisition, NYU, mimeo. [28] Johnson, D., Parker, J. and N. Soules (2006) Household expenditure and the income tax rebates of 2001. American Economic Review, Vol. 96, No. 5. [29] Kim, J., Kim, S., Schaumburg, E. andC. Sims (2005). Calculating and Using Second Order Accurate Solutions of Discrete Time Dynamic Equilibrium Models. mimeo. [30] Lewis, K. (2007), The life-cycle e⁄ects of information-processing constraints. Working Paper, University of Iowa. [31] Lewis, K. (2008), The Two-Period Rational Inattention Model: Accellerations and Analyses", Computational Economics, Forthcoming. Currently available as Federal Reserve Financial & Economics Discussion Series Paper, No. 2008-22, Board of Governors . 38

[32] Lorenzoni, G. (2006), Demand Shocks and Monetary Policy, MIT, mimeo. [33] Lucas, R. E., J. (1973), Some International Evidence on Output-In(cid:135)ation Tradeo⁄s, American Economic Review, 63(3), 326(cid:150)334. [34] Luo, Y. (2007), Consumption Dynamics, Asset Pricing, and Welfare E⁄ects under Information Processing Constraints, forthcoming in Review of Dynamics. [35] Lusardi, A.(1999), Information, Expectations, and Savings, in Behavioral Dimensions of Retirement Economics, ed. Henry Aaron, Brookings Institution Press/Russell Sage Foundation, New York, pp. 81-115. [36] Lusardi, A. (2003), Planning and Savings for Retirement, Dartmouth College, unpublished. [37] MacKay, David J. C (2003). Information Theory, Inference, and Learning Algorithms, Cambridge University Press, 2003 [38] Macrowiack, B., and M. Wiederholt (2007), Optimal Sticky Prices under Rational Inattention, Discussion paper, Northwestern/ ECB mimeo. [39] Mankiw, N.G. and R. Reis (2002), Sticky information versus sticky prices: A proposal to replace the New Keynesian Phillips curve, Quarterly Journal of Economics 117, 1295(cid:150)1328. [40] Mankiw, N. G., Reis, R. and J. Wolfers (2004) Disagreement in In(cid:135)ation Expectations, NBER Macroeconomics Annual 2003, vol. 18, pp. 209-147. [41] Mondria, J. (2006), Financial Contagion and Attention Allocation, Working paper, University of Toronto. [42] Morris, S., and H. S. Shin (2002), The Social Value of Public Information, American Economic Review, 92, 1521(cid:150)1534. [43] Moscarini, G. (2004), Limited Information Capacity as a Source of Inertia, Journal of Economic Dynamics and Control, pp. 2003(cid:150)2035. [44] Mullainathan, S. (2002), A Memory-Based Model of Bounded Rationality, Quarterly Journal of Economics, 117 (3), pp. 735-774. [45] Orphanides, A.(2003), Monetary Policy Evaluation with Noisy Information,Journal of Monetary Economics 50 (3), pp. 605-631. [46] Parker, J. (1999) The Reaction of Household Consumption to Predictable Changes in Social Security Taxes, American Economic Review, 89 (4), pp. 959-973. [47] Peng, L. (2005), Learning with Information Capacity Constraints, Journal of Financial and Quantitative Analysis, 40(2), 307(cid:150)329. 39

[48] Peng, L., and W. Xiong (2005), Investor Attention, Overcon(cid:133)dence and Category Learning, Discussion paper, Princeton University. [49] Pischke, Jorn-Ste⁄en (1995) Individual Income, Incomplete Information, and Aggregate Consumption, Econometrica, 63 (4), pp. 805-840. [50] Puterman, M.L. (1994), Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley Series in Probability and Mathematical Statistics, John Wiley and Sons, Inc. [51] Reis, R. A. (2006) Inattentive Consumers, Journal of Monetary Economics, 53 (8), 1761-1800. [52] Schmitt-GrohŁ, S. and M. Uribe(2004). Solving Dynamic General Equilibrium Models Using a Second-Order Approximation to the Policy Function. Journal of Economic Dynamics and Control 28, pp.755-75. [53] Rotemberg, Julio J., and Michael Woodford, 1999. (cid:147)The Cyclical Behavior of Prices and Costs.(cid:148)in John B. Taylor, and Michael Woodford (ed.), Handbook of Macroeconomics. [54] Shimer, Robert, 2005. (cid:147)The Cyclical Behavior of Equilibrium Unemployment and Vacancies.(cid:148)American Economic Review. 95 (1): 25(cid:150)49. [55] Smets, Frank, and Rafael Wouters, 2003. (cid:147)An Estimated Dynamic Stochastic General Equilibrium Model of the Euro Area.(cid:148)Journal of the European Economic Association. 1 (5): 1123(cid:150)1175. [56] Smets, Frank, and Rafael Wouters, 2007. (cid:147)Shocks and Frictions in US Business Cycles: A Bayesian DSGE Approach.(cid:148)American Economic Review. 97 (3): 586(cid:150) 606. [57] Sims, C. A. (1998), Stickiness, Carnegie-rochester Conference Series On Public Policy, 49(1), 317(cid:150)356. [58] Sims, C. A. (2003), Implications of Rational Inattention, Journal of Monetary Economics, 50(3), 665(cid:150)690. [59] Sims, C. A.(2005), Rational Inattention: a Research Agenda, Princeton University mimeo. [60] Van Nieuwerburgh, S., and L. Veldkamp (2004a), Information Acquisition and Portfolio Under-Diversication, Discussion paper, Stern School of Business, NYU. [61] Stokey,N.L.andR.E.LucasJr.,withEdwardC.Prescott(1989)Recursive Methods in Economic Dynamics, Cambridge, Harvard University Press. [62] Van Nieuwerburgh, S., and L. Veldkamp (2004b), Information Immobility and the Home Bias Puzzle, Discussion paper, Stern School of Business, NYU. 40

[63] Van Nieuwerburgh, S., and L. Veldkamp, 2006, Learning Asymmetries in Real Business Cycles, Journal of Monetary Economics 53 (4), pp.753-772. [64] Wilson, Andrea (2003) (cid:147)Bounded Memory and Biases in Information Processing,(cid:148) unpublished, Princeton University. [65] Woodford, M. (2002), Imperfect Common Knowledge and the E⁄ects of Monetary Policy, in P. Aghion, R. Frydman, J. Stiglitz, and M. Woodford, eds., Knowledge, Information, and Expectations in Modern Macroeconomics: In Honor of Edmund S. Phelps, Princeton: Princeton University Press. [66] Woodford, M. (2003), Interest and Prices, Princeton. Princeton University Press [67] Zeldes, S. (1989) Optimal Consumption with Stochastic Income: Deviations from Certainty Equivalence, Quarterly Journal of Economics, May 1989, 275-98.. 41

8 Appendix A 8.1 Proof of Proposition 1. The Bellman Recursion in the discrete Rational Inattention Consumption- Saving Model is a Contraction Mapping. Proof. The H mapping displays: HV (g) = maxHpV (g); p with HpV (g) = u(c)p(c w) g(w)+(cid:12) (V (g ( )))p(c w)g(w) : j c0 (cid:1) j " ! # w X2 (cid:10)w c X2 (cid:10)c w X2 (cid:10)wc X2 (cid:10)c Suppose that HV HU is the maximumat point g. Let p denote the optimal control 1 jj (cid:0) jj for HV under g and p the optimal one for HU 2 HV (g) = Hp1V (g); HU (g) = Hp2U (g): Then it holds HV (g) HU (g) = Hp1V (g) Hp2U (g): jj (cid:0) jj (cid:0) Suppose WLOG that HV (g) HU (g): Since p maximizes HV at g , I get 1 (cid:20) Hp2V (g) Hp1V (g): (cid:20) Hence, HV HU = jj (cid:0) jj HV (g) HU (g) = jj (cid:0) jj Hp1V (g) Hp2U (g) (cid:0) (cid:20) Hp2V (g) Hp2U (g) = (cid:0) (cid:12) [(Vp2(g ( ))) (Up2(g ( )))]p g(w) c0 (cid:1) (cid:0) c0 (cid:1) 2 (cid:20) w X2 (cid:10)wc X2 (cid:10)c (cid:12) ( V U )p g(w) 2 jj (cid:0) jj (cid:20) w X2 (cid:10)wc X2 (cid:10)c (cid:12) V U : jj (cid:0) jj Recalling that 0 (cid:12) < 1 completes the proof. (cid:20) 42

8.2 Proof of Corollary. The Bellman Recursion in the discrete Rational Inattention Consumption- Saving Model is an Isotonic Mapping. Proof. Let p denote the optimal control for HV under g and p the optimal one for 1 2 HU HV (g) = Hp1V (g); HU (g) = Hp2U (g): By de(cid:133)nition, Hp1U (g) Hp2U (g): (cid:20) Froma given g, it is possible to compute g ( ) foran arbitraryc and then the following c0 (cid:1) jp1 will hold V U = (cid:20) ) g(w);c; 8 V g ( ) U g ( ) = c0 (cid:1) jp1 (cid:20) c0 (cid:1) jp1 ) (cid:16) (cid:17) (cid:16) (cid:17) V g ( ) p g U g ( ) p g = c0 (cid:1) jp1 (cid:1) 1 (cid:20) c0 (cid:1) jp1 (cid:1) 1 ) c X2 (cid:10)c (cid:16) (cid:17) c X2 (cid:10)c (cid:16) (cid:17) u(c)p g(w)+(cid:12) V g ( ) p g 1 c0 (cid:1) jp1 (cid:1) 1 ! w X2 (cid:10)w c X2 (cid:10)c c X2 (cid:10)c (cid:16) (cid:17) u(c)p = 1 (cid:20) ) ! w X2 (cid:10)w c X2 (cid:10)c Hp1V (g) Hp1U (g) = (cid:20) ) Hp1V (g) Hp2U (g) = (cid:20) ) HV (g) HU (g) = (cid:20) ) HV HU: (cid:20) Note that g was chosen arbitrarily and, from it, g ( ) completes the argument that the c0 (cid:1) jp1 value function is isotone. 8.3 Proof of Proposition 2. TheOptimalValueFunctioninthediscreteRationalInattentionConsumption- Saving Model is Piecewise Linear and Convex (PCWL). 43

Proof. The proof is done via induction. I assume that all the operations are wellde(cid:133)ned in their corresponding spaces. For planning horizon n = 0, I have only to take into account the immediate expected rewards and thus I have that: V (g) = max u(c)p g(w) (24) 0 p (cid:0) " ! # 2 w X2 (cid:10)w c X2 (cid:10)c and therefore if I de(cid:133)ne the vectors (cid:11)i (w) u(c)p (25) 0 i (cid:17) ! (cid:8) (cid:9) c X2 (cid:10)c p 2 (cid:0) I have the desired V (g) = max (cid:11)i;g (26) 0 0 (cid:11)i(w) f 0 gi (cid:10) (cid:11) where :;: denotes the inner product (cid:11)i;g (cid:11)i (w);g(w). For the general case, h i h 0 i (cid:17) 0 w X2 (cid:10)w using equation (22): u(c)p(c w) g(w)+ j V n (g) = m p a (cid:0) x2 w X2 (cid:10)w c X2 (cid:10)c ! 3 (27) 2 6 6 +(cid:12) (V n (cid:0) 1 (g c0 ( (cid:1) ) c ))p(c j w)g(w) 7 7 6 w X2 (cid:10)wc X2 (cid:10)c 7 4 5 by the induction hypothesis V (g( ) ) = max (cid:11)i ;g ( ) (28) n (cid:0) 1 (cid:1) jc (cid:11)i n (cid:0) 1 c0 (cid:1) f n (cid:0) 1gi (cid:10) (cid:11) Plugging into the above equation (19) and by de(cid:133)nition of :;: , h i Pr(w;c) V (g ( )) = max (cid:11)i (w ) T ( ;w;c) (29) n (cid:0) 1 c0 (cid:1) (cid:11)i n (cid:0) 1 0 (cid:1) Pr(c) ! f n (cid:0) 1gi w X02 (cid:10)w w X2 (cid:10)wc X2 (cid:10)c With the above: u(c)p g(w)+ 2 ! 3 V n (g) = max w X2 (cid:10)w c X2 (cid:10)c p (cid:0) 2 6 +(cid:12)max (cid:11)i (w ) T(;w;c) p g(w) 7 6 6 6 f (cid:11)i n (cid:0) 1gi w X02 (cid:10)w n (cid:0) 1 0 w X2 (cid:10)w c X2 (cid:10)c P (cid:1) r(c) (cid:1) ! ! 7 7 7 4 5 1 = max u(c) p; g(w) +(cid:12) max (cid:11)i (w )T ( ;w;c) p; g p (cid:0) " h (cid:1) i Pr(c) (cid:11)i * n (cid:0) 1 0 (cid:1) (cid:1) +# 2 c X2 (cid:10)c f n (cid:0) 1gi w X02 (cid:10)w (30) 44

At this point, it is possible to de(cid:133)ne (cid:11)j (w) = (cid:11)i (w )T ( : w;c) p: (31) p;c n 1 0 (cid:1) (cid:1) (cid:0) w X02 (cid:10)w Note that these hyperplanes are independent on the prior g for which I am computing V : Thus, the value function amounts to n 1 V (g) = max u(c) p; g +(cid:12) max (cid:11)j ;g ; (32) n p 2 (cid:0) " h (cid:1) i c X2 (cid:10)c Pr(c) f (cid:11)j p;c gj (cid:10) p;c (cid:11) # and de(cid:133)ne: (cid:11) = arg max (cid:11)j ;g : (33) p;c;g p;c (cid:11)j p;c f gj (cid:10) (cid:11) Note that (cid:11) is a subset of (cid:11)j and using this subset results into p;c;g p;c 1 V (g) = max u(c) p; g +(cid:12) (cid:11) ;g n p;c;g p (cid:0) " h (cid:1) i Pr(c) h i # 2 c X2 (cid:10)c 1 = max u(c) +(cid:12) (cid:11) ; g : (34) p;c;g p (cid:0) * (cid:1) Pr(c) + 2 c X2 (cid:10)c Now 1 (cid:11)i = u(c) p+(cid:12) (cid:11) (35) n i (cid:1) Pr(c) p;c;g ( ) (cid:8) (cid:9) [8 g c X2 (cid:10)c p 2 (cid:0) is a (cid:133)nite set of linear function parametrized in the action set. 8.4 Proof of Proposition 3. Proof. The(cid:133)rsttaskistoprovethat (cid:11)i setsarediscreteforalln. Theproofproceeds f ngi via induction. Assuming CRRA utility and since the optimal policy belongs to (cid:0), it is straightforward to see that through (25), the set of vectors (cid:11)i , f 0gi c1 (cid:13) (cid:11)i (cid:0) p(c w) g(w) 0 i (cid:17) 1 (cid:13) j ! ! (cid:8) (cid:9) w X2 (cid:10)w c X2 (cid:10)c (cid:0) p 2 (cid:0) is discrete. For the general case, observe that for discrete controls and assuming M = (cid:11)j , the sets (cid:11)j are discrete, for a given action p and consumption c, I can only n 1 p;c gener(cid:0)ate (cid:11)j vectors. Now, (cid:133)xing p it is possible to select one of the M (cid:11)j vectors for (cid:12)(cid:8) (cid:9)(cid:12) p;c(cid:0) (cid:8) (cid:9) p;c(cid:0) e(cid:12)ach one(cid:12)of the observed consumption c and, thus, (cid:11)j is a discrete set. The previous f ngi proposition, shows the value function to be convex. The piecewise-linear component of the properties comes from the fact that (cid:11)j set is of (cid:133)nite cardinality. It follows that f ngi V is de(cid:133)ned as a (cid:133)nite set of linear functions. n 45

9 Appendix B 9.1 Concavity of Mutual information in the Belief State. For a given p(c w), Mutual Information is concave in g(w) j Proof. Let Z be the binary random variable with P (Z = 0) = (cid:21) and let W = W if 1 Z = 0 and W = W if Z = 1. Consider 2 I(W;Z;C) = I(W;C)+I(Z;C W) j = I(W;C Z)+I(Z;C) j Condition on W, C and Z are independent, I(C;Z W) = 0: Thus, j I(W;C) I(W;C Z) (cid:21) j = (cid:21)(I(W;C Z = 0))+(1 (cid:21))(I(W;C Z = 1)) j (cid:0) j = (cid:21)(I(W ;C))+(1 (cid:21))(I(W ;C)) 1 2 (cid:0) Q.E.D. 10 Appendix C Pseudocode Let (cid:18) be the shadow cost associated to (cid:20) = I (C ;W ). De(cid:133)ne a Model as a pair ((cid:13);(cid:18)). t t t t For a given speci(cid:133)cation : Step 1: Build the simplex. Construct an equi-spaced grid to approximate each g(w )t (cid:15) simplex point. Step 2: For each simplex point, de(cid:133)ne p(c ;w ). and Initialize with V g ( ) = 0: (cid:15) t t c0 j (cid:1) (cid:16) (cid:17) Step 3: For each simplex point, (cid:133)nd p (c;w) s.t. (cid:3) (cid:15) V (g(w )) = max c1 t(cid:0) (cid:13) p (c ;w ) (cid:18)[I (C ;W )] : 0 t jp (cid:3) (ct;wt) p(ct;wt) (cid:26)wt (cid:10)wct (cid:10)c 1 (cid:0) (cid:13) (cid:3) t t (cid:0) t t t (cid:27) 2 2 (cid:16) (cid:17) P P Step 4: For each simplex point, compute g ( ) = T ( ;w ;c )p (w c ). Use a (cid:15) c0 j (cid:1) wt (cid:10)w (cid:1) t t (cid:3) t j t kernel regression to interpolate V (g(w )) into g ( ). 2 0 t c0 jP(cid:1) Step 5: Optimize using csminwel and iterate on the value function up to convergence. (cid:15) 46

Obs. Convergence and Computation Time vary with the speci(cid:133)cation ((cid:13);(cid:18)). 180-320 iterations each taking 8min-20min ! Step 6. For each model ((cid:13);(cid:18)), draw from the ergodic p (c;w) a sample (c ;w ) and (cid:3) t t (cid:15) simulate the time series of consumption, wealth, expected wealth and information (cid:135)ow by averaging over 1000 draws. Step 7. Generate histograms of consumption and impulse response function of consump- (cid:15) tion to temporary positive and negative shocks to income. 11 Appendix D 11.1 Optimality Conditions InthissectionIincorporateexplicitlytheconstraintoninformationprocessingandderive the Euler Equations that characterize its solution. Themainfeatureof thissectionistorelatethelinkbetweentheoutputof thechannel consumption, with the capacity chosen by the agent. In deriving the optimality conditions, I incorporate the consistency assumption (20) in the main diagonal of the joint distribution to be chosen, Pr (c ;w ). Note that such a restriction is WLOG. I then t j i show the analytical details of the derivatives with respect to control and states. 11.2 First Order Conditions To evaluate the derivative of the Bellman equation with respect to a generic distribution Pr(c ;w ), de(cid:133)ne the di⁄erential operator (cid:1) v(l) v(l ) v(l ) and (cid:18) as k1 k2 k (cid:17) k1 (cid:0) k2 the shadow cost of processing information: Then, the optimal control for the program (17)-(21) amounts to: @p (c ;w ) : (cid:3) k1 k2 (cid:1) u(c)+(cid:12)(cid:1) V (g (:)) = p (c ;w ) (cid:1) u (c)(cid:18)p (w c ) (cid:12)(cid:1) V (g (:)) (36) k k c0 (cid:3) k1 k2 (cid:0) k 0 (cid:3) k2j k1 (cid:0) k p0 (cid:3) c0 (cid:0) (cid:1) This expression states that the optimal distribution depends on the weighted di⁄erence of two consumption pro(cid:133)les, c and c where the weights are given by current and k1 k2 future discounted utilities. Note that the di⁄erential of the marginal utility of current consumption is also weighted by the conditional optimal distribution of consumption and wealth. The interpretation of (36) is that the optimal probability of consumption and wealth depends on both levels of current and intertemporal utility and marginal utility. In 47

particular, thereisanintertemporaltrade-o⁄byconsumingthemaximumvalueofwealth allowed by the signal, c and a lower consumption c . To illustrate the argument, k2 k2 suppose a consumer believes that his wealth is w with high probability. Suppose for k2 simplicity that w allows him to spend c or c . The decision of shifting probability k2 k1 k2 from p(c ;w ) to p(c ;w ) depends on four variables. First, the current di⁄erence k2 k2 k1 k2 in utility levels, (cid:1) u(c) which tells the immediate satisfaction of consuming c rather k k1 than c . However, consuming more today has a cost in future consumption and wealth k2 levels tomorrow, (cid:12)(cid:1) V (g (:)). Optimal allocation of probabilities requires trading o⁄ k c0 not only intertemporal levels of utility but also marginal intertemporal utilities where now the current marginal utility of consumption is weighted by the e⁄ort required to process information today. To explore this relation further, I evaluate the derivative of the continuation value for a given optimal p (c ;w ), that is (cid:1) V (g (:)). To this end, de(cid:133)ne the ratio between (cid:3) k1 k2 k p0 (cid:3) c0 di⁄erentialinutilities(currentanddiscountedfuture)anddi⁄erentialinmarginalcurrent utility as (cid:9)(cid:20) (cid:1)ku(c((cid:20)))+(cid:12)(cid:1)kV(g c0 (:)) . Also, let (cid:8)(cid:20) be the ratio (cid:9)(cid:20) when current level of (cid:17) (cid:0) (cid:18)[u 0 (ck1 ((cid:20))) (cid:0) u 0 (ck2 ((cid:20)))] utilities are equalized and future di⁄erential utilities are constant, i.e., (cid:1) u(c) = 0 and k (cid:1) V (g (:)) = 1 or, (cid:8)(cid:20) (cid:12) . Then, an application of Chain rule and k c0 (cid:17) (cid:0) (cid:18)[u 0 (ck1 ((cid:20))) (cid:0) u 0 (ck2 ((cid:20)))] point-wise di⁄erentiation leads to p (c ;w ) = (cid:3)(k ;k )p (c ) (37) (cid:3) k1 k2 1 2 (cid:3) k1 where (cid:3)(k ;k ) (cid:3) ((cid:9)(cid:20);p (c ;w )) (cid:3) (cid:8)(cid:20);g ( );p (c ;w ) (cid:3) (cid:8)(cid:20);g ( );p (c ;w ) 1 2 (cid:17) 1 (cid:3) k1 k2 (cid:1) 2 c0 k1 (cid:1) (cid:3) k1 k2 (cid:1) 3 c0 k2 (cid:1) (cid:3) k1 k2 (cid:16) (cid:17) (cid:16) (cid:17) Letmefocusontheexplanationfortheterms(cid:3)(k ;k )whichcharacterizetheoptimal 1 2 solution of the conditional distribution p (w c ): (cid:3) k2j k1 The (cid:133)rst term (cid:3) ((cid:9)(cid:20);p (c ;w )) exp((cid:9)(cid:20)=(p (c ;w ))) states that the optimal 1 (cid:3) k1 k2 (cid:17) (cid:3) k1 k2 choice of the distribution balances di⁄erentials between current and future levels of utilities between high (k ) and low (k ) values of consumption. In case of log utility, the term 2 1 exp((cid:9)(cid:20))isalikelihoodratiobetweenutilitiesinthetwostatesoftheword(k andk )and 1 2 the interpretation is that the higher is the value of the state of the world k with respect 2 to k as measured by the utility of consumption, the lower is the optimal p (c ;w ). 1 (cid:3) k1 k2 This matches the intuition because the consumer would like to place more probability on the occurrence of k the wider the di⁄erence between c and c . A perhaps more 2 k1 k2 interesting intertemporal relation is captured by the terms (cid:3) and (cid:3) , both of which 2 3 display the occurrence of the update distribution g ( ), i = 1;2. To disentangle the c0 ki (cid:1) contribution of each argument of (cid:3) and (cid:3) , I combine the derivative of the control with 2 3 the envelope condition. Let (cid:3) be the term (cid:3) led one period and de(cid:133)ne the di⁄erential 01 1 between transition from one particular state to another and transition from one particu- ~ lar state to all the possible states as (cid:1)T T ;w ;c T ( ;w ;c )p w c j (cid:17) (cid:1) k2 kj (cid:0) (cid:1) i k1 (cid:3) i j kj (cid:18) i (cid:19) for j = 1;2. Evaluating the derivative with (cid:0)respect to(cid:1)the s P tate almost surel(cid:0)y reve(cid:1)als 48

that (cid:3) exp (cid:8)(cid:20)(cid:3) (cid:1) ~ T =p(c ) while (cid:3) exp (cid:8)(cid:20)(cid:3) (cid:1) ~ T =p(c ) . The 2 (cid:17) (cid:0) 01 1 k1 3 (cid:17) (cid:0) 01 2 k2 terms (cid:3) 2 and (cid:3)(cid:16) 3 re(cid:16)veal that i(cid:17)n setting(cid:17)the optimal distri(cid:16)but(cid:16)ion p (cid:3) (c k1 ;w k2 ) co(cid:17)n(cid:17)sumers take into account not only di⁄erential between levels and marginal utilities but also how the choice of the distribution shrinks or widens the spectrum of states that are reachable after observing the realized consumption pro(cid:133)le. An interesting special case that admits a closed form solution is when the agent is risk neutral. Consider the framework in Section (3.2) and let utility take up the form u(c) = c , then in the region of admissible solution c < w , the optimal probability t t t distribution makes c independent on w. To see this, it is easy to check that in the two period case with no discounting, the utility function reduces to u(c) = w, which implies c w U (w ;w ). Thatis,sincealltheuncertaintyisdrivenbyw, theconsumerdoes min max j / not bother processing information beyond the knowledge of where the limit of c = w lies. In other word, the constraint on information (cid:135)ow does not bind. With the continuation value, exploiting risk neutrality, the optimal policy function amounts to: [( ck1(cid:0) ck2 ) +(cid:12)(cid:1)kV(cid:22)(gc0(:))] (cid:18) e ! p (w c ) = (38) (cid:3) k2j k1 (cid:1) ~ T j j P The solution uncovers some important properties of the interplay between risk neutrality and information (cid:135)ow. First of all, households with linear utility do not spend extra consumption units in sharpening their knowledge of wealth. This is due to the fact that becausetheconsumerisriskneutral and, atthemargin, costsandbene(cid:133)tsof information (cid:135)ows are equalized amongst periods, there is no necessity to gather more information than the boundaries of current consumption possibilities. In each period, the presence of information processing constraint forces the consumer to allocate some utils to learn just enough to prevent violating the non-borrowing constraint. Once those limits are (cid:133)gured out, the consumption pro(cid:133)les in the region c < w are independent on the value of wealth. Derivative with Respect to Controls In the main text, I state that the optimal control amounts to : @p (c ;w ) : (cid:3) k1 k2 (cid:1) u(c((cid:20)))+(cid:12)(cid:1) V (g (:)) = p (c ;w )( (cid:18)(cid:1) u (c((cid:20)))+(cid:12)(cid:1) V (g (:))) (39) k k c0 (cid:3) k1 k2 (cid:0) k 0 k p (cid:3) c0 which can be rewritten, opening up the operator (cid:1) as: k @V g ( ) @V g ( ) ’(cid:20) = Pr(c ;w ) (cid:20) ln Pr(c k1 ;w k2 ) +(cid:12) 0 c0 k1 (cid:1) 0 c0 k2 (cid:1) (ck1 ;ck2 ) k1 k2 0 (ck1 ;ck2 ;(cid:18)) Pr(c k1 ) 2@Pr(cid:16)(c k1 ;w k2 (cid:17)) (cid:0) @Pr(cid:16)(c k1 ;w k2 (cid:17))31 @ 4 5A where 49

’(cid:20) u(c ((cid:20))) u(c ((cid:20)))+(cid:12) V g ( ) V g ( ) , and (cid:15) (ck1 ;ck2 ) (cid:17) (cid:0) k1 (cid:0) k2 c0 k1 (cid:1) (cid:0) c0 k2 (cid:1) h (cid:16) (cid:16) (cid:17) (cid:16) (cid:17)(cid:17)i (cid:20) (cid:18)[u (c ((cid:20))) u (c ((cid:20)))]: (cid:15) (ck1 ;ck2 ;(cid:18)) (cid:17) (cid:0) 0 k1 (cid:0) 0 k2 @V g () @V g () @ g () Note that by Chain rule 0 (cid:18) c0kj (cid:1) (cid:19) = 0 (cid:18) c0kj (cid:1) (cid:19) (cid:18) c0kj (cid:1) (cid:19) , for j = 1;2: Plug (36) @Pr(ckj ;wk2 ) @ g c0kj ( (cid:1) ) @Pr(ckj ;wk2 ) in the second term of the above expression(cid:18)and e(cid:19)valuating point-wise the derivatives delivers In c = c ; j k1 = @g (cid:18) (cid:1)jck1(cid:19) = @ "p ( c 1 k1 ) i T( (cid:1) ;wi;ck1 )Pr(wi;ck1 ) !# = ) @Pr(ck1 ;wk2 ) P@Pr(ck1 ;wk2 ) T ( ;w ;c )Pr(w ;c ) 1 (cid:1) i k1 i k1 0T ( ;w ;c ) (cid:18) i (cid:19)1 p(c ) (cid:1) k2 k1 (cid:0) P p(c ) k1 k1 B C B C @ A ’(cid:20) ( ) De(cid:133)ne (cid:9)(cid:20) ck1 ;ck2 and (cid:8)(cid:20) (cid:12) ; to get rid of cumbersome notation, let (cid:17) (cid:20) ( ) (cid:17) (cid:20) ( ) ck1 ;ck2 ck1 ;ck2 ;(cid:18) (k ;k ) (cid:9)(cid:20);(cid:8)(cid:20);g ( );g ( );Pr(c ;w ); : Then the (cid:133)rst order conditions 1 2 (cid:17) c0 k1 (cid:1) c0 k2 (cid:1) k1 k2 result into(cid:16) (cid:17) Pr(c ;w ) = (cid:3)(k ;k )Pr(c ) (40) k1 k2 1 2 k1 where (cid:3)(k ;k ) (cid:3) ((cid:9)(cid:20);Pr(c ;w )) (cid:3) (cid:8)(cid:20);g ( );Pr(c ;w ) (cid:3) (cid:8)(cid:20);g ( );Pr(c ;w ) 1 2 (cid:17) 1 k1 k2 (cid:1) 2 c0 k1 (cid:1) k1 k2 (cid:1) 3 c0 k2 (cid:1) k1 k2 (cid:16) (cid:17) (cid:16) (cid:17) while 0(cid:9)(cid:20) ( 1 )1 (cid:3) ((cid:9)(cid:20);Pr(c ;w )) eB Pr ck1 ;wk2 C; (cid:15) 1 k1 k2 (cid:17) @ A ( ) ( ) (cid:3) (cid:8)(cid:20);g ( );Pr(c ;w ) e 0 B B (cid:0) (cid:8)(cid:20) @ @ V (cid:18) (cid:18) g g c0 c0 k k 1 1 ( (cid:1) ( ) (cid:1) (cid:19) ) (cid:19) p ( c 1 k1 ) 0 B B T( (cid:1) ;wk2 ;ck1 ) (cid:0) 0 @ P i T (cid:1) ;wi p ;c ( k c 1 k1 ) Pr wi;ck1 1 A 1 C C 1 C C; (cid:15) 2 c0 k1 (cid:1) k1 k2 (cid:17) B @ B @ C A C A (cid:16) (cid:17) ( ) ( ) (cid:3) (k ;k ) e 0 B B (cid:8)(cid:20) @ @ V (cid:18) (cid:18) g g c0 c0 k k 2 2 ( (cid:1) ( ) (cid:1) (cid:19) ) (cid:19) p ( c 1 k2 ) 0 B B T( (cid:1) ;wk2 ;ck2 ) (cid:0) 0 @ P i T (cid:1) ;wi p ;c ( k c 1 k2 ) Pr wi;ck2 1 A 1 C C 1 C C: 3 1 2 B B CC (cid:15) (cid:17) @ @ AA 50

Derivative with Respect to States To derive the envelope condition with respect to a generic state g(w ) for k = 1;2;3, let me start by placing the restrictions on the k marginal distribution of wealth in the main diagonal of the joint distribution Pr(c;w). The derivative then amounts to: 1 (j = k) (j = maxl (cid:10) ) c @Pr t(cj;wk) = @Pr t(cj) = 1 f j = \ ma 6 xl (cid:10) 2 g : @g(wk) @g(wk) 8 (cid:0) f 2 c g 0 o/whise < Let l denote the maximum indicator l belonging to (cid:10) : Then the derivative of the max : c state g(w ) displays: k @V(g(wk)) a = :s @g(wk) u(c ((cid:20)))+(cid:12) V g ( ) u c ((cid:20)) +(cid:12)V g ( ) + k c0 k (cid:1) (cid:0) lmax c0 lmax (cid:1) (cid:16) (cid:16) (cid:16) (cid:17) (cid:16) (cid:17)(cid:17)(cid:17) (cid:0) (cid:0) (cid:1)(cid:1) (cid:18) log Pr(ck;wk) u (c ((cid:20)))Pr(c ;w ) log Pr(clmax ;wk ) u c ((cid:20)) Pr c ;w + (cid:0) (cid:18) p(ck)g(wk) 0 k k k (cid:0) (cid:18) p(clmax )g(wk) (cid:19) 0 l max l max k (cid:19) (cid:16) (cid:17) (cid:0) (cid:1) (cid:0) (cid:1) @V g () @ g () +(cid:12) j " @ (cid:18) g c0kj () (cid:1) (cid:19) (cid:18) @g c0 ( k w j k (cid:1) ) (cid:19) ! Pr(c j ;w k ) # : c0kj (cid:1) P (cid:18) (cid:19) Combining (cid:133)rst order conditions and the envelope condition after some algebra leads to the result in (40). 12 Appendix E. 12.1 A simple example To illustrate how a consumer with information constraints di⁄ers from a consumer with full information and a consumer with no information, consider the following model of consumer(cid:146)s choice. Suppose the household has three wealth possibilities, w W 2;4;6 , and three 2 (cid:17) f g consumption possibilities c C 2;4;6 . Before any observation is made, the con- 2 (cid:17) f g sumer has the following prior on wealth, Pr(w = 2) = :5, Pr(w = 4) = :25, Pr(w = 6) = :25. Moreover the consumer cannot borrow, c w and, if his check bounces he su⁄ers (cid:20) c = 0. He derives utility from consumption de(cid:133)ned as u(c) log(c). His payo⁄matrix (cid:17) is summarized in Figure a. c w 2 4 6 n 2 0:7 0:7 0:7 4 1:38 1:38 (cid:0)1 6 1:8 (cid:0)1 (cid:0)1 Figure a: Payo⁄Matrix with u(c) log(c) (cid:17) 51

If uncertainty in the payo⁄can be reduced at no cost, the consumer would set c = w; c C; w W. 8 2 8 2 In contrast, if he cannot gather any information about wealth besides that provided by the prior, the consumer will avoid unpleasant surprises by setting c = 2 whatever the wealth. The di⁄erence in bits in the two policies is measured by the mutual information between C and W. The ex-ante uncertainty embedded in the prior for w is calculated by evaluating its entropy in bits, i.e., 1 1 1 (W) p(w) log (p(w)) = 0:5 log +0:25 log +0:25 log = 1:5 H (cid:17) (cid:0) (cid:1) 2 (cid:1) 2 0:5 (cid:1) 2 0:25 (cid:1) 2 0:25 w W (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) X2 bits. Since observation of c provides information on wealth, conditional on the knowledge ofconsumptionuncertaintyaboutwisreducedbytheamount (W C) p(c;w)log (p(w c)). 2 H j (cid:17) j w W c C The mutual information between C and W, i.e., the remaining uncertai X n2ty X a2bout the wealth after observing consumption, is the di⁄erence between ex-ante uncertainty of W ( (W)) and the knowledge of W given by C ( (W C)). In formulae, the mutual infor- H H j mation or capacity of the channel amounts to: p(c;w) I(C;W) = p(c;w)log p(c)p(w) w W c C (cid:18) (cid:19) X2 X2 To see what this formula implies, consider (cid:133)rst the situation in which information can (cid:135)ow at in(cid:133)nite rate. In this case ex-post uncertainty is fully resolved. Moreover, note that (p(w c)) = 1; c C; w W since the consumer is setting positive probability j 8 2 8 2 on one and only one value of consumption per value of wealth. This in turns implies (W C) = 0,sothemutualinformationinthiscasewillbeI(C;W) = (W) = 1:5:bits. H j H Instead, if the consumer has zero information (cid:135)ow or, equivalently, if processing information is prohibitively hard for him, his optimal policy of setting c = 2 at all times makes consumption and wealth independent of each other. This implies that (W C) = H j p(c)p(w)log p(c)p(w) = (W). Hence, in this case I(C;W) = 0 and no 2 p(c) H ! w r X e2d W ucti X c o2n C in the uncertai (cid:16) nty abo (cid:17) ut wealth occurs upon observing consumption. The intuitionis that if aconsumerdecides tospendthesameamount inconsumptionregardless of his wealthlevel, his purchase will tell himnothingabout his (cid:133)nancial possibilities. The expected utility in the (cid:133)rst case is EFullInfo(u(c)) = (log(2)) (:5)+(log(4)+log(6)) (cid:1) (cid:1) (:25) = 1:14 while in the second case ENoInfo(u(c)) = 0:7. Now, assume that the consumer can allocate some e⁄ort in choosing size and scope of information about his wealth he wants to process, under the limits imposed by his processing capacity. Let (cid:20)(cid:22) = 0:3 be the maximum amount of information (cid:135)ow that the 52

consumer can process. Let the probability matrix of the consumer be: c w P (w = 2) P (w = 5) P (w = 8) n P (c = 2) 0:5 p p 1 2 P (c = 4) 0 :25 p p 1 3 (cid:0) P (c = 6) 0 0 :25 p p 2 3 (cid:0) (cid:0) Figure b: Probability Matrix where the zeroes onthe lowerleft cornerof the matrixencode anon-borrowingconstraint c w.29 The program of the consumer is:30 (cid:20) max E(cid:20)(u(c)) p1;p2;p3 f g s.t. (cid:20)(cid:22) I(C;W): (cid:21) Given (cid:20)(cid:22) = 0:3,31 the optimal policy sets p = 0:125; p = 0:125, p = 0:125, which (cid:3)1 (cid:3)2 3 corresponds to Pr(C = 2) = 0:75; Pr(C = 4) = 0:25, Pr(C = 6) = 0. This leads to an expectedutilityof E(cid:20)(u(c)) = 0:87. Hence, consumers whoinvest e⁄ort intrackingtheir wealth using the channel are better o⁄than in the no information case (higher expected utility) even though they cannot do as well as in the constrained case. Notethattheresultoftradinginformationforthehighestvaluetogainamoreprecise knowledge of the lower value of wealth is driven by the functional form of utility. For 29Iappendanon-borrowingconstraintc wtoreducethenumberoftheprobabilitiestobecalculated (cid:20) in Figure (b), thereby keeping the example easy. Figure (b) can be rationalized by assuming that the consumer acquires a signal on wealth ws =w+" and chooses the distribution of " always such that the support of " is in (0; ]. One can think that (cid:20)(cid:22) is net of the bits used to set the desired support of ". (cid:0)1 30In details: max E(cid:20)(u(c)) = (log(2)) (:5+p +p )+ 1 2 f p1;p2;p3g (cid:1) +(log(4)) (:25 p +p )+ 1 3 (cid:1) (cid:0) +(log(6)) (:25 p p ) 2 3 (cid:1) (cid:0) (cid:0) and (cid:20)(cid:22) I(C;W)= (cid:21) :5 p = :5log +:p log 1 + 2 :5(:5+p +p ) 1 2 :25(:5+p +p ) (cid:18) 1 2 (cid:19) (cid:18) 1 2 (cid:19) p (:25 p ) +p log 2 +(:25 p )log (cid:0) 1 + 2 2 :25(:5+p +p ) (cid:0) 1 2 :25(:25 p +p ) (cid:18) 1 2 (cid:19) (cid:18) (cid:0) 1 3 (cid:19) (:25 p p ) +(:25 p p )log (cid:0) 2 (cid:0) 3 : (cid:0) 2 (cid:0) 3 2 :25(:25 p p ) (cid:18) (cid:0) 2 (cid:0) 3 (cid:19) 31Note that such a bound of information (cid:135)ow is unrealistically low. However I decided to trade o⁄ realism for simplicity in this example. 53

instance, a consumer with the same bound on processing capacity but CRRAutility with coe¢ cient of risk aversion, say, (cid:13) = 5, would have chosen a probability Pr(C = 2) lower thanhislog-utilitycounterpart. Thisisbecausehigherdegreesofriskaversioninducethe consumers to be better informed about low values of wealth to avoid such occurrences. The intuition is that because the attention of the consumer within the limits of the Shannon capacity is allocated according to his utility, the degree of risk aversion plays an important role in determining what events receive the consumer(cid:146)s attention. Alog-utility consumer wants to be well informed about the middle values of his wealth, while a high risk averse consumer selects a signal which provides sharper information about the lower values of wealth, so that he can avoid high disutility. The opposite direction is taken by the less risk-averse agent. 12.2 Analytical Results for a three-point distribution In this section I will focus on the optimality conditions derived above for a three point distribution. The goal is to fully characterize the solution for this particular case and explore its insights.32 Let me assume the wealth to be a random variable that takes up values in w 2 (cid:10) w ;w ;w with distribution g(w ) = Pr(w = w ) described by: w 1 2 3 i i (cid:17) f g W w w w l m h g(w ) g g 1 g g i 1 2 1 2 (cid:0) (cid:0) The equation describing the evolution of the wealth is displayed by the budget constraint w = R(w c )+Y t+1 t t t (cid:0) where I denote by Y the exogenous stochastic income process earned by the household t and by R > 0 the (constant) interest rate on savings, (w c ). Like wealth, before t t (cid:0) processing information consumption, c ; is a random variable. It takes up a discrete t number of values in the event space (cid:10) c ;c ;c . The joint distribution of wealth c 1 2 3 (cid:17) f g and consumption, Pr (c ;w ), amounts to: t j i C W w w w 1 2 3 n c x x x Pr (c ;w ) 1 1 2 3 t j i c 0 x x 2 4 5 c 0 0 x 3 6 32Athree-pointdistributionisindeedaspecialcaseofthemoregeneralN pointsdistributionsincetwo ofthestatesintheeventspace(cid:10) areabsorbing states. This, inturn, setstozeroseveraldimensionsof w the problem and allows for a close form solution of the optimal policies. Although the solution for this particular case does not have a straightforward generalization, it provides useful insights on the optimal choice for the joint probability distribution of wealth and consumption and its relation with the prior distribution of wealth (g(w)) and the utility of the consumer. 54

where the zeros in the SW end of the matrix encodes the feasibility constraint w (t) i (cid:21) c (t) i (cid:10) ; j (cid:10) and t 0. The additional restrictions to the above matrix are j w c 8 2 2 8 (cid:21) the ones commanded by the marginal on wealth. That is: x = g 1 1 x +x = g 2 4 2 x +x +x = 1 g g 3 5 6 1 2 (cid:0) (cid:0) Withoutlossofgenerality,Iplacethemarginaldistributionofwealthinthemaindiagonal of Pr (c ;w ) and I impose the restrictions above together with the condition that the t j i resulting matrix describes a proper distribution. The joint distribution of wealth and consumption amounts to: Pr(c ;w ) : j i C W w w w 1 2 3 n c g p p 1 1 1 2 (41) c 0 g p p 2 2 1 3 (cid:0) c 0 0 1 (g +g ) (p +p ) 3 1 2 2 3 (cid:0) (cid:0) The resulting marginal distribution of consumption that endogenously depends on the choices of p (cid:146)s, i = 1;2;3; displays: i c w.p g +p +p 1 1 1 2 Pr(C = c ) = c w.p g p +p : j 2 2 1 3 8 (cid:0) c w.p 1 (g +g ) (p +p ) < 3 1 2 2 3 (cid:0) (cid:0) Once the consumer chooses p i (cid:146)s:and observes the realized consumption c t , he updates the marginal distribution of wealth. The latter, g , is obtained combining the 0 (cid:1)jcj joint distribution of wealth and consumption and the(cid:16)tran(cid:17)sition probability function. In formulae, the updated marginal on wealth amounts to: g = T ( ;w ;c )Pr(w c ): (42) 0 (cid:1)jcj (cid:1) i j i j j i (cid:16) (cid:17) P The speci(cid:133)cation of T ( ;w ;c ) adopted in the analytical derivation of the discrete probi j (cid:1) ability distribution as well as in the numerical simulation can be explained as follows. The transition probability function is meant to approximate the expected value of next period wealth: (cid:22) EW = R(w c )+Y: (43) 0 t t (cid:0) The approximation is necessary since (43) cannot hold exactly at the boundaries of the support of the wealth, (cid:10) . In the above equation, R is the interest rates assumed to be a w (cid:22) givennumberwhileY isthemeanofthestochasticincomeprocess,Y . Supposewehavea t threepointdistribution. AssumeWLOG thatthevaluesw (cid:10) areequallyspaced. For i w 2 agiven(w ;c )pair,thedistributionofnextperiodwealthisconcentratedonthreew vali j i0 (cid:22) ues closest to R(w c )+Y, which will be denoted by ! ;! ;! with respective probai j 1 2 3 (cid:0) bilities(cid:25) ;(cid:25) ;(cid:25) . Themeanofthedistributionis (cid:25) (! ! )+(cid:25) (! ! )+! . Let(cid:14) 1 2 3 1 2 1 3 3 2 2 (cid:0) (cid:0) (cid:0) 55

bethedistancebetweenthevaluesofw :Thenthemeanbecomes(cid:22) (cid:14)((cid:25) (cid:25) )+! . i ! 3 1 2 The variance of the distribution is then (cid:27)2 (cid:14)2((cid:25) (cid:25) ) ((cid:22) (cid:17) ! (cid:0) )2. Sinc (cid:0) e (cid:25) is an ! (cid:17) 3 (cid:0) 1 (cid:0) ! (cid:0) 2 2 exact function of (cid:25) and (cid:25) , the equations for mean and variance of the process consti- 1 3 tutes two equations in two unknowns. With the additional restriction that all the (cid:25) (cid:146)s i are positive and sum to one, it is not possible to guarantee the existence of a solution for (cid:22) R(w c )+Y close to the boundaries of the support of the distribution of wealth. To i j (cid:0) make sure that there is always a solution for (cid:22) (min(w)+:5(cid:14);max(w) :5(cid:14)), and ! 2 (cid:0) the solution is continuous at points where (cid:22) = (wi+wi+1), one has to choose (cid:27)2 = :25(cid:14)2. ! 2 ! Euler Equations. Making use of the marginal distribution of wealth described above and making use of (42) together with the speci(cid:133)cations of T ( ;w ;c ) and Pr(w ;c ), I i j i j (cid:1) can explicitly evaluate g point-wise. To illustrate this point, using the numerical 0 (cid:1)jcj values of T ( ;w i ;c j ) abov(cid:16)e, th(cid:17)e derivatives point-wise are as follows. (cid:1) In c = c ; j 1 1 g = (T ( ;w ;c )g +T ( ;w ;c )p +T ( ;w ;c )p ) 0 (cid:1)jc1 (g +p +p ) (cid:1) 1 1 1 (cid:1) 2 1 1 (cid:1) 3 1 2 1 1 2 (cid:0) (cid:1) In c = c j 2 1 g = (T ( ;w ;c )(g p )+T ( ;w ;c )p ) 0 (cid:1)jc2 (g p +p ) (cid:1) 2 2 2 (cid:0) 1 (cid:1) 3 2 3 2 1 3 (cid:0) (cid:0) (cid:1) In c = c j 3 g = T ( ;w ;c ) 0 (cid:1)jc3 (cid:1) 3 3 Then, the (cid:133)rst order conditions and envelope conditions amount to (cid:0) (cid:1) @p : 1 u(c ) u(c )+(cid:12) V g ( ) V g ( ) 1 (cid:0) 2 0 c0 2 (cid:1) (cid:0) 0 c0 2 (cid:1) (cid:2) (cid:18)([u (c (cid:18)(cid:0)(cid:20)) (cid:0) u (c(cid:1) (cid:18)(cid:20))(cid:0)])ln (cid:1)(cid:1)(cid:3) p1 + = p 0 1 (cid:0) (cid:0) 0 2 (cid:0) (g1+p1+p2) 1 0 + @V 0 (g c01 ( (cid:1) ))@g c01 ( (cid:1) ) @V 0 (g c02 ((cid:16) (cid:1) ))@g c02 ( (cid:1) ) (cid:17) 1 @g c01 ( (cid:1) ) @p1 (cid:0) @g c02 ( (cid:1) ) @p1 @ A @g () Note that c0j (cid:1) = 0 for j 1;2;3 .33 This result is not driven by the speci(cid:133)ca- @pj 2 f g tion chosen for the transition function T ( ;w ;c ), but it is a feature of the three point i j (cid:1) 33To see this, plug (42) in @g c0j ( (cid:1) ) for j 1;2 and evaluating pointwise the derivatives delivers @pj 2f g @g : 0 (cid:1)jc1 0:81p 0:15g (cid:0) (cid:1) 1 2 (cid:0) 1 (0:56p 0:15g ) =0 (g +p +p )2 2 (cid:0) 2 (cid:0) 1 3 1 1 2 0:25p 2 (cid:0) 4 5 @g : 0 (cid:1)jc2 0:15 (cid:0) (cid:1) p 3 (cid:0) 0:15 =0 (g p +p )2 2 3 2 1 3 0 (cid:0) 4 5 56

distribution. Indeed, because two of the three values of wealth are at the boundaries of (cid:10) , the absorbing states w and w place tight restrictions on the continuation value w 1 3 V g ( ) through the transition function and, as a result, the update for the marginal 0 c0 j (cid:1) g c0 j (cid:16)( (cid:1) ) acco(cid:17)rding to (42). That is, the marginal probability on wealth g c0 j ( (cid:1) ) in this case a:s: (cid:22) tends to its ergodic value g(cid:22) ( ). It follows that V g(cid:22) ( ) V g(cid:22) ( ) which is a cj (cid:1) 0 cj (cid:1) (cid:0)! (cid:3) cj (cid:1) constant since the functional argument is. This is what makes the 3-point distribution (cid:0) (cid:1) (cid:0) (cid:1) tractable. Forthegeneral case, the(cid:133)rst orderconditionwithrespecttothe(cid:133)rst control amounts to: @p : 1 (cid:22) (cid:22) u(c ((cid:20))) u(c ((cid:20)))+(cid:12) V (g(cid:22) ( )) V (g(cid:22) ( )) 1 (cid:0) 2 c1 (cid:1) (cid:0) c2 (cid:1) p (cid:2)= p (cid:18)([u (c ((cid:20))) u (c(cid:0)((cid:20)))])ln 1 (cid:1)(cid:3) (44) 1 0 1 0 2 (cid:0) (g +p +p ) (cid:18) (cid:18) 1 1 2 (cid:19)(cid:19) Similarly, for the second control @p : 2 (cid:22) (cid:22) u(c ((cid:20))) u(c ((cid:20)))+(cid:12) V (g(cid:22) ( )) V (g(cid:22) ( )) 1 (cid:0) 3 c1 (cid:1) (cid:0) c3 (cid:1) p (cid:2)= p (cid:18)([u (c ((cid:20))) u (c(cid:0)((cid:20)))])ln 2 (cid:1)(cid:3) (45) 2 0 1 0 3 (cid:0) (g +p +p ) (cid:18) (cid:18) 1 1 2 (cid:19)(cid:19) And (cid:133)nally: @p : 3 (cid:22) (cid:22) u(c ((cid:20))) u(c ((cid:20)))+(cid:12) V (g(cid:22) ( )) V (g(cid:22) ( )) 2 (cid:0) 3 c2 (cid:1) (cid:0) c3 (cid:1) p (cid:2)= p (cid:18)(u (c ((cid:20))) u (c(cid:0)((cid:20))))ln 3 (cid:1)(cid:3) (46) 3 0 2 0 3 (cid:0) (g p +p ) (cid:18) (cid:18) 2 (cid:0) 1 3 (cid:19)(cid:19) UsingtheresultthatthevaluefunctionconvergestoV whentheutilityfunctionbelongs (cid:3) to the family of constant absolute risk aversion (CARA), I assume the utility takes up the speci(cid:133)cation: ( ) e(cid:0) (cid:13) cj((cid:20)) for (cid:13) > 0 u(c ((cid:20))) = (cid:0) (cid:13) j 8 log(c ((cid:20))) for lim e(cid:0) (cid:13) ( cj((cid:20)) ) > < j (cid:13) ! 0 (cid:0) (cid:13) (cid:18) (cid:19) > where (cid:13) is the coe¢ cient of a:bsolute risk aversion and j (cid:10) c ;c ;c . Moreover, c 1 2 3 2 (cid:17) f g by proposition 1, the value function is PCWL, that is: (cid:22) V g(cid:22) ( ) = arg max (cid:11) ;g(cid:22) ( ) cj (cid:1) 0j c0 j (cid:1) (cid:11) f 0jgjD E (cid:0) (cid:1) 57

where (cid:11) are a set of vectors each of them generated for a particular observation of 0j j previou(cid:8)s v(cid:9)alues of consumption c j and h :;: i denotes the inner product (cid:11) 0j ;g(cid:22) c0 j ( (cid:1) ) (cid:17) (cid:11) (w )T ( : w;c ) p(c w). To get a close form solution, I need toDrepresentEthe 0j 0 (cid:1) j (cid:1) j j p w X0 r2o (cid:10) b w ability distribution of the prior. One of the possibilities is to use a particle based representation. The latter is performed by using N random samples, or particles, at points w and with weights $ . The prior is then i i N ~ g (w) = $ (cid:14)(w w ) t i i (cid:0) i=1 P ~ where (cid:14)(w w ) = Dirac(w w ) is the Dirac delta function with the center in zero. A i i (cid:0) (cid:0) particle-based representation can approximate arbitrary probability distributions (with an in(cid:133)nite number of particles in the extreme case), it can accommodate nonlinear transitionmodelswithouttheneedoflinearizingthemodel, anditallowsseveralquantitiesof interest to be computed e¢ ciently. In particular, the expected value in the belief update equation becomes: N g(cid:22) = Pr(c ) $ T ( ;w ;c ) 0 (cid:1)jcj j j(cid:1) i (cid:1) i j i=1 (cid:16) (cid:17) P The central issue in the particle (cid:133)lter approach is how to obtain a set of particles to approximate g(cid:22) from the set of particles approximating g(w). The usual Sampling 0 (cid:1)jcj Importance Re-(cid:16)samp(cid:17)ling (SIR) approach (Dellaert et al., 1999; Isard and Blake, 1998) samples particles using the motion model T ( ;w ;c ), then it assigns a new weights in i j (cid:1) order to make all particles weights equal. The trouble with the SIR approach is that it requires many particles to converge when the likelihood Pr(c ) is too peaked or when j j(cid:1) there is a small overlap between prior and posterior likelihood. The main problem with SIR is that it requires many particles to converge when the likelihood is too peaked or when there is only a small overlap between the prior and the likelihood. In the auxiliary particle (cid:133)lter, the sampling problem is addressed by inserting the likelihood inside the mixture N g(cid:22) $ Pr(c )T ( ;w ;c ): 0 (cid:1)jcj / i j j(cid:1) (cid:1) i j (cid:16) (cid:17) X i=1 The state ( ) used to de(cid:133)ne the likelihood Pr(c ) is not observed when the particles are j (cid:1) j(cid:1) resampled and this calls for the following approximation N g(cid:22) $ Pr c (cid:22)i T ( ;w ;c ) 0 (cid:1)jcj / i j j ! (cid:1) i j (cid:16) (cid:17) X i=1 (cid:0) (cid:1) with(cid:22)i anylikelyvalueassociatedwiththeith componentofthetransitiondensityT ( ;w ;c ), ! (cid:1) i j e.g., its mean. In this case, we have that (cid:22)i = w + (cid:1)(c ): Then, g(cid:22) can be re- ! i j 0 (cid:1)jcj garded as a mixture of N transition components T ( ;w ;c ) with weigh(cid:16)ts $(cid:17)Pr(c (cid:22)i ): (cid:1) i j i j j ! Therefore, sampling a new particle w to approximate g(cid:22) can be carried out by j0 0 (cid:1)jcj (cid:16) (cid:17) 58

selecting one of the N components, say i , with probability $ Pr(c (cid:22)i ) and then m i (cid:1) j j ! sampling w from the corresponding component T ( ;w ;c ): Sampling is performed in i0 (cid:1) im j the intersection of the prior and the likelihood and, consequently, particles with larger prior and larger likelihood (even if this likelihood is small in absolute value) are more likely to be used. After the set of states for the new particles is obtained using the above procedure, it is necessary to de(cid:133)ne the weights. This is done using Pr(c w ) $ j j m0 : 0m / Pr(c j j (cid:22)i ! m) Using the sample-based belief representation the averaging operator :;: can be comh i puted in close form as: ~ (cid:11);g(cid:22) = $ (cid:28) (w w ;(cid:6) ) $ (cid:14)(w w ) h 0 i k j k k 0l (cid:0) l " #" # w X2 (cid:10)w X k X l ~ = $ (cid:28) (w w ;(cid:6) ) $ (cid:14)(w w ) k k k l l j (cid:0) " #! X k w X2 (cid:10)w X l = $ $ (cid:28) (w w ;(cid:6) ) k l l k k j k l X X = $ $ (cid:28) (w w ;(cid:6) ): k l l k k j k;l X where (cid:28) (:) is the distribution of the r.v. W that use the speci(cid:133)cation of the transition 0 function above, i.e., mean (cid:22) (cid:14)((cid:25) (cid:25) ) + ! and variance (cid:27)2 (cid:14)2((cid:25) (cid:25) ) ((cid:22) ! )2 with (cid:14) the (consta ! nt (cid:17) ) d (cid:0) istanc 3 e (cid:0) bet 1 ween t 2 he values of w . ! (cid:17) 3 (cid:0) 1 (cid:0) ! 2 i (cid:0) Representingpriorsinthisfashionallowsanexplicitevaluationofthedi⁄erencesinthe valuefunctionsinthe(cid:133)rstorderconditions,sinceV g(cid:22) ( ) = argmax (cid:11) ;g(cid:22) ( ) = 0 c0 j (cid:1) f (cid:11) 0jgj 0j c0 j (cid:1) k;l $~ 0k $~ 0l (cid:28) (w l j w k ;(cid:6) k ), where $~ 0k (cid:17) (cid:18) P P r r ( ( c c j j j j w (cid:22)k ! k0 ) ) (cid:19) ; $~ (cid:16) 0l (cid:17) (cid:18) (cid:17) P P r r ( ( c c j j j j (cid:22) w l ! l0 ) ) (cid:19) : Since t D he result E of X the argmax is just one of the member of the set (cid:11) and all the elements involved in 0j j the de(cid:133)nition of (cid:11) function in (cid:0) are a (cid:133)nite set of linear function parametrized in the 0j (p) (cid:8) (cid:9) action set, so is the (cid:133)nal result. Let a prime (" ") denote the variables led one period ahead, algebraic manipulation 0 delivers the following optimal control functions: g ( (cid:18)(cid:12)(cid:23) ) p (~g;(cid:18)) = 1 1 (cid:0) 1 ; (47) (cid:3)1 (cid:18)g (LambertW((cid:31) )x LambertW((cid:31) )x )+2g ( (cid:18)(cid:12)v ) 1 1 12 11 11 1 1 1 (cid:0) (cid:0) g ( (cid:18)(cid:12)(cid:23) ) p (~g;(cid:18)) = 1 2 (cid:0) 2 ; (48) (cid:3)2 (cid:18)g (LambertW((cid:31) )x LambertW((cid:31) )x )+2 g ( (cid:18)(cid:12)(cid:23) ) 1 2 21 2 22 2 1 2 3 (cid:0) (cid:0) 59

(cid:18)(cid:12)v p (~g) = 3 (cid:0) 3 (49) (cid:3)3 (cid:18)x LambertW((cid:31) ) 3 3 where e (cid:0) (cid:13)(c2(cid:0) (cid:18)(cid:20))e (cid:0) (cid:13)(c1(cid:0) (cid:18)(cid:20)) ; e (cid:0) (cid:13)(c3(cid:0) (cid:18)(cid:20))e (cid:0) (cid:13)(c1(cid:0) (cid:18)(cid:20)) ; e (cid:0) (cid:13)(c3(cid:0) (cid:18)(cid:20))e (cid:0) (cid:13)(c2(cid:0) (cid:18)(cid:20)) ; (cid:15) 1 (cid:17) (cid:13) 2 (cid:17) (cid:13) 3 (cid:17) (cid:13) (cid:16) (cid:17) (cid:16) (cid:17) (cid:16) (cid:17) (cid:23) g ( )+(g g )( ); (cid:15) 1 (cid:17) 2 03 2 (cid:0) 1 02 (cid:0) 03 (cid:23) g ( )+(g g )( ); (cid:15) 2 (cid:17) 2 01 2 (cid:0) 1 01 (cid:0) 02 (cid:23) (1 g g )( )+(g g )( ) (cid:15) 3 (cid:17) (cid:0) 2 (cid:0) 1 02 2 (cid:0) 1 03 (cid:0) 01 (cid:15) (cid:31) 1 (cid:17) (cid:18) ( g 1 1 ( (cid:0) e(cid:13) (cid:18) ( (cid:12) c2 v (cid:0) 1) c 1) 1 ) ;; x 11 (cid:17) e (cid:0) (cid:13)(c1 (cid:0) (cid:18)(cid:20)); x 12 (cid:17) e (cid:0) (cid:13)(c2 (cid:0) (cid:18)(cid:20)); (cid:15) (cid:31) 21 (cid:17) (cid:18) ( g 1 2 ( (cid:0) e(cid:13) (cid:18) ( (cid:12) c3 v (cid:0) 2) c 1) 2 ) ; x 21 (cid:17) e (cid:0) (cid:13)(c1 (cid:0) (cid:18)(cid:20)); x 22 (cid:17) e (cid:0) (cid:13)(c3 (cid:0) (cid:18)(cid:20)) and (cid:15) (cid:31) 3 (cid:17) (cid:18)g2 (e 3 (cid:13) (cid:0) ( (cid:18) c3 (cid:12) (cid:0) v c 3 2)) ; x 3 (cid:17) e(cid:13)(c3 (cid:0) c2): andLambertW(:)istheLambertWfunctionthatsatis(cid:133)esLambertW(x)eLambertW(x) = x34. The argument of the LambertW is always positive for the (cid:133)rst order conditions derived, implying that for each of the optimal policies the function returns a real solution amongst other complex roots, which is unique and positive. Since @LambertW(x) = @x LambertW(x) it is possible to calculate the derivatives of the above expression with x(1+LambertW(x)) respect to (cid:18);g ;g . However, the sign of the derivatives with respect to those variables 1 2 f g is indeterminate. The rationale behind this result is quite simple. Consider the joint probability distribution Pr(c ;w ) . The overall e⁄ect of an increase in this probability i j results from the interplay of several factors. In general, if (cid:18) is low (or, equivalently, the capacity of the channel, (cid:20)(cid:22), in (18) is high), a risk averse consumer will try to reduce the o⁄ diagonal term of the joint as much as possible. That is, he would set 34Formally, the LambertW function is the inverse of the function f : C C given by f(x) xex: ! (cid:17) Hence LambertW(x) is the complex function that satis(cid:133)es LambertW(x)eLambertW(x) =x for all x C:. In practice the de(cid:133)nition of LambertW requires a branch cut, which is usually taken 2 along the negative real axis. LambertW(x) function is sometimes also called product log function. This function allows to solve the functional equation g(x)g(x) =x given that g(x)=eLambertW(ln(x)): See Corless, Gonnet, Hare, Je rey and Knuth (1996). 60

p = Pr(c ;w ); p = Pr(c ;w ) and p = Pr(c ;w ) as low as its capacity allows him 1 1 2 2 1 3 3 3 2 to sharpen his knowledge of the state. On the opposite extreme, for very high value of the cost associated to information processing, (cid:18), p and p will be higher, the higher the 1 2 prior g = g(w ) with respect to g = g(w ) and g = g(w ). This is due to the fact 1 1 2 2 3 3 that when the capacity of the channel is low -or, equivalently, the e⁄ort of processing information is high-, the (cid:133)rst order conditions indicate that it is optimal for the consumer to shift probabilities towards the higher belief state. The intuition is that when it is costly to process information, the household cannot reduce the uncertainty about his wealth. If the individual is risk adverse as implied by the CRRA utility function, in each period, he would rather specialize in the consumption associated to the higher prior than attempt to consume a di⁄erent quantity and running out of wealth in the following periods. This intuition leads to an optimal policy of the consumer that commands high probability to one particular consumption pro(cid:133)le and set the remaining probabilities as low as possible. To illustrate this, consider a consumer who has a high value of (cid:18) and a prior on w higher than the other priors. If he cannot sharpen his knowledge of the 1 wealth due to prohibitively information processing e⁄ort, he will optimize its dynamic problem by placing very high probability on Pr(c ) = g +p +p , i.e., increase p and 1 1 1 2 1 p and decrease p . Likewise, if g is higher than the other priors and (cid:18) is high -(cid:20) is low-, 2 3 2 optimality commands to decrease both p and p and increase p . 1 2 3 13 Appendix F: 13.1 The Mathematics of Rational Inattention This part addresses the mathematical foundations of rational inattention. The main reference is the seminal work of Shannon (1948). Drawing from the information theory literature, I provide an overview Shannon(cid:146)s axiomatic characterization of entropy and mutual information and show the main theoretical features of these two quantities. Formally,thestartingpointisasetofpossibleeventswhoseprobabilitiesofoccurrence are p ;p ;:::;p . Suppose for a moment that these probabilities are known but that is 1 2 n all we know concerning which event will occur. The quantity = p logp is called H (cid:0) i i i the entropy of the set of probabilities p ;:::;p . If x is a chance variable, then H(x) 1 n P indicates its entropy; thus x is not an argument of a function but a label for a number, to di⁄erentiate it from H(y) say, the entropy of the chance variable y. Quantities of the form H = p logp play a central role in Information Theory (cid:0) i i i as measures of information, choice and uncertainty. The quantity H goes by the name P of entropy 35 and p is the probability of a system being in cell i of its phase space. i ThemeasureofhowmuchchoiceisinvolvedintheselectionoftheeventsisH(p ;p ;::;p ) 1 2 n and it has the following properties: 35See, for example, R. C. Tolman, Principles of Statistical Mechanics, Oxford, Clarendon, 1938. 61

Axiom 1 H is continuous in the p . i Axiom 2 If all the p are equal, p = 1, then H should be a monotonic increasing function of i i n n. With equally likely events there is more choice, or uncertainty, when there are more possible events. Axiom 3 If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Theorem 2 of Shannon (1948) establishes the following results: Theorem 1 The only H satisfying the three above assumptions is of the form: n = K p logp i i H (cid:0) i=1 X where K is a positive constant to account for the change in unit of measurement. Figure a: Entropy oftwo choices with probability p and q=1 p as function ofp: (cid:0) Remark 1. . = 0 if and only if all the p but one are zero, with the one remaining i H having the value unity. Thus only when we are certain of the outcome does H vanish. Otherwise is positive. H Remark 2. For a given n, is a maximum and equal to logn when all the p are equal i H (i.e., 1). This is also intuitively the most uncertain situation. n Remark 3. Suppose there are two random variables, X and Y, (Y) = p(x;y)log p(x;y) H (cid:0) x;y x X X Moreover, (X;Y) (X)+ (Y) H (cid:20) H H with equality only if the events are independent (i.e., p(x;y) = p(x)p(y)). This means that the uncertainty of a joint event is less than or equal to the sum of the individual uncertainties. 62

Remark 4. Any change toward equalization of the probabilities p ;p ;:::;p increases 1 2 n . Thusifp < p anincreaseinp ,oradecreaseinp thatmakesthetwoprobabil- 1 2 1 2 H itiesmorealikeresultsintoanincreasein . Theintuitionistrivialsinceequalizing H theprobabilitiesoftwoeventsmakesthemindistinguishableandthereforeincreases uncertainty on their occurrence. More generally, if we perform any (cid:147)averaging(cid:148)operation on the p of the form p = a p where a = a = 1, and all i 0i j ij j i ij j ij a 0, then in general increases36. ij (cid:21) H P P P Remark 5. Given two random variables X and Y as in Remark 3, not necessarily independent, for any particular value x that X can assume there is a conditional probability p (y) that Y has the value y. This is given by x p(x;y) p (y) = : x p(x;y) y The conditional entropy of Y, is then d P e(cid:133)ned as (Y) and it is the average of X H the entropy of Y for each possible realization the random variable X, weighted according to the probability of getting a particular realization x. In formulae, (Y) = p(x;y)logp (y): X x H (cid:0) x;y X This quantity measures the average amount of uncertainty in Y after knowing X. Substituting the value of p (y) , delivers x (Y) = p(x;y)logp(x;y)+ p(x;y)log p(x;y) X H (cid:0) x;y x;y y X X X = (X;Y) (X) H (cid:0)H or (X;Y) = (X)+ (Y): X H H H This formula has a simple interpretation. The uncertainty (or entropy) of the joint event X;Y is the uncertainty of X plus the uncertainty of Y after learning the realization of X. Remark 6. Combining the results in Axiom 3 and remark 5, it is possible to recover (X)+ (Y) (X;Y) = (X)+ (Y): X H H (cid:21) H H H This reads (Y) (Y) and implies that the uncertainty of Y is never increased X H (cid:21) H by knowledge of X. If the two random variables are independent, then the entropy will remain unchanged. 36The only case in which remains unchanged is when the transformation results in just one permu- H tation of p . j 63

To substantiate the interpretation of entropy as the rate of generating information, it is necessary to link with the notion of a channel. Achannel is simply the mediumused H to transmit information from the source to the destination, and its capacity is de(cid:133)ned as the rate at which the channel transmits information. A discrete channel is a system through which a sequence of choices from a (cid:133)nite set of elementary symbols S ;:::;S 1 n can be transmitted from one point to another. Each of the symbols S is assumed to have i a certain duration in time t seconds. It is not required that all possible sequences of i the S be capable of transmission on the system; certain sequences only may be allowed. i These sequeences will be possible signals for the channel. Given a channel, one may be interested in measuring its capacity to transmit information. In general, with di⁄erent lengths of symbols and constraints on the allowed sequences, the capacity of the channel is de(cid:133)ned as: De(cid:133)nition 2 The capacity C of a discrete channel is given by logN(T) C = lim T T !1 where N(T) is the number of allowed signals of duration T. To explain the argument in a very simple case, consider transmitting (cid:133)les via computers. The speed at which one can exchange documents depends on the internet connection and it is expressed in bits per seconds. The maximum amount of bits per second that can be transmitted is negotiated with the provider. However, this does not mean that the computer will always be transmitting data at this rate; this is the maximum possible rate and whether or not the actual rate reaches this maximum depends on the usage and the source of information which feeds the channel. The link between channel capacity and entropy is illustrated by the following Theorem 9 of Shannon: Theorem 3 Let a source have entropy (bits per second) and a channel have a capacity H C (bits per second). Then it is possible to encode the output of the source in such a way C as to transmit at the average rate " symbols per second over the channel where " is (cid:0) H C arbitrarily small. It is not possible to transmit at an average rate greater than . H The intuition behind this result is that by selecting an appropriate coding scheme, the entropy of the symbols on a channel achieves its maximum at the channel capacity. Alternatively, channel capacity can be related to mutual information. De(cid:133)nition 4 The Mutual Information between two random variables X and Y is de(cid:133)ned as the average reduction in uncertainty of random variable X achieved upon the knowledge of the random variable Y. 64

In formulae: (X;Y) (X) E( (X Y)); I (cid:17) H (cid:0) H j which says that the mutual information is the average reduction in uncertainty of X due to the knowledge of Y or, symmetrically, it is the reduction of uncertainty of X due to the knowledge of Y. Mutual information is invariant to transformation of X and Y , depending only on their copula. Intuitively, (X;Y) measures the amount of information that two random variables I have in common. The capacity of the channel is then alternatively de(cid:133)ned by C = max( (X;Y)) p(Y) I where the maximum is with respect to all possible information sources used as input to the channel (i.e., the probability distribution of Y, p(Y)). If the channel is noiseless, E( (x)) = E( (X( Y))) = 0. For example, think about a newspaper editor who y H H j wants to maximize his sales. To do that, he has to choose the allocation of space for his articles in such a way that it is attractive for the consumers. In this example, Y is the random variable space, X the random variable sales, the channel(cid:146)s capacity is the maximum number of pages in the newspaper and the channel itself is the best articles(cid:146) allocation of space which signals that the journal is worth buying. 65

Cite this document
APA
Antonella Tutino (2008). The Rigidity of Choice. Lifecycle savings with information-processing limits (FEDS 2008-62). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2008-62
BibTeX
@techreport{wtfs_feds_2008_62,
  author = {Antonella Tutino},
  title = {The Rigidity of Choice. Lifecycle savings with information-processing limits},
  type = {Finance and Economics Discussion Series},
  number = {2008-62},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2008},
  url = {https://whenthefedspeaks.com/doc/feds_2008-62},
  abstract = {This paper studies the implications of information-processing limits on the consumption and savings behavior of households through time. It presents a dynamic model in which consumers rationally choose the size and scope of the information they want to process concerning their financial possibilities, constrained by a Shannon channel. The model predicts that people with higher degrees of risk aversion rationally choose more information. This happens for precautionary reasons since, with finite processing rate, risk averse consumers prefer to be well informed about their financial possibilities before implementing a consumption plan. Moreover, numerical results show that consumers with processing capacity constraints have asymmetric responses to shocks, with negative shocks producing more persistent effects than positive ones. This asymmetry results in more savings. I show that the predictions of the model can be effectively used to study the impact of tax reforms on consumers spending.},
}