feds · November 2, 2017

Mechanics of linear quadratic Gaussian rational inattention tracking problems

Abstract

This paper presents a general framework for constructing and solving the multivariate static linear quadratic Gaussian (LQG) rational inattention tracking problem. We interpret the nature of the solution and the implied action of the agent, and we construct representations that formalize how the agent processes data. We apply this infrastructure to the rational inattention price-setting problem, confirming the result that a conditional response to economics shocks is possible, but casting doubt on a common assumption made in the literature. We show that multiple equilibria and a social cost of increased attention can arise in these models. We consider the extension to the dynamic problem and provide an approximate solution method that achieves low approximation error for many applications found in the LQG rational inattention literature. Accessible materials (.zip)

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C. Mechanics of linear quadratic Gaussian rational inattention tracking problems Chad Fulton 2017-109 Please cite this paper as: Fulton, Chad (2017). “Mechanics of linear quadratic Gaussian rational inattention tracking problems,” Finance and Economics Discussion Series 2017-109. Washington: Board of Governors of the Federal Reserve System, https://doi.org/10.17016/FEDS.2017.109. NOTE: Staff working papers in the Finance and Economics Discussion Series (FEDS) are preliminary materials circulated to stimulate discussion and critical comment. The analysis and conclusions set forth are those of the authors and do not indicate concurrence by other members of the research staff or the Board of Governors. References in publications to the Finance and Economics Discussion Series (other than acknowledgement) should be cleared with the author(s) to protect the tentative character of these papers.

Mechanics of linear quadratic Gaussian rational inattention tracking problems Chad Fulton * Abstract This paper presents a general framework for constructing and solving the multivariate static linear quadratic Gaussian (LQG) rational inattention tracking problem. We interpret the nature of the solution and the implied action of the agent, and we construct representations that formalize how the agent processes data. We apply this infrastructure to the rational inattention price-setting problem, confirming the result that a conditional response to economics shocks is possible, but casting doubt on a common assumption made in the literature. We show that multiple equilibria and a socialcostofincreasedattentioncanariseinthesemodels. Weconsidertheextension to the dynamic problem and provide an approximate solution method that achieves low approximation error for many applications found in the LQG rational inattention literature. JELClassification: D81,D83,E31 Keywords: Rationalinattention,informationacquisition,signalextraction *chad.t.fulton@frb.gov. Theviewsexpressedinthispaperaresolelytheresponsibilityoftheauthorand shouldnotbeinterpretedasreflectingtheviewsoftheBoardofGovernorsoftheFederalReserveSystem,or anyoneelseintheFederalReserveSystem. 1

1 Introduction Models incorporating rational inattention, in which agents faced with limited information processingcapacityoptimallyallocatetheirattentionacrossvariouseconomicsshocks,can accommodate a wide range of behavior that deviates from the rational expectations baseline. They have been used to explain the sluggish responses to shocks observed for many macroeconomic time series, they imply behavior similar to standard logit models when applied to discrete choice problems, and they can result in discrete behavior by agents even when the underlying economic shocks that influence the agent are continuously distributed.1 Despitetheirappeal,thetechnicalchallengesaresuchthatexplicitsolutionshave notbeenfoundformostproblems. Inthispaperwederiveanexplicitsolutionforandgivea comprehensiveaccountofafoundationalmodel: amultivariatestaticprobleminwhichall shocks are Gaussian and the objective function of the agent is quadratic. These so-called static linear quadratic Gaussian problems are the most tractable class of rational inattention problems, but, even so, a full solution has been previously unknown. In addition, the model considered in this paper serves as an important special case of more complex dynamic models, and has been used to establish baseline results and provide intuition in many applications. Along these lines, much of the analysis and interpretation that we will developinthispaperwillextendtothedynamiccase. Our first step is to lay a firm groundwork, since a variety of ways even to formulate the problemhavearisen. Webeginbywritingdownourpreferredformulation,followingSims (2003) and Sims (2010), and explaining its relation to the classic signal extraction problem. Inshort,anagentmustchoosetheoptimalposteriorcovariancematrixforavectorof shocks given a loss function and subject to a constraint on how much uncertainty can be 1 Forsluggishnessinmacroeconomicseries,seetheprice-settingmodelofMac´kowiakandWiederholt (2009),thepermanentincomemodelofSims(2003),orthenumerousreferencescontainedinSims(2010). Forrationalinattentionasappliedtodiscretechoicemodels,seeMatêjkaandMcKay(2015)orSteineretal. (2017). Fordiscreteactionsincontinuoussettings,seeJungetal.(2015). 2

reduced relative to their prior. Our formulation can include an arbitrary number of shocks, potentially correlated, and can incorporate the information constraint in terms of a fixed quantity of information processing capacity or a fixed marginal cost associated with processing additional information. Throughout the paper, we clarify the relationship between this and alternative statements of the problem. In particular, we will take a closer look at theoften-usedformulationinwhichagentschoosethenoisevarianceof"signals"received by the agent, which, we will argue, can encourage misleading comparisons with the signal extractionproblem. After establishing the problem, we immediately present the general solution in two theorems. Weshowthatthecrucialelementinconstructingandunderstandingthesolutionlies in recognizing that the agents are not just choosing how much posterior uncertainty about shocks is optimal, they are also choosing the form of the posterior uncertainty. An illuminating example of this is given in Sims (2010): if a rationally inattentive agent wishes to trackthesumof𝑛randomvariables,thentheywillprocessinformationsoastomaketheir posterior uncertainty about those random variables negatively correlated, even if the variables themselves are independent. We show how to construct what we call the canonical synthetic shocks (or just “canonical shocks”), specific linear combinations of the original, or “fundamental”, shocks that capture the optimal form of posterior uncertainty chosen by a rationally inattentive agent. Understanding these canonical shocks is the key to solving theproblemandunderstandingtheimplicationsofthesolution,andtheircarefuldefinition isonecontributionofthispaper. While the fundamental shocks that exist as part of the formulation of the economic model may appear natural to the modeler, we argue that it is instead the canonical shocks that are natural for the agent within the model. We show that the canonical shocks represent the separate and distinct elements of uncertainty that actually matter to the agent. In fact, the solution to the problem is exactly constructed by transforming the problem into the 3

“canonical space”, and we provide a straightforward intuition of this by geometrizing the problemintermsofellisoidsrepresentinguncertainty. Then,giventhesolution,theagent’s action - their posterior estimate of each individual component of the canonical shock turns out to be a simple Bayesian update, a weighted average of the agent’s prior for that component and their understanding of the incoming data. Moreover, using the canonical shocks we can construct a representation of the incoming data as understood by the agent that gives an intuitive sense of how the agent produced their posterior through information processing. Whiletheformtheserepresentationstakeisconsistentwiththeconceptofan“observation” or“signal”asinasignalextractionproblem,acrucialpointisthatanygivenrepresentation is simply a device that assists us in characterizing the agent’s decision. Representations are not unique, and we show how to construct the class of representations that would be valid for a given problem. We characterize the useful subset of these representations as “feasible”,andshowthatallfeasiblerepresentationsareonlytransformationsoftherepresentation constructed in terms of the canonical shocks. Importantly, we show that whereas this canonical representation always exists, in most cases there does not exist a feasible representation in terms of the “fundamental” shocks. This underscores that while the fundamental shocks may be of interest to the modeler, they are not the objects of interest to the agent. Finally, we present the “representation form” of the problem, and show that it is less useful than the canonical form. We also describe the related form of the problem, mentioned above, in which agents choose the noise variance of “signals”, and we show howissuescanarisethroughtheincautiousapplicationofthislastformulation. Asanapplication,weconsidertherationalinattentionprice-settingproblemofMac´kowiak and Wiederholt (2009). We start by showing how to cast the static case of their problem, including their “independence assumption”, in the terms of this paper and then solve it alongwiththreenewformulationsthatweintroduce. Incontrasttotheinvolvedderivations 4

that previous papers have often had to rely on, the exact solution to the general problem that we derive in this paper yields the results immediately. In comparing these solutions, we find that the key result of Mac´kowiak and Wiederholt (2009) - a conditional response todifferenttypesoffundamentalshocks-survivesdroppingtheindependenceassumption, and we also present new results, including the introduction of multiple equilibria and the possibilitythatadditionalinformationprocessingcapacityactuallyincreasessocialcosts.2 ThemoregeneraldynamicRI-LQGtrackingproblemremainsunsolvedbythemethodsof this paper. Despite this, many key concepts - including the canonical synthetic shocks, the agentasaBayesianupdater,andourtreatmentofrepresentations-doapplyinthedynamic problem. We present this problem and show that the sequential application of the static solution combined with iteration of the dynamic transition equation approximates the full dynamic solution, and that the approximation error will be low as long as the parameter capturing the marginal cost of attention is close to zero. Since this condition holds in most existing applications of dynamic RI-LQG tracking problems in the literature, we conclude that the static approximation is a useful tool, particularly since no analytic solution so far existsandnumericalsolutionscanbedifficulttoobtain. This paper is most closely related to Sims (2003) and Sims (2010), to which we owe our basic formulation for the class of RI-LQG tracking problems. Additionally, in these two papers can be found the seeds of many of the concepts we make explicit and fully develop here for the static case. This paper is also related and complementary to Matejka et al. (2017), as both of our papers provide explicit solutions for special cases of the dynamic RI-LQG tracking problem. Whereas we consider the static version of the problem with multiple targets and arbitrary correlations and present an approximate solution in the dynamiccase,theyconsiderthedynamicproblemwithasingleARMA(p,q)target. 2 ThelatterresultrecallsMorrisandShin(2002),exceptthatheretheincompletenessofinformationis endogenous. 5

2 Preliminaries Hereweintroduceafewmathematicalresultsrelatedtoinformationtheoryandgeneralized eigenvalueproblems;thesewillbeusedthroughouttherestofthepaper. 2.1 Information theory Itismosttransparenttointroducetheconceptsofinformationtheoryforthecaseofdiscrete randomvariables,andsoinwhatfollowswewilllet𝑋 and𝑌 denoterandomvariableswith probability mass functions 𝑃 and 𝑃 . For the results in the paper, we will be making use 𝑋 𝑌 of an extension to the continuous case known as differential entropy. Although this extensionisbroadlyconsistentwithdiscretecase,therearesubtletiesthatmustbeaccountedfor; wepointoutafewexamplesofthisbelow. 2.1.1 Entropy Thebasicquantityininformationtheoryisentropy,ameasureoftheuncertaintyassociated witharandomvariable. Entropyisdefinedas:3 ℎ(𝑋) = 𝐸[log(𝑃(𝑋))] − Entropyistypicallymeasuredin“bits”,whereabitisthequantityofuncertaintyassociated with a Bernoulli trial with probability of success 𝑝 = 0.5. Thus a bit is a quantification of theuncertaintyresolvedbytherealizationofasinglecoinflip. Wecanalsodefinejointentropy ℎ(𝑋,𝑌) = 𝐸[log(𝑃(𝑋,𝑌))] and conditional entropy ℎ(𝑋 𝑌) = 𝐸[log(𝑃(𝑋 − | − | 𝑌))]. Conditional entropy can be thought of as the uncertainty about 𝑋 that remains after 3 Often entropy of a discrete random variable is denoted 𝐻(𝑋) and entropy of a continuous random variable, known as differential entropy, is denoted ℎ(𝑋). To simplify notation, we will use ℎ() for both · cases. 6

observing𝑌. The“chainrule”ofentropystatesℎ(𝑋,𝑌) = ℎ(𝑋)+ℎ(𝑌 𝑋);inwords,the | uncertainty about 𝑋 and 𝑌 together is the uncertainty about 𝑋 plus the uncertainty about 𝑌 thatremainsafterobserving𝑋. If𝑋 𝑌,then𝑋 doesnotresolveanyuncertaintyabout𝑌 andsoℎ(𝑌 𝑋) = ℎ(𝑌). Then ⊥ | bythechainruleℎ(𝑋,𝑌) = ℎ(𝑋)+ℎ(𝑌),sothattheuncertaintyabout𝑋 and𝑌 together is just the sum of the uncertainty about 𝑋 and 𝑌 separately. In the degenerate case that 𝑋 = 𝑌, observing𝑋 fully resolves the uncertainty about 𝑌. If 𝑋 and 𝑌 are discrete, then ℎ(𝑌 𝑋) = 0 and so ℎ(𝑋,𝑌) = ℎ(𝑋) + ℎ(𝑌 𝑋) = ℎ(𝑋). However, in the continuous | | case, ℎ(𝑌 𝑋) = ; this is an example of one subtlety that arises in information theory | −∞ whenmovingfromdiscretetocontinuousrandomvariables. 2.1.2 MutualInformation Mutual information is a measure of the information about one random variable contained inanother. Formally: 𝐼(𝑋;𝑌) = ℎ(𝑋) ℎ(𝑋 𝑌) − | Thiscanbeunderstoodasthequantityofuncertaintyabout𝑋 resolvedaftertheobservation of𝑌. Forexample,if𝑋 𝑌,thenℎ(𝑋 𝑌) = ℎ(𝑋)and𝐼(𝑋;𝑌) = 0. Thisistrueinboth ⊥ | thediscreteandcontinuouscases. At the other extreme, if 𝑋 = 𝑌 then in the discrete case ℎ(𝑋 𝑌) = 0 and so 𝐼(𝑋;𝑌) = | ℎ(𝑋). Thus observing 𝑌 resolves all uncertainty about 𝑋, and since the “quantity” of uncertainty about 𝑋 is given by the entropy ℎ(𝑋), this is also the quantity of mutual information. However, in the continuous case, ℎ(𝑋 𝑌) = so that 𝐼(𝑋;𝑌) = . In | −∞ ∞ fact, this too is an intuitive result reflecting the fact that a continuous random variable can take on an uncountably infinite number of values. By mapping each possible value to a 7

“message” of arbitrary content, it is clear that we can transmit as much information as we likethroughtherealizationofacontinuousrandomvariable. 2.1.3 Informationtheoreticresults Here we state some well-known properties of entropy and mutual information; see for exampleCoverandThomas(2006)fordetails. Property 1: Entropy is invariant under translation. Let 𝑊,𝑋 be arbitrary random vectors andlet𝑐bea𝑊-measurablefunction. Then: ℎ(𝑋 +𝑐(𝑊) 𝑊) = ℎ(𝑋 𝑊) | | Corollary: Mutualinformationisinvariantundertranslationbyaconstant. Property 2: Conditioning weakly reduces entropy. Let 𝑊,𝑋 be arbitrary random vectors. Then: ℎ(𝑋) ℎ(𝑋 𝑊) ≥ | Property3: Mutualinformationisinvariantunderinvertibletransformations. Let𝑊,𝑋,𝑌 bearbitraryrandomvectorsandlet𝑓,𝑔 bebijectivefunctions. Then: 𝐼(𝑋;𝑌 𝑊) = 𝐼(𝑓(𝑋),𝑔(𝑌) 𝑊) | | Corollary: As a consequence of properties 1 and 3, if 𝐹,𝐺 are nonsingular conformable matricesand𝑐,𝑑areconstants,then: 𝐼(𝑋;𝑌 𝑊) = 𝐼(𝐹𝑋 +𝑐,𝐺𝑌 +𝑑 𝑊) | | 8

Property 4: Let 𝑋 be a random vector, and consider all possible distributions for 𝑋 such that Var(𝑋) = 𝑃 is fixed. Then the differential entropy is maximized when 𝑋 is jointly Gaussian. Property5: Let𝑋,𝑌 bejointlyGaussianrandomvectorsofdimension𝑛,possiblyconditional on some information , and let 𝑉𝑎𝑟(𝑋 ) = 𝑃 and 𝑉𝑎𝑟(𝑋 ,𝑌) = 𝑃 . − − − − + ℐ | ℐ | ℐ Then: 1 ℎ(𝑋 ) = log 2𝜋𝑒𝑃 | ℐ − 2 𝑏 | − | 1 ℎ(𝑋 ,𝑌) = log 2𝜋𝑒𝑃 | ℐ − 2 𝑏 | + | 1 𝐼(𝑋,𝑌 ) = ℎ(𝑋 ) ℎ(𝑋 ,𝑌) = (log 𝑃 log 𝑃 ) | ℐ − | ℐ − | ℐ − 2 𝑏 | − |− 𝑏 | + | We have not specified the base of the logarithm in Property 5, since different bases simply correspond to different measures of mutual information; for example, if the base is 2 then mutual information is measured in bits, whereas if the base is𝑒 then mutual information is measuredinnats. 2.2 Generalized eigenvalue problems Thegeneralizedeigenvalueproblemfortwomatrices𝐴,𝐵 istofindscalars𝜆 andvectors 𝑖 𝑟 suchthatthefollowingequationholds:4 𝑖 (𝐴 𝜆 𝐵)𝑟 = 0, 𝑖 = 1,...𝑛 𝑖 𝑖 − In what follows, we will be interested in the specialization in which 𝐴,𝐵 are symmetric positivesemidefinitematrices. Infact,wewillusuallyconsidercasesinwhich𝐵 ispositive definite, and then since 𝐵 is nonsingular it is easy to see that left multiplication by 𝐵−1 4Inthissection,weusethenotation𝜆andΛdifferentlythanwewillintherestofthepaper. 9

yields a standard eigenvalue problem (𝐵−1𝐴 𝜆 𝐼)𝑟 = 0. However it turns out that 𝑖 𝑖 − applying this transformation often obscures the form of the solution since 𝐵−1𝐴 may not be positive semidefinite and is generally not even symmetric. The matrix 𝐴 𝜆𝐵 is often − referredtoasamatrixpencilanddenotedbythepair(𝐴,𝐵). For positive semidefinite matrices 𝐴,𝐵, the generalized eigenvalue problem can be solved viasimultaneousdiagonalizationof𝐴,𝐵 bycongruence. Westatethisresultasalemma. Lemma 1:5 If 𝐴 and 𝐵 are real symmetric positive semidefinite matrices of order 𝑛 and rk(𝐵) = 𝑟,then: a. There exists a nonsingular matrix 𝑆 such that 𝐵 = 𝑆′(𝐼 0 )𝑆 and 𝐴 = 𝑆′Λ𝑆, 𝑟 𝑛−𝑟 ⊕ inwhichΛisnonnegativediagonalandrk(𝐴) = rk(Λ). [︂ ]︂ b. Defining 𝑅 ≡ 𝑆−1 = 𝑟 1 ... 𝑟 𝑛 and Λ = diag( { 𝜆 𝑖 } 𝑛 𝑖=1 ), the pairs (𝜆 𝑖 ,𝑟 𝑖 ) solve the generalized eigenvalue problem associated with the matrix pencil (𝐴,𝐵). The scalars𝜆 arecalledgeneralizedeigenvaluesandthevectors𝑟 arecalledgeneralized 𝑖 𝑖 righteigenvectors. c. If 𝐵 is positive definite, there is a unique factorization 𝑀′𝑀 = 𝐵, where 𝑀 is nonsingular. Defining 𝐿 = 𝑀−1, we can compute the eigendecomposition 𝑄Λ𝑄′ = 𝐿′𝐴𝐿. ThenthismatrixΛalongwith𝑆 = 𝑄′𝑀 satisfy(a)and(b). An important element of generalized eigenvalue problems is that the matrix containing generalized eigenvectors is not orthogonal with respect to the usual inner product, i.e. in general 𝑅′𝑅 = 𝐼. However, if 𝐵 is positive definite, we can define a valid inner product ̸ induced by 𝐵 as 𝑥,𝑦 . That the generalized eigenvectors are 𝐵-orthogonal, i.e. that 𝐵 ⟨ ⟩ 𝑅′𝐵𝑅 = 𝐼,followsdirectlyfrompart(a)ofthelemma. Although the generalized eigenvalue problem will be crucial in several ways in the solution to the rational inattention problem considered in this paper, one important use can be 5ProofsofallresultsinthispaperaregiveninAppendixA. 10

immediatelyshowntosimplifythemutualinformationofGaussianrandomvectors. Property6: Let𝑋,𝑌, ,𝑃 ,𝑃 allbedefinedasinProperty5. Thenwecanwrite: − − + ℐ 𝑛 1 ∑︁ 1 𝐼(𝑋,𝑌 ) = log | ℐ − 2 𝑏 𝑛 𝑖 𝑖=1 where 𝑛 denote the generalized eigenvalues of the matrix pencil (𝑃 ,𝑃 ).6 Importantly, 𝑖 + − thispropertyappliestobothstaticanddynamicrationalinattentionproblems. 3 Problem Rational inattention problems fall into the larger class of problems in which agents must make decisions under imperfect information. In classical imperfect information problems, the information structure of the economy is often exogenously imposed. The rational inattentionapproach,introducedbySims(2003),isonewaytoendogenizeinformationimperfections as the rational behavior of agents that face constraints on the extent to which they can process and translate information into actions, even in the case that the information itselfisfreelyavailable. 3.1 Exogenous information imperfections We begin by briefly describing the classical signal extraction problem, one of the most common models of imperfect information, in which the characteristics of the signal and noise are exogenous. This is valuable because it will turn out that the rational inattention problem can be cast in the form of specific signal extraction problems. However, as we willshowbelow,thesignalextractionformulationoftherationalinattentionproblemisnot unique. A more fundamental representation of the rational inattention problem is in terms 6 We use the notation 𝑛 instead of 𝜆 in order to make a notational connection with the following 𝑖 𝑖 sections. 11

of a generalization of signal extraction problems known as tracking problems, which we alsobrieflyintroduce. Thiswillallowus,inthenextsection,todescribethespecificapplication to rationally inattentive tracking problems, and to present the problem and solution inthefurtherspecialcaseknownasthelinearquadraticGaussian(LQG)case. 3.1.1 Signalextractionproblems Givenanunknownrandomvector7ofinterest𝛼andagivenobservationvector𝑦 = ℎ(𝛼,𝜀), where 𝜀 is an independent random vector representing contaminating noise and ℎ is some measurable function, a signal extraction problem is to select a second function 𝑎(𝑦) such that the expected distance between 𝛼 and 𝑎(𝑦) is “small” according to some distance, or loss,function𝑑. Thesignalextractionproblemcanbeformulatedas: ∫︁ min 𝑑(𝛼,𝑎(𝑦))𝑓(𝛼 𝑦)𝑑𝛼 𝑎(𝑦) | If loss is quadratic in 𝛼 𝑎(𝑦), so that the problem is to minimize the (weighted) mean − squareerror,thenthesolutioniswellknowntobetheconditionalexpectation𝑎(𝑦) = 𝐸[𝛼 | 𝑦]. If it is also the case that 𝑦 and 𝛼 are jointly Gaussian, then it is similarly well known thattheconditionalexpectationisalinearfunction,𝑎(𝑦) = 𝑎 +𝐾𝑦. 0 Thewell-knownKalmanfilterrecursivelysolvesadynamicversionofthesignalextraction problem in which the loss is quadratic, all variables are jointly Gaussian, and the vector of interest 𝛼 follows a linear transition law. This case is referred to as a linear quadratic Gaussian(LQG)filteringproblem. Becausethestaticsignalextractionproblemintroduced above is a special case of the recursive problem, we will also refer to it as an LQG signal extractionproblem. 7 We derive all results in terms of random vectors, but everything remains valid for the 1-dimensional randomvariablecase. 12

3.1.2 Trackingproblems To more clearly formulate the rational inattention problem and its solution below, we distinguish between a signal extraction problem and a “tracking” problem. Here, a tracking problem is a generalization of a signal extraction problem in which an observation vector isnotagivenfundamentalcomponent. Instead,theproblemis: ∫︁ min 𝑑(𝛼,𝑎)𝑓(𝛼,𝑎 )𝑑𝑎𝑑𝛼 𝑓() | ℐ suchthat𝑓 isavalidjointdensityfunctionfor(𝛼,𝑎)andispotentiallyconditionalonsome givenpriorinformationset . Wereferto𝛼asthe“target”or“state”and𝑎asthe“action”.8 ℐ Inthecasethatthelossisquadraticin𝛼 𝑎andthevariablesarejointlyGaussian,werefer − tothisasanLQGtrackingproblem. Iftherearenoconstraints,thenthesolutionistochoose𝑓 suchthat𝑎 = 𝛼withprobability 1. Then 𝑓 is degenerate and expected losses are zero. To specify an interesting tracking problem,someconstraintmustbeadded. Forexample,thesignalextractionproblemabove isaspecializationofthetrackingprobleminwhichaconstraintisplacedontheformof𝑎, sothat𝑎mustbeameasurablefunctionofanexogenousobservation𝑦. Forwhatfollows,itisnotationallyconvenienttorewritethetrackingproblemasmin 𝐸[𝑑(𝛼,𝑎) 𝑎 | ] where it is understood that the expectation is with respect to the joint distribution of ℐ (𝛼,𝑎) conditional on the marginal distribution of 𝛼 and the prior information , and that ℐ theminimizationiseitheroverthatjointdistributiondirectlyor,equivalently,overtheconditionaldistributionof𝑎 𝛼, . | ℐ 8 It may be useful to have in mind some sport in which a player must track the position of a ball (the target)inordertoplacetheirfootsothatitwillmeettheball(theiraction). Theiractiondependsonwhere they perceive the ball to be, and they wish to make that perception as close as possible to where the ball actuallyis. 13

3.2 Endogenizing imperfect information through rational inattention In rational inattention problems, all information is generally supposed to be freely observablesaveforaconstraintontheinformationprocessingcapacityoftheagent. Iftherelevant information can be expressed as a random vector 𝛼, then we will write the agent’s perception of that information after processing as 𝑎 . Because the agent wishes to make 𝑎 as + + close to 𝛼 as possible given some constraint, this is often naturally formulated in terms of atrackingproblem,andsowewillrefertothe𝛼 asthetargetand𝑎 astheaction. + Theconstraintinarationalinattentionproblemisformalizedusingthemutualinformation betweentargetandaction,𝐼(𝛼,𝑎 ). Asdescribedabove,thisquantificationof“information + processed” has various desirable properties and a natural interpretation: it is the quantity of uncertainty about the target resolved by the agent in the process of taking their action. There are two primary ways of formulating this constraint. The first allows agents a fixed processing capacity 𝜅 and requires that 𝐼(𝛼,𝑎 ) 𝜅; we will refer to this as the “fixed + ≤ capacity” or “fixed 𝜅” formulation. The second allows agents to access any amount of informationprocessingcapacityatafixedmarginalcost𝜆*;wewillrefertothisasthe“fixed marginal cost” or “fixed 𝜆” formulation. As we will show below, these approaches lead to largelysimilarstatementsoftheproblemandsolution,buttheyhavedifferentimplications incomparativestaticsexercises. 3.3 Rational inattention tracking problems Therationalinattentiontrackingproblemis:9 min𝐸[𝑑(𝛼,𝑎 ) ]+𝜆*𝐼(𝛼,𝑎 ) + − + − 𝑎+ | ℐ | ℐ 9SeeSims(2010)formoredetailsregardingthisformulationoftheproblem. 14

where𝜆*isinterpretedeitherasacostparameterorasaLagrangemultiplierforaconstraint 𝐼(𝛼,𝑎 ) 𝜅; these interpretations correspond respectively to the fixed marginal cost + − | ℐ ≤ and fixed capacity constraints introduced earlier, and we will provide an explicit solution for each case. Note that here and in what follows we will denote the prior information set as and the action as 𝑎 to emphasize the processing of new information. The function − + ℐ 𝐼(𝛼,𝑎 )istheconditionalShannonmutualinformation,introducedabove. + − | ℐ In general this is a difficult problem to solve. However, if the loss is quadratic and 𝛼 is Gaussian,thenananalyticsolutionexists. AsdescribedinSims(2003)andSims(2010),a solutiontothisproblemmakes(𝛼,𝑎 ) jointlyGaussianandwecanwrite𝛼 = 𝑎 +𝜂, + − + | ℐ where𝑎 𝜂. Writing𝛼 𝑁(𝑎 ,𝑃 ),wecanthenspecifythecomponents: + − − − ⊥ | ℐ ∼ 𝑎 𝑁(𝑎 ,𝑃 𝑃 ) + − − − + | ℐ ∼ − 𝜂 𝑁(0,𝑃 ) − + | ℐ ∼ 𝛼 ,𝑎 𝑁(𝑎 ,𝑃 ) − + + + | ℐ ∼ Thenitisclearthatthisoptimalactionisaconditionalexpectation: 𝑎 = 𝐸[𝛼 ],where + + | ℐ denotes the posterior information, with . This of course immediately recalls + − + ℐ ℐ ⊆ ℐ thesolutiontothesignalextractionproblem. Acrucialpointtonoteatthisstage,however, is that we have not been explicit about the contents of the posterior information set, and we have made no mention of an observation or signal vector. In fact, we will develop the complete formulation and solution to this problem with no mention of such a vector, and the fact that we can do this makes the tracking problem, rather than the signal extraction problem, fundamental. Nonetheless, an analogy with the signal extraction problem can be usefulasanaidtointerpretation,andsowewillmaketheanalogypreciseanddrawoutits strengthsandweaknessesasweproceed. Specificationof𝑎 asaconditionalexpectationhasnotfullysolvedtheproblem,butithas + reduced the optimization space and it will allow us to present a simpler formulation. First, 15

wecansimplify𝐸[𝑑(𝛼,𝑎 ) ] = 𝐸[(𝛼 𝑎 )′𝑊(𝛼 𝑎 ) ] = 𝑡𝑟(𝑊𝑃 )where𝑊 is + − + + − + | ℐ − − | ℐ apositivesemidefinitematrixdefiningthelossfunction. Second,fromProperty5,wehave 𝐼(𝛼,𝑎 ) = 1 (log 𝑃 log 𝑃 ).10 Finally, for notational convenience we write + | ℐ − 2 𝑏 | − |− 𝑏 | + | 𝜆 = 𝜆*/(2ln𝑏) to eliminate a constant term from this form of the information constraint, andwewilloftenreferto𝜆asthemarginalcostofattention. This leads us to what might be termed the canonical formulation of the static rational inattention linear quadratic Gaussian (RI-LQG) tracking problem. This formulation is a static versionofthedynamicproblemsdescribedinSims(2003)andSims(2010). Definition1: ThestaticRI-LQGtrackingproblemrepresentedbythetuple(𝑊,𝑎 ,𝑃 )is: − − min𝑡𝑟(𝑊𝑃 )+𝜆(ln 𝑃 ln 𝑃 ) (1) + − + 𝑃+ | |− | | s.t.𝛼 𝑁(𝑎 ,𝑃 ) − − − | ℐ ∼ 𝑃 0 + ≥ 𝑃 𝑃 0 − + − ≥ wherethenotation𝑃 𝑃 0indicatesthatthedifferenceofthesematricesmustbepos- − + − ≥ itivesemidefinite. Wewillgenerallyassumethatthetarget𝛼 isan𝑛 1vectordistributed × 𝑁(𝛼¯,Ω) where rank Ω = 𝑛. Finally, we will refer to 𝛼 and 𝑎 as the “fundamental” target + and action, since we will extensively deal also with transformations of these vectors that wewillcall“synthetic”targetsandactions. Wehavethusreducedtheproblemfromoptimizationoverthespaceofrandomvariablesto optimizationovertheconeofpositivesemidefinitematrices, andwenotethatanysolution 𝑃 determinesaspecificinformationset thatwillbedescribedinmoredetailbelow. + + ℐ The problem as stated has two “positive semidefiniteness” constraints. The first requires 10 Wehaveleftthebaseofthelogarithmunspecifiedhere; inexampleswewillgenerallyassumeinformationtobemeasuredinbits. 16

that 𝑃 is a valid covariance matrix. Given that 𝑃 is full rank, the objective function + − growswithoutboundasthesmallesteigenvalueof𝑃 goestozero,soitisclearthatinany + solution𝑃 willbepositivedefiniteandthisfirstconstraintwillnotbebindinginpractice. + The second constraint, sometimes termed the “no-forgetting” constraint, is often binding, anditwillturnoutthathandlingthatcaseiscentraltothefullsolutionoftheproblem. This latterconstraintisnecessarybecausetheproblemtradesoffposterioruncertaintyamongthe components of the target, so if the loss matrix 𝑊 assigns little weight to some component then it can be optimal to assign that component more posterior uncertainty than existed prior uncertainty. Because the introduction of new information cannot achieve this result, theconstraintisnecessary. Mechanically,thisconstraintguaranteesthatourformulationof 𝑎 ,above,isvalid. + − | ℐ 4 Solution In this section, we describe the solution to the static RI-LQG tracking problem presented aboveinDefinition1. Tobeginwith,wewillworkwiththefixedmarginalcostformulation, andthenshowtheextensiontothefixedcapacitycase. 4.1 Solution to the static LQG-RI tracking problem Itiseasytocheckthatthefirstorderconditiontotheproblemyields: 𝑃−1 = 𝑊/𝜆 (2) + We cannot generally write the first order condition in terms of 𝑃 , because we have not + required 𝑊 to be nonsingular.11 Despite this, if the positive semidefiniteness constraints 11 For this reason it is sometimes more convenient to work in terms of precision matrices rather than covariancematrices. However,whenpossiblewewillpresentresultsintermsofcovariancematrices. 17

are not binding, then this yields the solution to the static RI-LQG tracking problem. In the generalcasewhentheconstraintsmaybebinding,particularlytheno-forgettingconstraint, the solution is more complex. Before presenting the full solution in Theorem 1, some preliminariesareprovidedinLemma2. Lemma2: Assumethatthelossmatrix𝑊 ispositivesemidefiniteandthepriorcovariance matrix 𝑃 is positive definite. Then considering the matrix pencil (𝑊,𝑃−1) we have the − − followingresults: a. TheCholeskyfactor𝐿𝐿′ = 𝑃 isnonsingular,sothat𝑀 = 𝐿−1 exists. − b. Define𝑉 = 𝐿′𝑊𝐿. Thismatrixispositivesemidefinite,anditseigendecomposition canbewritten𝑄𝐷𝑄′ = 𝑉. c. The matrix pencil can be simultaneously diagonalized by congruence so that 𝑊 = 𝑆′𝐷𝑆 and𝑃−1 = 𝑆′𝐼𝑆,where𝑆 = 𝑄′𝑀. − d. The generalized eigenvalues of the matrix pencil, denoted 𝑑 , are the diagonal el- 𝑖 ements of the matrix 𝐷. It will be convenient to always arrange the generalized eigenvaluesinnonincreasingorder. e. The generalized right eigenvectors of the matrix pencil, denoted 𝑟 , are the columns 𝑖 ofthematrix𝑅 = 𝑆−1. Theorem 1: The solution to the fixed marginal cost static RI-LQG tracking problem is givenby: 𝑃 = 𝑅𝑁+𝑅′ (3) + where 𝑁+ is a diagonal matrix with entries 𝑛+. These diagonal elements are defined by 𝑖 𝑛+ = 1/𝛿+, where 𝛿+ = max 𝑑 /𝜆,1 and 𝑑 and 𝑅 are as defined in Lemma 2. As a 𝑖 𝑖 𝑖 { 𝑖 } 𝑖 consequence of assuming that the generalized eigenvalues 𝑑 are in nonincreasing order, 𝑖 18

the values 𝑛+ will be in nondecreasing order. In the following two corollaries, we state 𝑖 an even more explicit solution for the useful special case in which the loss matrix is rank oneandweshowhowtheelementsofthesolutionarerelatedtoamatrixpencilofinterest, (𝑃 ,𝑃 ). + − Corollary1: Ifthelossmatrixisrankonethenwecandecomposeitas𝑊 = 𝑤𝑤′,with𝑤 an𝑛 1vector,andthesolutiontothefixedmarginalcoststaticRI-LQGtrackingproblem × canbewritten: 1 𝑛+ 𝑃 = 𝑃 − 1 𝑃 𝑊𝑃 + − − 𝐿′𝑤 2 − − ‖ ‖ Corollary 2: Let 𝑃 denote the posterior covariance matrix solving the static RI-LQG + trackingproblem andlet𝑠′ denotethe𝑖-throw ofthematrix𝑆,defined inLemma2. Then 𝑖 𝑛+ is the generalized eigenvalue of the matrix pencil (𝑃 ,𝑃 ) associated with the left 𝑖 + − generalizedeigenvector𝑠′. 𝑖 In order to solve the fixed capacity version of the problem, it is useful to first define a new quantity 𝑟 as the integer such that 𝑑 > 𝜆 𝑑 and define 𝑑 = and 𝑑 = to 𝑟 𝑟+1 0 𝑛+1 ≥ ∞ −∞ encompassdegenerateandfullranksolutions. Theorem 2: The solution to the fixed capacity static RI-LQG tracking problem with 𝜅 measured in base 𝑏 is as given in Theorem 1, except that 𝜆 is interpreted as a shadow cost. Thevalueof𝜆thatsolvestheproblemis: [︃ ]︃1 𝑟 𝑟 ∏︁ 𝜆 = 𝑏−2𝜅 𝑑 (4) 𝑖 𝑖=1 aslongas𝜅 > 0andisundefinedotherwise. Thequantity𝑏isthebaseofthelogarithmthat definestheunitofinformation(𝑏 = 2ifinformationismeasuredinbits),andthequantity𝑟 isdefinedasabove,butnowisdeterminedinconcertwith𝜆. Theprocedureforcomputing 19

𝑟 and𝜆isasfollows: a. Set𝑟 = 𝑛 b. Compute𝜆accordingtoequation(4),given𝑟. c. If 𝑑 > 𝜆, 𝑖 = 1,...,𝑟 then this pair (𝑟,𝜆) describes the solution. Otherwise, set 𝑖 𝑟 = 𝑟 1andrepeatfromstep2. − Corollary: ForthefixedcapacitystaticRI-LQGtrackingproblem: a. Theshadowcost𝜆ismonotonicdecreasingin𝜅,for𝜅 (0, ). ∈ ∞ b. Thequantity𝑟 isnondecreasingin𝜅. 4.1.1 Canonicalsynthetictarget Beforeproceedingwithimplicationsofthesetheorems,wefirstdefineanewrandomvector thatisinstrumentalinunderstandingthesolutiontothestaticRI-LQGtrackingproblem. Definition 2: We define the canonical synthetic target (briefly the canonical target) as the vector 𝛽 = 𝑆𝛼, where 𝑆 is the matrix of left generalized eigenvectors from the second 𝑐 CorollarytoTheorem1. The canonical synthetic target is a transformation of the target vector into a new set of coordinates. The importance of this transformation and insight into the new coordinate spaceisgiveninthenextlemma. Lemma3: Thecanonicalsynthetictarget𝛽 ,satisfiesthefollowing: 𝑐 a. 𝛽 𝑁(𝑏 ,𝑁+)where𝑏 = 𝑆𝑎 . 𝑐 + 𝑐,+ 𝑐,+ + | ℐ ∼ b. 𝛽 𝑁(𝑏 ,𝐼)where𝑏 = 𝑆𝑎 . 𝑐 − 𝑐,− 𝑐,− − | ℐ ∼ c. 𝐸[(𝛼 𝑎 )′𝑊(𝛼 𝑎 ) ] = 𝐸[(𝛽 𝑏 )′𝐷(𝛽 𝑏 ) ] + + − 𝑐 𝑐,+ 𝑐,+ − − − | ℐ − − | ℐ d. 𝐼(𝛼,𝑎 ) = 𝐼(𝛽 ,𝑏 ) + − 𝑐 𝑐,+ − | ℐ | ℐ 20

e. 𝐼(𝛽 ,𝑏 ) = ∑︀𝑛 𝐼(𝛽 ,𝑏 )where𝛽 = (𝛽 , ,𝛽 )′ 𝑐 𝑐,+ | ℐ − 𝑖=1 𝑖,𝑐 𝑖,𝑐,+ | ℐ − 𝑐 1,𝑐 ··· 𝑛,𝑐 Parts (c) and (d) demonstrate that the objective function can be rewritten entirely in terms of 𝛽 . It is because of these results that we call 𝛽 a “synthetic” target. As we will show 𝑐 𝑐 later, there are many transformations that allow us to reformulate the problem in terms of a variety of synthetic target vectors. Parts (b), (c), and (e) demonstrate that the elements of the canonical synthetic target are separable with respect to prior uncertainty, the loss function, and mutual information; this is the essence of the new coordinate space and, becausesuchavectorcanalwaysbeconstructed,werefertothisasthecanonicalsynthetic target. Moreover, part (a) demonstrates that the elements of the canonical synthetic target remainseparableintheposterior. Part (c) furnishes us an intuition for the generalized eigenvalues 𝑑 : they define the loss 𝑖 function as associated with the canonical synthetic target. Because 𝐷 is diagonal, the element 𝑑 captures the full loss associated with the element 𝛽 , and we thus refer to the 𝑖 𝑖,𝑐 elements𝑑 asthecanonicallossweights. 𝑖 We are now in a position to state some results following from Theorems 1 and 2. These resultswillequallyapplytothefixed𝜆orfixed𝜅formulations,unlessotherwisenoted. 4.1.2 Rankofthesolution Definition 3: We refer to 𝑟 as the rank of the solution to the static RI-LQG tracking problem,andwesaythatthesolutionisfullrankif𝑟 = 𝑛. Lemma4: a. 𝑟 = rk(𝑃 𝑃 ),sothesolutionisfullrankifandonlyiftheno-forgettingconstraint − + − isnotbinding. Ifthesolutionisfullrank,thenthesolutionisgivenbythefirst-order condition. b. 𝑟 rk(𝑊),soif𝑊 issingularthenthesolutioncannotbefullrank. ≤ 21

c. 𝑟 is the number of elements for which the loss in utility caused by increased uncertainty, as measured by the canonical loss weight 𝑑 , is greater than the marginal cost 𝑖 ofadditionalattention,asmeasuredby𝜆. d. 𝑟 is the number of elements in the canonical synthetic target for which the agent processesnewinformation. e. In the fixed 𝜅 formulation, if 𝜅 > 0 and rk(𝑊) 1, then 𝑟 1. This is in contrast ≥ ≥ tothefixed𝜆case,whichmayhave𝑟 = 0evenif𝑊 isfullrank. 4.1.3 Informationcapacityallocations Definition 4: The total quantity of information capacity used by the agent, measured in base𝑏,is: 𝑟 1 ∑︁ 1 𝜅 𝐼(𝛼,𝑎 ) = log (5) ≡ + | ℐ − 2 𝑏 𝑛+ 𝑖=1 𝑖 wherewecouldhavealsoused𝑛astheupperlimitofsummation,sincefor𝑖 > 𝑟,log 1 = 𝑛+ 𝑖 log1 = 0. Alternatively,giventhedefinitionof𝜆fromTheorem2,wecanalsowrite: [︃ ]︃ 𝑟 1 ∑︁ 𝜅 = log 𝑑 𝑟log 𝜆 2 𝑏 𝑖 − 𝑏 𝑖=1 Theseformulasareequivalent(althoughinthelatterformulawecannotuse𝑛astheupper limitofsummation),andsothislatterformulaisalsovalidinthefixed𝜆formulation. Definition 5: The information capacity allocated to processing the 𝑖-th element of canonicalsynthetictarget𝛽 is: 𝑐 ⎧ [︂ ]︂ √ ⎪ ⎪𝜅 +log 𝑑𝑖 𝑖 = 1,...,𝑟 𝜅 𝑖 ≡ 𝐼(𝛽 𝑖,𝑐 ,𝑏 𝑖,𝑐,+ | ℐ − ) = 1 2 log 𝑏 𝑛 1 + = ⎨𝑟 𝑏 ∏︀𝑟 𝑗=1 √𝑑𝑗 1/𝑟 (6) 𝑖 ⎪ ⎪ ⎩0 𝑖 = 𝑟+1,...,𝑛 22

The last formulation suggests a straightforward intuition describing the allocation of capacity: first, each element is given an equal amount of attention (the 𝜅/𝑟 term), and then attentionisadded(subtracted)ifthesquarerootofcanonicallossweightforthatelementis higher (lower) than the geometric mean across all elements that are considered. Note that this result is in terms of the canonical synthetic target, and this intuition does not extend to the original (fundamental) target. Given this definition, we can also write 𝜅 = ∑︀𝑟 𝜅 , 𝑖=1 𝑖 wherewecouldagainuseeither𝑟 or𝑛astheupperlimitofsummation. Unfortunately, there is generally no straightforward measure of the information capacity allocated to processing an individual element of the fundamental target 𝛼. This is because it is not straightforward to decompose mutual information for random vectors exhibiting correlation. However,wecanintroduceanapproximatemeasure. Definition 6: An approximate measure of the information capacity allocated to the 𝑖-th element of the fundamental target 𝛼, measured in base 𝑏, is the following component-wise mutualinformation: (︂ )︂ 1 𝑃 𝑖𝑖,− 𝑘 𝐼(𝛼 ,𝑎 ) = log (7) 𝑖 ≡ 𝑖 + | ℐ − 2 𝑏 𝑃 𝑖𝑖,+ where, for example 𝑃 is the (𝑖,𝑖)-th element of the matrix 𝑃 . This quantity computes 𝑖𝑖,− − the information about the 𝑖-th element of the target that is contained in the full action 𝑎 , + and it ignores the effect of correlation in the prior and the posterior. Note that generally ∑︀𝑛 𝑘 = 𝜅 and, moreover, the sum does not provide either an upper or lower bound for 𝑖=1 𝑖 ̸ 𝜅. Lemma5: Ifboth𝑊 and𝑃 arediagonalmatrices,thencomponent-wisemutualinforma- − tion 𝑘 is equal to both the information capacity allocated to processing the 𝑖-th element of 𝑖 the fundamental target 𝛼 and the 𝑖-th element of the canonical synthetic target 𝛽 , so that 𝑐 𝑘 = 𝜅 and ∑︀𝑛 𝑘 = 𝜅. 𝑖 𝑖 𝑖=1 𝑖 23

4.1.4 Illustration: separabletarget The solution to the static RI-LQG tracking problem is easiest to understand when the elements of the canonical target happen to be oriented in the same directions as the elements of the fundamental target. In practice, this situation primarily occurs when 𝑊 and 𝑃 − are both diagonal, because in this case the fundamental target vector is already separable with respect to prior uncertainty, the loss function, and mutual information. For this reason, we describe a target associated with diagonal 𝑊 and 𝑃 as separable. We will first − demonstrate the relatively simple solution in the separable case, and then emphasize that this same logic also applies to general case, except in terms of the canonical target rather thanthefundamentaltarget. Tofixnotation,wewillassume𝑃 isapositivedefinitediagonalmatrixwithelements𝜎2 − 𝑖,− and that 𝑊 is a positive semidefinite diagonal matrix with elements 𝑤2. For convenience, 𝑖 we will assume that 𝑤2𝜎2 𝑤2𝜎2 (we can always re-order the elements of 𝛼 to 1 1,− ≥ ··· ≥ 𝑛 𝑛,− make this true). Application of Lemma 2 is trivial in this case since 𝑉 = 𝐿′𝑊𝐿 is already diagonal,sothatthegeneralizedeigenvaluesaresimply𝑑 = 𝑤2𝜎2 . Thisformulaimplies 𝑖 𝑖 𝑖,− that the canonical loss weights 𝑑 can be interpreted as “loss-weighted volatility”.12 The 𝑖 associatedrightgeneralizedeigenvectorsare𝑟 = 𝜎 𝑒 where𝑒 isthe𝑖-thelementofthe 𝑖 𝑖,− 𝑖 𝑖 standardbasis. We will examine the solution in the fixed marginal cost case and note that these results apply also to the fixed capacity formulation of the problem when the shadow cost 𝜆 is computedasdescribedinTheorem2. Wesupposethattherankofthesolutionis𝑟,sothat 𝜆 is a fixed parameter satisfying 𝑑 > 𝜆 𝑑 . From Theorem 1, it is easy to see that 𝑃 𝑟 𝑟+1 + ≥ will also be a diagonal matrix, and we denote its 𝑖-th diagonal element as 𝜎2 . Then the 𝑖,+ 12 This interpretation as “loss-weighted volatility” is still broadly true in the more general case, but the relationshipsaremorecomplexduetointeractioneffects 24

fullsolutionis: ⎧ ⎪ ⎪ ⎨𝜆/𝑤2 𝑖 = 1,...,𝑟 𝑖 𝜎2 = 𝑖,+ ⎪ ⎪ ⎩𝜎2 𝑖 = 𝑟+1,...,𝑛 𝑖,− The first order condition would have set 𝜎2 = 𝜆/𝑤2 for 𝑖 = 1,...,𝑛. This is infeasible, 𝑖,+ 𝑖 since we defined 𝑟 such that 𝜆/𝑤 𝜎2 and so this would suggest more posterior 𝑟+1 ≥ 𝑖,− uncertaintyforelements𝑟+1,...,𝑛thanthereexistedprioruncertainty-theagentwould have “forgotten” information they previously knew. In this case, it is straightforward to imposetheconstraint,setting𝜎2 = 𝜎2 for𝑖 = 𝑟+1,...,𝑛. 𝑖,+ 𝑖,− Thiscaseadmitsasimpleformulafortheinformationcapacityallocatedtoeachelement: ⎧ ⎪ ⎪ ⎨ 1(log 𝑤2 +log 𝜎2 log 𝜆) 𝑖 = 1,...,𝑟 2 𝑏 𝑖 𝑏 𝑖,− − 𝑏 𝑘 = 𝜅 = 𝑖 𝑖 ⎪ ⎪ ⎩0 𝑖 = 𝑟+1,...,𝑛 More attention is paid to elements of the target that are more important (in terms of loss weight) or that are associated with more prior uncertainty, and as the marginal cost of attention falls, (weakly) more attention will be paid to every element. For those elements that receive no attention from the agent according to this result, it is easy to see in the previousresultthat,asonewouldexpect,posterioruncertaintyisequaltoprioruncertainty. If the no-forgetting constraint were not enforced, these elements would be associated with negativecapacityallocations. Thissectionappliesdirectlytocasesinwhichfundamentaltargetitselfisseparablesothat the loss and prior covariance matrices are diagonal. This will generally not be the case, butfromLemma3weknowthattheseconditionswillalwaysbesatisfiedforthecanonical target. Thismeansthattheaboveanalysis,whichiseasytounderstand,canstillbeapplied inthegeneralcase,solongasitiscastintermsofthecanonicaltarget. 25

4.1.5 Comparativestatics We now consider how the solution changes as individual parameters vary, holding everythingelseconstant. Mathematically,theseexercisescanberelativelystraightforwardgiven the explicit formulas we derived for posterior uncertainty and attention allocations, but the intuition can be obscured due to the presence of binding constraints and the somewhat opaquecharacterofthegeneralizedeigendecomposition. Forthisreason,inthissectionwe will only briefly describe the general effects on posterior uncertainty of a change in each type of parameter and will then focus on illustrating important behavior using two specific examples. There are three types of parameters in the model: (1) the parameter associated with the information constraint, 𝜆 or 𝜅, (2) the elements of 𝑊 describing the loss function, and (3) the elements of 𝑃 describing prior uncertainty. The effect of a change in the first − type can be understood by focusing only on the marginal, or shadow, cost parameter 𝜆, as a consequence of the Corollary to Theorem 2. It is easy to see from Theorem 1 that an increase (decrease) in the marginal cost of attention always weakly increases (decreases) posterioruncertaintyforeveryelementofthetarget. For the second and third types of parameters, it is difficult to achieve a simple presentation of the wide variety of effects possible, as these parameters affect both the generalized eigenvalues and the generalized eigenvectors, and so affect the definition of the canonical target. Rather than attempt it, we instead consider the effect of a change in one of the canonical loss weights 𝑑 , with the justification that this captures all possible effects for a 𝑖 givencanonicaltarget. For the first time, here the formulation of the information constraint has a material effect on results. If the problem is formulated with a fixed marginal cost of attention, then an increaseinthecanonicallossweightassociatedwiththe𝑖-thelementofthecanonicaltarget, 26

𝑑 , weakly decreases posterior uncertainty associated with that element, but leaves poste- 𝑖 rior uncertainty associated with the other elements unchanged. If the problem is instead formulated with a fixed capacity, then an increase in 𝑑 still weakly decreases posterior 𝑖 uncertainty for that element, but now weakly increases posterior uncertainty for all other elements. Inthelattercase,theincreasein𝑑 makesitoptimaltopaymoreattentiontothe 𝑖 𝑖-th component, but attention must be reallocated from elsewhere to achieve that. In the former case, the agent simply pays to allocate additional attention, and the end result is an increaseinthetotalquantityofinformationprocessed. Illustration We now illustrate these results using two specific examples. The baseline parameterizationsareasfollows: ⎡ ⎤ ⎡ ⎤ 1 0 0 1.5 0 0 ⎢ ⎥ ⎢ ⎥ Example(a) 𝑊(𝑎) = ⎢ ⎢0 1 0 ⎥ ⎥, 𝑃 (𝑎) = ⎢ ⎢ 0 1.4 0 ⎥ ⎥ − ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ 0 0 1 0 0 0.8 ⎡ ⎤ ⎡ ⎤ 1.5 0 0 1 0 0 ⎢ ⎥ ⎢ ⎥ Example(b) 𝑊(𝑏) = ⎢ ⎢ 0 1.4 0 ⎥ ⎥, 𝑃 (𝑏) = ⎢ ⎢0 1 0 ⎥ ⎥ − ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ 0 0 0.8 0 0 1 These examples are relatively easily to understand because they are separable, and they are relatively easy to contrast because they share the same canonical loss weights. This allows us to highlight those differences caused by different loss matrices separately from those differences caused by different levels of prior uncertainty. While example (a) might initiallyappearmoreplausiblethanexample(b)-sinceitmayseemparticularlyunrealistic thatthepriorcovariancematrixbetheidentity-itisexample(b)thatwillbemoreusefulin understandingmorecomplexmodels. ThisisbecauseanystaticRI-LQGtrackingproblem willbeintheformofexample(b)whenitiscastintermsofitscanonicaltarget. 27

κdexiF, ,11PniesaerceD λdexiF, ,11PniesaerceD κniesaerceDroλniesaercnI enilesaB − − 2 2 2 ytniatrecnudecudeR 2 tniartsnocgnittegrof-oN ytniatrecnugniniameR 2σ 2σ , 2 2 σ , 2 2 σ , 2 2 σ − ,1 , 2 2 σ − ,1 − − 2w/λ − − 3 λ 2w/λ= 2σ 2w/λ= 2σ , 2 3 σ 1 2 3 w/λ 2 1 w/λ λ 1 2 +,2 1 +,1 1 2 3 w/λ λ 1 − 2 1 w/λ λ , 2 3 σ= +, 2 3 σ 2 2 w/λ= +, 2 2 σ , 2 3 σ= +, 2 3 σ , 2 3 σ= +, 2 3 σ 2 2 w/λ= +, 2 2 σ 2 1 w/λ= +, 2 1 σ 2w/λ= 2σ 2w/λ= 2σ − − − 3 +,3 2 +,2 2σ= 2σ 2σ= 2σ ,1 +,1 ,1 +,1 − − 0 0 0 0 3=i 2=i 1=i 3=i 2=i 1=i 3=i 2=i 1=i 3=i 2=i 1=i )a(elpmaxerofsesicrexescitatsevitarapmoC :1erugiF κdexiF,2 1 wniesaerceD λdexiF,2 1 wniesaerceD κniesaerceDroλniesaercnI enilesaB 2 2 2 ytniatrecnudecudeR 2 2w/λ tniartsnocgnittegrof-oN 1 2w/λ ytniatrecnugniniameR 3 2w/λ 1 2w/λ λ 2w/λ 2σ 2σ 3 2σ 2σ 2σ 3 2σ 2σ ,3 ,2 ,2 ,2 ,1 ,2 ,1 − − 1 − 1 − − 1 − − 1 2σ= 2σ 2σ= 2σ 2σ= 2σ λ 2σ= 2σ 2σ= 2σ λ 2 3 w/λ= +, 2 3 σ − ,1 +,1 λ − ,3 +,3 − ,1 +,1 − ,3 +,3 2 2 w/λ= +, 2 2 σ 2 1 w/λ= +, 2 1 σ − ,3 +,3 2w/λ= 2σ 2 2 w/λ= +, 2 2 σ 2 2 w/λ= +, 2 2 σ 2 1 w/λ= +, 2 1 σ 2 +,2 0 0 0 0 3=i 2=i 1=i 3=i 2=i 1=i 3=i 2=i 1=i 3=i 2=i 1=i )b(elpmaxerofsesicrexescitatsevitarapmoC :2erugiF 28

Fig. 1 and Fig. 2, corresponding respectively to examples (a) and (b), each contain four panels depicting prior uncertainty and optimal posterior uncertainty.13 In both figures, the panelatthefarleftdepictsthebaselinecase,whilethethreeotherpanelsdepictspecificdeviations from that baseline case. In both figures, the second panel from the left depicts the effect of an increase in the marginal cost of attention (or equivalently a decrease in available capacity). The third and fourth panels depict a decrease in the canonical loss weight associated with the first element of the target, under the fixed marginal cost formulation in thethirdpanelandunderthefixedcapacityformulationinthefourthpanel. Thetwoexamplesdifferinhowthisdecreaseinthecanonicallossweightisachieved-inexample(a)we consideradecreaseinprioruncertaintyassociatedwiththefirstelementofthetarget,while inexample(b)weconsideradecreaseinthelossweightassociatedwiththefirstelement. Ineachpanel,eachbaroutlinedinblackrepresentsuncertaintyassociatedwithoneelement of the target. The height of the bar represents prior uncertainty, the dashed lines represent the level of posterior uncertainty suggested by the first order condition, the unshaded portionrepresentstheoptimallevelofposterioruncertainty,andtheshadedportionrepresents the reduction in uncertainty due to information processing. For some elements, there is a hatched region in place of the shaded region; in these cases, the first order condition suggested too high a level of posterior uncertainty, and the no-forgetting constraint became bindingsothatnoinformationwasprocessed. Thehatchedregionrepresentstheinfeasible proposedenlargementofuncertainty. In example (a), since the loss weight for each element of the target is equal to one, the proposed posterior uncertainty for each is simply equal to 𝜆, which is set to be about 0.9 in the baseline case for this illustration. As shown in the first panel, this is feasible for the first two elements, which have relatively high prior uncertainty, but is not feasible for the third element, for which prior uncertainty is already lower than the given value of 𝜆. In the second panel, we consider increasing 𝜆, and this has straightforward effects: posterior 13ThesolutionprocessvisualizedinFig.1iscommonlyknownas“reversewaterfilling”. 29

uncertaintyforthefirsttwoelementsrises,whileposterioruncertaintyforthethirdelement cannot rise any further. In the third panel, we consider, relative again to the baseline case, the effect of decreasing prior uncertainty associated with the first element while assuming that the model is formulated with a fixed marginal cost of attention. Because this change does not affect the loss weight, it does not affect the proposed level of uncertainty, which isstillequalto𝜆. Infact,ifwehadonlyslightlyreducedtheprioruncertainty,itwouldnot have changed the solution at all. However, in this case the reduction in prior uncertainty is so great that the no-forgetting constraint begins to bind. As described above, this has no effectonthesolutionforthesecondorthirdelements. Inthelastpanel,weagainconsider, relative to the baseline case, the same decrease in prior uncertainty, but this time assuming that the model is formulated with a fixed capacity; the results clearly differ from those in thepreviouspanel. Becausethereductioninprioruncertaintymakesiteasierfortheagent to achieve any desired level of posterior uncertainty, this has the effect of reducing the shadowcostofattention. Whiletheno-forgettingconstraintstillbeginstobindforthefirst element, in this case posterior uncertainty falls for both of the other elements, and in fact theno-forgettingconstraintceasestobindforthethirdelement. Eventhoughthespecificsofthesolutionsdifferinexample(b),therearequalitativelysimilarresultsfromthecomparativestaticsexercises. Thetwomaindifferencesare,first,since the loss weights differ, the first order condition will propose different levels of posterior uncertainty for each element of the target and, second, since prior uncertainty is the same, the no-forgetting constraint will bind at the same point for each element. The qualitative similarities are apparent in the second, third, and fourth panels: in response to an increase in𝜆,posterioruncertaintyrisesforeachelement;inresponsetoadecreaseinthecanonical loss weight for the first element under a fixed marginal cost of attention, the no-forgetting constraint binds for the first element while the solutions for the other two elements remain unchanged; and in response to the same decrease under a fixed capacity, posterior uncertaintyfallsfortheothertwoelements. 30

It is not an accident that these two examples share qualitative results; they were designed sothatexample(b)issimplyexample(a)recastintermsofitscanonicaltarget. Ingeneral, it may be quite difficult to interpret the solution in terms of the fundamental target, while it will always be easy to do so in terms of the canonical target. Simplifications achieved by considering the problem in terms of the canonical target will arise in every subsequent sectionofthispaper. 4.2 Geometric interpretation of the static RI-LQG tracking problem and solution (1)First-ordercondition (2)Whitened (3)Canonical Prior Posterior (4)Canonical,constrained (5)Whitened,constrained (6)Posterior Figure3: GeometrizationofTheorem1usingellipsoids In this section, we use a geometrical approach to interpret the problem and the nature of the solution given in Theorems 1 and 2. This is especially helpful in understanding the solution when the loss and prior covariance matrices are not diagonal. The general idea is 31

totakeadvantageofthegeometrizationofpositivedefinitematrices,specificallycovariance matrices,asellipsoids. The iso-density loci of the prior and posterior conditional distributions of the fundamental target form ellipsoids defined by the prior and posterior covariance matrices, and these ellipsoids can be interpreted as regions of uncertainty about the target, conditional on the priororposteriorinformationset. Thevolumeoftheellipsoiddefinedbyapositivedefinite matrix 𝑃 is 𝑉 = 𝑃 𝑉 where 𝑉 defines the volume of an n-dimensional unit sphere. 𝑃 𝑠 𝑠 | |× Iso-density ellipsoids with greater volume are associated with larger covariance matrices and increased uncertainty. The ratio of prior volume to posterior volume is given by 𝑉− = 𝑉+ |𝑃−|×𝑉𝑠. Takinglogsanddividingbytwo,weseethat |𝑃+|×𝑉𝑠 1 1 (log (𝑉 /𝑉 )) = (log 𝑃 log 𝑃 ) = 𝐼(𝛼,𝑎 ) 2 𝑏 − + 2 𝑏 | − |− 𝑏 | + | + | ℐ − Thus the information constraint can be understood in terms of the relative volumes of the prior and posterior ellipsoids. Under the fixed capacity formulation, the information constraint limits the volume of the ellipsoid describing posterior uncertainty in terms of the prior volume: if 𝐼(𝛼,𝑎 ) 𝜅, then 𝑉 𝑉 1 𝑉 . Similarly, under a fixed + | ℐ − ≤ − ≥ + ≥ 22𝜅 − marginal cost, the total cost equals the marginal cost times a function of the ratio of volumes. ThetwopositivesemidefinitenessconstraintsinDefinition1canalsobeunderstood in terms of the prior and posterior ellipsoids. The constraint 𝑃 0 simply requires that + ≥ theposteriorellipsoidbewell-defined. Theno-forgettingconstraint𝑃 𝑃 0requires − + − ≥ that the posterior ellipsoid be weakly contained within the prior ellipsoid. If the posterior ellipsoid extended beyond the prior ellipsoid in any direction, that would correspond to “forgetting”informationpreviouslyknown. Formally, an ellipsoid defined by a positive definite covariance matrix 𝑃 can be fully described in terms of its eigendecomposition. Its eigenvectors determine the directions of the ellipsoid’s principal axes, and its eigenvalues are proportional to the squares of the 32

semi-axis lengths. Because the determinant of a matrix is the product of its eigenvalues, the volume of an ellipsoid is invariant to its rotation, and for this reason, the information constraint depends only on the eigenvalues of the prior and posterior. The no-forgetting constraint,however,dependsalsoontheeigenvectors. A preliminary step in the proof of Theorem 1 was to establish that in the static RI-LQG tracking problem it will always be optimal for the eigenvectors of 𝑃 to coincide with the + eigenvectors of a particular transformation of the loss matrix 𝑊. This result fixes the rotation of the posterior ellipsoid; what remains is to select its eigenvalues. The first-order condition proposes setting the eigenvalues equal to the eigenvalues of the inverse loss matrix scaled by 𝜆. If the no-forgetting constraint is not binding then this fixes the semi-axis lengths and completes the solution. If the latter constraint does bind, however, the problemismoredifficultbecausetheposteriorellipsoidisusuallynotconcentricwiththeprior ellipsoid;thatis,theyusuallydonotusuallyshareeigenvectors. Iftheellipsoidswereconcentric, then imposing the no-forgetting constraint would be straightforward: simply “pull in” the ends of each posterior principal axis that extend beyond the prior. This straightforwardcaseisactuallythesituationwhenboth𝑊 and𝑃 arediagonal,asweshowedabove. − Inthegeneralcase,however,itisnotobviouswhichaxesto“pullin”,orbyhowmuch. This problem is solved by simultaneous diagonalization, which generates new coordinates under which the prior and posterior ellipsoids are not only concentric but are aligned with the standard axes. In fact, in the new coordinate space the prior is an n-dimensional unit sphere. Thematrix𝑆 fromLemma2isthechangeofbasismatriximplementingthetransformationtothenewcoordinates,andtheellipsoidsofuncertaintyinthetransformedspace correspond to covariance matrices associated with the canonical synthetic target. The simple“pulling-in”approachcanbeimplementedinthenewcoordinatespace,andthesolution in the original space can be found simply by reversing the transformation. This is possible because relative volumes are preserved by this transformation and the no-forgetting 33

constraint is satisfied in the original space if and only if it is satisfied in the transformed space. Wevisualizethegeometricalinterpretationoftheproblemandsolutioninthesixpanelsof Fig.3. Panels(1)and(6),(2)and(5),and(3)and(4)representpriorandproposedposterior ellipsoids in three different coordinate spaces. In all panels, the dotted ellipsoids represent the prior in the given space and the solid ellipsoids represent a candidate posterior. The three upper panels represent the infeasible posterior proposed by the first order condition, andthethreelowerpanelsrepresentthefeasible,constrained,posterior. Panel (1) displays the prior ellipsoid and the proposed posterior ellipsoid satisfying the first order condition, 𝑃 = 𝜆𝑊−1, in the standard basis. It is clear that the no-forgetting + constraint is not satisfied, since the proposed posterior extends beyond the prior. Panel (2) represents the same prior and proposed posterior ellipsoids as in Panel (1), but after an intermediate transformation has been applied. This transformation will be called the “whitening” transformation and will be described in more detail later. The underlying coordinatespaceiscalledthe“whitened”space. Panel (3) again represents the same prior and proposed posterior ellipsoids, but now after the transformation to the canonical synthetic target has been applied. Accordingly, we call the underlying coordinate space the “canonical” coordinate space for this problem. In this panel, the problem has not been solved (since the posterior still extends beyond the prior), but the ellipsoids are concentric and are aligned with the standard axes, making the impositionoftheconstraintsstraightforward. Panel (4) remains in the canonical coordinate space, but now displays the constrained posteriorresultingfromthepulling-inoperationappliedtothe𝑥semi-axis,asdescribedabove. The no-forgetting constraint is now satisfied. Panels (5) and (6) simply reverse the transformation to return to the original coordinate space. In Panel (6), the solid ellipsoid now represents the posterior covariance matrix that solves the static RI-LQG tracking problem, 34

withtheno-forgettingconstraintnowsatisfied. 4.3 The action solving the static RI-LQG tracking problem In the previous sections, we noted that the optimal action is a conditional expectation, 𝑎 = 𝐸[𝛼 ], but presented the solution in terms of 𝑃 . In this section we provide + + + | ℐ a few important results concerning the structure and interpretation of the action 𝑎 itself, + although two preliminary steps in this section will be asserted for the time being and will only by proved in later sections. First, we will write 𝛼ˆ to denote the agent’s understanding ofthetargetbasedsolelyonincomingdata;thiswillonlybefullyformalizedlater. Second, we present the result, derived later, that the agent’s optimal action can be written as a weightedaverageoftheirpriorandtheirunderstandingoftheincomingdata: 𝑎 = (𝐼 𝐾)𝑎 +𝐾𝛼ˆ (8) + − − wheretheweightmatrixis𝐾 = 𝐼 𝑃 𝑃−1. Thisequationshowsthat,asusualintheLQG + − − imperfectinformationsetting,ouragentisaBayesianupdater,butnowbecausetheagentis rationally inattentive, the weight matrix is not given but is selected. One important insight fromthisequationisthattherearetwochannelsthroughwhichtheagent’sactionisdriven away from the target. The first is that since the agent incompletely processes the incoming data,theirunderstandingofthetargetislessthanperfect,andsopartoftheiractionwillbe basedoncontaminatingnoise. Thesecondisthatevenafterreceivingupdatedinformation, the rationally inattentive agent still places weight on their prior because they take into accounttheirlimitedunderstandingoftheincomingdata. Two limiting cases provide some intuition. First, as information becomes perfect, we have both 𝐾 𝐼 and 𝛼ˆ 𝛼, so that the agent puts all weight on their understanding of → → the incoming data, and moreover their understanding is correct. When no information is 35

collected,𝐾 0and𝛼ˆ becomesdiffuse,sothatnoweightisplacedonincomingdataand → theactionisequaltotheprior. Moregeneralresultsaredifficultintermsofthefundamental target 𝛼, because in general 𝐾 will not be diagonal. As usual, however, things are more straightforwardintermsofthecanonicalsynthetictarget𝛽 . 𝑐 Tomotivatetheuseofthecanonicalcoordinatespaceininterpretingtheaction,noticethat we can rewrite 𝐾 = 𝑅(𝐼 𝑁+)𝑆. The rows of the matrix 𝑆, 𝑠′, are the left generalized − 𝑖 eigenvectors of (𝑃 ,𝑃 ) associated with generalized eigenvalues 𝑛+, and it is not hard + − 𝑖 to see that those rows are also the left eigenvectors of 𝐾 associated with eigenvalues 1 − 𝑛+. The elements of the canonical target 𝛽 are the linear combinations of 𝛼 defined by 𝑖 𝑐 these left eigenvectors 𝑠′. Taken together, the elements of the canonical target are exactly 𝑖 thoserandomvariablesforwhichBayesianupdatingbyarationallyinattentiveagentoccurs independently. ThisisformalizedinLemma6. Lemma 6: The components of 𝑏 = 𝐸[𝛽 ], which we call the canonical synthetic 𝑐,+ 𝑐 + | ℐ action(brieflythecanonicalaction),are: 𝑏 = 𝑛+𝑏 +(1 𝑛+)𝛽 ˆ (9) 𝑖,𝑐,+ 𝑖 𝑖,𝑐,− − 𝑖 𝑖,𝑐 where 𝑛+ [0,1] and 𝑏 = 𝐸[𝛽 ], and where the agent’s understanding of the 𝑖 ∈ 𝑐,− 𝑐 | ℐ − canonicaltarget,𝛽 ˆ ,isdefinedas 𝑖,𝑐 𝛽 ˆ 𝑦 = 𝛽 +𝜀 , 𝜀 𝑁(0,(1/𝑛+ 1)−1) 𝑖,𝑐 ≡ 𝑖,𝑐 𝑖,𝑐 𝑖,𝑐 𝑖,𝑐 ∼ 𝑖 − if 𝑛+ [0,1) and is diffuse if 𝑛+ = 1. The noise term 𝜀 is a mechanism to formalize 𝑖 ∈ 𝑖 𝑖,𝑐 the effects of inattention. The alternative notation used here, 𝑦 , will connect this lemma 𝑖,𝑐 with our definition of representations, introduced later. Importantly, while it will turn out that 𝛽 ˆ will always correspond to what we term a feasible representation, so that we will 𝑖,𝑐 alwaysbejustifiedinwritingitas𝑦 ,thesameisnottrueof𝛼ˆ. 𝑖,𝑐 36

In the canonical coordinate space, therefore, the Bayesian updating is straightforward: the action is a simple weighted average of the prior for that component and a noisecontaminated version of the canonical target. We now relate these results to the action associatedwiththefundamentaltarget. Theorem 3: The (fundamental) action that solves the static RI-LQG tracking problem for eitherthefixedmarginalcostorfixedcapacitycaseis: 𝑎 = 𝑅𝑏 (10) + 𝑐,+ where 𝑏 is the canonical action and 𝑅 is the matrix of right generalized eigenvectors 𝑐,+ defined in Lemma 2. Although this Theorem is in a sense trivial - a straightforward application of the definition of the canonical target - it is important because it formalizes the constructionof𝑎 . + 4.3.1 Bias,variance,andresponsiveness In general, we know that rationally inattentive individuals will not respond perfectly to incomingdata,butwecanusetheupdatingequationinthecanonicalspace,giveninLemma 6, to provide a sharper comparison with the perfect information situation. Above, we describedtwochannelsdrivingtheactionawayfromthetarget;formally,theseare,first,that a rationally inattentive agent introduces contaminating noise, since 𝜀 = 0, and, second, 𝑖,𝑐 ̸ that a rationally inattentive agent chooses to be partially unresponsive, since 𝑛+ = 0. By 𝑖 ̸ contrast, a perfectly informed agent has both of these equal to zero. By viewing the action 𝑏 as the rational inattention estimator of 𝛽 , we can say that the variance of the esti- 𝑖,𝑐,+ 𝑖,𝑐 mator is due to the former channel, while the bias of the estimator is due to the latter. To justify this terminology, we define the bias and variance of the rational inattention action 37

as: 𝐸[𝑏 𝛽 ,𝛽 ] = 𝑛+(𝑏 𝛽 ) Bias 𝑖,𝑐,+ − 𝑖,𝑐 | ℐ − 𝑖,𝑐 𝑖 𝑖,𝑐,− − 𝑖,𝑐 𝑉𝑎𝑟(𝑏 ,𝛽 ) = (1 𝑛+)2𝑉𝑎𝑟(𝜀 ) Variance 𝑖,𝑐,+ | ℐ − 𝑖,𝑐 − 𝑖 𝑖,𝑐 The bias is generally nonzero unless the target is degenerate (𝛽 𝑏 ) or information 𝑖,𝑐 𝑖,𝑐,− ≡ is perfect (𝑛+ = 0). The variance is nonzero unless information is perfect or the agent 𝑖 collects no information at all (𝑛+ = 1). The bias describes the extent to which the rational 𝑖 inattention action will differ from the target on average. The quantity 𝑛+ is the proportion 𝑖 of the unexpected part of the incoming data to which the agent is unresponsive, and so the quantity 1 𝑛+ can be interpreted as the responsiveness of the agent. It is difficult to − 𝑖 meaningfullyextendtheseresultstothefundamentaltargetinageneralway,otherthanby mechanicallyreferencingTheorem3. 4.3.2 Linearcombinationsofthetarget It can be useful to explore arbitrary linear combinations of the action, 𝑤′𝛼 where 𝑤 is an 𝑛 1 vector of weights, and it is easy to do so. Applying Theorem 3, we can compute × any linear combination as 𝑤′𝑎 = 𝛾′𝑏 , where 𝛾′ = 𝑤′𝑅 are the weights in the canoni- + 𝑐,+ cal space. One reason that this is interesting is that the loss function is often constructed exactly to minimize the weighted mean square error of one or more such linear combinations. Supposing that we are interested in 𝑛 linear combinations defined by 𝑤 ,...,𝑤 1 𝑛 withweights𝜉 ,...,𝜉 ,thenthelossfunctionis: 1 𝑛 𝑛 ∑︁ 𝜉 𝐸[(𝑤′𝛼 𝑤′𝑎 )′(𝑤′𝛼 𝑤′𝑎 ) ] 𝑖 𝑖 − 𝑖 + 𝑖 − 𝑖 + | ℐ − 𝑖=1 and this can be rewritten in the standard form 𝐸[(𝛼 𝑎 )′𝑊(𝛼 𝑎 ) ] by setting + + − − − | ℐ 𝑊 = ∑︀𝑛 𝜉 𝑤 𝑤′. 𝑖=1 𝑖 𝑖 𝑖 38

A special case that is often of interest occurs when an agent is only interested in tracking one specific linear combination 𝑝 = 𝑤′𝛼, so that their loss function is 𝐸[(𝑝 𝑝 )2 ]. + − − | ℐ Thiscanbewritteninthestandardformusingtherankonelossmatrix𝑊 = 𝑤𝑤′. Although the action solving the static RI-LQG tracking problem is 𝑎 , the agent is only interested + in the synthetic posterior 𝑝 = 𝑤′𝑎 . We can of course compute this using Theorem 3, + + butinthiscasewecanactuallyderiveamoreexplicitsolution. UsingthefirstCorollaryto Theorem 1 it is easy to show that 𝑤′ is a left eigenvector of 𝐾 and therefore the target of interest 𝑝 is simply a scalar multiple of the canonical target. This result is very intuitive: the agent chooses to track exactly the object of interest. Finally, it is straightforward to show that the posterior collapses to 𝑝 = 𝑛+𝑝 +(1 𝑛+)𝑝ˆ, so that the ultimate form of + 1 − 1 − thesolutionisasimpleBayesianupdateintermsoftheobjectofinterest. Inthisrankonecase,wecansimplycharacterizethesenseinwhichuncertaintyisreduced between prior and posterior. Since 𝑤′ is proportional to the only generalized eigenvector associated with a nonzero eigenvalue, it follows that any vector orthogonal to 𝑤′ is in the null space of 𝐾. Writing 𝑤⊥ as a vector orthogonal to 𝑤, it is not hard to show that 𝑤′𝑃 𝑤 < 𝑤′𝑃 𝑤 and that 𝑤⊥′ 𝑃 𝑤⊥ = 𝑤⊥′ 𝑃 𝑤⊥. The general version of this result for + − + − the rank 𝑛 case is that uncertainty is only reduced for the space spanned by the canonical targets𝛽 towhichattentionisactuallyallocated,i.e. forwhich𝑛+ < 1. 𝑖,𝑐 𝑖 4.3.3 Illustration: rankonecase To illustrate the rank one case, we consider the example in section 3.2.3 of Sims (2010) in which an agent is supposed to be tracking a variable 𝑦 = ∑︀𝑛 𝑧 subject to a fixed 𝑡 𝑖=1 𝑖𝑡 marginal cost of attention 𝜆, where 𝑧 𝑁(0,𝜔2), independent across 𝑖 and 𝑡. Since this 𝑖𝑡 ∼ problem is identical at each time period 𝑡, we can sequentially apply the static solution described here, and we assume that the agent’s prior is just the unconditional distribution, so that 𝑧 𝑁(0,𝜔2𝐼) for all 𝑡. While Sims (2010) gives the general form of the 𝑡 𝑡−1 | ℐ ∼ 39

solution to this problem, as a consequence of Theorem 1 we can easily derive the exact formula. To set up the problem in terms of our Definition 1, the fundamental target is the vector 𝑧 𝑡 andthelossmatrixis𝑊 = 𝜄𝜄′ = 1 (an𝑛 𝑛matrixofones),where𝜄 = (1,1,...,1)′ is 𝑛×𝑛 × a vector of weights defining 𝑦 as a linear combination of 𝑧 . The prior covariance matrix 𝑡 𝑡 is 𝑃 = 𝜔2𝐼. The canonical loss weights are 𝑑 = 𝑛𝜔2 and 𝑑 = 0 for 𝑖 = 2,...,𝑛. − 1 𝑖 This implies that 𝑛+ = min(𝜆/𝑛𝜔2,1) and 𝑛+ = 1 for 𝑖 = 2,...,𝑛. Applying the first 1 𝑖 CorollarytoTheorem1,weconcludethat: 𝑃 = 𝜔2(𝐼 (1 𝑛+)(1/𝑛) 1 ) + − − 1 𝑛×𝑛 ThisagreeswiththesolutioninSims(2010),exceptthatweareabletobemoreexplicitthe term(1 𝑛+). Asdescribedabove,wehavealsoformalizedSims’remarkthatthevariance − 𝑖 ofanylinearcombination𝑤′𝑧 thatisuncorrelatedwith𝜄′𝑧 willnotbereduced,regardless 𝑡 𝑡 of the cost 𝜆. This is easy to see here, because 𝜄′ is the only generalized eigenvector 𝑠′ 𝑖 associatedwithageneralizedeigenvalueforwhichitispossiblethat𝑛+ < 1. 𝑖 4.4 Transformations of the static RI-LQG tracking problem In previous sections, we have extensively used a specific transformation to construct what we call the canonical synthetic target. This transformation is particularly useful because it simplifiestheproblemwhilepreservingimportantrelationships,especiallytheinformation and no-forgetting constraints. However, this is not the only possible transformation of the problem,andsoweprovideamoregeneralresulthere. Definition7: ConsiderastaticRI-LQGtrackingproblemdefinedbythetuple(𝑊,𝑎 ,𝑃 ), − − referred to as the reference problem. Let 𝐵 be a nonsingular 𝑛 𝑛 matrix. We define the × 𝐵-transformed static RI-LQG tracking problem, corresponding to the 𝐵-synthetic target 40

𝛽 = 𝐵𝛼,as: min𝑡𝑟(𝑉𝑂 )+𝜆(ln 𝑂 ln 𝑂 ) (11) + − + 𝑂+ | |− | | s.t.𝛽 𝑁(𝑏 ,𝑂 ) − − − | ℐ ∼ 𝑂 0 + ≥ 𝑂 𝑂 0 − + − ≥ where 𝑉 = 𝐵−1′ 𝑊𝐵−1, 𝑂 = 𝐵𝑃 𝐵′ and 𝑏 = 𝐵𝑎 . We represent the 𝐵-transformed − − − − problembythetuple(𝐵,𝑊,𝑎 ,𝑃 ),andnotethatthisdefinitionencompassesthestandard − − formulation given by Definition 1, which can be included here by setting 𝐵 to the identity matrix. Note also that any 𝐵-transformed problem can be written as an independent problem(𝐼,𝑉,𝑏 ,𝑂 ),althoughthiseliminatesconnectiontothereferenceproblem. − − Theorem 4: If a matrix 𝑂 solves the 𝐵-transformed static RI-LQG tracking problem + (𝐵,𝑊,𝑎 ,𝑃 ), then the matrix 𝑃 = 𝐵−1𝑂 𝐵−1′ solves the reference static RI-LQG − − + + trackingproblem(𝑊,𝑎 ,𝑃 ). − − Wecanusethisresulttoredefinethecanonicaltarget. Definition 8: Let 𝑆 be the matrix defined in Lemma 2. Then the 𝑆-transformed problem (𝑆,𝑊,𝑎 ,𝑃 ) is called the canonical form of the reference problem and the 𝑆-synthetic − − targetisexactlythecanonicalsynthetictargetgiveninDefinition2,𝛽 𝑆𝛼. 𝑐 ≡ Therearetwoothersynthetictargetsthatitwillbeusefultoformallydefine. Definition 9: Let 𝑀′𝑀 = 𝑃−1. Then the 𝑀-transformed problem (𝑀,𝑊,𝑎 ,𝑃 ) is − − − called the whitened form of the reference problem and the 𝑀-synthetic target is called the whitenedsynthetictarget. Definition10: Let𝑍𝑋𝑍′ = 𝑊 betheeigendecompositionof𝑊. Thenthe𝑍-transformed problem (𝑍,𝑊,𝑎 ,𝑃 ) is called the eigendecomposition form of the reference problem − − 41

andthe𝑍-synthetictargetiscalledtheeigendecompositionsynthetictarget. Since the product of two nonsingular matrices is again nonsingular, we can chain transformationstogether,andstillapplyTheorem4totheproductofthetransformationmatrices. Lemma 7: If 𝐵 and 𝐶 are nonsingular 𝑛 𝑛 matrices, then the 𝐶𝐵-transformed problem × (𝐶𝐵,𝑊,𝑎 ,𝑃 )isequaltothe𝐶-transformationofthe𝐵-transformedproblem. − − ThisallowsustogivefurtherinsightintothecanonicalformofthestaticRI-LQGtracking problem. Lemma 8: The canonical form of the reference problem is equivalent to the transformed problem achieved by first applying the whitening transformation to the reference problemandthensubsequentlyapplyingtheeigendecompositiontransformationtotheresultant whitenedproblem. Lemma8isaformalizationofthegeometricalstepsvisualizedinFig.3. 5 Representations Although we have continually described 𝑎 as a conditional expectation, we have so far + lefttheposteriorinformationset vagueandfocusedinsteadontheposteriorcovariance + ℐ matrix 𝑃 , and we have also purposely presented both the problem and solution with no + mention of the “observation” or “signal” vectors that are commonly used in the rational inattention literature. In this section, we finally consider the posterior information set and discuss what we call “representations” of the information processed by agents. We first pursuetheseissuesqualitativelyandthenformalizethemusinganalgebraicapproach. Definition11: ForastaticRI-LQGtrackingproblem(𝑊,𝑎 ,𝑃 )withsolution𝑃 andthe − − + correspondingaction𝑎 ,wedefinearepresentationasanyrandomvector𝑦 thatgenerates + + the solution, i.e. for which 𝐸[𝛼 ,𝑦 ] = 𝑎 . An innovation representation, denoted 𝑣 , − + + + | ℐ 42

isanyrepresentationthatadditionallysatisfies𝐸[𝑣 ] = 0. + − | ℐ We think that “representation” is a natural term to capture the essence of these vectors, particularly because they are not fundamental to the static RI-LQG tracking problem and because there are many vectors that satisfy the definition. When we provide a formal derivation, we will show that the most useful subset of representations correspond to a noise-contaminatedversionofsomesynthetictarget. Thesynthetictargetsexpressthefundamental target in different coordinate systems, and this is also the role representations play,exceptthatrepresentationsexpresstheagent’simperfectunderstandingafterprocessingnewdata. Lemma9: Theaction𝑎 isarepresentation,since𝑎 = 𝐸[𝛼 ,𝑎 ]. Thuswecanrefer + + − + | ℐ totheaction𝑎 astheagent’s“perception”ofthetarget. + The term “perception” seems natural to use when discussing 𝑎 as it relates to the agent’s + understanding of the target 𝛼 whereas the term “action” seems natural when discussing how the agent uses the solution of the rational inattention problem in the context of a larger economic problem. However, both terms refer to the same object, the conditional expectation𝑎 . + In the rational inattention literature, what we refer to here as representations are often instead referred to as “observations” or as “signals”, and the rational inattention problem is often formulated in terms of selecting the noise covariance matrix corresponding to a specific form of a signal vector, rather than in terms of selecting the posterior covariance as we have done. This approach can be valid, and in fact we will later show how to reformulate the static RI-LQG tracking problem in similar terms. However, we argue that using the terms “observation” or “signal” can create ambiguities because they conjure up certain connotationsthatmaynotbenaturalintherationalinattentioncontext. Aswedevelopformaldefinitionsofrepresentationsandtheposteriorinformationset,wewillmakeconcrete theseconcerns. 43

5.1 Posterior information set and feasible representations In this section, we use an algebraic approach to describe the posterior information set and thespaceofrepresentationsavailabletotheagent. Since the optimal action is a conditional expectation and since all variables are jointly Gaussian, 𝑎 is the linear projection of 𝛼 onto a vector space. We take to be the vec- + 𝒱 tor space of Gaussian random vectors of dimension 𝑛 equipped with the inner product 𝑋,𝑌 = 𝐸[𝑋𝑌′]14, and identify and to be the subspaces of defined by the − + ⟨ ⟩ 𝒲 𝒲 𝒱 information sets and . This implies that . Now, recalling that we can − + − + ℐ ℐ 𝒲 ⊆ 𝒲 write 𝛼 = 𝑎 +𝜂, we have 𝛼 , 𝑎 , and 𝜂 ⊥, where ⊥ is the orthogonal + ∈ 𝒱 + ∈ 𝒲 + ∈ 𝒲+ 𝒲+ complementto in . Thus𝛼 = 𝑎 +𝜂 isadecompositionintoorthogonalsubspaces. + + 𝒲 𝒱 Ourgoalistoisolateonlythenewinformationcollectedbytheagent,andformallywewant toconstructasubspace thatistheorthogonalcomplementof in . Thefirststep * − + 𝒲 𝒲 𝒲 is to pick a subspace such that = . If we let 1,𝑣 be an orthogonal 𝑦 + − 𝑦 − 𝒲 𝒲 𝒲 ⊕ 𝒲 { } basis for and take as given a basis 𝑦 for , then 1,𝑣 ,𝑦 will be a basis for − + 𝑦 − + 𝒲 { } 𝒲 { } .15 However, because we did not require , this latter basis will generally + − 𝑦 𝒲 𝒲 ⊥ 𝒲 not be orthogonal, and thus , and so also the basis vector 𝑦 , contains a component 𝑦 + 𝒲 of information already known by the agent. However, we can construct an orthogonal basis 1,𝑣 ,𝑣 by applying the Gram-Schmidt process, so that 𝑣 = 𝑦 proj 𝑦 . { − + } + + − 𝒲− + This 𝑣 is now orthogonal to and so only contains new information; thus we have + − 𝒲 defined the space we want as = span(𝑣 ). This allows us to write 𝑎 as an orthogonal * + + 𝒲 14Moreprecisely,this 𝑋,𝑌 istheGrammatrixconsistingofthecomponent-wiseinnerproductsofthe ⟨ ⟩ randomvectors𝑋,𝑌. 15 Thedimensionsof and arenotessentialtothissection,andcouldbemadetobeanynumber − * 𝒲 𝒲 greaterthanzero. 44

decomposition: 𝑎 = proj 𝛼+proj 𝛼 + 𝒲− 𝒲* ⏟ ⏞ ⏟ ⏞ 𝑎− 𝑎*≡𝐾𝑣𝑣+ where 𝑎 is the prior mean, now interpreted as the projection of 𝛼 on prior informa- − tion, 𝑎 is the projection of 𝛼 on new information, and the projection matrix is 𝐾 = * 𝑣 𝛼,𝑣 [ 𝑣 ,𝑣 ]−1. In this way, we have decomposed posterior information, defined by + + + ⟨ ⟩ ⟨ ⟩ ,intopurelypriorinformation,in ,andpurelynewinformation,in . + − * 𝒲 𝒲 𝒲 As suggested by the notation, the vectors 𝑦 will correspond to the representations intro- + duced in the previous section and the vectors 𝑣 will correspond to innovation representa- + tions. However,westilldonothaveanoperationaldefinitionof𝑦 , ,or . Toremedy + * + 𝒲 𝒲 this,weconsideranarbitraryrandomvector𝑦 . Denotingthespacespannedby𝛼as 𝛼 ∈ 𝒱 𝒱 and its orthogonal complement in as ⊥, we can perform an orthogonal decomposition 𝒱 𝒱𝛼 𝑦 = proj 𝑦+proj 𝑦. Since𝛼isabasiselementof wecanwriteproj 𝑦 𝑍𝛼where 𝒱𝛼 𝒱 𝛼 ⊥ 𝒱 𝛼 𝒱𝛼 ≡ 𝑍 is some conformable matrix, and we will denote 𝜁 proj 𝑦. We can then construct ≡ 𝒱 𝛼 ⊥ 𝑣 = 𝑦 proj 𝑦,andnotethatproj 𝑦 = 𝑍𝑎 +proj 𝜁. Wedefine𝜀 𝜁 proj 𝜁 + − 𝒲− 𝒲− − 𝒲− ≡ − 𝒲− andΛ 𝜀,𝜀 ,andwenotethatboth𝜀 𝛼 and𝜀 . Thisallowsustooperationalize − ≡ ⟨ ⟩ ⊥ ⊥ 𝒲 aninnovationrepresentationas𝑣 = 𝑍𝛼+𝜀 𝑍𝑎 . + − − We can now explicitly compute 𝛼,𝑣 = 𝑃 𝑍′ and 𝑣 ,𝑣 = 𝑍𝑃 𝑍′ + Λ so that + − + + − ⟨ ⟩ ⟨ ⟩ 𝐾 = 𝑃 𝑍′(𝑍𝑃 𝑍′ +Λ)−1. Notice that, given the prior, the innovation representation 𝑣 𝑣 − − + and the space are completely defined by the pair (𝑍,Λ), as is 𝐾 . Furthermore, from * 𝑣 𝒲 anysuchpairwecandefinearepresentation𝑦 = 𝑍𝛼+𝜀forwhich𝑦 proj 𝑦 = 𝑣 . + + − 𝑊− + + The last step is to specify the matrices 𝑍 and Λ that correspond to valid representations. To do so, we note that 𝑃 = 𝛼,𝛼 𝑎 and it is not hard to show that this yields 𝑃 = + + + ⟨ − ⟩ 𝑃 𝑃 𝑍′(𝑍𝑃 𝑍′ + Λ)−1𝑍𝑃 . Applying the matrix inversion lemma to this equation, − − − − − 45

wearriveat: 𝑍′Λ−1𝑍 = 𝑃−1 𝑃−1 + − − Any pair (𝑍,Λ) that satisfies this equation, along with Λ positive semidefinite (since it results from an inner product), describes what we will call a feasible representation. It is in this way that the choice of 𝑃 in the static RI-LQG tracking problem defines and + + 𝒲 thereby defines . We now define a slightly more general concept of representation and + ℐ giveaformaldefinitionofafeasiblerepresentation. Definition12: ForastaticRI-LQGtrackingproblemwithsolution𝑃 ,arepresentationof + dimension𝑚isdefinedasatuple(𝑑,𝑍,Λ−1)suchthat: a. 𝑑isan𝑚 1vectorthatisconstantwithrespecttothepriorinformationset × b. 𝑍 isan𝑚 𝑛matrixwithfullrowrank16 × c. Λ−1 isan𝑚 𝑚positivesemidefinitematrix × d. Theequation𝑍′Λ−1𝑍 = 𝑃−1 𝑃−1 issatisfied + − − BecauseweonlyrequireΛ−1 positivesemidefinite,sucharepresentationcannotalwaysbe meaningfully written in terms of some target contaminated by a well-defined noise term. Wethereforeintroduceanadditionalcondition: e. For some 0 < ℓ 𝑚, we can write 𝐸Λ−1𝐸′ = Λ−1 0 , where Λ−1 is an ≤ (ℓ) ⊕ (𝑚−ℓ,𝑚−ℓ) (ℓ) ℓ ℓ positive definite matrix, 0 is an 𝑚 ℓ 𝑚 ℓ matrix of zeros, and (𝑚−ℓ,𝑚−ℓ) × − × − 𝐸 is the product of elementary matrices that potentially implement row-swapping transformations. A feasible representation is a representation that additionally satisfies condition (e). We can then define 𝐸Λ𝐸′ = Λ 𝐼 and so any feasible representation can be written (ℓ) (𝑚−ℓ) ⊕∞ 16 It is not too difficult to expand this definition to include rank deficient 𝑍, but these cases are not importantforourpurposesandincludingthemwouldcomplicatetheexpositionthatfollows. 46

asavector𝑦 inthefollowingform: + 𝑦 = 𝑑+𝑍𝛼+𝜀, 𝜀 𝑁(0,Λ) (12) + ∼ This definition is still somewhat loose, but it is understood that the agent simply does not process any updated data regarding the components of 𝑦 with infinite noise variance.17 + We refer to a feasible representation as “proper” if ℓ = 𝑚, so that Λ−1 is positive definite, and as improper if ℓ < 𝑚, so that Λ−1 is only positive semidefinite. Since the block of 𝑦 with infinite noise variance corresponds to variables for which no data is processed by + the agent, every improper representation can be made proper simply by eliminating the improperblockandconsideringareducedrepresentationofdimensionℓ. Definition 13: Given a feasible representation (𝑑,𝑍,Λ−1), the reduced form of that representation is denoted (𝑑 ,𝑍 ,Λ−1), where ℓ and Λ−1 are as defined in Definition 12 part (ℓ) (ℓ) (ℓ) (ℓ) (e), and 𝑑 and 𝑍 contain the first ℓ rows of 𝐸𝑑 and 𝐸𝑍, respectively. If the feasible (ℓ) (ℓ) representationisdenoted𝑦 ,thenitsreducedformissimplythefirstℓrowsof𝐸𝑦 . + + SinceΛ−1 ispositivedefinitebyconstruction,thereducedformofafeasiblerepresentation (ℓ) is proper. As a consequence, we can now give results for proper representations that automaticallyextendtothelargerclassoffeasiblerepresentationsthroughthereducedformof thelatter. AlowerboundforthedimensionofanyrepresentationisgiveninLemma10. Lemma10: a. The minimum dimension of any representation is the rank of the solution, so that 𝑚 𝑟. ≥ b. The dimension of any proper feasible representation is equal to the rank of the solu- 17 Forexample,eventhoughtherowandcolumninterchangeoperationsarewelldefined,constructingΛ asintheformularequiresinterpretingtheproductofzeroandinfinityasequaltozero. 47

tion,sothat𝑚 = 𝑟. Finally,wenotethateveryfeasiblerepresentationhasacorrespondinginnovationrepresentationthatcanbewrittenas𝑣 = 𝑦 𝐸[𝑦 ],andthateveryinnovationrepresentation + + + − − | ℐ isafeasiblerepresentationinitsownrightdenotedby( 𝑍𝑎 ,𝑍,Λ−1). − − Implicit in the definition of a representation is the requirement that the no-formatting constraintbesatisfied,sinceΛ−1 willbepositivesemidefiniteifandonlyif𝑃 𝑃 ispositive − + − semidefinite. Ifweallowedtheno-forgettingconstrainttobeviolated,thenwewouldhave toadmitrepresentationscontainingnoisetermswithnegativevariancesassociatedwithone ormorelinearcombinations. Iftheno-forgettingconstraintisjustsatisfied,thenaccording toourdefinitionarepresentationdoesexist,butΛ−1 willbesingular. Thisimpliesthatany representationmustincludeanoisetermwithinfinitevarianceassociatedwithoneormore linearcombinations. Thisisnotinvalid,sinceitsimplyimpliesthattheagentprocessesno newinformationaboutthosecombinations,butitdoescompelustomakeadistinctionbetween feasible and infeasible representations. This is because if those linear combinations associated with infinite variance are not separable in the representation from those combinations associated with finite variance, a meaningful noise term cannot be constructed. Forthisreason,thefeasiblerepresentationsexactlyformalizethewaysinwhichonecould meaningfully understand the processing of incoming data by a rationally inattentive agent. The infeasible representations are mathematically valid objects, but they do not provide insightintothemechanismofinformationprocessingbyanagent. Wecannowusetheclassoffeasiblerepresentationstounderstandthesolutiontothestatic RI-LQGtrackingproblemaswellasthecorrespondingaction. Theorem 5: Given a proper feasible representation (𝑑,𝑍,Λ−1) and associated innovation representationdenoted𝑣 ,thesolutiontothestaticRI-LQGtrackingproblemcanbewrit- + 48

tenas: 𝑎 = 𝑎 +𝐾 𝑣 (13) + − 𝑣 + 𝑃 = (𝐼 𝐾 𝑍)𝑃 + 𝑣 − − where𝐾 = 𝑃 𝑍′(𝑍𝑃 𝑍′ +Λ)−1. 𝑣 − − TheseformulaswillbefamiliarastheupdatingstepoftheKalmanfilter,and,accordingly, assimilartothesolutiontotheLQGsignalextractionproblemdiscussedabove. Crucially, though, note that the signal extraction problem computes the optimal unknown action 𝑎 + foragivenobservation𝑦 . Inourcase,thesolutiontothestaticRI-LQGtrackingproblem + yields agiven action𝑎 and wehad toderive thecorresponding set ofrepresentationsthat + couldbeconsideredasgeneratingit. Thispointisimportantbecausethefundamentalfora rationallyinattentiveagentistheactionitself,derivedasasolutiontothetrackingproblem, and it is unnecessary to posit an “observation” vector. While it is often useful to consider the problem as if the agent has processed the data as a particular representation, it must be rememberedtherearemanysuchrepresentationsthatwouldbeequallyvalid. 5.1.1 Illustration: simplifiedvectorspace (a) (b) (c) (1) (2) W+ W+ Z(3)α ε(3) y + (3)=v + (3) α α η(2) α Z(4)α ε(4) η(1) a( + 2) y + (4)=v + (4) θ(1) a( + 1) (1) θ(2) W+⊥ (2) a+ W+⊥ Figure4: VisualizationofthestaticRI-LQG trackingproblem,solution,action,andrepresentationsinasimplifiedvectorspace We can illustrate the algebraic approach using simplified vectors and vector spaces that 49

admitagraphicalrepresentation. Inanalogywithaunivariaterandomvariable,weconsider a target 𝛼 in the encompassing space = R2. The problem is then to find an action 𝑎 + 𝒱 ∈ R2 that minimizes the (squared) Euclidean distance between target and action, 𝑑(𝛼,𝑎 ) = + 𝛼 𝑎 ,𝛼 𝑎 . We will make two simplifications. First, since this is analogous to a + + ⟨ − − ⟩ univariate problem, the loss matrix 𝑊 is 1 1, and we will normalize it to unity. Second, × we will ignore prior information so that 𝑎 = 0 and = 0 ; this will imply that − − 𝒲 { } = .18 * + 𝒲 𝒲 Ourfirststepisasbefore: theformofanyoptimalactionwillbeaprojectiononasubspace R2. Wecanthenwritetheorthogonaldecomposition𝛼 = 𝑎 +𝜂,where𝜂 ⊥ 𝒲 + ⊆ + ∈ 𝒲+ ⊆ R2. The vector 𝜂 represents tracking error, and the loss function can be interpreted as minimizingthelengthofthetrackingerrorvector: 𝑑(𝛼,𝑎 ) = 𝜂,𝜂 ,sothisisthefamiliar + ⟨ ⟩ sumofsquarederrorslossfunction. Now,theinnerproductconceptinthissimplifiedspace isanalogoustotheconceptofcovarianceinthefullproblem,andsowehave 𝛼,𝛼 = 𝑃 , − ⟨ ⟩ 𝑎 ,𝑎 = 𝑃 𝑃 , and 𝜂,𝜂 = 𝑃 . Thus, as before, our tracking objective is to + + − + + ⟨ ⟩ − ⟨ ⟩ minimize 𝑃 . The positive semidefiniteness constraints from the full problem are easily + understoodinthiscontextasrequiringvalidactionanderrorvectors(i.e. thatthesevectors musthavenonnegativelengths). [︂ ]︂′ For this illustration we will set 𝛼 = 0 1 , and in Fig. 4 (a) we show an example of vectors 𝑎 (1) , and 𝜂(1) that satisfy the definition of 𝑎 as a linear projection for some value + + (1) (1) ⊥(1) 𝑃 . We have also shown the corresponding subspaces and , and it is easy to + + + 𝒲 𝒲 seethat𝑎 istheprojectionof𝛼onto ,while𝜂 istheresidual. InFig.4(b),weshowa + + 𝒲 different set of action and error vectors that satisfy the above definition, but for a different value 𝑃 (2) . Because the length of 𝜂(2) is smaller, these new vectors must correspond to + (2) (1) decreased posterior uncertainty: 𝑃 < 𝑃 . Since 𝑃 defines the length of 𝜂, it is easy + + + to visualize how it is that 𝑃 specifies the vector space and so ties down the posterior + + 𝒲 18 We could extend the example to include a nontrivial prior, but it would require more complicated graphicsthatwouldobscureourprimarygoal. 50

informationset . + ℐ The remaining problem, analogous to Definition 1, is to select the optimal length of the errorvector,𝑃 ,subjecttoeitheraconstraint 1 log (𝑃 /𝑃 ) 𝜅orafixedcost𝜆oflength + 2 𝑏 − + ≤ reduction. ItisinterestingtonotethatinthecaseofunivariateGaussianrandomvariables, themutualinformationdefiningtheanalogousconstraintcanbewrittenas 1 log 1/(1 𝜌2) 2 𝑏 − where 𝜌 denotes correlation. Here, correlation is analogous to the cosine of the angle between the target and action, defined by cos(𝜃) = ⟨𝛼,𝑎+⟩ . Thus, another way to write ‖𝛼‖‖𝑎+‖ theconstraintforthisexamplewouldbeintermsoftheanglebetweenactionandtarget,to illustratethis,wehaveindicatedthecorrespondinganglesinFig.4(a)and(b). ThesolutiontothisproblemismechanicallythesameasinTheorems1and2. Sinceweset 𝑊 = 1,wehave𝑃 = min 𝜆,𝑃 . Then,giventheformof𝑎 andasolution𝑃 ,wecan + − + + { } construct the vector space and define the class of representations. To do so, consider + 𝒲 [︂ ]︂′ [︂ ]︂′ an arbitrary 𝑦 R2. We have 𝛼 = span( 0 1 ) = 𝑍 0 1 𝑍 R , so that we ∈ 𝒱 { | ∈ } [︂ ]︂′ [︂ ]︂′ can write 𝑦 = 𝑍𝛼+𝜁, where 𝜁 span( 1 0 ) = 𝑐 1 0 𝑐 R . Because we set ∈ { | ∈ } = 0 , we must have proj 𝑦 = 0 so that 𝑣 = 𝑦 and 𝜀 = 𝜁, with 𝜀,𝜀 = 𝑐2 Λ. 𝒲 − { } 𝒲− + ⟨ ⟩ ≡ Now, for a pair (𝑍,Λ) to be valid, it must satisfy 𝑍2/Λ = 1/𝑃 1/𝑃 . For any solution + − − 𝑃 , the right hand side is fixed, so that larger elements 𝑍 require larger Λ. Finally, for any + valid pair (𝑍,Λ), the associated innovation representation can be taken as a basis vector (3) (3) defining the subspace as = span(𝑣 ). In Fig. 4 (c), we plot representations 𝑦 = 𝑣 * + + + 𝒲 and 𝑦 (4) = 𝑣 (4) arising from two valid pairs (𝑍(3),Λ(3)) and (𝑍(4),Λ(4)). It is easy to see + + that any valid representation must lie in the subspace and, conversely, the action 𝑎 + + 𝒲 willalwaysbeaprojectionontothesubspacespannedbyavalidrepresentation(and,more generally,alsoanypriorinformation). 51

5.2 The fundamental and canonical representations Inthissection,wepresentseveralimportantrepresentations. Definition 14: The fundamental representation is defined by 𝑑 = 0, 𝑍 = 𝐼, and Λ−1 = 𝑓 𝑃−1 𝑃−1 and is denoted (0,𝐼,Λ−1). If the fundamental representation is feasible, we + − − 𝑓 writeitas: 𝑦 = 𝛼+𝜀 , 𝜀 𝑁(0,Λ ) (14) 𝑓 𝑓 𝑓 𝑓 ∼ IfthesolutiontothestaticRI-LQGtrackingproblemisfullrankthenthefundamentalrepresentationwillbefeasible,andalsoproper,butmoregenerallyitwillusuallybeinfeasible except in cases that exhibit a separation across the prior covariance and loss matrices that extendsalsototheposterior. Itistemptingtoviewthefundamentalrepresentationasthemoststraightforwardrepresentation, because it corresponds to the “true (fundamental) target plus white noise” concept often considered in the rational inattention literature. From the perspective of the agent, however, it is more natural to consider a representation based on the canonical synthetic target, because this latter target encapsulates the information of importance. Not only that, but the fundamental representation is often infeasible, whereas it will turn out that such a “canonicalrepresentation”willalwaysbefeasible. Definition 15: The canonical representation is defined by 𝑑 = 0, 𝑍 = 𝑆, and Λ−1 = 𝑐 𝑐 𝑐 (𝑁+)−1 𝐼 andisdenoted(0,𝑍 ,Λ−1). Wewriteitas: − 𝑐 𝑐 𝑦 = 𝛽 +𝜀 , 𝜀 𝑁(0,Λ ) (15) 𝑐 𝑐 𝑐 𝑐 𝑐 ∼ where 𝛽 = 𝑆𝛼 is the canonical synthetic target. Because Λ−1 is diagonal, the canonical 𝑐 𝑐 representationisalwaysfeasible. 52

The canonical representation corresponds to “true (canonical synthetic) target plus white noise”. While the fundamental target describes the shocks as they appear in the economy, the canonical target describes synthetic shocks as they matter - separately - to the agent. Because of this, it is conceivable how the agent could operationalize the solution to their problem in terms of this representation, by considering each component separately and choosing whether and how much to pay attention to each by adjusting the variance of the informationprocessingnoise. Although the canonical representation is always feasible it is not always proper, because the agent may process no information about some components. However, by applying Definition13,wecanalwaysconstructareducedcanonicalrepresentationthatisproper. Definition 16: We write the reduced form of canonical representation as (0,𝑍 ,Λ−1) and 𝑟 𝑟 denoteitby𝑦 . 𝑟 This reduced canonical representation is perhaps the most useful representation, since it correspondstothecanonicaltarget,containsanoisetermwithafinitediagonalcovariance matrix, and can always be used to construct the action, by application of Lemma 6 and Theorem3. 5.3 Representation form of the static RI-LQG tracking problem We can now state an alternative form of the static RI-LQG tracking problem, which is in termsofselectingarepresentationratherthantheposteriorcovariance. 53

Definition17: TherepresentationformofthestaticRI-LQGtrackingproblemis: min 𝑡𝑟(𝑊𝑃 )+𝜆(ln 𝑃 ln 𝑃 ) (16) + − + 𝑍,Λ−1 | |− | | s.t.𝛼 𝑁(𝑎 ,𝑃 ) − − − | ℐ ∼ Λ−1 0 ≥ 𝑃 = (𝑍′Λ−1𝑍 +𝑃−1)−1 + − Thisformulationrequiresjointsolutionin𝑍 andΛ−1,anditisprimarilyofinterestbecause many examples in the rational inattention literature use a similar form. One difficulty with this formulation is that the solution is not unique. For example, if (𝑍,Λ−1) is a solution thensois(𝑋𝑍,(𝑋Λ𝑋′)−1)foreverynonsingularconformablematrix𝑋. Possiblyforthis reason, this formulation of the problem is often split into two parts, and an optimal 𝑍 is solvedforfirst. Withsomeoptimal𝑍 fixed,anassociatedoptimalΛ−1 canbesolvedfor. 5.4 Representation form of the action We can also now characterize the action in terms of specific representations, and derive the results we simply asserted when previously describing the action. From Theorem 5, for any feasible innovation representation ( 𝑍𝑎 ,𝑍,Λ−1) we have 𝑎 = 𝑎 + 𝐾 𝑣 , − + − 𝑣 + − and we can extend this to any feasible representation by writing 𝑣 = 𝑦 𝑍𝑎 so that + + − − 𝑎 = (𝐼 𝐾 𝑍)𝑎 + 𝐾 𝑦 . If 𝑍 has full row rank, then we can further write 𝑎 = + 𝑣 − 𝑣 + + − (𝐼 𝐾 𝑍)𝑎 + 𝐾 𝑍(𝛼 + 𝑍−𝜀) where 𝑍− denotes the Moore-Penrose pseudo inverse. 𝑣 − 𝑣 − From this it is not hard to show that 𝐾 = 𝐼 𝑃 𝑃−1 = 𝐾 𝑍, and so by defining 𝛼ˆ = + − 𝑣 − 𝛼+𝑍−𝜀,wehavetheformulationpresentedoriginally. Weemphasizethateventhough𝛼ˆis generallynotafeasiblerepresentation,theaction𝑎 isalwaysvalidandtheweightmatrix + 𝐾 is always well-defined. This underscores once more the result that the representation of the data in an observation-like form is inessential to the solution of the static RI-LQG 54

tracking problem. If it happens that 𝑦 is a feasible representation, then 𝛼ˆ = 𝑦 , and we 𝑓 𝑓 have 𝑎 = (𝐼 𝐾)𝑎 + 𝐾𝑦 . However, we can always construct the action using some + − 𝑓 − feasible representation. In particular, if we consider the canonical representation, we have 𝑎 = (𝐼 𝐾 𝑍 )𝑎 +𝐾 𝑦 anditiseasytoshowthat𝐾 = 𝑅(𝐼 𝑁+),andalittlealgebra + 𝑐 𝑐 − 𝑐 𝑐 𝑐 − − brings us to either Lemma 6 or Theorem 3. Finally, these results could be easily rewritten intermsoftheproperfeasiblerepresentation𝑦 ,since1 𝑛+ = 0for𝑖 = 𝑟+1,...,𝑛. 𝑟 − 𝑖 6 Application: rationally inattentive price-setting In this section, we consider the model of rationally inattentive price-setters introduced by Mac´kowiak and Wiederholt (2009), which we will refer to as MW. While our analytic resultsonlyextendtothestaticversionoftheproblem,thishasbecomeausefulbenchmark case. First, we apply the method derived in this paper to solve the problem formulated by MWinwhichafixedcapacityofattentionisemployedalongwitharestrictionthatamounts to requiring a diagonal posterior covariance. With the more general solution method now availabletous,wecanalsoconsiderthreealternativeformulations,andwediscusshowthe resultschangeineachvariant. The basic setup considers a unit mass of monopolistically competitive firms indexed by 𝑖, each with identical profit function 𝜋(𝑃 ,𝑃 ,𝑌 ,𝑍 ) where 𝑃 is the price of firm 𝑖’s 𝑖𝑡 𝑡 𝑡 𝑖𝑡 𝑖𝑡 differentiated good, 𝑃 is the aggregate price level, 𝑌 is real aggregate demand, and 𝑍 𝑡 𝑡 𝑖𝑡 is a firm-specific productivity shock. We assume an exogenous process for nominal aggregate demand, 𝑄 = 𝑃 𝑌 . Denote a second-order approximation to this profit function 𝑡 𝑡 𝑡 by𝜋˜(𝑝 ,𝑝 ,𝑦 ,𝑧 )wherethelowercasevariablesdenotelog-deviationfromnonstochastic 𝑖𝑡 𝑡 𝑡 𝑖𝑡 ∫︀1 steady-state. The aggregate price is log-approximated as 𝑝 = 𝑝 𝑑𝑖. The optimal price 𝑡 0 𝑖𝑡 55

underperfectinformationis: (︂ )︂ 𝜋ˆ 𝜋ˆ 𝜋ˆ 𝑝◇ = 14 𝑧 + 13 𝑞 + 1 13 𝑝 𝑖𝑡 𝜋ˆ 𝑖𝑡 𝜋ˆ 𝑡 − 𝜋ˆ 𝑡 11 11 11 | | | | | | where 𝜋ˆ denotes a second partial derivative of the profit function evaluated at the non- 𝑖𝑗 stochastic steady-state. It is not hard to see that under perfect information, equilibrium yields 𝑝 = 𝑞 . To extend this to incorporate imperfect information, we follow MW in as- 𝑡 𝑡 sumingthatfirmssetpricestotracktheperfectinformationpricebutarerationallyinattentive. Becausetheapproximateprofitfunctionisquadratic,thisisintheformofanRI-LQG tracking problem. Here we focus on the static case, and so we assume that 𝑧 and 𝑞 are 𝑖𝑡 𝑡 Gaussian white noise with variances 𝜎2 and 𝜎2, and that 𝑧 𝑞 .19 As described above, 𝑧 𝑞 𝑖𝑡 ⊥ 𝑡 we know that the form of the action will be the conditional expectation20 𝑝* = 𝐸[𝑝◇ ] 𝑖𝑡 𝑖𝑡 | ℐ + and,giventhisformofthesolution,theexpectedlossinprofitsduetosettingasuboptimal pricewillbe: 𝜋ˆ 𝐸[𝜋˜(𝑝◇,𝑝 ,𝑦 ,𝑧 ) 𝜋˜(𝑝*,𝑝 ,𝑦 ,𝑧 ) ] = | 11 |𝐸[(𝑝◇ 𝑝*)2 ] 𝑖𝑡 𝑡 𝑡 𝑖𝑡 − 𝑖𝑡 𝑡 𝑡 𝑖𝑡 | ℐ − 2 𝑖𝑡 − 𝑖𝑡 | ℐ − Based on the form of the perfect information equilibrium aggregate price, we follow a guess-and-verify approach to solve for the imperfect information equilibrium, guessing that𝑝 = 𝛾𝑞 .21 Thenfirm𝑖willsettheirpriceaccordingto: 𝑡 𝑡 [︂ (︂ )︂ ]︂ 𝜋ˆ 𝜋ˆ 𝜋ˆ 𝑝* = 𝐸 14 𝑧 + 13 𝑞 + 1 13 𝛾𝑞 𝑖𝑡 𝜋ˆ 𝑖𝑡 𝜋ˆ 𝑡 − 𝜋ˆ 𝑡 | ℐ + 11 11 11 | | | | | | 19 Giventhegeneralityofthesolutionmethodderivedinthispaper,itisnolongeressentialthat𝑧 and 𝑖𝑡 𝑞 beindependent,butwemaintainthisassumptionforcomparisonwithMac´kowiakandWiederholt(2009). 𝑡 20 In the equation for the imperfect information case, Mac´kowiak and Wiederholt (2009) condition on a vector of signals 𝑠𝑡. For the reasons described earlier in this paper, we use the more general posterior 𝑖 informationset . + ℐ 21 Mac´kowiakandWiederholt(2009)write𝑝 =𝛼𝑞 ,butweuse𝛾 inplaceof𝛼toavoidconfusionwith 𝑡 𝑡 thefundamentaltarget. 56

This is a best response function given a particular 𝛾, and the equilibrium solution represents a fixed point. For a given 𝛾, the rational inattention problem be written in the form [︂ ]︂′ of Definition 1, where the target vector is 𝛼 𝑖𝑡 = 𝑧 𝑖𝑡 𝑞 𝑡 , and we can define a weight (︁ )︁′ vector 𝑤 = (𝑤 ,𝑤 )′ = 𝜋^14 ,𝛾 +(1 𝛾) 𝜋^13 so that the loss function is defined by 𝑧 𝑞 |𝜋^11| − |𝜋^11| the positive semidefinite matrix 𝑊 = |𝜋^11| 𝑤𝑤′; note that rank(𝑊) = 1. We assume 2 that agents have no special prior knowledge, so that 𝛼 = 𝛼 𝑁(0,Ω), where 𝑖𝑡 − 𝑖𝑡 | ℐ ∼ 𝑃 = Ω = diag 𝜎2,𝜎2 . − { 𝑧 𝑞} At this point, MW make the additional assumption that firms must pay attention to 𝑧 and 𝑖𝑡 𝑞 separately; following them, we refer to the this as the independence assumption. We 𝑡 will expand on their results by considering four cases: with or without the independence assumption, and employing either fixed capacity or fixed marginal cost of attention. Since ourframeworkimmediatelyhandleseitherafixedcapacityorafixedmarginalcostformulations, we need only now describe how to modify the problem and solution to impose the independenceassumption. It is most straightforward to introduce the independence assumption using the representation form of the problem given in Definition 17, because this assumption is most naturally interpreted as a limit on the form that representations (or “signals”, in their terminology) may take. The formalization of the independence assumption of MW requires that any representation form of the solution has both 𝑍 and Λ−1 as diagonal matrices. The implications for the posterior covariance matrix are easy to see by considering the equation 𝑃−1 = 𝑍′Λ−1𝑍 + 𝑃−1. Combined with the assumption that 𝑧 𝑞 , this requires that + − 𝑖𝑡 𝑡 ⊥ in any solution the posterior covariance matrix must be diagonal. However, it is also clear that the independence assumption does not put restrictions on the diagonal elements of 𝑃−1. Thiscanbestatedasfollows: theindependenceassumptionrestrictstheeigenvectors + of 𝑃 but not the eigenvalues, and, in this example, this amounts to requiring that 𝑄 = 𝐼, + where𝑄isasdefinedinLemma2. 57

0 0 zit tq Illustration:canonicaltargets βc +(Gen)=p(cid:5)it 1.0 βc +(Ind) (Gen) 0.8 W+ (Ind) W+ 0.6 0.4 0.2 0.0 0 2 4 6 8 10 12 κ γ − )+ zn − 1( Illustration:responsivenessdifferential General Independence 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.5 1.0 1.5 2.0 2.5 λ × 10−6 γ Illustration:multipleequilibria 1.0 0.8 0.6 γ Var(qt|I+) 0.4 0.2 0.0 0 2 4 6 8 10 12 κ γ Illustration:socialcostofattention 1.0 0.8 0.6 γ 0.4 Var(pit|I+) 0.2 0.0 )+I|tq(raV × 10−4 5 4 3 2 1 0 )+I|tip(raV × 10−5 Figure5: Illustrationsofpossiblebehaviorintherationalinattentionprice-settingmodel It might seem at first that our solution method cannot be applied here, because 𝑄 is the matrix of eigenvectors of 𝑉 = 𝐿′𝑊𝐿, and it is clear that this matrix will not be diagonal, given the loss matrix derived above. However, the portion of the objective function that is responsible for the eigenvectors of the posterior covariance matrix is 𝑡𝑟(𝑊𝑃 ) and it + is not hard to show that the independence assumption requires that this term be equal to 𝑤2𝜎2𝑛+ + 𝑤2𝜎2𝑛+. This suggests that we can simultaneously impose 𝑄 = 𝐼 while still 𝑧 𝑧 1 𝑞 𝑞 2 applying the basic structure developed in this paper by employing a different loss matrix, 𝑊 = diag(𝑤2). For this particular example, this is a way of imposing the independence 𝐼 assumptionwhilestillallowingustoemployTheorems1and2toachievethesolution. Wenowconsiderfourcases. Thefirsttwoconsiderthefixedmarginalcostandfixedcapacityformulationsinthegeneralcase,whilethesecondtwoproceedundertheindependence assumption. To conserve space, we relegate the details of the solutions to Appendix B, but broadly the solution involves two steps. We first take 𝛾 as given and solve the static 58

RI-LQGtrackingproblem. Then,sincethisyieldsattentionallocationsthatthemselvesdepend on 𝛾, the second step is to solve the fixed point problem and find equilibrium values of 𝛾. Although there are many interesting differences between these models, due to space constraintswefocushereononlythree: (1)thedefinitionofthecanonicalsynthetictarget, (2)theresponsivenessofagentstothetwotypesofshocks,and(3)thevaluesof𝛾 inequilibrium. Wearemostinterestedinhighlightingthedifferencebetweenthegeneralcaseand the case under the independence assumption; a few selected illustrations appear in Fig. 5 andwillbediscussedbelow. The first, and most obvious, difference between these cases is the resultant canonical synthetic target. This is an important difference, because the components of this target define theobjectsofattentionforrationallyinattentiveagents. Inthegeneralcase,thereisasingle canonical target that consists of the optimal price, while under the independence assumption there are two canonical targets: the idiosyncratic and aggregate shocks (this latter fact was exactly the goal of the assumption).22 The canonical targets for each case are visualized in the upper left panel of Fig. 5. Both choices may appear reasonable, but while MW advocate for the independence assumption, we argue that the general case should be preferred; this issue is considered in detail in the next section. For now, we focus on an important practical effect, that the two cases lead to a qualitative difference in the form of theposterioruncertaintychosenbyagents. We showed above that uncertainty is only reduced for the space spanned by the canonical targets to which the agent actually pays attention. Here, under the independence assumption, this space can include both the idiosyncratic and aggregate shocks, but in the general case the space can only be a hyperplane corresponding to a single particular linear combination of the shocks. This difference in the dimensions of the posterior vector spaces is also apparent in the upper left panel of Fig. 5. The implication of this is that as the cost of 22 Technically even in the general case there are two components to the canonical synthetic target, but theagentneverpaysattentiontothesecondcomponentanditisdefinedonlybyitsorthogonalitytothefirst component. 59

attentionfallstozero(orthecapacityrisestoinfinity),undertheindependenceassumption the agent will become fully informed about the idiosyncratic and aggregate shocks separately, but in the general case the agent will only become fully informed about the specific linearcombinationthatisrelevanttotheireconomicproblem,sothatsomeuncertaintywill remain about the shocks themselves. Thus the independence assumption yields suboptimal behavior for the agent, since they are acquiring costly information that they do not use. Specifically, for any given parameterization, posterior uncertainty about the optimal priceandtheobjectivefunctionitself(bothofwhichtheagentwishestoreduceasmuchas possible)willbehighergiventheindependenceassumption,althoughposterioruncertainty abouteithertheidiosyncraticoraggregateshocksindividuallywillbehigherinthegeneral case. The second difference between these cases that we consider is the implied responsiveness of rationally inattentive agents to shocks. To compute the responsiveness to a shock to the optimal price, we use the result (derived in the appendix) that for all the cases considered above,wecanwritetheoptimalposteriorintheform 𝑝* = (1 𝑛+)𝑤 𝑧 +𝛾𝑞 +𝜀 𝑖𝑡 − 𝑧 𝑧 𝑖𝑡 𝑡 where 𝛾 = (1 𝑛+)𝑤 and 𝜀 is a mean zero noise term whose variance may differ under − 𝑞 𝑞 the various cases. For the general case, the solution imposes 𝑛+ = 𝑛+, whereas under 𝑧 𝑞 theindependentassumptiontheymaydiffer. Recallingthattheperfectinformationoptimal priceis𝑝◇ = 𝑤 𝑧 +𝑞 ,wecanmeasuretheresponsivenessoffirmstoidiosyncraticshocks 𝑖𝑡 𝑧 𝑖𝑡 𝑡 as (1 𝑛+) and to aggregate shocks as 𝛾. These values are between zero and one, and, − 𝑧 sinceunderperfectinformationbothoftheseareequaltoone,theydescribethefractionof ashockreflectedintheactionofarationallyinattentiveagent. One feature of particular interest in MW and the related literature is whether it is possible that firms exhibit conditional responsiveness; that is, high responsiveness to idiosyncratic 60

shocksandlowresponsivenesstoaggregateshocks. ThekeyresultofMWisthatunderthe independenceassumption,thiscanbeachievediffirmspaycloseattentiontoidiosyncratic shocks (so that 𝑛+ is close to zero) but they do not pay close attention to aggregate shocks 𝑧 (sothat𝑛+ isclosetooneandhence𝛾 isclosetozero). Inthegeneralcase,since𝑛+ = 𝑛+ 𝑞 𝑧 𝑞 is imposed, it is more difficult to achieve this conditional response. Using a calibration based on that of MW, we compute the difference in responsiveness, (1 𝑛+) 𝛾, across − 𝑧 − a range of values for the marginal cost and capacity parameters. We plot these values in the upper right panel of Fig. 5 for the fixed capacity formulations. Under the independence assumption, we find that an arbitrary difference can be achieved for some value of the marginal cost or fixed capacityparameters, andmoreover that one can achieve any differencewhilealsorequiringthatfirmsrespondnearlyperfectlytoidiosyncraticconditions. This confirms the result of MW. In the general case, for this calibration, we find that the maximum difference is about 45 percentage points, and that this difference occurs when firms respond to about 75 percent of idiosyncratic shocks. This indicates that the independence assumption is not crucial to achieving a conditional response to shocks. However, since the contrast between attention paid to idiosyncratic and aggregate shocks is not as stark in the general case, this suggests that a richer price-setting model may be required to matchempiricaldataonprices.23 Thefinalissuethatwewillconsiderishowtheequilibriumvaluesof𝛾 varyoverthecases. The term 𝛾 controls the strength of “coordination” across firms to aggregate shocks: if 𝛾 is high, then aggregate shocks have a high pass-through to all individual price-setting decisions, whereas if 𝛾 is low, aggregate shocks have a smaller impact. The primary result hereisthatforagivenparameterization,𝛾 willgenerallybelowerundertheindependence assumptionthanitwouldbeinthegeneralcase. Thisisbecauseundertheindependenceas- 23 This is unsurprising, since the seminal model of Mac´kowiak and Wiederholt (2009) was deliberately leftrelativelysimpletoexposethekeymechanism. Forexample,Fulton(2015)demonstratesthatevenwith theindependenceassumption,amorecomplexmodelisrequiredtomitigatecalibrationissuesthatimplyan implausibledifferentialbetweenthevolatilityofaggregateandidiosyncraticshocks. 61

sumptionfirmsmustpayattentiontotheseshocksseparately,andsopartoftheinformation collected is unused. The end result is that it is more costly for firms to pay attention to aggregateconditions. Thetwocasesalsodisplaymarkedlydifferentequilibriumbehaviorfor 𝛾: in the general case 𝛾 is monotonic nonincreasing in the marginal cost of attention and, at least for reasonable calibrations, there is a unique equilibrium; under the independence assumption, there are regions in which decreases in the marginal cost of attention actually decrease𝛾,andthereareregionsadmittingmultipleequilibria. Thisricherequilibriumbehavior appears in the latter case because there are two components of the canonical target thatendupreceivingattentionfromtheagentandsomorecomplexinteractionscanarise. Multipleequilibriacanariseinthismodelduetothecombinationofstrategiccomplementarities and endogenous information choice. Here, if most agents are paying attention to aggregate shocks and set their prices accordingly, then the aggregate shock is actually relevant for every individual agent, whereas if few agents pay attention to aggregate shocks then the cost of ignoring them for any individual agent can be small. For the calibration we consider, there is a region of parameterizations for 𝜆 in the fixed marginal cost case undertheindependenceassumptionthatimpliesthreeequilibria: ahighequilibrium,alow equilibrium, and one in which 𝛾 is zero. We illustrate the equilibrium values of 𝛾 in this case along with the corresponding posterior uncertainty about the aggregate shock in the lowerleftpanelofFig.5. A social cost of increased attention can arise in this model for a similar reason. If no firm pays attention to aggregate shocks (so that in equilibrium 𝛾 = 0), then these shocks do not enter the optimal price-setting equation, and all the posterior uncertainty faced by an imperfectly informed agent is driven by the idiosyncratic shocks. As available attention rises, if firms start to pay attention to aggregate shocks (so that 𝛾 becomes nonzero), these shocks become relevant and this can result in an overall increase in posterior uncertainty. Because the expected loss in profits increases with posterior uncertainty, this makes all 62

firmsworseoff. ThisisillustratedinthelowerrightpanelofFig.5. As far as we are aware, rational inattention price-setting models incorporating multiple equilibria or a social cost of increased attention have so far not been considered in the literature - although there is of course a vast body of work dedicated to these issues in imperfectinformationcontextsgenerally. Wehopethatthesolutionmethodderivedinthis papermayfacilitateaccesstotheseinterestingquestions. 7 Modeling rational inattention problems We have emphasized throughout this paper that caution must be used when making modeling decisions in rational inattention models on the basis of intuition derived from signal extraction models. In this section, we consider the extent to which the independence assumption,andotherassumptionswithsimilarimplications,canbejustifiedwithoutrelying onaninappropriateanalogy. The independence assumption, introduced in Mac´kowiak and Wiederholt (2009) (MW), has seen increased use in the rational inattention literature in recent years. A portion of its appeal is surely because it made the rational inattention problem more tractable (and indeed many authors do not provide a justification for its use), although this concern is less relevant now that we have derived an exact solution in the static case. However, it was introduced by MW not only for convenience but because they argued that the general case of the model was implausible. As we showed above, the linear combination that defines the relevant canonical target in the general case is exactly the linear combination that generates the optimal price from the idiosyncratic and aggregate shocks; this implies that the canonical representation is exactly of the form “profit-maximizing price plus noise”. MWwritethatthis“amountstoassumingthatthedecisionmakercanattenddirectlytothe profit-maximizing price” and suggest “we think that, in most economic contexts, decision 63

makers cannot attend directly to the optimal decision ... The independence assumption is thesimplestwayofmodelingtheideathatdecisionmakingisaboutfirstpayingattentionto a variety of variables, and then combining these different pieces of information in a single decision”. Inthissection,wetaketheoppositepositionandarguethattheuseofthegeneral case is justified, and that the appeal of the independence assumption comes exactly from intuitionderivedfromsignalextractionmodelsthatdoesnotapplytorationallyinattentive agents. To consider this issue we will examine two related claims along the lines of those from MW: (1) a rationally inattentive agent should not have access to a representation of the form optimal action plus noise, and (2) there may be restrictions that prevent a rationally inattentiveagentfromprocessinginformationinanarbitraryway. Thefirstclaimaddresses whether we should require restrictions on representations (for example the independence assumption),whilethesecondclaimaddresseswhetherweshouldallowrestrictions. Thereisnodoubtthatthefirstclaimisplausibleinthecontextofasignalextractionmodel, in which case the word “representation” would be replaced with “observation”. It is certainly the case that datasets observed by agents are not usually in the form of their optimal decision. In a rational inattention context, however, the data observed by the agent is the fundamentaltarget,andthistargetisindeedunrelatedtotheoptimalaction. Thecanonical target, which is generally related to the optimal action, is not given to the agent but is constructed by them as they solve their problem and process the incoming data as efficiently as possible. Thus the existence of a representation in the form optimal action plus noise is notsuspicious,becauseitisexactlyconstructedbytheagenttocapturethemostimportant aspects of available information. We therefore reject the first claim and suggest that the default position for rational inattention models should not include assumptions restricting theformofrepresentationsavailabletotheagent. The second claim is more complex, and in some ways it is an obviously true statement 64

- it would be foolish to argue that real economic agents have no restrictions to data processing other than cognitive capacity, particularly when the economic agent in question is a firm, composed of many individuals. However, assessment of this claim must take into accountthecontextoftherelevantabstractionproposedbythemodel. Fortheformulation laiddowninSims(2003)andSims(2010),rationalinattentionmodels“donotsubsumeor claimtoreplaceallpreviouseconomicmodelsofcostlyinformation”;instead,theabstractionsupposesthefreeavailabilityofallunderlyinginformationsothatanyincompleteness of information is entirely due to inattention on the part of the agent. This is only straightforward in stylized examples and so some license must clearly be extended if the rational inattentionapproachistobeusedformorecomplexsituationsandagents. In designating a person as rationally inattentive, we model the person as a finite capacity channel through which information flows. The abstraction of the model ignores purely psychologicalquirksrelatedtoinformationprocessingsothatagent’sbehaviorcanbeconsidered through the lens of optimization.24 Since all relevant data is assumed to be freely available to the agent, imposition of the independence assumption would amount to the imposition of such a psychological quirk. Justification of such an assumption would presumably need to be done on a case-by-case basis, rather than as a general rule for rational inattention models. Ultimately, other frameworks, for example that of Woodford (2014), maybemorenaturalforproblemsinwhichtheseissuesrepresentseriousconcerns. Firms,ontheotherhand,arecomposedofmanyindividualsanddecisionmakingprocesses are often complex. It seems plausible that in designating a firm as rationally inattentive, whatwemeanisthatthefirm’soperationsgenerallyareconceivedasafinitecapacitychannel through which information flows, as individual managers consult a variety of information sources to make a myriad of decisions. For the price-setting example, the in-depth study of firm behavior in Zbaracki et al. (2004) suggests something like this. In this case, woulditnotbereasonabletoassumethatfirmshaveonegroupdedicatedtounderstanding 24Thispointismadeclearinfootnote1ofSims(2003). 65

idiosyncratic shocks and a second group dedicated to understanding aggregate shocks, so thattheindependenceassumptionwouldbejustified? Wearguenot. Thekeyconsideration for us is that these restrictions result in suboptimal outcomes as firms process costly but irrelevant information, and yet there is no particular barrier that prevents firms from structuring their decision-making in any way they please. While of course the actual structure offirmsisinfluencedbymanyconsiderations(forexampleeconomiesofscale),itdoesnot seem clear that there is any general justification for the independence assumption. Despite this,itisundeniablethatcomplexagentssurelyfacesomerestrictionsonthewaythatthey processinformation,andinspecificcasestheremaybeevidenceindicatingsomeparticular deviation from the baseline model. Therefore, while we advocate for applying the rational inattentionmodelwithoutadhocrestrictions,wedonotrejectthesecondclaimaltogether. 8 Extension: dynamic RI-LQG tracking problems Thesignalextractionandtrackingproblemscanusuallybeextendedasdynamicproblems inastraightforwardmanner,especiallyintheLQGcasewherethetargetfollowsthelinear transition law 𝛼 = 𝑇𝛼 +𝜂 with 𝜂 𝑁(0,Ω). This nests the static case when 𝑇 = 0. 𝑡 𝑡−1 𝑡 𝑡 ∼ ThedynamicsignalextractionproblemcanbesolvedrecursivelybytheKalmanfilter,and a key feature is that at each stage the solution is given by a conditional expectation, which we will denote 𝑎 = 𝐸[𝛼 𝑦𝑡], where 𝑦𝑡 collects the (exogenously given) observations 𝑡|𝑡 𝑡 | 𝑦 . Toconstructadynamictrackingproblem,weagainshedtheexogenouslyimposed 𝜏 𝜏≤𝑡 { } observationvector𝑦 ,andwealsonowassumethattheagentdiscountsthefutureatrate𝛽, 𝑡 sothattheproblemis: [︃ ]︃ ∞ ∑︁ min 𝐸 𝛽𝑡𝑑(𝛼 ,𝑎 ) 𝑡 𝑡|𝑡 0 {𝑎 } | ℐ 𝑡|𝑡 𝑡≥0 𝑡=0 66

along with the transition equation. By introducing an information constraint at each time period we can construct a rational inattention problem and proceed in a similar fashion to the static case. Sims (2003) and Sims (2010) show that at each stage it will be optimal (︀ )︀ to set 𝑎 = 𝐸[𝛼 ], and we have 𝐼(𝛼 ,𝑎 ) = 1 log 𝑃 log 𝑃 where 𝑡|𝑡 𝑡 | ℐ 𝑡 𝑡 𝑡|𝑡 | ℐ 𝑡−1 2 𝑏 | 𝑡 |− 𝑏 | 𝑡|𝑡 | 𝑃 = 𝑉𝑎𝑟(𝛼 ) and 𝑃 = 𝑉𝑎𝑟(𝛼 ). By using the transition law to derive the 𝑡 𝑡 𝑡−1 𝑡|𝑡 𝑡 𝑡 | ℐ | ℐ predictedcovariancematrix𝑃 ,wecanrecursivelydefinethedynamicRI-LQGtracking 𝑡+1 problem. Definition18: thedynamicRI-LQGtrackingproblem,denoted(𝑊,𝑎 ,𝑃 ,𝑇,Ω),is: − − min𝑡𝑟(𝑊𝑃 )+𝜆(ln 𝑃 ln 𝑃 )+𝛽𝜆ln 𝑃 (17) 𝑡|𝑡 − + 𝑡+1 𝑃 | |− | | | | 𝑡|𝑡 s.t.𝛼 𝑁(𝑎 ,𝑃 ) − 𝑡 𝑡 | ℐ ∼ 𝑃 = 𝑇𝑃 𝑇′ +Ω 𝑡+1 𝑡|𝑡 𝑃 0 𝑡|𝑡 ≥ 𝑃 𝑃 0 𝑡 𝑡|𝑡 − ≥ Becausethepriorcovariancematrixfortime𝑡+1,𝑃 ,dependsontheposteriorattime𝑡, 𝑡+1 thedynamicproblemfeatureslinkagesacrosstimethatdonotappearinthestaticproblem: decreasing uncertainty today makes it less costly to achieve a given level of uncertainty tomorrow. It is immediate that for any given marginal cost of attention, more attention will be allocated in the dynamic problem than would be allocated for the same problem with 𝑇 = 0. Another feature of the dynamic problem that does not appear in the static problem is that achieving equilibrium, if one exists, may take several periods. This is becauseinthestaticproblem,thepriorwasgenerallyequaltotheunconditionaldistribution so repetitions were generally identical, whereas in the dynamic problem the prior evolves from period to period and equilibrium is only reached when the prior 𝑃 is equal to the 𝑡 predictedcovariance𝑃 constructedusingtheoptimalposterior𝑃 . 𝑡+1 𝑡|𝑡 67

It is easy to check that the first order condition for the time 𝑡 iteration of the dynamic problemis: 𝑃−1 = 𝑊/𝜆+𝛽𝑇′𝑃−1𝑇 (18) 𝑡|𝑡 𝑡+1 Although the matrix 𝑃 that solves this equation generally cannot be given explicitly, it 𝑡|𝑡 is not too hard to compute it numerically. However, as before, the first order condition only solves the problem if the constraints are not binding. Similar to the static case, while thefirstpositivesemidefinitenessconstraintwillalwaysbesatisfied,theno-forgettingconstraint will usually be binding in the dynamic case. To understand why, first recall that Lemma 4, associated with the static problem, suggests that the rank of the solution, will alwaysbelessthantherankofthelossmatrix;thisresultisnotstrictlytrueinthedynamic case,althoughtheintuitionisstillusuallyvalid. Now,inordertomapdynamictargetsinto the form required by Definition 18, an augmented target usually has to be constructed in order to satisfy the requirement of a linear first order transition equation. Thus although the loss function is defined in terms of the original target, the loss matrix 𝑊 is defined in terms of the augmented target; this generally introduces rows and columns of zeros, and theresultisthatthelossmatrixformostproblemsisnotfullrank. For example, consider tracking an AR(2) target 𝛼𝑜 = 𝜑 𝛼𝑜 + 𝜑 𝛼𝑜 + 𝜂𝑜. In order to 𝑡 1 𝑡−1 2 𝑡−2 𝑡 putthisintoaformamenabletoDefinition18,wewrite: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 𝛼𝑜 𝜑 𝜑 1 1 0 𝛼 = ⎢ 𝑡 ⎥, 𝑇 = ⎢ 1 2 ⎥, 𝜂 = ⎢ ⎥𝜂𝑜, 𝑊 = ⎢ ⎥ 𝑡 ⎣ ⎦ ⎣ ⎦ 𝑡 ⎣ ⎦ 𝑡 ⎣ ⎦ 𝛼𝑜 1 0 0 0 0 𝑡−1 Thustheno-forgettingconstraintwillalmostalwaysbindfordynamicproblemsandsothe firstorderconditionwillnotprovidethesolution. Unfortunately,themethodofTheorem1 does not immediately help, because in the dynamic case the eigenvectors of 𝑃 cannot be 𝑡|𝑡 completely decoupled from the eigenvalues. Finding a fully general analytic solution for 68

thedynamicRI-LQGtrackingproblemremainsanopenproblem. Despitethis,ifasolution is found, for example numerically, many of the results derived this paper can be applied, since they only depend on the conditional Gaussianity of prior and posterior. Analysis can still proceed based on the generalized eigenvalue problem associated with the matrix pencil (𝑃 ,𝑃 ), where the generalized eigenvalues and left eigenvectors can be found by + − applying simultaneous diagonalization. Thus Proposition 6 is still valid and the canonical synthetic target is still well-defined, as is the rank of the solution and the associated definitions of information capacity allocations. The construction of the action as a projection onanappropriatelydefinedvectorspacealsocontinuestohold,asdoestheconceptoffeasible and proper representations. Of course, results that depend on the specific relation of thegeneralizedeigenvectorsandeigenvaluestothelossmatrix,especiallyTheorems1and 2,donotextendtothedynamiccase. In the special case of a one-dimensional target, it is sometimes possible to construct the solution analytically, and for the important class of one-dimensional targets following an ARMA(p, q) process under a fixed capacity constraint, an analytic solution has been derived by Matejka et al. (2017). Although their setup is nominally different, their results can be stated in the terms introduced here; we describe only a few. First, the rank of the solution will always be 𝑟 = 1, and so the no-forgetting constraint will always bind except possibly in the AR(1) case. Second, except for the AR(1) case, no solution admits a representationoftheform𝑦𝑜 = 𝑍𝑜𝛼𝑜+𝜀𝑜 where𝛼𝑜 istheone-dimensionalARMA(p,q)target, 𝑡 𝑡 𝑡 𝑡 althoughineachcasetherestillexistssome1 𝑛matrix𝑍 suchthatthereexistsafeasible × representation 𝑦 = 𝑍𝛼 + 𝜀 , where 𝛼 is an augmented target constructed to satisfy the 𝑡 𝑡 𝑡 𝑡 transition equation. Matejka et al. (2017) term this second result the “dynamic attention principle”. ThesolutionofMatejkaetal.(2017)isverypromising,butitdoesnotapplytothegeneral problemofDefinition18inwhichagentsmusttradeoffbetweenmanytargetprocesses,and 69

there is no way to expand their method to multivariate series without imposing an ad hoc restriction like the independence assumption. One option is to solve the general problem numericallyaswasdoneinexamplesgiveninSims(2003)andSims(2010). Unfortunately, numerical optimization can prove difficult for even moderate sized systems. Instead, we advocate an approximation suggested by the first order conditions. It is not hard to see that a first order Taylor approximation in 𝜆 to the dynamic first order condition, around the point of perfect information (𝜆 = 0), is equal to the static first order condition.25 We suggest, then, that when the marginal cost of information is sufficiently low (or capacity is sufficiently high), iterated application of the static solution given in Theorems 1 and 2 along with the transition equation, starting from an arbitrary prior, will yield a good approximationofthefulldynamicsolution. Althoughwedonotprovetheresult,itappears thatsuchiterationsalwaysconvergetoanequilibrium. Thiscanbeaparticularlyattractive method because, in practice, most applications of RI-LQG problems have been associated with 𝜆 close to zero.26 Finally, we note that for ARMA(p, q) targets, this approximation couldalsobejustifiedbyappealingtoProposition7ofMatejkaetal.(2017). Of course this approximation only imperfectly captures the full solution to the general problem. Itskeystrengthisthatittakesintoaccountintratemporaltradeoffsbetweentarget processes,whileitskeyweaknessisthatitfailstotakeintoaccountintertemporaltradeoffs: it ignores future benefits from reducing uncertainty today.27 One outcome of this is it will tend to select higher levels of uncertainty than the analytic solution. For the same reason, it will also run into the no-forgetting constraint sooner (at a higher marginal cost or lower capacity) than would the analytic solution. In many cases of practical interest, however, thisstaticapproximationwillbequitegood,asweillustratebelow. 25 Thisapproximationcouldalternativelybederivedfromanapproximationaround𝛽 = 0;itthuseffectivelyimposesthatindividualsfullydiscountfutureuncertainty. 26 Somecaremustbetakenwhenconsideringthescaleof𝜆,sinceitisactuallythescaleof𝜆relativeto theeigenvaluesofthelossmatrix𝑊 (and,insomeproblems,potentiallythoseof𝑇 andΩ)thatmatters. 27 The approximation is not completely divorced from intertemporal issues since the prior, which influences the posterior in the static solution, evolves over iterations. It is also possible that the no-forgetting constraintmaybeimposedorliftedasthepriorchanges. 70

0.2 5.1 0.1 5.0 0.0 6 4 2 0 t | − tta tα )1(RA 4 −01 × 0.2 tcaxE citatS 5.1 tnednepednI 0.1 5.0 0.0 6 4 2 0 t | − tta tα )2(RA 4 −01 × 0.2 5.1 0.1 5.0 0.0 6 4 2 0 t | − tta0w tα0w )1(RAlellaraP 4 −01 × 0.2 5.1 0.1 5.0 0.0 6 4 2 0 t | − tta0w tα0w )1(RAV 4 −01 × 4−01 = 𝜆htiwssenevisnopsernucimanyD :6erugiF 00.1 57.0 05.0 52.0 00.0 52.0 6 4 2 0 − t | − tta tα )1(RA 00.1 tcaxE 57.0 citatS tnednepednI 05.0 52.0 00.0 52.0 6 4 2 0 − t | − tta tα )2(RA 00.1 57.0 05.0 52.0 00.0 52.0 6 4 2 0 − t | − tta0w tα0w )1(RAlellaraP 00.1 57.0 05.0 52.0 00.0 52.0 6 4 2 0 − t | − tta0w tα0w )1(RAV 1 = 𝜆htiwssenevisnopsernucimanyD :7erugiF 71

AR(1) AR(2) ParallelAR(1) VAR(1) 𝜅 𝑘 𝑘 𝜅 𝑘 𝑘 𝜅 𝑘 𝑘 𝜅 1 2 1 2 1 2 𝜆=10−4 Exact 6.64 6.64 0.00 6.64 0.21 0.46 7.20 0.83 0.65 7.58 Static 6.64 6.64 0.00 6.64 0.21 0.46 7.20 0.83 0.65 7.58 Ind. – – – – 6.64 6.64 13.29 6.64 6.64 13.29 𝜆=1 Exact 0.27 0.50 0.06 0.51 0.12 0.38 0.83 0.66 0.33 1.03 Static 0.16 0.36 0.09 0.36 0.13 0.31 0.72 0.56 0.43 1.01 Ind. – – – – 0.27 0.61 0.88 0.76 0.22 0.96 Table 1: Attention allocations in dynamic examples for the exact solution, the static approximation,andanapproximationbasedontheindependenceassumption. 8.1 Illustration: goodness of approximation In Table 1 we present a few examples demonstrating the accuracy of the static approximation (“Static”) compared to the exact solution of the problem (“Exact”) or the solution under the independence assumption (“Ind.”), and we consider four examples: an AR(1) process, an AR(2) process, a model with two separate (“parallel”) AR(1) processes, and a bivariate VAR(1) process. In the former two cases, there is only a single variable to track, while in the latter two cases, there are two variables to track and we assume that the agent wishestotrackthesumofthetwovariables.28 Weapplythefixedmarginalcostofattention formulationoftheproblemwith𝛽 = 0.99,andconsidertwodifferentcosts: 𝜆 = 10−4 and 𝜆 = 1. We report the approximate information capacity allocated to processing the 𝑖-th elementofthefundamentaltarget,𝑘 ,aswellasthetotalinformationprocessed,𝜅.29 𝑖 Ascouldbeexpected,whenthecostofinformationislow,thestaticapproximationispractically identical to the full dynamic solution while when the cost is high, its performance degrades. Impositionoftheindependenceassumptiondrivesthesolutionawayfromtheexact dynamic solution regardless of the information cost, but there is an especially marked difference when the information cost is low. This is because, as described above in the 28 SincethereisonlyasinglevariableintheAR(1)andAR(2)cases,theindependenceassumptiondoes notchangetheproblemthere. 29IntheAR(1)case,thereisonlyasinglevariableofinterestand𝑘 =𝜅. IntheAR(2)case,𝑘 refersto 1 1 theapproximateinformationcapacityallocatedtoprocessingthecontemporaneousvalueofthetargetwhile 𝑘 referstotheapproximateinformationcapacityallocatedtoprocessingthelaggedvalue. 2 72

prices example, a solution to the general problem will often reduce uncertainty only about certain relevant linear combinations of the target, whereas a solution under the independenceassumptionwillreduceuncertaintyabouteachtargetelementseparately. TheinformationcapacityallocationsinTable1provideonewayofassessingthegoodness ofthestaticapproximation,butanalternativemethodistodirectlyexaminethefinaleffect ontheagent’saction. Wewillconsiderhowarationallyinattentiveagentrespondstoaone unit innovation in each of the four example models. In particular, in Fig. 6 and Fig. 7, we plot the difference between the true impulse response function of the model and the action takenbytheagent.30 Welabelthisdifferencethe“unresponsiveness”oftheagent,because it captures the portion of the impulse that the agent does not respond to. These figures provide more evidence that the static approximation is very good when the marginal cost of information is close to zero, and is often still quite good when the marginal cost of informationisrelativelylarge. One final interesting characteristic of these results can be found in the “Exact” solutions to the AR(2) problem. For this problem, the agent is only concerned with tracking the contemporaneous variable, and the lagged variable is associated with zero weight in the loss matrix. When the cost of information is low (𝜆 = 10−4), the approximate capacity allocated to processing the lagged variable is 𝑘 0. It might seem counterintuitive that 2 ≈ asattentionbecomesmorecostlythereisanincreaseintheapproximatecapacityallocated to processing the lagged variable, with 𝑘 0.06 when 𝜆 = 1. The reason for this can be 2 ≈ found in the first order condition to the dynamic problem: when 𝜆 is small, the effects of transitionaldynamicsaredwarfedbytheeffectof𝑊/𝜆,whereaswhen𝜆islargertheycan become important. Ultimately, the effect of transitional dynamics can induce the agent to pay attention to variables that receive zero weight in the loss matrix as long as they help predict the variables that are of interest. This is why the agent pays more attention to the 30Inallcases,weshowtheunresponsivenesstotheobjectofinterest;thisdoesnotnecessarilycorrespond to one of the fundamental targets. For the AR(1) and AR(2) models we show the responsiveness to the contemporaneousvalueofprocess𝛼 ,whileforthemultivariateseriesweshowtheunresponsiveness𝑤′𝛼 . 1𝑡 𝑡 73

secondcomponentasattentionbecomesmorecostly,anditisexactlyanillustrationofwhat Matejkaetal.(2017)refertoasthe“dynamicattentionprinciple”. 9 Conclusion Inthispaper,wedescribetheoptimalallocationofattentionbyagentsinterestedintracking multipleeconomicshockseachofwhichprovidesvaluableinformationsubjecttoalimited ability to process incoming data. The key insight is that by constructing a transformation of the economic shocks, we can simplify the problem, facilitate the solution, and ease the interpretation of a wide variety of results. The transformed “canonical” shocks introduce a decoupling that captures the independent aspects of the economic shocks as they matter to the agent. Even in a complex multivariate setting with correlation between economic shocks,foreachofthecanonicalshockstheagentactsasasimpleBayesianupdater,giving some weight to the imperfectly processed incoming data while retaining some weight on their prior. We show how these canonical shocks define a representation of the incoming data that provides insight into how a rationally inattentive agent processes information. Throughout, we carefully examine the similarities and differences between the rational inattentionproblemandtheclassicalsignalextractionproblem. We apply our solution method to solve the static version of the rational inattention pricesetting problem, and find a richer set of equilibrium behavior than previously known, includingmultipleequilibriaandasocialcostofincreasedattentionbyagents. Weshowhow ourframeworkcanbeusedtohelpinformrationalinattentionmodelingdecisions,andthis leads us to argue that the “independence assumption”, often employed in rational inattention models to make the model tractable, imposes unjustifiable restrictions on agents. At the same time, the solution method developed in this paper all but eliminates the need for such an assumption in the static case. Finally, we describe how our solution to the static 74

problem can be used to approximate the solution to the dynamic problem, and moreover showthatthisapproximationisquitegoodinmanycasesofpracticalinterest. 75

10 Appendices 10.1 Appendix A: Proofs 10.1.1 ProofofProperty6 Simultaneouslydiagonalize𝑃 = 𝑆′𝐼𝑆 and𝑃 = 𝑆′𝑁𝑆 asdescribedinLemma1. Then: − + 1 𝐼(𝑋,𝑌 ) = (log 𝑃 log 𝑃 ) | ℐ − 2 𝑏 | − |− 𝑏 | + | 1 = (log 𝑆′𝐼𝑆 log 𝑆′𝑁𝑆 ) 2 𝑏 | |− 𝑏 | | 1 = (log 𝐼 log 𝑁 ) 2 𝑏 | |− 𝑏 | | 1 = log 𝑁−1 2 𝑏 | | 𝑛 1 ∑︁ 1 = log 2 𝑏 𝑛 𝑖 𝑖=1 10.1.2 ProofofLemma1 SeeTheorem7.6.4ofHornandJohnson(2012). 10.1.3 ProofofLemma2 ThisisastraightforwardapplicationofLemma1. 10.1.4 ProofofTheorem1 Throughout this proof, the matrices 𝐿, 𝑀, 𝑉, 𝐷, and 𝑄 are as defined in Lemma 2. We noteattheoutsetthatwecanassumewithoutlossofgeneralitythat𝑃 ispositivedefinite, + sinceifitwerenottheobjectivefunctionwouldgrowwithoutbound. 76

Ignoringtheno-forgettingconstraint,simultaneouslydiagonalize𝑃−1 and𝑃−1 as: + − 𝑃−1 = 𝑋′∆𝑋 + 𝑃−1 = 𝑋′𝐼𝑋 − where 𝑋 = 𝑍′𝑀 with 𝑍∆𝑍′ = 𝐿′𝑃−1𝐿, and denote ∆ = diag( 𝛿 𝑛 ). Because 𝑃 is + { 𝑖 }𝑖=1 + fullrank,∆isnonsingularandwecandefine𝑁 = ∆−1 = diag( 𝑛 𝑛 )where𝑛 = 1/𝛿 . { 𝑖 }𝑖=1 𝑖 𝑖 Denotingtheobjectivefunctionas ,wecanrewriteitusingtheabovedecompositionand 𝒪 applyingProperty6as: 𝑛 ∑︁ 1 = 𝑡𝑟(𝑊𝑃 )+𝜆 ln + 𝒪 𝑛 𝑖 𝑖=1 𝑛 ∑︁ = 𝑡𝑟(𝑊𝐿𝑍𝑁𝑍′𝐿′) 𝜆 ln𝑛 𝑖 − 𝑖=1 𝑛 ∑︁ = 𝑡𝑟(𝑍′𝑉𝑍𝑁) 𝜆 ln𝑛 𝑖 − 𝑖=1 Notice that the matrix of eigenvectors, 𝑍, appears only in the first term. A standard result isthatminimizingthefirsttermoverunitarymatrices𝑍 yields𝑍 = 𝑄(recallthat𝑄𝐷𝑄′ = 𝑉), for any matrix 𝑁. Thus the optimal 𝑍 contains the eigenvectors of 𝑉 = 𝐿′𝑊𝐿. This alsoimpliesthat𝑋 = 𝑆 = 𝑄′𝑀. Thisallowsustofurthersimplytheobjectivefunction: 𝑛 ∑︁ = 𝑡𝑟(𝑄′𝑉𝑄) 𝜆 ln𝑛 𝑖 𝒪 − 𝑖=1 𝑛 ∑︁ = 𝑡𝑟(𝑄′(𝑄𝐷𝑄′)𝑄𝑁) 𝜆 ln𝑛 𝑖 − 𝑖=1 𝑛 ∑︁ = 𝑡𝑟(𝐷𝑁) 𝜆 ln𝑛 𝑖 − 𝑖=1 𝑛 𝑛 ∑︁ ∑︁ = 𝑑 𝑛 𝜆 ln𝑛 𝑖 𝑖 𝑖 − 𝑖=1 𝑖=1 77

We can also use the simultaneous diagonalization to simplify the no-forgetting positive semidefinitenessconstraint. First,notethatif𝑃 𝑃 0ifandonlyif𝑃−1 𝑃−1 0. − + + − − ≥ − ≥ Then from above, 𝑃−1 𝑃−1 = 𝑆′(∆ 𝐼)𝑆, and this is positive semidefinite if and only + − − − if ∆ 𝐼 0. Since ∆ is diagonal and 𝑁 = ∆−1, this condition is satisfied if and only if − ≥ 𝛿 1or𝑛 1for𝑖 = 1,...,𝑛. 𝑖 𝑖 ≥ ≤ With this, the objective and the constraint can be separated into 𝑛 isolated problems, each ofwhichisoftheform: min𝑑 𝑛 𝜆ln𝑛 s.t.𝑛 1 𝑖 𝑖 𝑖 𝑖 𝑛𝑖 − ≤ If𝑑 > 0, thenthis is aconvex objectivefunction with alinear inequalityconstraint, so the 𝑖 solution, denoted by 𝑛+, is characterized by the Kuhn-Tucker conditions. The first order 𝑖 conditionyields𝑛 = 𝜆/𝑑 ,andthefullsolutionis: 𝑖 𝑖 ⎧ ⎪ ⎪ ⎨𝜆/𝑑 𝑖 𝜆 𝑑 𝑖 𝑛+ = ≤ 𝑖 ⎪ ⎪ ⎩1 otherwise If 𝑑 = 0, then the problem is min 𝜆ln𝑛 , and the solution sends 𝑛 , so that the 𝑖 𝑛𝑖− 𝑖 𝑖 → ∞ constraintisbindingand𝑛+ = 1. 𝑖 Defining 𝛿+ = 1/𝑛+ and ∆+ = diag( 𝛿+ 𝑛 ), we have solved for the optimal 𝑆 and ∆ 𝑖 𝑖 { 𝑖 }𝑖=1 thatdefine𝑃−1,andinparticular: + 𝑃−1 = 𝑆′∆+𝑆 + 𝑃 = 𝑅𝑁+𝑅′ + 78

Finally,if𝑑 𝜆 𝑖,then∆+ = 𝐷/𝜆and: 𝑖 ≥ ∀ 𝑃−1 = 𝑀′𝑄∆+𝑄′𝑀 + = 𝑀′𝑄(𝐷/𝜆)𝑄′𝑀 = 𝑀′𝐿′(𝑊/𝜆)𝐿𝑀 = 𝑊/𝜆 10.1.5 ProofoffirstCorollarytoTheorem1 Let𝑊 = 𝑤𝑤′ anddefine𝑞 = 1 𝐿′𝑤. Then: ‖𝐿′𝑤‖ 𝑃−1 = 𝑆′∆+𝑆 + = 𝑃−1 +𝑆′(∆+ 𝐼)𝑆 − − = 𝑃−1 +(𝛿+ 1)𝑀′𝑞 𝑞′𝑀 − 1 − 1 1 1 = 𝑃−1 +(𝛿+ 1) 𝑊 − 1 − 𝐿′𝑤 2 ‖ ‖ 79

Fromabove,wehave: 𝑃−1 = 𝑃−1 +(𝛿+ 1)𝑀′𝑞 𝑞′𝑀 + − 1 − 1 1 [︀ ]︀ = 𝑀′ 𝐼 +(𝛿+ 1)𝑞 𝑞′ 𝑀 1 − 1 1 𝑃 = 𝐿 [︀ 𝐼 +(𝛿+ 1)𝑞 𝑞′ ]︀−1 𝐿′ + 1 − 1 1 [︁ ]︁ = 𝐿 𝐼−1 𝐼−1𝑞 (︀ (𝛿+ 1)−1 +𝑞′𝐼−1𝑞 )︀−1 𝑞′𝐼−1 𝐿′ − 1 1 − 1 1 1 [︁ ]︁ = 𝐿 𝐼 (︀ (𝛿+ 1)−1 +1 )︀−1 𝑞 𝑞′ 𝐿′ − 1 − 1 1 [︃ ]︃ (︂ 𝛿+ )︂−1 = 𝐿 𝐼 1 𝑞 𝑞′ 𝐿′ − 𝛿+ 1 1 1 1 − 𝛿+ 1 1 = 𝑃 1 − 𝑃 𝑊𝑃 − − 𝛿+ 𝐿′𝑤 2 − − 1 ‖ ‖ 1 = 𝑃 (1 𝑛+) 𝑃 𝑊𝑃 − − − 1 𝐿′𝑤 2 − − ‖ ‖ 10.1.6 ProofofsecondCorollarytoTheorem1 Wewanttoshowthat𝑠′(𝑃 𝑛+𝑃 ) = 0foreachpair(𝑠′,𝑛+). 𝑖 + − 𝑖 − 𝑖 𝑖 From Lemma 2 we have 𝑃 = 𝑅𝐼𝑅′, and from Theorem 1 we have 𝑃 = 𝑅𝑁+𝑅′. Since − + 𝑅 = 𝑆−1, then 𝑠′𝑅 is equal to a row vector with each element equal to zero except for the 𝑖 𝑖-thelementwhichisequalto1,andso𝑠′𝑅𝑁+ = 𝑛+𝑠′𝑅. 𝑖 𝑖 𝑖 𝑠′(𝑃 𝑛+𝑃 ) = 𝑠′(𝑅𝑁+𝑅′ 𝑛+𝑅𝐼𝑅′) 𝑖 + − 𝑖 − 𝑖 − 𝑖 = (𝑛+𝑠′𝑅𝑅′ 𝑛+𝑠′𝑅𝑅′) 𝑖 𝑖 − 𝑖 𝑖 = 0 80

10.1.7 ProofofTheorem2 Since Definition 1 is valid for the fixed capacity problem, except with 𝜆* = 2ln(𝑏)𝜆 interpreted as a Lagrange multiplier, the solution in Theorem 1 is valid in this case, but we mustalsoderivethevalueoftheLagrangemultiplieratthesolution. Todoso,notethatthe associatedconstraintis 1(log 𝑃 log 𝑃 ) 𝜅and,asintheproofofTheorem1,we 2 𝑏 | − |− 𝑏 | + | ≤ canrewriteitas: 𝑛 1 ∑︁ log 𝛿+ 𝜅 2 𝑏 𝑖 ≤ 𝑖=1 In any solution, all processing capacity will be used, so that this constraint will hold with equality. Define𝑟 suchthat𝑑 > 0for𝑖 = 1,...,𝑟 and𝑑 𝜆for𝑖 = 𝑟+1,...,𝑛. Recall 𝑖 𝑖 ≤ fromTheorem1that𝛿+ = 1for𝑖 > 𝑟,andsotheconstraintis: 𝑖 𝑟 ∑︁ log 𝛿+ = 2𝜅 𝑏 𝑖 𝑖=1 𝑟 ∏︁ 𝑑 𝑖 log = 2𝜅 𝑏 𝜆 𝑖=1 𝑟 ∏︁ 𝜆𝑟 = 𝑏−2𝜅 𝑑 𝑖 𝑖=1 [︃ ]︃1 𝑟 𝑟 ∏︁ 𝜆 = 𝑏−2𝜅 𝑑 𝑖 𝑖=1 Since the choice of 𝑟 depends on 𝜆, we can compute 𝑟 in the following way. Initialize 𝑟 = 𝑛. First, compute the 𝜆 associated with 𝑟. If 𝑑 > 𝜆 for 𝑖 = 1,...,𝑟, then this is the 𝑖 solution. If 𝑑 𝜆with𝑖 𝑟,thenset𝑟 = 𝑟 1andrepeatthesesteps. 𝑖 ∃ ≤ ≤ − Notice that if 𝑟 = 1, then 𝜆 = 2−2𝜅𝑑 . As long as 𝜅 > 0 and 𝑑 > 0 (and recall that 𝑑 is 1 1 1 the largest eigenvalue, so only in completely degenerate problems will 𝜅 = 0 or 𝑑 = 0), 1 we will have 𝑑 > 𝜆. Thus, except for degenerate problems, it will always be optimal to 1 have𝑟 1. ≥ 81

Finally,for𝑖 𝑟,wehave: ≤ 𝛿+ = 𝑑 𝜆−1 𝑖 𝑖 [︃ 𝑟 ]︃− 𝑟 1 ∏︁ 2𝜅 = 𝑏 𝑟 𝑑 𝑖 𝑑 𝑗 𝑗=1 Takinglogs,wedefine: ⎡ ⎤ 1 𝜅 √𝑑 𝜅 𝑖 ≡ 2 log 𝑏 𝛿 𝑖 + = 𝑟 +log 𝑏 ⎣ ∏︀𝑟 √︀ 𝑖 1/𝑟 ⎦ 𝑑 𝑗=1 𝑗 For𝑖 > 𝑟,wehave𝛿+ = 1,so𝜅 = 1 log 1 = 0. 𝑖 𝑖 2 𝑏 10.1.8 ProofofCorollarytoTheorem2 Part(a): We want to show that 𝜕𝜆 < 0. The only difficulty is accounting for the fact that 𝑟 as a 𝜕𝜅 functionof𝜅actslikeastepfunction. Ourfirststepistonoticethatifthechangein𝜅doesnotchange𝑟,thenwehave: [︃ 𝑟 ]︃1/𝑟 𝜕𝜆 2𝜅 (︀ )︀ ∏︁ = − 𝑏(−2𝜅/𝑟)−1 𝑑 < 0 𝑖 𝜕𝜅 𝑟 𝑖=1 Our second step is to show that if 𝑟 is nondecreasing in 𝜅 (i.e. 𝑟 and 𝜅 move (weakly) together),theresultstillholds. Toseethis,considerthetwotermsof𝜆separately. a. Itiseasytoseethat 𝜕2−2𝜅/𝑟 < 0and 𝜕2−2𝜅/𝑟 < 0. 𝜕𝜅 𝜕𝑟 b. The second term is the geometric mean of (𝑑 ,...,𝑑 ), and by assumption we have 1 𝑟 𝑑 𝑑 𝑑 𝑑 . An increase in 𝑟 will therefore introduce into the 1 2 𝑟 𝑛 ≥ ≥ ··· ≥ ≥ ··· ≥ geometric mean terms that are no larger than any of the existing terms; similarly, a 82

decreasein𝑟 willremoveonlythesmallestexistingterms. Thus,thetermasawhole theisnonincreasingin𝑟. Sincethistermisindependentof𝜅,wehaveourresult. Our final step is to show that 𝑟 is nondecreasing in 𝜅. This follows directly from the first step,above,andthealgorithmforcomputing𝑟. Consideranincreasein𝜅. Atanyiteration of the algorithm, the proposed value for 𝜆 will be smaller than it was under the original valueof𝜅,andso whilethealgorithmmayterminateearlier,it certainlywillnotterminate later. Thereverseistrueforadecreasein𝜅. Thisyieldstheresult. Part(b): Seethelastparagraphoftheprooftopart(a). 10.1.9 ProofofLemma3 Part(a): Thisfollowsdirectlyfrom𝛽 = 𝑆𝛼 and𝛼 𝑁(𝑎 ,𝑃 ). 𝑐 + + + | ℐ ∼ Part(b): Thisfollowsdirectlyfrom𝛽 = 𝑆𝛼 and𝛼 𝑁(𝑎 ,𝑃 ). 𝑐 − − − | ℐ ∼ Part(c): 𝐸[(𝛽 𝑏 )′𝐷(𝛽 𝑏 ) ] = 𝐸[(𝛼 𝑎 )′𝑆′𝐷𝑆(𝛼 𝑎 ) ] 𝑐 𝑐,+ 𝑐,+ − + + − − − | ℐ − − | ℐ = 𝐸[(𝛼 𝑎 )′𝑀′𝑄𝐷𝑄′𝑀(𝛼 𝑎 ) ] + + − − − | ℐ = 𝐸[(𝛼 𝑎 )′𝑀′𝑉𝑀(𝛼 𝑎 ) ] + + − − − | ℐ = 𝐸[(𝛼 𝑎 )′𝑊(𝛼 𝑎 ) ] + + − − − | ℐ Part(d): Thisfollowsfromparts(a)and(b)alongwiththefactthat(𝑃 ,𝑃 )and(𝑁 ,𝐼) + − + sharegeneralizedeigenvalues. Alternatively,thisfollowsfromProperty3. Part(e): Thisfollowsbecause𝑉𝑎𝑟(𝛽 ) = 𝑁 isadiagonalmatrix. 𝑐 + + | ℐ 83

10.1.10 ProofofLemma4 Part(a): Thequantity𝑟istheintegersuchthat𝑛 = 𝜆/𝑑 < 1but𝑛 = 𝜆/𝑑 1. Thus 𝑟 𝑟 𝑟 𝑟+1 ≥ wehave𝑛+ > 1for𝑖 = 1,...,𝑛and𝑛+ = 1for𝑖 = 𝑟+1,...,𝑛,sothatrk(𝐼 𝑁+) = 𝑟. 𝑖 𝑖 − Then,since𝑆 isnonsingular,wehaverk(𝑃 𝑃 ) = rk(𝑆𝑃 𝑆′ 𝑆𝑃 𝑆′) = rk(𝐼 𝑁 ) = − + − + + − − − 𝑟. Part(b): Ifrk(𝑊) = ℓ,theneachof𝑑 ,𝑑 ,...,𝑑 mustequalzero,andforany𝑖such 𝑛 𝑛−1 ℓ+1 that𝑑 = 0,itmustalsobethat𝑛+ = 1. Then𝑟 = rk(𝐼 𝑁+) ℓ = rk(𝑊). 𝑖 𝑖 − ≤ Part(c): Thisfollowsdirectlyfromthedefinitionof𝛿+ inTheorem1. 𝑖 Part (d): This follows from Lemma 3 part (e), since for each 𝑖 such that 𝑛+ = 1, we have 𝑖 𝐼(𝛽 ,𝑏 ) = 0. 𝑖,𝑐 𝑖,𝑐,+ − | ℐ Part(e): In the fixed 𝜅 formulation, suppose that 𝑟 = 1. Then 𝜆 = 𝑏−2𝜅𝑑 , so that 𝑑 > 𝜆. Thus the 1 1 algorithmofTheorem2willalwaysterminateat𝑟 = 1ifitdidnotterminateearlier. Inthefixed𝜆formulation,set𝜆 = 𝑑 +1. Then𝑟 = 0. 1 10.1.11 ProofofLemma5 If𝑃 isdiagonal,thentheCholeskyfactor𝐿isalsodiagonal. Alongwith𝑊 diagonal,this − implies that 𝑉 = 𝐿′𝑊𝐿 is diagonal, so that the matrix of eigenvectors 𝑄 is equal to the identity. Then 𝑆 = 𝑄′𝑀 = 𝑀 and 𝑃 = 𝑅𝑁+𝑅′ = 𝐿𝑁+𝐿′ = 𝑁+𝑃 . Rearranging, we get + − (𝑁+)−1 = 𝑃 𝑃 . Rearranging,weget 1 = 𝑃𝑖𝑖,−. − + 𝑛+ 𝑖 𝑃𝑖𝑖,+ 84

10.1.12 ProofofLemma6 Let 𝛽 ˆ = 𝛽 + 𝜀 with 𝜀 𝑁(0,(1/𝑛+ 1)−1), as in the Lemma. Recall from 𝑖,𝑐 𝑖,𝑐 𝑖,𝑐 𝑖,𝑐 ∼ 𝑖 − Lemma 3 that 𝐸[𝛽 ] = 𝑏 and Var(𝛽 ) = 1. Then standard signal extraction 𝑖,𝑐 − 𝑖,𝑐,− 𝑖,𝑐 − | ℐ | ℐ formulasimply𝑏 = 𝑏 +𝐾 (𝛽 ˆ 𝑏 )where: 𝑖,𝑐,+ 𝑖,𝑐,− 𝑐 𝑖,𝑐 𝑖,𝑐,− − 𝐾 = (1+(1/𝑛+ 1)−1)−1 = (1 𝑛+) 𝑐 𝑖 − − 𝑖 Pluggingthisinyieldstheresult. 10.1.13 ProofofTheorem3 ThisfollowsdirectlyfromDefinition2andLemma6. 10.1.14 ProofofTheorem4 Let 𝑂 solve the 𝐵-transformed problem, and recall that we have 𝐵 nonsingular. Now + considertheobjectivefunctionofthereferenceproblem: = 𝑡𝑟(𝑊𝑃 )+𝜆(log 𝑃 log 𝑃 ) + − + 𝒪 | |− | | = 𝑡𝑟(𝐵′(𝐵′)−1𝑊𝐵−1𝐵𝑃 )+𝜆(log 𝐵𝑃 𝐵′ log 𝐵𝑃 𝐵′ ) + − + | |− | | = 𝑡𝑟(𝑉𝐵𝑃 𝐵′)+𝜆(log 𝑂 log 𝐵𝑃 𝐵′ ) + − + | |− | | By considering 𝑃 = 𝐵−1𝑂 𝐵′−1, it is clear that if 𝑂 is optimal for the 𝐵-transformed + + + objectivefunction,𝑃 willbeoptimalforthereferenceproblem,aslongastheconstraints + are the same. To see that they are the same, notice that since 𝐵 is nonsingular, 𝑂 + ≥ 0 𝑃 0and𝑃 𝑃 0 𝐵(𝑃 𝑃 )𝐵′ = 𝑂 𝑂 0. + − + − + − + ⇐⇒ ≥ − ≥ ⇐⇒ − − ≥ 85

10.1.15 ProofofLemma7 Thisfollowsdirectlyfromthefactthattheproductofnonsingularmatricesisnonsingular. 10.1.16 ProofofLemma8 ThisfollowsdirectlyfromLemma7alongwithDefinitions8,9,and10 10.1.17 ProofofLemma9 Thistautologyfollowsdirectlyfromthedefinitionof𝑎 asaconditionalexpectation. + 10.1.18 ProofofLemma10 Part(a): Sincerk(𝑍) = 𝑚,wehave𝑟 = rk(𝑃 𝑃 ) rk(Λ−1) 𝑚 − + − ≤ ≤ Part(b): Since rk(𝑍) = 𝑚 and rk(Λ−1) = 𝑚, we also have rk(𝑍′Λ−1𝑍) = 𝑚, but rk(𝑍′Λ−1𝑍) = rk(𝑃 𝑃 ) = 𝑟. − + − 10.1.19 ProofofTheorem5 Given the innovation representation 𝑣 = 𝑍𝛼 + 𝜀 𝑍𝑎 where 𝜀 𝑁(0,Λ), we have + − − ∼ that the posterior information set is = 𝑣 , that 𝛼 𝑁(𝑎 ,𝑃 ), and that + − + − − − ℐ ℐ ∪{ } | ℐ ∼ 𝛼 and 𝑣 are jointly Gaussian. Theorem 5 is then simply a statement of the form of the + conditionaldistributionofjointlyGaussianrandomvectors. 86

10.2 Appendix B: rationally inattentive price-setting Therationallyinattentiveprice-settingproblemsupposesthatmonopolisticallycompetitive firms cannot pay perfect attention to the shocks that determine the optimal price for their differentiated good. In order to minimize the expected loss in (log quadratically approximated)prices,theychoosehowtoallocateattention. ThebasicrationalinattentionresultofMWisthatmoreattentionispaidtoshocksthatare more important or more volatile. The former characteristic is captured in the loss function and the latter is captured by the shock’s variance. In this paper, we have refined this result and shown that it should be applied to the “canonical synthetic shocks” rather than the original, or “fundamental”, shocks. In MW, these two types of shocks are required to be identical,butingeneralthisrequirementimposessuboptimalbehavior. To understand why these shocks are treated differently, it is important to recall that a rationally inattentive agent has access to the complete data but, in optimally processing the information, they may choose only certain components of the data to pay attention to, with any remainder ignored. The agent’s problem can be thought of as (1) selecting the components that matter to them, and (2) selecting the amount of attention to pay to each component. The canonical synthetic shocks provide exactly the decomposition that solves theformerproblem. Itcanbeinstructivetoconsidertheinattentiveagentasreceivingaparticular noisy signal of the data, but this is only appropriate in the context of the optimally chosen,canonical,shocks. Intheprice-settingproblem,eventhoughthespaceoffundamentalshocksistwo-dimensional, the space of canonical synthetic shocks is only one dimensional, because as the agent processes information, there is only one variable that is of interest to them: the optimal price. The agent, assumed to have access to the complete data, only processes information about thatrelevantcombination. 87

10.2.1 Setup The firm’s profit function is 𝜋(𝑃 ,𝑃 ,𝑌 ,𝑍 ), and the log quadratic approximation is 𝑖𝑡 𝑡 𝑡 𝑖𝑡 𝜋˜(𝑝 ,𝑝 ,𝑦 ,𝑧 ). 𝑖𝑡 𝑡 𝑡 𝑖𝑡 Aggregatedemandisgivenby𝑄 = 𝑃 𝑌 or𝑞 = 𝑝 +𝑦 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 Underperfectinformation,optimalprice-settingis: (︂ )︂ 𝜋ˆ 𝜋ˆ 𝜋ˆ 𝑝◇ = 14 𝑧 + 13 𝑞 + 1 13 𝑧 𝑝 𝑖𝑡 𝜋ˆ 𝑖𝑡 𝜋ˆ 𝑖𝑡 − 𝜋ˆ 𝑖𝑡 𝑡 11 11 11 | | | | | | Inequilibrium𝑝 = 𝑞 . 𝑡 𝑡 Under rational inattention, 𝑝* = 𝐸[𝑝◇ ], and the objective is to minimize the loss in 𝑖𝑡 𝑖𝑡 | ℐ 𝑡 profitsduetoinattention. Thiscanbewrittenas: min𝜋˜(𝑝◇,𝑝 ,𝑦 ,𝑧 ) 𝜋˜(𝑝*,𝑝 ,𝑦 ,𝑧 ) 𝑖𝑡 𝑡 𝑡 𝑖𝑡 − 𝑖𝑡 𝑡 𝑡 𝑖𝑡 andthiscanbesimplifiedto: 𝜋ˆ min | 11 |𝐸[(𝑝◇ 𝑝*) ] 2 𝑖𝑡 − 𝑖𝑡 | ℐ 𝑡 Toeasenotation,define𝜁 = |𝜋^11|,𝜁 = 𝜋^14 ,and𝜁 = 𝜋^13 . 0 2 𝑧 |𝜋^11| 𝑞 |𝜋^11| Guessandverifyapproach: guessthat𝑝 = 𝛾𝑞 . Then: 𝑡 𝑡 𝑝* = 𝐸[𝜁 𝑧 +(𝛾 +(1 𝛾)𝜁 )𝑞 ] 𝑖𝑡 𝑧 𝑖𝑡 − 𝑞 𝑡 | ℐ 𝑡 88

Withthisguess,theRI-LQGtrackingproblemcanbedefinedby: ⎡ ⎤ ⎡ ⎤ 𝛼 = ⎢ 𝑧 𝑖𝑡 ⎥, 𝑊 = | 𝜋ˆ 11 |𝑤𝑤′, 𝑤 = ⎢ 𝑤 𝑧 ⎥ ⎣ ⎦ 2 ⎣ ⎦ 𝑞 𝑤 𝑡 𝑞 where𝑤 = 𝜁 and𝑤 = 𝛾 +(1 𝛾)𝜁 . 𝑧 𝑧 𝑞 𝑞 − 10.2.2 Solutions General solution In the general case, we proceed as usual. First we solve the fixed marginalcostproblem,accordingtoTheorem1,andthenwesolvethefixedcapacityproblem,accordingtoTheorem2. Fixedmarginalcost ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 𝜎 0 𝜎 0 𝜎 𝑤 [︂ ]︂ 𝑧 𝑧 𝑧 𝑧 𝐿′𝑊𝐿 = ⎢ ⎣ ⎥ ⎦ 𝜁 0 𝑤𝑤′⎢ ⎣ ⎥ ⎦ = 𝜁 0 ⎢ ⎣ ⎥ ⎦ 𝜎 𝑧 𝑤 𝑧 𝜎 𝑞 𝑤 𝑞 0 𝜎 0 𝜎 𝜎 𝑤 𝑞 𝑞 𝑞 𝑞 [︂ ]︂ Let 𝑞 = 𝐿′𝑤 = 1 𝜎 𝑤 𝜎 𝑤 so that 𝑞 = √𝜎 𝑧 2𝑤 𝑧 2+𝜎 𝑞 2𝑤 𝑞 2 = 1. Then we ‖𝐿′𝑤‖ √𝜎2𝑤2+𝜎2𝑤2 𝑧 𝑧 𝑞 𝑞 ‖ ‖ √𝜎2𝑤2+𝜎2𝑤2 𝑧 𝑧 𝑞 𝑞 𝑧 𝑧 𝑞 𝑞 have: ⎡ ⎤⎡ ⎤ (︁ )︁ [︂ ]︂ 𝑑 0 𝑞′ 1 𝐿′𝑊𝐿 = 𝑞 𝜁 0 (𝜎 𝑧 2𝑤 𝑧 2 +𝜎 𝑞 2𝑤 𝑞 2) 𝑞′ = 𝑞 𝑞⊥ ⎢ ⎣ ⎥ ⎦ ⎢ ⎣ ⎥ ⎦ ⏟ ⏞ 0 0 𝑞⊥′ ≡𝑑1 ⏟ ⏞ 𝑄 ⏟ ⏞ ⏟ ⏞ 𝐷 𝑄′ Now,𝛿+ = max 𝑑 /𝜆,1 ,𝑛+ = min 𝜆/𝑑 ,1 and𝛿+ = 𝑛+ = 1. Thelatterresultimplies 1 1 1 1 2 2 { } { } the agent will never pay attention to a second component. This means that the rank of the solution will be at most 𝑟 = 1, although it is possible that the agent will choose to not pay anyattentionatall(𝑟 = 0). 89

⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 𝑞′ 1/𝜎 0 𝑤′/ 𝐿′𝑤 𝑠′ Next,𝑆 = 𝑄′𝑀 = ⎢ ⎥⎢ 𝑧 ⎥ = ⎢ ‖ ‖⎥ = ⎢ 1⎥. ⎣ ⎦⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 𝑞⊥′ 0 1/𝜎 𝑞⊥′ 𝑀 𝑠′ 𝑞 2 Thenwecancomputetheoptimalposterior: ⎡ ⎤⎡ ⎤ [︂ ]︂ 𝛿+ 0 𝑠′ 𝑃−1 = 𝑆′∆+𝑆 = 𝑠 𝑠 ⎢ 1 ⎥⎢ 1⎥ + 1 2 ⎣ ⎦⎣ ⎦ 0 1 𝑠′ 2 Theweightmatrix𝐾 = 𝐼 𝑃 𝑃−1 = 𝑅(1 𝑁+)𝑆. + − − − Of the left generalized eigenvectors 𝑠′ of the matrix pencil (𝑃 ,𝑃 ), only 𝑠′ = 𝑤′/ 𝐿′𝑤 𝑖 + − 1 ‖ ‖ isassociatedwithanonzerogeneralizedeigenvalue. Asdescribedabove,thisvectorisalso alefteigenvectoroftheweightmatrix𝐾 associatedwiththeeigenvalue1 𝑛+. Ofcourse − 𝑖 anyscalarmultipleofaneigenvectorisalsoaneigenvector,sothatwecanwrite: 𝑤′𝐾 = (1 𝑛+)𝑤′ − 1 Thefundamentalrepresentationis: 𝑦 = 𝛼 +𝜀 , 𝜀 𝑁(0,Λ )withΛ−1 = 𝑆′(∆ 𝐼)𝑆 𝑓 𝑡 𝑓 𝑓 ∼ 𝑓 𝑓 + − This is a not a feasible representation, because 𝑆′(∆ 𝐼)𝑆 is neither full rank nor can it + − be written in the form required by Definition 12. Instead, we can construct the canonical representation: 𝑦 = 𝑆𝛼 +𝜀 , 𝜀 𝑁(0,Λ )withΛ−1 = ∆+ 𝐼 𝑐 𝑡 𝑐 𝑐 ∼ 𝑐 𝑐 − Thisis(asalways)afeasiblerepresentation,butitisnotpropersince𝛿+ 1 = 0sothatthe 2 − error due to inattention has infinite variance for the second component. Thus we instead 90

usethereducedcanonicalrepresentation: 𝑦 = 𝑠′𝛼 +𝜀 , 𝜀 = 𝜀 𝑁(0,(𝛿+ 1)−1) 𝑟 1 𝑡 𝑟 𝑟 1,𝑐 ∼ 1 − theassociatedinnovationrepresentationis: 𝑣 = 𝑦 𝑠′𝑎 𝑟 𝑟 − 1 − andtheassociatedweightmatrixis 𝐾 = 𝑠 (1+(𝛿+ 1)−1)−1 = (1 𝑛+)𝑠 𝑟 1 1 − − 1 1 where𝑠′𝐾 = (1 𝑛+)and𝑤′𝐾 = (1 𝑛+) 𝐿′𝑤 1 𝑟 − 1 𝑟 − 1 ‖ ‖ Nowwecanconstructtheposterior: 𝑎 = 𝑎 +𝐾 𝑣 + − 𝑟 𝑟 = 𝑎 +(1 𝑛+)𝑠 (𝑠 𝛼 +𝜀 𝑠′𝑎 ) − − 1 1 1 𝑡 𝑟 − 1 − = (𝐼 (1 𝑛 )+𝑠 𝑠′)𝑎 +(1 𝑛+)𝑠 𝑠′𝛼 +(1 𝑛+)𝑠 𝜀 − − 1 1 1 − − 1 1 1 𝑡 − 1 1 𝑟 91

Wecanthenconstructtheposteriorofinterest: 𝑝* = 𝑤′𝑎 𝑖𝑡 + = 𝑤′(𝑎 +𝐾 𝑣 ) − 𝑟 𝑟 [︀ ]︀ = 𝑤′ (𝐼 (1 𝑛 )+𝑠 𝑠′)𝑎 +(1 𝑛+)𝑠 𝑠′𝛼 +(1 𝑛+)𝑠 𝜀 − − 1 1 1 − − 1 1 1 𝑡 − 1 1 𝑟 = (𝑤′ (1 𝑛 )+𝑤′𝑠 𝑠′)𝑎 +(1 𝑛+)𝑤′𝑠 𝑠′𝛼 +(1 𝑛+)𝑤′𝑠 𝜀 − − 1 1 1 − − 1 1 1 𝑡 − 1 1 𝑟 = (𝑤′ (1 𝑛 )+𝑤′)𝑎 +(1 𝑛+)𝑤′𝛼 +(1 𝑛+) 𝐿′𝑤 𝜀 − − 1 − − 1 𝑡 − 1 ‖ ‖ 𝑟 = (1 (1 𝑛 )+)𝑤′𝑎 +(1 𝑛+)𝑤′𝛼 +(1 𝑛+) 𝐿′𝑤 𝜀 − − 1 − − 1 𝑡 − 1 ‖ ‖ 𝑟 = 𝑛+𝑝 +(1 𝑛+)𝑝◇ +(1 𝑛+) 𝐿′𝑤 𝜀 1 − − 1 𝑖𝑡 − 1 ‖ ‖ 𝑟 Inthiscase,theprioris𝑎 = 0,and𝑝◇ = 𝑤′𝛼 = 𝑤 𝑧 +𝑤 𝑞 . Tofindtheaggregateprice − 𝑖𝑡 𝑡 𝑧 𝑖𝑡 𝑞 𝑡 level,integrateoverfirms: ∫︁ 𝑝 = 𝑝*𝑑𝑖 𝑡 𝑖𝑡 𝐼 ∫︁ ∫︁ ∫︁ = (1 𝑛+)𝑤 𝑧 𝑑𝑖+ (1 𝑛+)𝑤 𝑞 𝑑𝑖+ (1 𝑛+) 𝐿′𝑤 𝜀 𝑑𝑖 − 1 𝑧 𝑖𝑡 − 1 𝑞 𝑡 − 1 ‖ ‖ 𝑟 𝐼 𝐼 𝐼 = (1 𝑛+)𝑤 𝑞 − 1 𝑞 𝑡 Recall that our guess was 𝑝 = 𝛾𝑞 ; this result confirms our guess, with 𝛾 = (1 𝑛+)𝑤 . 𝑡 𝑡 1 𝑞 − However,𝑤 isafunctionof𝛾,sothefullsolutionyetrequiressolvingfor𝛾. 𝑞 First, there is an equilibrium with 𝛾 = 0 if 𝜁 (𝜎2𝜁2 +𝜎2𝜁2) 𝜆. Since 𝑤 > 0 regardless 0 𝑧 𝑧 𝑞 𝑞 ≤ 𝑞 of 𝛾, then 𝛾 = 0 requires 𝑛+ = 1, i.e. it requires all agents to collect no information 1 whatsoever. For this to be an equilibrium, it requires that 𝑑 = 𝜁 (𝜎2𝑤2 + 𝜎2𝑤2) 𝜆. 1 0 𝑧 𝑧 𝑞 𝑞 ≤ Since 𝑤 = 𝜁 and 𝑤 = 𝛾 +(1 𝛾)𝜁 , requiring that 𝑑 𝜆 when 𝛾 = 0 is equivalent to 𝑧 𝑧 𝑞 𝑧 1 − ≤ theconditiongivenabove. We can find nonzero equilibria by solving for 𝛾; this is difficult to do analytically, but symbolic math software indicates that in the domain of interest, there is a unique real 92

solution along with a conjugate pair of complex solutions. Numerical solution methods findagreementwiththeuniquerealsolution. Fixedcapacity Giventhesolutiontothefixedmarginalcostformulationabove,wecanfindthesolutionto the fixed capacity formulation of the problem by applying Theorem 2 to find the shadow marginalcostassociatedwithcapacityconstraint. Fromabove,weknowthat𝑟 1,butbecausetheproblemisnotdegeneratewealsoknow ≤ that𝑟 1. Thusitmustbethat𝑟 = 1andsowecaneasilyapplyTheorem2toyield: ≥ 𝜆 = 2−2𝜅𝑑 1 Therefore,𝛿+ = max 𝑑 /𝜆,1 = max 22𝜅,1 = 22𝜅,andso𝑛+ = 2−2𝜅. Then: 1 1 1 { } { } 𝛾 = (1 𝑛+)𝑤 − 1 𝑞 = (1 2−2𝜅)(𝛾 +(1 𝛾)𝜁 ) 𝑞 − − (1 2−2𝜅)𝜁 𝑞 = − (1 2−1𝜅)𝜁 +2−2𝜅 1 − 𝜁 𝑞 = (22𝜅 1)−1 +𝜁 𝑞 − The fixed capacity version is often easy to solve in the case 𝑟 = 1 because it can tie down the posterior covariance matrix based only on the parameter 𝜅. This was the case here, where we were able to substitute 𝑛+ = 2−2𝜅 whereas in the fixed marginal capacity case 1 (andassuminganinteriorsolution)wehad𝑛+ = 𝜆/[𝜁 (𝜎2𝑤2 +𝜎2𝑤2)]. 1 0 𝑧 𝑧 𝑞 𝑞 Independence assumption We can also proceed here as usual, but using the alternate weight matrix 𝑊 = diag 𝑤2,𝑤2 . We first apply Theorem 1 to solve the fixed marginal 𝐼 { 𝑧 𝑞} costcaseandthenapplyTheorem2tosolvethefixedcapacitycase. 93

Fixedmarginalcost ⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤ 𝜎 0 𝑤2 0 𝜎 0 𝜎2𝑤2 0 𝐿′𝑊𝐿 = ⎢ 𝑧 ⎥𝜁 ⎢ 𝑧 ⎥⎢ 𝑧 ⎥ = 𝜁 ⎢ 𝑧 𝑧 ⎥ ⎣ ⎦ 0⎣ ⎦⎣ ⎦ 0⎣ ⎦ 0 𝜎 0 𝑤2 0 𝜎 0 𝜎2𝑤2 𝑞 𝑞 𝑞 𝑞 𝑞 Then𝑄 = 𝐼 and𝑑 = 𝜁 𝜎2𝑤2 (where𝑖 𝑧,𝑞 ). Asusual,wehave: 𝑖 0 𝑖 𝑖 ∈ { } 𝛿+ = max 𝑑 /𝜆,1 𝑖 { 𝑖 } 𝑛+ = min 𝜆/𝑑 ,1 𝑖 { 𝑖 } Therankofthesolutionwillbe𝑟 0,1,2 becausetheagentmaychoosetopayattention ∈ { } toeither,both,orneitherofthecomponents. ⎡ ⎤ ⎡ ⎤ 1/𝜎 0 𝑠′ Now 𝑆 = 𝑄′𝑀 = ⎢ 𝑧 ⎥ = ⎢ 1⎥ with 𝑠 = 1 𝑒 (and 𝑒 is the 𝑖-th standard basis ⎣ 0 1/𝜎 ⎦ ⎣ 𝑠′ ⎦ 𝑖 𝜎𝑖 𝑖 𝑖 𝑞 2 element). This implies that the canonical synthetic target is nothing more than a scaled versionofthefundamentaltarget;infact,thiswasessentiallythegoaloftheindependence assumption. We will abuse notation somewhat to now interpret the index as 𝑖 = 1,2 where 𝑑 = 1 max 𝑑 ,𝑑 and 𝑑 = min 𝑑 ,𝑑 ; this accords with the usual practice of listing these 𝑧 𝑞 2 𝑧 𝑞 { } { } generalizedeigenvaluesinnonincreasingorder. Thenwecancomputetheoptimalposterior: ⎡ ⎤ 𝛿+/𝜎2 0 𝑃−1 = ⎢ 1 1 ⎥ + ⎣ ⎦ 0 𝛿+/𝜎2 2 2 94

or ⎡ ⎤ 𝑛+𝜎2 0 𝑃 = ⎢ 1 1 ⎥ + ⎣ ⎦ 0 𝑛+𝜎2 2 2 Theweightmatrixis𝐾 = 𝐼 𝑃 𝑃−1 = 𝑅(𝐼 𝑁+)𝑆. Because𝑅,𝑆,and𝑁+arediagonal, + − − − theycommute,sothatwehavesimply𝐾 = (𝐼 𝑁+). Wecanthenwrite: − [︂ ]︂ 𝑤′𝐾 = 𝑤 𝑛+ 𝑤 𝑛+ 1 1 2 2 Thefundamentalrepresentationis: 𝑦 = 𝛼 +𝜀 , 𝜀 𝑁(0,Λ )withΛ−1 = 𝑆′(∆+ 𝐼)𝑆 𝑓 𝑡 𝑓 𝑓 ∼ 𝑓 𝑓 − Thisrepresentationisfeasible,because𝑆 and∆+ arediagonalandsothematrixwilleither be full rank or can be written as required by Definition 12. The associated innovation representationis: 𝑣 = 𝑦 𝑎 𝑓 𝑓 − − We could still construct the canonical or reduced canonical representations in this case, althoughitisunnecessaryforcomputingtheaction. 95

Theactionis: 𝑎 = 𝑎 +𝐾𝑣 + − 𝑓 = (𝐼 𝐾)𝑎 +𝐾𝑦 − 𝑓 − = (𝐼 𝐾)𝑎 +𝐾𝛼 +𝐾𝜀 − 𝑡 𝑓 − ⎡ ⎤ 𝑛+𝑎 +(1 𝑛+)(𝛼 +𝜀 ) = ⎢ 1 1,− − 1 1,𝑡 1,𝑓 ⎥ ⎣ ⎦ 𝑛+𝑎 +(2 𝑛+)(𝛼 +𝜀 ) 2 2,− 2 2,𝑡 2,𝑓 − andtheposteriorofinterestis: ∑︁ (︀ )︀ 𝑝* = 𝑤′𝑎 = 𝑤 𝑛+𝑎 +𝑤 (1 𝑛+)(𝛼 +𝜀 ) 𝑖𝑡 + 𝑗 𝑗 𝑗,− 𝑖 − 𝑖 𝑖,𝑡 𝑖,𝑓 𝑗∈{𝑧,𝑞} Asbefore,theprioris𝑎 = 0;aggregatingoverfirmsyields: − ∫︁ 𝑝 = 𝑝*𝑑𝑖 = (1 𝑛+)𝑤 𝑞 𝑡 𝑖𝑡 − 𝑞 𝑞 𝑡 𝐼 Notethatthisisalmostidenticaltotheresultinthegeneralcase,exceptthatherethegeneralizedeigenvalue𝑛+ isspecifictotheaggregatedemandshock,whereasinthegeneralcase 𝑞 it corresponded to the synthetic shock the combined both the idiosyncratic and aggregate shocks. Recall that 𝑛+ = min 𝜆/𝑑 ,1 with 𝑑 = 𝜁 𝜎2𝑤2, and 𝑤 = 𝛾 +(1 𝛾)𝜁 . Now, we can 𝑞 { 𝑞 } 𝑞 0 𝑞 𝑞 𝑞 − 𝑞 combinetheseresultstosolvefortheequilibriumvalueof𝛾. First, there is an equilibrium with 𝛾 = 0 if 𝜁 𝜎2𝜁2 𝜆. This always corresponds to an 0 𝑞 𝑞 ≤ agent paying no attention to aggregate conditions; however, the agent may still pay some attentiontoidiosyncraticconditions. 96

Otherwise,wecansolvefor𝛾: (︂ )︂ 𝜆 𝛾 = (1 𝑛+)𝑤 = 1 𝑤 − 𝑞 𝑞 − 𝜁 𝜎2𝑤2 𝑞 0 𝑞 𝑞 = ... √︁ √︀ 𝜆(1 𝜁 )+𝜁 𝜎2/4+(0.5 𝜁 ) 𝜁 𝜎2 ± − 𝑞 0 𝑞 − 𝑞 0 𝑞 = √︀ (1 𝜁 ) 𝜁 𝜎2 − 𝑞 0 𝑞 Both of these roots may be valid and may, moreover, coexist with the 𝛾 = 0 equilibrium, sothattheremaybeasmanyasthreeequilibria. Fixedcapacity Asbefore,weknowthatinthefixedcapacitycase𝑟 1. Thismeansthatwehave: ≥ ⎧ ⎪ ⎪ ⎨(2−2𝜅𝑑 1 𝑑 2 ) 1/2 𝑟 = 2 𝜆 = ⎪ ⎪ ⎩𝜆 = 2−2𝜅𝑑 1 𝑟 = 1 and recall that we have stipulated 𝑑 𝑑 , where 𝑑 = 𝜎2𝑤2. We have 𝑟 = 1 if 𝑑 1 ≥ 2 𝑖 𝑖 𝑖 2 ≤ 𝜆 = 2−2𝜅𝑑 and 𝑟 = 2 otherwise; i.e. the agent will pay attention to both idiosyncratic and 1 aggregateconditionsaslongasthecanonicallossweightsarerelativelyclosetogether,and willpayattentiontoonlyonecomponentiftheyarefarenoughapart. Nowwecancompute𝛿+: 𝑖 ⎧ ⎪ {︂ 𝑑 }︂ ⎪ ⎨22𝜅 𝑑 2 𝜆 𝛿+ = max 1 ,1 = ≤ 1 𝜆 ⎪ ⎪ ⎩2𝜅 (︁ 𝑑 𝑑 1 2 )︁1/2 𝑑 2 > 𝜆 ⎧ ⎪ ⎪ ⎨22𝜅 𝑑 2 2−2𝜅𝑑 1 ≤ = ⎪ ⎪ ⎩2𝜅𝜎 𝜎 1 2 𝑤 𝑤 1 2 𝑑 2 > 2−2𝜅𝑑 1 97

⎧ ⎪ {︂ 𝑑 }︂ ⎪ ⎨1 𝑑 2 𝜆 𝛿+ = max 2 ,1 = ≤ 2 𝜆 ⎪ ⎪ ⎩2𝜅 (︁ 𝑑 𝑑 2 1 )︁1/2 𝑑 2 > 𝜆 ⎧ ⎪ ⎪ ⎨1 𝑑 2 2−2𝜅𝑑 1 ≤ = ⎪ ⎪ ⎩2𝜅𝜎 𝜎 2 1 𝑤 𝑤 2 1 𝑑 2 > 2−2𝜅𝑑 1 Todeterminetheequilibriumvalueof𝛾,notethatwestillhave𝑝 = (1 𝑛+)𝑤 𝑞 ,butnow 𝑡 − 𝑞 𝑞 𝑡 therearethreecases: ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ 22𝜅 𝑑 𝑧 ≤ 2−2𝜅𝑑 𝑞 ⎪ ⎨ 𝛿 𝑞 + = ⎪ 2𝜅 (︁ 𝑑 𝑑 𝑧 𝑞 )︁1/2 𝑑 𝑞 > 𝑑 𝑧 > 2−2𝜅𝑑 𝑞 or𝑑 𝑧 > 𝑑 𝑞 > 2−2𝜅𝑑 𝑧 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩1 𝑑 𝑞 2−2𝜅𝑑 𝑧 ≤ we can restate the interior (middle) condition as min 𝑑 ,𝑑 > 2−2𝜅max 𝑑 ,𝑑 or as 𝑧 𝑞 𝑧 𝑞 { } { } 2−2𝜅𝑑 < 𝑑 < 22𝜅𝑑 ,andthenwehave: 𝑧 𝑞 𝑧 ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ 22𝜅 𝑑 𝑞 ≥ 22𝜅𝑑 𝑧 ⎪ ⎨ 𝛿 𝑞 + = ⎪ 2𝜅 (︁ 𝑑 𝑑 𝑧 𝑞 )︁1/2 2−2𝜅𝑑 𝑧 < 𝑑 𝑞 < 22𝜅𝑑 𝑧 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩1 𝑑 𝑞 2−2𝜅𝑑 𝑧 ≤ This gives bounds for 𝑑 based on 𝜅 and 𝑑 that determine whether aggregate conditions 𝑞 𝑧 are paid attention to (the top two options) and, if so, whether idiosyncratic conditions are thenalsopaidattentionto(themiddleoption). 98

Foraninteriorsolution,wecompute𝛾 as: 𝛾 = (1 𝑛+)𝑤 − 𝑞 𝑞 (︂ )︂ 𝜎 𝑤 = 1 2−𝜅 𝑧 𝑧 𝑤 𝑞 − 𝜎 𝑤 𝑞 𝑞 𝜎 = 𝑤 2−𝜅𝑤 𝑧 𝑞 𝑧 − 𝜎 𝑞 𝜎 = 𝛾 +(1 𝛾)𝜁 2−𝜅𝜁 𝑧 𝑞 𝑧 − − 𝜎 𝑞 𝜎 𝜁 = 1 2−𝜅 𝑞 𝑧 − 𝜎 𝜁 𝑞 𝑞 Wemustofcoursecheckthatthis𝛾 isconsistentwithaninteriorsolution. Note: Our formulation is notationally different from MW, but we can rewrite it in their terms. Givenaninteriorsolution,wecancomputethelossweightas: 𝑤 = 𝛾 +(1 𝛾)𝜁 𝑞 𝑞 − (︂ )︂ (︂ )︂ 𝜎 𝜁 𝜎 𝜁 = 1 2−𝜅 𝑞 𝑧 + 2−𝜅 𝑞 𝑧 𝜁 𝑞 − 𝜎 𝜁 𝜎 𝜁 𝑞 𝑞 𝑞 𝑞 𝜎 𝜁 = 1 (1 𝜁 )2−𝜅 𝑞 𝑧 𝑞 − − 𝜎 𝜁 𝑞 𝑞 Andwehaveaninteriorsolutionif: 2−2𝜅𝑑 < 𝑑 < 22𝜅𝑑 𝑧 𝑞 𝑧 2−2𝜅𝜎2𝜁2 < 𝜎2𝑤2 < 22𝜅𝜎2𝜁2 𝑧 𝑧 𝑞 𝑞 𝑧 𝑧 𝜎 𝑤 2−𝜅 < 𝑞 𝑞 < 2𝜅 𝜎 𝜁 𝑧 𝑧 99

Now: 𝜎 𝑤 𝜎 1 𝑞 𝑞 = 𝑞 (1 𝜁 )2−𝜅 𝑞 𝜎 𝜁 𝜎 𝜁 − − 𝜁 𝑧 𝑧 𝑧 𝑧 𝑞 (︂ )︂ 𝜎 1 = 𝑞 1 2−𝜅 𝜎 𝜁 − 𝜁 − 𝑧 𝑧 𝑞 (︂ )︂ 1 𝜎 𝜁 1 𝜁 = 𝑞 𝑞 − 𝑞 2−𝜅 𝜁 𝜎 𝜁 − 𝜁 𝑞 𝑧 𝑧 𝑞 (︂ )︂ 1 𝜎 𝜁 = 𝑞 𝑞 2−𝜅(1 𝜁 ) 𝑞 𝜁 𝜎 𝜁 − − 𝑞 𝑧 𝑧 Thenwecanwritetheconditionas: (︂ )︂ 1 𝜎 𝜁 2−𝜅 < 𝑞 𝑞 2−𝜅(1 𝜁 ) < 2𝜅 𝑞 𝜁 𝜎 𝜁 − − 𝑞 𝑧 𝑧 𝜎 𝜁 2−𝜅𝜁 +2−𝜅(1 𝜁 ) < 𝑞 𝑞 < 2𝜅𝜁 +2−𝜅(1 𝜁 ) 𝑞 𝑞 𝑞 𝑞 − 𝜎 𝜁 − 𝑧 𝑧 orfinallyas: 𝜎 𝜁 2−𝜅 < 𝑞 𝑞 < 2−𝜅 +(2𝜅 2−𝜅)𝜁 𝑞 𝜎 𝜁 − 𝑧 𝑧 ThisisidenticaltoMW’sconditionforaninteriorsolution,whichis: 𝜎 𝑞 𝜁 𝑞 (︀ 2−𝜅,2−𝜅 +(2𝜅 2−𝜅)𝜁 )︀ 𝑞 𝜎 𝜁 ∈ − 𝑧 𝑧 100

References Cover, T. M. and J. A. Thomas (2006). Elements of Information Theory. John Wiley & Sons. Fulton,C.(2015). OptimalPricesinaMultisectorModelunderRationalInattention. Horn,R.A.andC.R.Johnson(2012). MatrixAnalysis. CambridgeUniversityPress. Jung, J., J.-h. Kim, F. Matejka, and C. A. Sims (2015). Discrete actions in informationconstraineddecisionproblems. Mac´kowiak, B. and M. Wiederholt (2009). Optimal Sticky Prices under Rational Inattention. TheAmericanEconomicReview99(3),769–803. Matêjka,F.andA.McKay(2015). RationalInattentiontoDiscreteChoices: ANewFoundationfortheMultinomialLogitModel. AmericanEconomicReview105(1),272–298. Matejka, F., M. Wiederholt, and B. Mac´kowiak (2017). The rational inattention filter. WorkingPaperSeries2007,EuropeanCentralBank. Morris, S. and H. S. Shin (2002). Social Value of Public Information. The American EconomicReview92(5),1521–1534. Sims, C. A. (2003). Implications of rational inattention. Journal of Monetary Economics50(3),665–690. Sims,C.A.(2010). RationalInattentionandMonetaryEconomics. Handbookofmonetary economics,Elsevier. Steiner, J., C. Stewart, and F. Mateˇjka (2017). Rational Inattention Dynamics: Inertia and DelayinDecision-Making. Econometrica85(2),521–553. 101

Woodford, M. (2014). Stochastic Choice: An Optimizing Neuroeconomic Model. The AmericanEconomicReview104(5),495–500. Zbaracki, M. J., M. Ritson, D. Levy, S. Dutta, and M. Bergen (2004). Managerial and Customer Costs of Price Adjustment: Direct Evidence from Industrial Markets. The ReviewofEconomicsandStatistics86(2),514–533. 102

Cite this document

APA

Chad Fulton (2017). Mechanics of linear quadratic Gaussian rational inattention tracking problems (FEDS 2017-109). Board of Governors of the Federal Reserve System, Finance and Economics Discussion Series. https://whenthefedspeaks.com/doc/feds_2017-109

BibTeX

@techreport{wtfs_feds_2017_109,
  author = {Chad Fulton},
  title = {Mechanics of linear quadratic Gaussian rational inattention tracking problems},
  type = {Finance and Economics Discussion Series},
  number = {2017-109},
  institution = {Board of Governors of the Federal Reserve System},
  year = {2017},
  url = {https://whenthefedspeaks.com/doc/feds_2017-109},
  abstract = {This paper presents a general framework for constructing and solving the multivariate static linear quadratic Gaussian (LQG) rational inattention tracking problem. We interpret the nature of the solution and the implied action of the agent, and we construct representations that formalize how the agent processes data. We apply this infrastructure to the rational inattention price-setting problem, confirming the result that a conditional response to economics shocks is possible, but casting doubt on a common assumption made in the literature. We show that multiple equilibria and a social cost of increased attention can arise in these models. We consider the extension to the dynamic problem and provide an approximate solution method that achieves low approximation error for many applications found in the LQG rational inattention literature. Accessible materials (.zip)},
}