conditional expectation (changes) in nLab
Showing changes from revision #27 to #28: Added | Removed | Changed
Context
Measure and probability theory
Contents
Idea
In probability theory a conditional expectation value or conditional expectation, for short, is like an expectation value of some random variable/observable, but conditioned on the assumption that a certain event is assumed to have occured.
More technically: If (Ξ©,π,P)(\Omega,\mathfrak{A},P) is a probability space, the conditional expectation E[X|Ξ£]E[X|\Sigma] of a (measurable) random variable XX with respect to some sub-Ο\sigma-algebra Ξ£βπ\Sigma\subseteq \mathfrak{A} is some measurable random variable which is a ββcoarsenedββ version of XX. We can think of E[X|Ξ£]E[X|\Sigma] as a random variable with the same domain but which is measured with a sigma algebra containing only restricted information on the original event since to some events in π\mathfrak{A} has been assigned probability 11 or 00 in a consistent way.
Conditional expectation relative to a random variable
Let (Ξ©,π,P)(\Omega,\mathfrak{A},P) be a probability space, let YY be a measurable function into a measure space (U,Ξ£,P Y)(U,\Sigma,P^Y) equipped with the pushforward measure induced by YY, let X:(Ξ©,π,P)β(β,β¬(β),Ξ»)X:(\Omega,\mathfrak{A},P)\to(\mathbb{R},\mathcal{B}(\mathbb{R}), \lambda) be a real-valued random variable.
Then for XX and YY there exists a essentially unique (two sets are defined to be equivalent if their difference is a set of measure 00) integrable function g=:E[X|Y]g=:E[X|Y] such that the following diagram commutes:
(Ξ©,π,P) βY (U,Ξ£,P Y) β X β g=:E[X|Y] (β,β¬(β),Ξ») \array{ (\Omega,\mathfrak{A},P)& \stackrel{Y}{\to}& (\U, \Sigma, P^Y) \\ \downarrow^{\mathrlap{X}} && \swarrow_{\mathrlap{g=:E[X|Y]}} \\ (\mathbb{R},\mathcal{B}(\mathbb{R}),\lambda) }
where g:yβ¦E[X|Y=y]g:y\mapsto E[X|Y=y]. Here ββcommutesββ shall mean that
(1) gg is Ξ£\Sigma-measurable.
(2) the integrals over XX and gβYg\circ Y are equal.
In this case g=E[X|Y]g=E[X|Y] is called a version of the conditional expectation of XX provided YY.
In more detail (2) is equivalent to that for all BβΞ£B\in \Sigma we have
β« Y β1(B)X(Ο)dP(Ο)=β« Bg(u)dP Y(u)\int_{Y^{-1}(B)}X(\omega)d P(\omega)=\int_B g(u)d P^Y (u)
and to
β« Y β1(B)X(Ο)dP(Ο)=β« Y β1(B)(gβY)(Ο)dP(Ο)\int_{Y^{-1}(B)}X(\omega)d P(\omega)=\int_{Y^{-1}(B)}(g\circ Y)(\omega)d P (\omega)
(The equivalence of the last two formulas is given since we always have β« Bg(u)dP Y(u)=β« Y β1(B)(gβY)(Ο)dP(Ο)\int_B g(u)d P^Y (u)=\int_{Y^{-1}(B)} (g\circ Y)(\omega)d P (\omega) by the substitution rule.)
Note that it does not follow from the preceding definition that the conditional expectation exists. This is a consequence of the Radon-Nikodym theorem as will be shown in the following section. (Note that the argument of the theorem applies to the definition of the conditional expectation by random variables if we consider the pushforward measure as given by a sub-Ο\sigma-algebra of the original one. In this sense E[X|Y]E[X|Y] is a ββcoarsened versionββ of XX factored by the information (i.e. the Ο\sigma-algebra) given by YY.)
Conditional expectation relative to a sub-Ο\sigma-algebra
Note that by construction of the pushforward-measure it suffices to define the conditional expectation only for the case where Ξ£:=πβπ\Sigma:=\mathfrak{S}\subseteq \mathfrak{A} is a sub-Ο\sigma-algebra.
(Note that we loose information with the notation P YP^Y; e.g P π idP^{id}_\mathfrak{A} is different from P π idP^{id}_\mathfrak{S})
The diagram
(Ξ©,π,P) βid (Ξ©,π,P id) β X β Z=:E[X|π] (β,β¬(β),Ξ»)\array{ (\Omega,\mathfrak{A},P)& \stackrel{id}{\to}& (\Omega, \mathfrak{S}, P^{id}) \\ \downarrow^X&& \swarrow^{Z=:E[X|\mathfrak{S}]} \\ (\mathbb{R},\mathcal{B}(\mathbb{R}),\lambda) }
is commutative (in our sense) iff
(a) ZZ is π\mathfrak{S}-measurable
(b) β« AZdP=β« AXdP\int_A Z d P=\int_A X d P, βAβπ\forall A\in \mathfrak{S}
We hence can write the conditional expectation as the equivalence class
E[X|π]={ZβL 1(Ξ©,F,P)|β« AZdP=β« AXdPβAβπ}E[X|\mathfrak{S}]=\{Z\in L^1 (\Omega, F,P)|\int_A ZdP=\int_A XdP\;\forall A\in \mathfrak{S}\}
An element of this class is also called a version.
Theorem
E[X|π]E[X|\mathfrak{S}] exists and is unique almost surely.
Proof
Existence: By
Q(A):=β« AX(Ο)P(dΟ)Q(A):=\int_A X(\omega)P(d\omega)
AβπA\in \mathfrak{A} is defined a measure QQ on (Ξ©,π,P)(\Omega,\mathfrak{A},P) (if Xβ₯0X\ge 0; if not consider the positive part X +X^+ and the negative part X βX^- of X=X +βX βX=X^+ -X^- separate and use linearity of the integral). Let P| πP|_{\mathfrak{S}} be the restriction of PP to π\mathfrak{S}. Then
Q<<P| πQ\lt\lt P|_{\mathfrak{S}}
meaning: P| π(M)=0βQ(M)=0P|_{\mathfrak{S}}(M)=0\Rightarrow Q(M)=0 for all MβπM\in\mathfrak{S}. This is the condition of the theorem of Radon-Nikodym (the other condition of the theorem that P| πP|_{\mathfrak{S}} is Ο\sigma-finite is satisfied since PP is a probability measure). The theorem implies that QQ has a density w.r.t P| πP|_{\mathfrak{S}} which is E[X|π]E[X|\mathfrak{S}].
Uniqueness: If gg and g β²g^\prime are candidates, by linearity the integral over their difference is zero.
Conditional probability
From elementary probability theory we know that P(A)=E[1 A]P(A)=E[1_A].
For AβπA\in \mathfrak{S} we call P(A|π):=E[1 A|π]P(A|\mathfrak{S}):=E[1_A|\mathfrak{S}] the conditional probability of AA provided BB.
Conditional distribution, Conditional density
Integral kernel, Stochastic kernel
In probability theory and statistics, a stochastic kernel is the transition function of a stochastic process. In a discrete time process with continuous probability distributions, it is the same thing as the kernel of the integral operator that advances the probability density function.
Integral kernel
An integral transform TT is an assignation of the form
(Tf)(u)=β«K(t,u)f(t)dt(Tf)(u)=\int K(t,u)f(t)dt
where the function of two variables K(β¦,β―)K( \dots ,\cdots) is called integral kernel of the transform TT.
Stochastic kernel
Let (Ξ© 1,π 1)(\Omega_1,\mathfrak{A}_1) be a measure space, let (Ξ© 2,π 2)(\Omega_2,\mathfrak{A}_2) be a measurable space.
A map Q:Ξ© 1Γπ 2Q: \Omega_1\times \mathfrak{A}_2 satisfying
(1) Q(β,A):Ξ© 1β[0,1]Q(-, A):\Omega_1\to [0,1] is π 1\mathfrak{A}_1 measurable βA 2βπ 2\forall A_2\in \mathfrak{A}_2
(2) Q(Ο,β):π 2β[0,1]Q(\omega,-):\mathfrak{A}_2\to [0,1] is a probability measure on (Ξ© 2,π 2)(\Omega_2,\mathfrak{A}_2), βΟ 1βΞ© 1\forall \omega_1\in \Omega_1
is called a stochastic kernel or transition kernel (or Markov kernel - which we avoid since it is confusing) from (Ξ© 1,π 1)(\Omega_1,\mathfrak{A}_1) to (Ξ© 2,π 2)(\Omega_2,\mathfrak{A}_2).
Then QQ induces a function between the classes of measures on (Ξ© 1,π 1)(\Omega_1, \mathfrak{A}_1) and on (Ξ© 2,π 2)(\Omega_2, \mathfrak{A}_2)
QΒ―:{M(Ξ© 1,π 1) β M(Ξ© 2,π 2) ΞΌ β¦ (Aβ¦β« Ξ© 1Q(β,A)dΞΌ)\overline{Q}: \begin{cases} M(\Omega_1, \mathfrak{A}_1)& \to& M(\Omega_2, \mathfrak{A}_2) \\ \mu& \mapsto& (A\mapsto \int_{\Omega_1} Q(-, A) d\mu) \end{cases}
If ΞΌ\mu is a probability measure, then so is QΒ―(ΞΌ)\overline{Q}(\mu). The symbol Q(Ο,A)Q(\omega, A) is sometimes written as Q(A|Ο)Q(A|\omega) in optical proximity to a conditional probability.
The stochastic kernel is hence in particular an integral kernel.
In a discrete stochastic process (see below) the transition function is a stochastic kernel (more precisely it is the function QΒ―\overline{Q} induced by a kernel QQ).
Coupling (Koppelung)
Let (Ξ© 1,π 1,P 1)(\Omega_1,\mathfrak{A}_1, P_1) be a probability space, let (Ξ© 2,π 2)(\Omega_2,\mathfrak{A}_2) be a measure space, let Q:Ξ© 1Γπ 2β[0,1]Q:\Omega_1\times \mathfrak{A}_2\to [0,1] be a stochastic kernel from (Ξ© 1,π 1,P 1)(\Omega_1,\mathfrak{A}_1, P_1) to (Ξ© 2,π 2)(\Omega_2,\mathfrak{A}_2).
Then by
P(A):=β« Ξ© 1(β« Ξ© 21 A(Ο 1,Ο 2Q(Ο 1,Ο 2))P 1(dΟ 1)P(A):=\int_{\Omega_1}(\int_{\Omega_2} 1_A (\omega_1,\omega_2 Q(\omega_1,\omega_2))P_1(d \omega_1)
is defined a probability measure on π 1βπ 2\mathfrak{A}_1\otimes\mathfrak{A}_2 which is called coupling. P=:PβQP=:P\otimes Q is unique with the property
P(A 1ΓA 2)=β« A 1Q(Ο 1,A 2)P 1(dΟ 1)P(A_1\times A_2)=\int_{A_1} Q(\omega_1, A_2) P_1(d\omega_1)
Theorem
Let (with the above settings) Y:Ξ© 1βΞ© 2Y:\Omega_1\to \Omega_2 be (π 1,π 2)(\mathfrak{A}_1,\mathfrak{A}_2)-measurable, let XX be a dd-dimensional random vector.
Then there exists a stochastic kernel from (Ξ© 1,π 1)(\Omega_1, \mathfrak{A}_1) to (β d,β¬(β) d)(\mathbb{R}^d,\mathcal{B}(\mathbb{R})^d) such that
P X,Y=P YβQP^{X,Y}=P^Y\otimes Q
and QQ is (a version of) the conditional distribution of XX provided YY, i.e.
Q(y,β)=P X(β|Y=y)Q(y,-)=P^X(-|Y=y)
This theorem says that that QQ (more precisely yβ¦Q(y,β)y\mapsto Q(y,-)) fits in the diagram
(Ξ© 1,π 1,P) βY (Ξ© 2,π 2,P Y) β X β Q (β,β¬(β),Ξ»)\array{ (\Omega_1,\mathfrak{A}_1,P)& \stackrel{Y}{\to}& (\Omega_2,\mathfrak{A}_2, P^Y) \\ \downarrow^X&& \swarrow^{Q} \\ (\mathbb{R},\mathcal{B}(\mathbb{R}),\lambda) }
and E[X|Y]=QE[X|Y]=Q.
Discrete case
In the discrete case, i.e. if Ξ© 1\Omega_1 and Ξ© 2\Omega_2 are finite- or enumerable sets, it is possible to reconstruct QQ by just considering one-element sets in π 2\mathfrak{A}_2 and the related probabilities
p ij:=Q(i,{j})p_{ij}:= Q(i,\{j\})
called transition probabilities encoding QQ assemble to a (perhaps countably infinite) matrix MM called transition matrix of QQ resp. of QΒ―\overline{Q}. Note that p ijp_{ij} is the probability of the transition of the state (aka. elementary event or one-element event) ii to the event {j}\{j\} (which in this case happens to have only one element, too). We have β ip ij=1\sum_i p_{ij}=1 forall iβΞ© 1i\in \Omega_1.
If Ο:=(p i) iβΞ© 1\rho:=(p_i)_{i\in \Omega_1} is a counting density on Ξ© 1\Omega_1, then
pM=(β iβΞ©p ip ij) jβΞ© 2pM=(\sum_{i\in \Omega} p_i p_{ij})_{j\in \Omega_2}
is a counting density on Ξ© 2\Omega_2.
The conditional expectation plays a defining role in the theory of martingales which are stochastic processes such that the conditional expectation of the next value (provided the previous values) equals the present realized value.
Stochastic processes
The terminology of stochastic processes is a special interpretation of some aspects of infinitary combinatorics? in terms of probability theory.
Let II be a total order (i.e. transitive, antisymmetric, and total).
A stochastic process is a diagram X I:IββX_I: I\to \mathcal{R} where β\mathcal{R} is the class of random variables such that X I(i)=:X i:(Ξ© i,π i,P i)β(S i,π i)X_I(i)=:X_i:(\Omega_i, \mathfrak{F}_i, P_i)\to (S_i, \mathfrak{S}_i) is a random variable. Often one considers the case where all (S i,π i)=(S,π)(S_i, \mathfrak{S}_i)=(S, \mathfrak{S}) are equal; in this case SS is called state space of the process X IX_I.
If all Ξ© i=Ξ©\Omega_i=\Omega are equal and the class of Ο\sigma-algebras (π i) iβI(\mathfrak{A}_i)_{i\in I} is filtered i.e.
π iβπ j;iff;iβ€j\mathfrak{F}_i\subseteq \mathfrak{F}_j\;;iff\;; i\le j
and all X lX_l are π l\mathfrak{F}_l measurable, the process is called adapted process.
For example the natural filtration where π i=Ο({X l β1(A),lβ€i,Aβπ})\mathfrak{F}_i=\sigma(\{X^{-1}_l(A), l\le i, A\in \mathfrak{S}\}) gives an adapted process.
In terms of a diagram we have for iβ€ji\le j
(Ξ© j,π j,P j) βf (Ξ© i,π i,P i) β X j β Ο iβ¦Q(Ο i,β) (β,β¬(β),Ξ»)\array{ (\Omega_j,\mathfrak{A}_j,P_j)& \stackrel{f}{\to}& (\Omega_i,\mathfrak{A}_i,P_i) \\ \downarrow^{X_j}&& \swarrow^{\omega_i\mapsto Q(\omega_i,-)} \\ (\mathbb{R},\mathcal{B}(\mathbb{R}),\lambda) }
and QΒ―:(Ξ© i,π i,P i)β(Ξ© j,π j,P j)\overline{Q}:(\Omega_i,\mathfrak{A}_i,P_i)\to(\Omega_j,\mathfrak{A}_j,P_j) where Q:Ξ© iΓπ jβ[0,1]Q:\Omega_i\times\mathfrak{A}_j\to [0,1] is the transition probability for the passage from state ii to state jj.
Martingale
An adapted stochastic process with the natural filtration in discrete time is called a martingale if all E[X i]<βE[X_i]\lt \infty and βiβ€j,E[X j|π i]=X i\forall i\le j, E[X_j|\mathfrak{A}_i]=X_i.
(Ξ© j,π j,P j) βf (Ξ© i,π i,P i) β X j β E[X j|π i]=X i (β,β¬(β),Ξ»)\array{ (\Omega_j,\mathfrak{A}_j,P_j)& \stackrel{f}{\to}& (\Omega_i,\mathfrak{A}_i,P_i) \\ \downarrow^{X_j}&& \swarrow^{E[X_j|\mathfrak{A}_i]=X_i} \\ (\mathbb{R},\mathcal{B}(\mathbb{R}),\lambda) }
(β¦)
Markow Process
An adapted stochastic process satisfying
P(X t|π s)=P(X t|X s);βsβ€tP(X_t|\mathfrak{A}_s)=P(X_t|X_s)\;;\forall s\le t
is called a Markow process.
Chapman-Kolmogorow Equation
For a Markow process the Chapman-Kolmogorow equation encodes the statement that the transition probabilities of the process form a semigroup.
If in the notation from above (P t:Ξ©Γπβ[0,1]) t(P_t:\Omega\times\mathfrak{A}\to [0,1])_t is a family of stochastic kernels (Ξ©,π)β(Ξ©,π)(\Omega,\mathfrak{A})\to(\Omega,\mathfrak{A}) such that all P t(Ο,β):πβ[0,1]P_t(\omega,-):\mathfrak{A}\to [0,1] are probabilities, then (P t) t(P_t)_t is called transition semigroup if
PΒ― t(P s(Ο,A))=P s+t(Ο,A)\overline P_t (P_s(\omega,A))=P_{s+t} (\omega, A)
where
PΒ― t:P s(Ο,β)β¦(Aβ¦β« Ξ©P t(y,A)P s(Ο,β)(d y))\overline P_t: P_s(\omega,-)\mapsto (A\mapsto\int_\Omega P_t (y,A) P_s(\omega,-)(d_y))
In quantum probability theory
In the dual algebraic formulation of probability theory known as noncommutative probability theory or quantum probability theory, where the concept of expectation value is primitive (while that of the corresponding probability space (if it exists) is a derived concept), the concept of conditional expection appears as follows (e.g. Redei-Summers 06, section 7.3):
Let (π,β¨ββ©)(\mathcal{A},\langle -\rangle) be a quantum probability space, hence a complex star algebra π\mathcal{A} of quantum observables, and a state on a star-algebra β¨ββ©:πββ\langle -\rangle \;\colon\; \mathcal{A} \to \mathbb{C}.
This means that for AβπA \in \mathcal{A} any observable, its expectation value in the given state is
πΌ(A)ββ¨Aβ©ββ. \mathbb{E}(A) \;\coloneqq\; \langle A \rangle \in \mathbb{C} \,.
More generally, if PβπP \in \mathcal{A} is a real idempotent/projector
(1)P *=P,AAAPP=P P^\ast = P \,, \phantom{AAA} P P = P
thought of as an event, then for any observable AβπA \in \mathcal{A} the conditional expectation value of AA, conditioned on the observation of PP, is
(2)πΌ(A|P)ββ¨PAPβ©β¨Pβ©. \mathbb{E}(A \vert P) \;\coloneqq\; \frac{ \left \langle P A P \right\rangle }{ \left\langle P \right\rangle } \,.
References
See also
- Wikipedia, Conditional expectation
Discussion form the point of view of quantum probability is in
- Miklos Redei, Stephen Summers, section 7.3 of Quantum Probability Theory (arXiv:quant-ph/0601158)
Last revised on July 21, 2024 at 16:22:26. See the history of this page for a list of all contributions to it.