3.3 Conditional Expectation

The standard approach in intermediary and advanced Probability textbooks is to introduce the conditional expectation of \(Y\) given \(X\) (this is defined for integrable \(Y\)) as any random variable \(W\) that satisfies the following two conditions:

There exists a measurable function \(\varphi\colon\mathbb{R}^{\mathrm{D}_X}\to\mathbb{R}\) such that \(W = \varphi(X).\)
It holds that \(\mathbf{E}(Y\,\mathbb{I}[X\in A]) = \mathbf{E}(W\,\mathbb{I}[X\in A]\)), for all Borel subsets \(A\subseteq\mathbb{R}^{\mathrm{D}_X}.\)

The proof that one can always find such a \(W\) relies on the famous Radon-Nikodym Theorem from Measure Theory, combined with the Doob-Dynkin Lemma which provides us with the function \(\varphi.\) As a matter of fact, one can obtain the regular conditional distribution in 3.1 as a corollary to this fact (a direct proof also relies on the Radon-Nikodym Theorem, see Billingsley (1995), Section 33, especially Theorem 33.3).

Since we introduced the idea of conditioning through the viewpoint of regular conditional distributions, we can take a shortcut and define, for any integrable random variable \(Y\) and any random vector⁹ \(X,\) \[ \mathbf{E}(Y\,|\,X = x) := \int_0^1 Q_{Y|X}(\tau|x)\,\mathrm{d}\tau,\qquad x\in\mathbb{R}^{\mathrm{D}_X}, \] called the conditional expectation of \(Y\) given \(X=x\), and, writing \(\varphi(x) := \mathbf{E}(Y\,|\,X=x),\) let \[ \mathbf{E}(Y\,|\,X) := \varphi(X), \] called the conditional expectation of \(Y\) given \(X\).

Exercise 3.6 Show that

\(\mathbf{E}(\mathbb{I}[Y\in B]\,|\,X=x) = \mathbf{P}[Y\in B\,|\,X=x]\) holds for every \(x\in\mathbb{R}^{\mathrm{D}_X}\) and all Borel sets \(B\subseteq \mathbb{R}.\)
\(\mathbf{E}\{\mathbf{E}(Y|X)\,\mathbb{I}[X\in A]\} = \mathbf{E}(Y\,\mathbb{I}[X\in A])\) holds for all Borel sets \(A\subseteq\mathbb{R}^{\mathrm{D}_X}.\)
\(\mathbf{E}(aY + Z\,|\,X) = a\mathbf{E}(Y\,|\,X) + \mathbf{E}(Y\,|\,X)\) for any \(a\in\mathbb{R}.\)
(Substitution principle) \(\mathbf{E}(\psi(X,Z)\,|\,X=x) = \mathbf{E}(\psi(x,Z)\,|\,X=x).\) In particular, \(\mathbf{E}(\psi(X)Y\,|\,X=x) = \psi(x)\mathbf{E}(Y\,|\,X=x)\) and \(\mathbf{E}(\psi(X)Y\,|\,X) = \psi(X)\mathbf{E}(Y\,|\,X).\)
If \(Y\) and \(X\) are mutually independent, then \(\mathbf{E}(Y\,|\,X) = \mathbf{E}(Y).\) This holds, in particular, if \(X\) is constant.

In fact, the definition is still valid even if \(X\) is an infinite sequence of random variables. This fact will be useful in the context of time series quantile regression models, where we typically condition on the entire “past” \(\{(Y_{s},X_{s}):s<t\}.\)↩︎