No Title

STAT 870 Lecture 9

strong Markov property:

Stopping time for the Markov chain is a random variable T taking values in $\{0, 1,\cdots\}\cup\{\infty\}$ such that for each finite k there is a function f_k such that

$\begin{displaymath}1(T=k) = f_k(X_0,\ldots,X_k) \end{displaymath}$

Notice that T_k in theorem is a stopping time.

Standard shorthand notation: by

P^x(A)

we mean

$\begin{displaymath}P(A\vert X_0=x) \, . \end{displaymath}$

Similarly we define

$\begin{displaymath}{\rm E}^x(Y)= {\rm E}(Y\vert X_0=x) \, . \end{displaymath}$

Goal: explain and prove

$\begin{displaymath}{\rm E}(f(X_{T},\ldots)\vert X_T,\ldots,X_0) = {\rm E}^{X_T}(f(X_0,\ldots)) \end{displaymath}$

Simpler claim:

$\begin{displaymath}P(X_{T+1}=j\vert X_{T}=i) = {\bf P}_{ij} = P^i(X_1=j) \, . \end{displaymath}$

Notation: $A_k=\{X_{k}=i,T=k\}$

Notice: $A_k=\{X_T=k,T=k\}$ :
$\begin{align*}P(& X_{T+1}=j\vert X_{T}=i) = \frac{ P(X_{T+1}=j,X_{T}=i)}{P(X_T=i... ... P(X_{1}=j\vert X_{0}=i)P(A_k)}{\sum_k P(A_k)} \\ & ={\bf P}_{i,j} \end{align*}$
Notice use of fact that T=k is event defined in terms of $X_0,\ldots,X_k$ .

Technical problems with proof:

$\bullet$ It might be that $P(T=\infty)>0$ . What are X_T and X_T+1 on the event $T=\infty$ .

Answer: condition also on $T<\infty$ .

$\bullet$ Prove formula only for stopping times where $\{T < \infty\}\cap \{X_T=i\}$ has positive probability.

We will now fix up these technical details.

Conditional distributions, expectations

When X and Y are discrete we have

$\begin{displaymath}{\rm E}(Y\vert X=x) = \sum_y P(Y=y\vert X=x) \end{displaymath}$

for any x for which P(X=x) is positive.

Defines a function of x.

This function evaluated at X gives rv which is ftn of X denoted

$\begin{displaymath}{\rm E}(Y\vert X) \, . \end{displaymath}$

Here are some properties of that function.

1)

Suppose A is a function defined on the range of X. Then

$\begin{displaymath}{\rm E}(A(X)Y\vert X=x) = A(x){\rm E}(Y\vert X=x) \end{displaymath}$

and so

$\begin{displaymath}{\rm E}(A(X)Y\vert X) = A(X){\rm E}(Y\vert X) \end{displaymath}$

Second assertion follows from first. Note that if Z=A(X) Y then Z is discrete and

$\begin{displaymath}P(Z=z) = \sum_{x,y} P(Y=y,X=x) 1( z = A(x) y) \end{displaymath}$

Also
$\begin{align*}P&(Z=z\vert X=x) \\ & = \frac{\sum_y P(Y=y, X=x) 1(z=A(x)y)}{P(X=x) } \\ & = \sum_y P(Y=y\vert X=x)1(z=A(x)y) \end{align*}$
Thus
$\begin{align*}{\rm E}&(Z\vert X=x) \\ & =\sum_z z P(Z=z\vert X=x) \\ & = \sum... ...y\vert X=x) \sum_z 1(z=A(x)y) \\ & = A(x) \sum_y yP(Y=y\vert X=x) \end{align*}$

2)

Repeated conditioning: if X, Y and Z discrete then
$\begin{align*}{\rm E}\left\{{\rm E}(Z\vert X,Y)\vert X\right\} & = {\rm E}(Z\vert X) \\ {\rm E}\left\{{\rm E}(Y\vert X)\right\} & = {\rm E}(Y) \end{align*}$

3)

Additivity

$\begin{displaymath}{\rm E}(Y+Z\vert X)={\rm E}(Y\vert X)+{\rm E}(Z\vert X) \end{displaymath}$

4)

Putting the first two items together gives

$\begin{displaymath} {\rm E}\left\{{\rm E}(A(X)Y\vert X) \right\} {\rm E}(A(X)Y) \end{displaymath}$

(1)

Definition of ${\rm E}(Y\vert X)$ when X and Y are not assumed to discrete:

${\rm E}(Y\vert X)$ is rv which is measurable function of X satisfying(1).

Existence is measure theory problem.

Suppose $f(x_0,x_1,\ldots)$ is a (measurable) function on $S^{\Bbb N}$ . Put

$\begin{displaymath}Y_n = f(X_n,X_{n+1},\ldots)\, . \end{displaymath}$

Assume ${\rm E}(\, \vert Y_0\vert \, \vert X_0=x)<\infty$ for all x. Claim:

$\begin{displaymath} {\rm E}(Y_n\vert X_n,A) = {\rm E}^{X_n}(Y_0) \end{displaymath}$

(2)

whenever A is any event defined in terms of $X_0,\ldots,X_n$ .

Proof:

Family of f for which claim is true includes all indicators; see extension of Markvo Property in previous lecture.
family of f for which claim is true is vector space (so if f, g in family then so is af+bg for any constants a and b.
So family of f for which claim is true includes all simple functions.
family of f for which claim true is closed under monotone increasing limits (of non-negative f_n) by the Monotone Convergence theorem.
So claim true for every non-negative integrable f.
Claim follows for integrable f by linearity.

Aside on ``measurable'': what sorts of events can be defined in terms of a family $\{Y_i: i \in I\}$ ?

Natural: any event of form $(Y_{i_1}, \ldots, Y_{i_k}) \in C$ is ``defined in terms of the family'' for any finite set $i_1,\ldots,i_k$ and any (Borel) set C in S^k.

For countable S: each singleton $(s_1,\ldots,s_k)\in S^k$ Borel. So every subset of S^k Borel.

Natural: if you can define each of a sequence of events A_n in terms of the Ys then the definition ``there exists an n such that (definition of A_n) ...'' defines $\cup A_n$ .

Natural: if A is definable in terms of the Ys then A^c can be defined from the Ys by just inserting the phrase ``It is not true that'' in front of the definition of A.

So family of events definable in terms of the family $\{Y_i: i \in I\}$ is a $\sigma$ -field which includes every event of the form $(Y_{i_1}, \ldots, Y_{i_k}) \in C$ . We call the smallest such $\sigma$ -field, ${\cal F}(\{Y_i: i \in I\})$ , the $\sigma$ -field generated by the family $\{Y_i: i \in I\}$ .

This discussion permits some shorthand. We define

$\begin{displaymath}{\cal F}_n = {\cal F}(\{X_0,\ldots,X_n\}) \end{displaymath}$

and

$\begin{displaymath}{\cal F}_\infty = {\cal F}_(\{X_0,X_1,\ldots\}) \end{displaymath}$

Conditioning on $X_0,\ldots,X_T$ where T is a stopping time?

Method 1: regard vector $X_0,\ldots,X_T$ as taking values in exotic space, namely:

$\begin{displaymath}\left(\bigcup_{n=1}^\infty S^n \right) \bigcup S^{\Bbb N} \end{displaymath}$

We define

$\begin{displaymath}{\cal F}_T = {\cal F}(X_0,\ldots,X_T) \end{displaymath}$

Random vector $X_0,\ldots,X_T$ is discrete on $T<\infty$ but not discrete on $T=\infty$ .

Suppose X is discrete and X^*=g(X) is a one to one transformation of X. Since X=x is the same event as X^*=g(x) we find

$\begin{displaymath}{\rm E}(Y\vert X=x) = {\rm E}(Y\vert X^*=g(x)) \end{displaymath}$

Let h^*(u) denote the function ${\rm E}(Y\vert X^*=u)$ and $h(u)={\rm E}(Y\vert X=u)$ . Then

h(x)=h^*(g(x))

Thus

h(X) = h^*(g(X)) = h^*(X^*)

This just means

$\begin{displaymath}{\rm E}(Y\vert X)= {\rm E}(Y\vert X^*) \end{displaymath}$

Interpretation.

Formula is ``obvious''.

Example: Toss coin n=20 times. Y is indicator of first toss is a heads. X is number of heads and X^* number of tails. Formula says:

$\begin{displaymath}{\rm E}(Y\vert X=17) = {\rm E}(Y\vert X^*=3) \end{displaymath}$

In fact for a general k and n

$\begin{displaymath}{\rm E}(Y\vert X=k) = \frac{k}{n} \end{displaymath}$

$\begin{displaymath}{\rm E}(Y\vert X) = \frac{X}{n} \end{displaymath}$

At the same time

$\begin{displaymath}{\rm E}(Y\vert X^*=j) = \frac{n-j}{n} \end{displaymath}$

$\begin{displaymath}{\rm E}(Y\vert X^*) = \frac{n-X^*}{n} \end{displaymath}$

But of course X=n-X^* so these are just two ways of describing the same random variable.

Another interpretation: Rv X partitions $\Omega$ into countable set of events of the form X=x.

Other rv X^* partitions $\Omega$ into the same events.

Then values of ${\rm E}(Y\vert X^*=x^*)$ are same as values of ${\rm E}(Y\vert X=x)$ but labelled differently.

To form ${\rm E}(Y\vert X)$ take value $\omega$ , compute $X(\omega)$ to determine member A of the partition we being conditionsed on, then write down corresponding ${\rm E}(Y\vert A)$ .

Hence conditional expectation depends only on partition of $\Omega$ .

X not discrete: replace partition with $\sigma$ -field. Suppose X and X^* 2 rvs such that ${\cal F}(X) = {\cal F}(X^*)$ . Then:

There is g Borel,one to one with one to one Borel inverse s.t. X^*=g(X).
${\rm E}(Y\vert X)={\rm E}(Y\vert X^*)$ almost surely.

In other words ${\rm E}(Y\vert X)$ depends only on the $\sigma$ -field generated by X. We write

$\begin{displaymath}{\rm E}(Y\vert{\cal F}(X))={\rm E}(Y\vert X) \end{displaymath}$

Definition: Suppose ${\cal G}$ is sub- $\sigma$ -field of ${\cal F}$ . X is ${\cal G}$ measurable if, for every Borel B

$\begin{displaymath}\{\omega: X(\omega) \in B\} \in {\cal G} \, . \end{displaymath}$

Definition: ${\rm E}(Y\vert{\cal G})$ is any ${\cal G}$ measurable rv s.t. for every ${\cal G}$ measurable rv variable Awe have

$\begin{displaymath}{\rm E}(AY) = {\rm E}\left\{A{\rm E}(Y\vert{\cal G})\right\} \, . \end{displaymath}$

Again existence is measure theory problem.

Now we define ${\cal F}_T$ to be the collection of all events B such that

$\begin{displaymath}B\cap \{T \le n\} \in {\cal F}_n \end{displaymath}$

for each n. It is easy to prove that ${\cal F}_T$ is a $\sigma$ -field and that T and X_T are ${\cal F}_T$ measurable.

Rewrite Strong Markov Property: if

$f:S^{\Bbb N} \mapsto {\Bbb R}$ is Borel and
$Y_T = f(X_T,X_{T+1}, \ldots)1(T<\infty)$ and
$Y_0 = f(X_0,X_1,\ldots)$ and
${\rm E}^x(\vert Y_0\vert) < \infty$ for all x.

then
$\begin{multline*}1(T<\infty){\rm E}(f(X_{T},X_{T+1},\ldots)\vert{\cal F}_T) \\ = 1(T<\infty){\rm E}^{X_T}(f(X_0,X_1,\ldots)) \end{multline*}$

Proof of Strong Markov Property.

Since T is ${\cal F}_T$ measurable we have
$\begin{multline*}1(T<\infty){\rm E}\left\{f(X_{T},X_{T+1},\ldots)\vert{\cal F}_T... ...left\{ 1(T<\infty)f(X_{T},X_{T+1},\ldots)\vert{\cal F}_T\right\} \end{multline*}$
Suppose A is ${\cal F}_T$ measurable; need only prove
$\begin{multline*}{\rm E}\left[A1(T<\infty){\rm E}^{X_T}\left\{f(X_0,X_1,\ldots)\... ... \\ = {\rm E}\left\{A1(T<\infty)f(X_{T},X_{T+1},\ldots)\right\} \end{multline*}$

Break up the events according to value of T:
$\begin{align*} A1(T< &\infty) {\rm E}^{X_T}\left\{f(X_0,X_1,\ldots)\right\} \\ ... ...\\ & = \sum_k A1(T=k){\rm E}^{X_k}\left\{f(X_0,X_1,\ldots)\right\} \end{align*}$

On the other hand
$\begin{align*}A1(&T<\infty)f(X_{T},X_{T+1},\ldots) \\ & = \sum_k A1(T=k)f(X_{T},X_{T+1},\ldots) \\ & = \sum_k A1(T=k)f(X_k,X_{k+1},\ldots) \end{align*}$
But by our extended Markov property
$\begin{align*}{\rm E}& \left\{ A1(T=k)f(X_k,X_{k+1},\ldots)\right\} \\ & = {\r... ...\left[A1(T=k){\rm E}^{X_k}\left\{f(X_0,X_{1},\ldots)\right\}\right] \end{align*}$
Exactly expectation of kth term in (). This proves the Strong Markov Property.

$next$ $up$ $previous$

Richard Lockhart
2000-10-17