web

STAT 380 Week 3

Markov Chains

Last names example has following structure: if, at generation there are individuals then the number of sons in the next generation has the distribution of the sum of independent copies of the random variable which is the number of sons I have. This distribution does not depend on , only on the value of . We call a Markov Chain.

Ingredients of a Markov Chain:

A state space . will be finite or countable in this course.
A sequence $X_0,X_1,\ldots$ of random variables whose values are all in .
Matrix ${\bf P}$ with entries $P_{i,j}$ for $i,j\in S$ . ${\bf P}$ is required to be stochastic:

$\displaystyle \sum_k P_{ik} = 1$ and $\displaystyle \qquad 0 \le P_{ij}$
for all .

The stochastic process $X_0,X_1,\ldots$ is called a Markov chain if

$\displaystyle P\left(X_{k+1} =j\vert X_k=i,A\right) = P_{i,j}$

for any

defined in terms of $X_0,\ldots,X_{k-1}$ and for all

. Usually used with

$\displaystyle A=\{X_{k-1}=i_{k-1},\ldots,X_0=i_0\}$

for some $i_0,\ldots,i_{k-1}$ .

The matrix ${\bf P}$ is called a transition matrix.

Example: If in the last names example has a Poisson $(\lambda)$ distribution then given , $Z_{n+1}$ is like sum of independent Poisson $(\lambda)$ rvs which has a Poisson( $k\lambda$ ) distribution. So

$\displaystyle {\bf P}= \left[\begin{array}{llll} 1 & 0 & 0 & \cdots \\ e^{-\l... ...\lambda} & \cdots \\ \vdots & \vdots & \vdots & \ddots \end{array}\right]$

Example: Weather: each day is dry (D) or wet (W).

is weather on day n.

Suppose dry days tend to be followed by dry days, say 3 times in 5 and wet days by wet 4 times in 5.

Markov assumption: yesterday's weather irrelevant to prediction of tomorrow's given today's.

Transition Matrix:

$\displaystyle {\bf P}= \left[\begin{array}{cc} \frac{3}{5} & \frac{2}{5} \\ \\ \frac{1}{5} & \frac{4}{5} \end{array} \right]$

Now suppose it's wet today. P(wet in 2 days)?

$\displaystyle P(X_2=W\vert X_0=W) \hspace*{230pt}$

$\displaystyle =$	$\displaystyle P(X_2=W,X_1=D \vert X_0=W)$
	$\displaystyle \quad + P(X_2=W,X_1=W \vert X_0=W)$
$\displaystyle =$	$\displaystyle P(X_2=W\vert X_1=D, X_0=W)$
	$\displaystyle \qquad \times P(X_1=D\vert X_0=W)$
	$\displaystyle \quad + P(X_2=W\vert X_1=W ,X_0=W)$
	$\displaystyle \qquad \times P(X_1=W\vert X_0=W)$
$\displaystyle =$	$\displaystyle P(X_2=W \vert X_1=D)$
	$\displaystyle \qquad \times P(X_1=D\vert X_0=W)$
	$\displaystyle \quad + P(X_2=W\vert X_1=W)$
	$\displaystyle \qquad \times P(X_1=W\vert X_0=W)$
$\displaystyle =$	$\displaystyle P_{W,D} P_{D,W} +P_{W,W} P_{W,W}$
$\displaystyle =$	$\displaystyle \left(\frac{1}{5}\right)\left(\frac{2}{5}\right) + \left(\frac{4}{5}\right)\left(\frac{4}{5}\right)$

Notice that all the entries in the last line are items in ${\bf P}$ . Look at the matrix product ${\bf P}{\bf P}$ :

$\displaystyle \left[\begin{array}{ll} \frac{3}{5} & \frac{2}{5} \\ \\ \frac... ...25} & \frac{14}{25} \\ \\ \frac{7}{25} & \frac{18}{25} \end{array} \right]$

Notice that $P(X_2=W\vert X_0=W)$ is exactly the

entry in ${\bf P}{\bf P}$ .

General case. Define

$\displaystyle P^{(n)}_{i,j} =P(X_n=j\vert X_0=i)$

Then

$\displaystyle P(X_{m+n}=j$	$\displaystyle \vert X_m=i, X_{m-1}=i_{m-1},\ldots)$
	$\displaystyle = P(X_{m+n}=j\vert X_m=i)$
	$\displaystyle =P(X_n=j\vert X_0=i)$
	$\displaystyle = P^{(n)}_{i,j}$
	$\displaystyle = \left({\bf P}^n\right)_{ij}$

Proof of these assertions by induction on .

Example for . Two bits to do:

First suppose are discrete variables. Assume

$\displaystyle P(Y=y\vert X=x,$	$\displaystyle U=u,V=v)$
	$\displaystyle = P(Y=y\vert X=x)$

for any

. Then I claim

$\displaystyle P(Y=y$	$\displaystyle \vert X=x,U=u)$
	$\displaystyle = P(Y=y\vert X=x)$

In words, if knowing both

and

doesn't change the conditional probability then knowing

alone doesn't change the conditional probability.

Proof of claim:

$\displaystyle A=\{X=x,U=u\}$

Then

$\displaystyle P(Y=y$	$\displaystyle \vert X=x,U=u)$
	$\displaystyle = \frac{P(Y=y,A)}{P(A)}$
	$\displaystyle = \frac{\sum_v P(Y=y,A,V=v)}{P(A)}$
	$\displaystyle = \frac{\sum_v P(Y=y\vert A,V=v)P(A,V=v)}{P(A)}$
	$\displaystyle = \frac{\sum_v P(Y=y\vert X=x)P(A,V=v)}{P(A)}$
	$\displaystyle = \frac{ P(Y=y\vert X=x)\sum_vP(A,V=v)}{P(A)}$
	$\displaystyle = \frac{ P(Y=y\vert X=x)P(A)}{P(A)}$
	$\displaystyle = P(Y=y\vert X=x)$

Second step: consider

$\displaystyle P(X_{n+2}$	$\displaystyle =k \vert X_n=i)$
$\displaystyle =$	$\displaystyle \sum_j P(X_{n+2}=k,X_{n+1}=j\vert X_n=i)$
$\displaystyle =$	$\displaystyle \sum_j P(X_{n+2}=k\vert X_{n+1}=j,X_n=i)$
	$\displaystyle \qquad \times P(X_{n+1}=j\vert X_n=i)$
$\displaystyle =$	$\displaystyle \sum_j P(X_{n+2}=k\vert X_{n+1}=j)$
	$\displaystyle \qquad \times P(X_{n+1}=j\vert X_n=i)$
$\displaystyle =$	$\displaystyle \sum_j {\bf P}_{i,j}{\bf P}_{j,k}$

This shows that

$\displaystyle P(X_{n+2}=k\vert X_n=i) = ({\bf P}^2)_{i,k}$

where ${\bf P}^2$ means the matrix product ${\bf P}{\bf P}$ . Notice both that the quantity does not depend on

and that we can compute it by taking a power of ${\bf P}$ .

More general version

$\displaystyle P(X_{n+m}=k\vert X_n=j) = ({\bf P}^m)_{j,k}$

Since ${\bf P}^n{\bf P}^m={\bf P}^{n+m}$ we get the Chapman-Kolmogorov equations:

$\begin{multline*} P(X_{n+m}=k\vert X_0=i) = \\ \sum_j P(X_{n+m}=k\vert X_n=j)P(X_n=j\vert X_0=i) \end{multline*}$

Summary: A Markov Chain has stationary step transition probabilities which are the th power of the 1 step transition probabilities.

Here is Maple output for the 1,2,4,8 and 16 step transition matrices for our rainfall example:


> p:= matrix(2,2,[[3/5,2/5],[1/5,4/5]]);
                      [3/5    2/5]
                 p := [          ]
                      [1/5    4/5]
> p2:=evalm(p*p):
> p4:=evalm(p2*p2):
> p8:=evalm(p4*p4):
> p16:=evalm(p8*p8):

This computes the powers ( evalm understands matrix algebra).

Fact:

$\begin{displaymath} \lim_{n\to\infty} {\bf P}^n = \left[ \begin{array}{cc} \frac... ...{2}{3} \\ \\ \frac{1}{3} & \frac{2}{3} \end{array}\right] \end{displaymath}$


> evalf(evalm(p));
            [.6000000000    .4000000000]
            [                          ]
            [.2000000000    .8000000000]
> evalf(evalm(p2));
            [.4400000000    .5600000000]
            [                          ]
            [.2800000000    .7200000000]
> evalf(evalm(p4));
            [.3504000000    .6496000000]
            [                          ]
            [.3248000000    .6752000000]
> evalf(evalm(p8));
            [.3337702400    .6662297600]
            [                          ]
            [.3331148800    .6668851200]
> evalf(evalm(p16));
            [.3333336197    .6666663803]
            [                          ]
            [.3333331902    .6666668098]

Where did

and

come from?

Suppose we toss a coin $P(H)=\alpha_D$ and start the chain with Dry if we get heads and Wet if we get tails.

Then

$\displaystyle P(X_0=x) = \begin{cases}\alpha_D & x=\text{Dry} \\ \alpha_W=1-\alpha_D & x=\text{Wet} \end{cases}$

and

$\displaystyle P(X_1=x)$	$\displaystyle = \sum_y P(X_1=x\vert X_0=y)P(X_0=y)$
	$\displaystyle = \sum_y \alpha_y P_{y,x}$

Notice last line is a matrix multiplication of row vector $\alpha$ by matrix ${\bf P}$ . A special $\alpha$ : if we put $\alpha_D=1/3$ and $\alpha_W=2/3$ then

$\displaystyle \left[ \frac{1}{3} \quad \frac{2}{3} \right] \left[\begin{array}{... ...frac{4}{5} \end{array} \right] = \left[ \frac{1}{3} \quad \frac{2}{3} \right]$

In other words if we start off

then

and analogously for

. This means that

and

have the same distribution.

A probability vector $\alpha$ is called an initial distribution for the chain if

$\displaystyle P(X_0=i) = \alpha_i$

A Markov Chain is stationary if

$\displaystyle P(X_1=i) = P(X_0=i)$

for all

An initial distribution is called stationary if the chain is stationary. We find that $\alpha$ is a stationary initial distribution if

$\displaystyle \alpha {\bf P}=\alpha$

Suppose ${\bf P}^n$ converges to some matrix ${\bf P}^\infty$ . Notice that

$\displaystyle \lim_{n\to\infty} {\bf P}^{n-1} = {\bf P}^\infty$

and

$\displaystyle {\bf P}^\infty$	$\displaystyle = \lim {\bf P}^n$
	$\displaystyle = \left[\lim {\bf P}^{n-1}\right] {\bf P}$
	$\displaystyle = {\bf P}^\infty {\bf P}$

This proves that each row $\alpha$ of ${\bf P}^\infty$ satisfies

$\displaystyle \alpha = \alpha {\bf P}$

Def'n: A row vector is a left eigenvector of with eigenvalue $\lambda$ if

$\displaystyle xA=\lambda x$

So each row of ${\bf P}^\infty$ is a left eigenvector of ${\bf P}$ with eigenvalue .

$next$ $up$ $previous$

Richard Lockhart
2002-02-07