STAT 380: Introduction to Stochastic Processes

Course outline:

Today's lecture Summary
Some Basic Examples

Example 1: Three cards: one red on both sides, one black on both sides, one black on one side, red on the other. Shuffle, pick card at random. Side up is Black. What is the probability the side down is Black?

Solution: To do this carefully, enumerate sample space, $ \Omega$, of all possible outcomes. Six sides to the three cards. Label three red sides 1, 2, 3 with sides 1, 2 on the all red card (card # 1). Label three black sides 4, 5, 6 with 3, 4 on opposite sides of mixed card (card #2). Define some events:

$\displaystyle A_i$ $\displaystyle = \{$side $ i$ shows face up$\displaystyle \}$    
$\displaystyle B$ $\displaystyle = \{$side showing is black$\displaystyle \}$    
$\displaystyle C_j$ $\displaystyle = \{$card $ j$ is chosen$\displaystyle \}$    

One representation $ \Omega=\{1,2,3,4,5,6\}$. Then $ A_i = \{i\}$, $ B=\{4,5,6\}$, $ C_1=\{1,2\}$ and so on.

Modelling: assumptions about some probabilities; deduce probabilities of other events. In example simplest model is

All of the $ A_i$ are equally likely.

Apply two rules:

$\displaystyle P(\cup_1^6 A_i) = \sum_1^6 P(A_i)$   and$\displaystyle \quad
P(\Omega) = 1
$

to get, for $ i=1,\ldots,6$,

$\displaystyle P(A_i) = \frac{1}{6}
$

Question was about down side of card. We have been told $ B$ has happened. Event that a black side is down is $ D=\{3,5,6\}$. (Of course $ B$ has happened rules out 3.)

Definition of conditional probability:

$\displaystyle P(D\vert B) = \frac{P(D\cap B)}{P(B)} = \frac{P(\{5,6\})}{P(\{4,5,6\})} =
\frac{2}{3}
$

Example 2: Monte Hall, Let's Make a Deal. Monte shows you 3 doors. Prize hidden behind one. You pick a door. Monte opens a door you didn't pick; shows you no prize; offers to let you switch to the third door. Do you switch?

Sample space: typical element is $ (a,b,c)$ where $ a$ is number of door with prize, $ b$ is number of your first pick and $ c$ is door Monte opens with no prize.

(1,1,2) (1,1,3) (1,2,3) (1,3,2)
(2,1,3) (2,2,1) (2,2,3) (2,3,1)
(3,1,2) (3,2,1) (3,3,1) (3,3,2)

Model? Traditionally we define events like

$\displaystyle A_i$ $\displaystyle = \{$Prize behind door $ i$$\displaystyle \}$    
$\displaystyle B_j$ $\displaystyle = \{$You choose door $ j$$\displaystyle \}$    

and assume that each $ A_i$ has chance $ 1/3$. We are assuming we have no prior reason to suppose Monte favours one door. But these and all other probabilities depend on the behaviour of people so are open to discussion.

The event $ LS$, that you lose if you switch is

$\displaystyle (A_1\cap B_1) \cup (A_2\cap B_2) \cup (A_3\cap B_3)
$

The natural modelling assumption, which captures the idea that you have no idea where the prize is hidden, is that each $ A_i$ is independent of each $ B_j$, that is,

$\displaystyle P(A_i \cap B_j) = P(A_i)P(B_j)
$

Usually we would phrase this assumption in terms of two random variables, $ M$, the door with the prize, and $ C$ the door you choose. We are assuming that $ M$ and $ C$ are independent. Then

$\displaystyle P(LS) =$ $\displaystyle P(A_1\cap B_1) + P(A_2\cap B_2)$    
  $\displaystyle \qquad + P(A_3\cap B_3)$    
$\displaystyle =$ $\displaystyle P(A_1)P(B_1) + P(A_2)P(B_2)$    
  $\displaystyle \quad +P(A_3)P(B_3)$    
$\displaystyle =$ $\displaystyle \frac{1}{3}\left\{P(B_1) + P(B_2) +P(B_3)\right\}$    
$\displaystyle =$ $\displaystyle \frac{1}{3}$    

So the event you win by switching has probability 2/3 and you should switch.

Usual phrasing of problem. You pick 1, Monte shows 3. Should you take 2? Let $ S$ be rv $ S =$ door Monte shows you. Question:

$\displaystyle P(M=1\vert C=1, S=3)
$

Modelling assumptions do not determine this; it depends on Monte's method for choosing door to show when he has a choice. Two simple cases:

  1. Monte picks at random so

    $\displaystyle P(S=3\vert M=1,C=1) = 1/2
$

  2. Monte chooses the door with the largest possible number:

    $\displaystyle P(S=3\vert M=1,C=1) = 1
$

Use Bayes' rule:

\begin{multline*}
P(M=1\vert C=1, S=3)
\\
= \frac{P(M=1,C=1, S=3)}{P(C=1,S=3)}
\end{multline*}

Numerator is

\begin{multline*}
P(S=3\vert M=1,C=1) P(M=1,C=1)
\\
= P(S=3\vert M=1,C=1)P(C=1)/3
\end{multline*}

Denominator is

\begin{multline*}
P(S=3\vert M=1,C=1) P(M=1,C=1)
\\
+
P(S=3\vert M=2,C=1) P(M=2,C=1)
\\
+P(S=3\vert M=3,C=1) P(M=3,C=1)
\end{multline*}

which simplifies to

\begin{multline*}
P(S=3\vert M=1,C=1)P(M=1)P(C=1)
\\
+1 \cdot P(M=2)P(C=1)
\\
+0 \cdot P(M=3)P(C=1)
\end{multline*}

which in turn is

$\displaystyle \left\{P(S=3\vert M=1,C=1)+1\right\}P(C=1)/3
$

For case 1 we get

$\displaystyle P(M=1\vert C=1, S=3) = \frac{1/2}{1/2 + 1} = \frac{1}{3}
$

while for case 2 we get

$\displaystyle P(M=1\vert C=1, S=3) = \frac{1}{1+1} =\frac{1}{2}
$

Notice that in case 2 if we pick door 1 and Monte shows us door 2 we should definitely switch. Notice also that it would be normal to assume that Monte used the case 1 algorithm to pick the door to show when he has a choice; otherwise he is giving the contestant information. If the contestant knows Monte is using algorithm 2 then by switching if door 2 is shown and not if door 3 is shown he wins 2/3 of the time which is as good as the always switch strategy.

Example 3: Survival of family names. Traditionally: family name follows sons. Given man at end of 20th century. Probability descendant (male) with same last name alive at end of 21st century? or end of 30th century?

Simplified model: count generations not years. Compute probability, of survival of name for $ n$ generations.

Technically easier to compute $ q_n$, probability of extinction by generation $ n$.

Useful rvs:

$\displaystyle X=$   # of male children of first man$\displaystyle $

$\displaystyle Z_k =$   # of male children in generation $k$$\displaystyle $

Event of interest:

$\displaystyle E_n = \{ Z_n=0\}
$

Break up $ E_n$:

$\displaystyle q_n=P(E_n) = \sum_{k=0}^\infty P(E_n\cap \{ X=k\})
$

Now look at the event $ E_n\cap \{ X=k\}$. Let

$\displaystyle B_{j,n-1} =$ $\displaystyle \{ X=k\}\cap \{$child $ j$s line extinct    
      in $ n-1$ generations$\displaystyle \}$    

Then

$\displaystyle E_n\cap \{ X=k\} =\cap_{j=1}^k B_{j,n-1}
$

Now add modelling assumptions:
  1. Given ( conditional on) $ X=k$ the events $ B_{j,n-1}$ are independent. In other words: one son's descendants don't affect other sons' descendants.

  2. Given $ X=k$ the probability of $ B_{j,n-1}$ is $ q_{n-1}$. In other words: sons are just like the parent.

Now add notation $ P(X=k) = p_k$.

$\displaystyle q_n$ $\displaystyle = \sum_{k=0}^\infty P(E_n\cap \{ X=k\})$    
  $\displaystyle = \sum_{k=0}^\infty P( \cap_{j=1}^k B_{j,n-1}\vert X=k) p_k$    
  $\displaystyle = \sum_{k=0}^\infty \prod_1^k P(B_{j,n-1}\vert X=k) p_k$    
  $\displaystyle = \sum_{k=0}^\infty (q_{n-1})^k p_k$    

Probability generating function:

$\displaystyle \phi(s) = \sum_{k=0}^\infty s^k p_k = {\rm E}(s^X)
$

We have found

$\displaystyle q_1 = p_0
$

and

$\displaystyle q_n = \phi(q_{n-1})
$

Notice that $ q_1 \le q_2 \le \cdots$ so that the limit of the $ q_n$, say $ q_\infty$, must exist and (because $ \phi$ is continuous) solve

$\displaystyle q_\infty = \phi(q_\infty)
$

Special cases

Geometric Distribution: Assume

$\displaystyle P(X=k) = (1-\theta)^k \theta \qquad k=0,1,2,\ldots
$

($ X$ is number of failures before first success. Trials are Bernoulli; $ \theta$ is probability of success.)

Then

$\displaystyle \phi(s)$ $\displaystyle = \sum_0^\infty s^k (1-\theta)^k \theta$    
  $\displaystyle = \theta \sum_0^\infty \left[s(1-\theta)\right]^k$    
  $\displaystyle = \frac{\theta}{1-s(1-\theta)}$    

Set $ \phi(s) = s$ to get

$\displaystyle s[1-s(1-\theta)]=\theta
$

Two roots are

$\displaystyle \frac{1 \pm \sqrt{1-4\theta(1-\theta)}}{2(1-\theta)} =
\frac{1 \pm (1-2\theta)}{2(1-\theta)}
$

One of the roots is 1; the other is

$\displaystyle \frac{\theta}{1-\theta}
$

If $ \theta \ge 1/2$ the only root which is a probability is 1 and $ q_\infty=1$. If $ \theta < 1/2$ then in fact $ q_n \to q_\infty = \theta/(1-\theta)$.

Binomial($ m,\theta$): If

$\displaystyle P(X=k) = \binom{m}{k} \theta^k(1-\theta)^{m-k} \quad k=0,\ldots, m
$

then

$\displaystyle \phi(s)$ $\displaystyle = \sum_0^m \binom{m}{k} (s\theta)^k(1-\theta)^{m-k}$    
  $\displaystyle = (1-\theta+s\theta)^m$    

The equation $ \phi(s) = s$ has two roots. One is 1. The other is less than 1 if and only if $ m\theta={\rm E}(X) > 1$.

Poisson($ \lambda$): Now

$\displaystyle P(X=k) = e^{-\lambda} \lambda^k/k! \quad k=0,1,\ldots
$

and

$\displaystyle \phi(s) = e^{\lambda(s-1)}
$

The equation $ \phi(s) = s$ has two roots. One is 1. The other is less than 1 if and only if $ \lambda = {\rm E}(X) > 1$.

Important Points:

Example 3: Mean values

$ Z_n$ = total number of sons in generation $ n$.

$ Z_0=1$ for convenience.

Compute $ {\rm E}(Z_n)$.

Recall definition of expected value:

If $ X$ is discrete then

$\displaystyle {\rm E}(X) = \sum_x x P(X=x)
$

If $ X$ is absolutely continuous then

$\displaystyle {\rm E}(X) = \int_{-\infty}^\infty x f(x) dx
$

Theorem: If $ Y=g(X)$, $ X$ has density $ f$ then

$\displaystyle {\rm E}(Y) = {\rm E}(g(X)) =\int g(x) f(x) dx
$

Key properties of $ {\rm E}$:

1: If $ X\ge 0$ then $ {\rm E}(X) \ge 0$. Equals iff $ P(X=0)=1$.

2: $ {\rm E}(aX+bY) = a{\rm E}(X) +b{\rm E}(Y)$.

3: If $ 0 \le X_1 \le X_2 \le \cdots$ then

$\displaystyle {\rm E}(\lim X_n) = \lim {\rm E}(X_n)
$

4: $ {\rm E}(1) = 1$.

Conditional Expectations

If $ X$, $ Y$, two discrete random variables then

$\displaystyle {\rm E}(Y\vert X=x) = \sum_y y P(Y=y\vert X=x)
$

Extension to absolutely continuous case:

Joint pmf of $ X$ and $ Y$ is defined as

$\displaystyle p(x,y) = P(X=x,Y=y)
$

Notice: The pmf of $ X$ is

$\displaystyle p_X(x) = \sum_y p(x,y)
$

Analogue for densities: joint density of $ X,Y$ is

$\displaystyle f(x,y) dx dy \approx P(x \le X \le x+dx, y \le Y \le y+dy)
$

Interpretation is that

$\displaystyle P(X \in A, Y \in B) = \int_A \int_B f(x,y) dy dx
$

Property: if $ X,Y$ have joint density $ f(x,y)$ then $ X$ has density

$\displaystyle f_X(x) = \int_y f(x,y) dy
$

Sums for discrete rvs are replaced by integrals.

Example:

$\displaystyle f(x,y) = \begin{cases}x+y & 0 \le x,y \le 1
\\
0 & \text{otherwise}
\end{cases}$

is a density because

$\displaystyle \iint f(x,y)dx dy$ $\displaystyle = \int_0^1\int_0^1 (x+y) dy dx$    
  $\displaystyle = \int_0^1 x dx + \int_0^1 y dy$    
  $\displaystyle = \frac{1}{2} + \frac{1}{2} = 1$    

The marginal density of $ X$ is, for $ 0 \le x \le 1$.

$\displaystyle f_X(x)$ $\displaystyle = \int_{-\infty}^\infty f(x,y)dy$    
  $\displaystyle = \int_0^1 (x+y) dy$    
  $\displaystyle = \left.(xy+y^2/2)\right\vert _0^1 = x+\frac{1}{2}$    

For $ x$ not in $ [0,1]$ the integral is 0 so

$\displaystyle f_X(x) = \begin{cases}x+\frac{1}{2} & 0 \le x \le 1
\\
0 & \text{otherwise}
\end{cases}$

Conditional Densities

If $ X$ and $ Y$ have joint density $ f_{X,Y}(x,y)$ then we define the conditional density of $ Y$ given $ X=x$ by analogy with our interpretation of densities. We take limits:

\begin{multline*}
f_{Y\vert X}(y\vert x)dy \\
\approx \frac{ P(x \le X \le x+dx, y \le Y \le y+dy)}{P(x
\le X \le x+dx)}
\end{multline*}

in the sense that if we divide through by $ dy$ and let $ dx$ and $ dy$ tend to 0 the conditional density is the limit

$\displaystyle \frac{\lim_{dx, dy \to 0} \frac{ P(x \le X \le
x+dx, y \le Y \le y+dy)}{(dx\,dy)}}{
\lim_{dx\to 0} \frac{P(x \le X \le x+dx)}{dx}}
$

Going back to our interpretation of joint densities and ordinary densities we see that our definition is just

$\displaystyle f_{Y\vert X}(y\vert x) = \frac{f_{X,Y}(x,y)}{f_X(x)}
$

When talking about a pair $ X$ and $ Y$ of random variables we refer to $ f_{X,Y}$ as the joint density and to $ f_X$ as the marginal density of $ X$.

Example: For $ f$ of previous example conditional density of $ Y$ given $ X=x$ defined only for $ 0 \le x \le 1$:

$\displaystyle f_{Y\vert X}(y\vert x) = \begin{cases}
\frac{x+y}{x+\frac{1}{2}} ...
...y>1
\\
0 & 0 \le x \le 1, y < 0
\\
\text{undefined} & otherwise
\end{cases}$

Example: $ X$ a Poisson$ (\lambda)$ random variable. Observe $ X$ then toss a coin $ X$ times. $ Y$ is number of heads. $ P(H) = p$

$\displaystyle f_Y(y)$ $\displaystyle = \sum_x f_{X,Y}(x,y)$    
  $\displaystyle = \sum_x f_{Y\vert X}(y\vert x) f_X(x)$    
  $\displaystyle = \sum _{x=0}^\infty \dbinom{x}{y} p^y(1-p)^{x-y} \times \frac{\lambda^x}{x!} e^{-\lambda}$    

WARNING: in sum $ 0 \le y \le x$ is required and $ x$, $ y$ integers so sum really runs from $ y$ to $ \infty$

$\displaystyle f_Y(y)$ $\displaystyle = \frac{(p\lambda)^ye^{-\lambda}}{y!} \sum_{x=y}^\infty \frac{\left[(1-p)\lambda\right]^{x-y}}{(x-y)!}$    
  $\displaystyle = \frac{(p\lambda)^ye^{-\lambda}}{y!}\sum_0^\infty \frac{\left[(1-p)\lambda\right]^{k}}{k!}$    
  $\displaystyle = \frac{(p\lambda)^ye^{-\lambda}}{y!}e^{(1-p)\lambda}$    
  $\displaystyle = e^{-p\lambda} (p\lambda)^y/y!$    

which is a Poisson($ p\lambda$) distribution.

Conditional Expectations

If $ X$ and $ Y$ are continuous random variables with joint density $ f_{X,Y}$ we define:

$\displaystyle E(Y\vert X=x) = \int y f_{Y\vert X}(y\vert x) dy
$

Key properties of conditional expectation

1: If $ Y\ge 0$ then $ {\rm E}(Y\vert X=x) \ge 0$. Equals iff $ P(Y=0\vert X=x)=1$.

2: $ {\rm E}(A(X)Y+B(X)Z\vert X=x) = A(x){\rm E}(Y\vert X=x) +B(x){\rm E}(Z\vert X=x)$.

3: If $ Y$ and $ X$ are independent then

$\displaystyle {\rm E}(Y\vert X=x) = {\rm E}(Y)
$

4: $ {\rm E}(1\vert X=x) = 1$.

Example:

$\displaystyle f(x,y) = \begin{cases}x+y & 0 \le x,y \le 1
\\
0 & \text{otherwise}
\end{cases}$

has conditional of $ Y\vert X$:

$\displaystyle f_{Y\vert X}(y\vert x) = \begin{cases}
\frac{x+y}{x+\frac{1}{2}} ...
...y>1
\\
0 & 0 \le x \le 1, y < 0
\\
\text{undefined} & otherwise
\end{cases}$

so, for $ 0 \le x \le 1$,

$\displaystyle {\rm E}(Y\vert X=x)$ $\displaystyle = \int_0^1 y \frac{x+y}{x+\frac{1}{2}} dy$    
  $\displaystyle = \frac{x/2 +1/3}{x+1/2}$    

Computing expectations by conditioning:

Notation: $ {\rm E}(Y\vert X)$ is the function of $ X$ you get by working out $ {\rm E}(Y\vert X=x)$, getting a formula in $ x$ and replacing $ x$ by $ X$. This makes $ {\rm E}(Y\vert X)$ a random variable.

Properties:

1: $ {\rm E}(A(X)Y+B(X)Z\vert X) = A(X){\rm E}(Y\vert X) +B(X){\rm E}(Z\vert X)$.

2: If $ Y$ and $ X$ are independent then

$\displaystyle {\rm E}(Y\vert X) = {\rm E}(Y)
$

3: $ {\rm E}(1\vert X) = 1$.

4: $ {\rm E}\left[{\rm E}(Y\vert X)\right] = {\rm E}(Y) $ (compute average holding $ X$ fixed first, then average over $ X$).

In example:

$\displaystyle {\rm E}(Y\vert X) = \frac{X+2/3}{2X+1}
$

Application to last names problem. Put $ m={\rm E}(X)$

$\displaystyle {\rm E}(Z_n)$ $\displaystyle = {\rm E}\left[{\rm E}(Z_n\vert X)\right]$    
  $\displaystyle = {\rm E}\left[ X{\rm E}(Z_{n-1})\right]$    
  $\displaystyle = {\rm E}(X){\rm E}(Z_{n-1})$    
  $\displaystyle = m {\rm E}(Z_{n-1})$    
  $\displaystyle = m^2 {\rm E}(Z_{n-2})$    
  $\displaystyle \quad \vdots$    
  $\displaystyle = m^{n-1}{\rm E}(Z_1)$    
  $\displaystyle = m^n$    

For $ m < 1$ expect exponential decay. For $ m>1$ exponential growth (if not extinction).

Summary of Probability Review

We have reviewed the following definitions:



Tactics:

Tactics for expected values:

Markov Chains

Last names example has following structure: if, at generation $ n$ there are $ m$ individuals then number of sons in next generation has distribution of sum of $ m$ independent copies of the random variable $ X$ which is number of sons I have. This distribution does not depend on $ n$, only on value $ m$ of $ Z_n$. Call $ Z_n$ a Markov Chain.

Ingredients of a Markov Chain:

The stochastic process $ X_0,X_1,\ldots$ is called a Markov chain if

$\displaystyle P\left(X_{k+1} =j\vert X_k=i,A\right)
=
P_{i,j}
$

for any $ A$ defined in terms of $ X_0,\ldots,X_{k-1}$ and for all $ i,j,k$. Usually used with

$\displaystyle A=\{X_{k-1}=i_{k-1},\ldots,X_0=i_0\}
$

for some $ i_0,\ldots,i_{k-1}$.

The matrix $ {\bf P}$ is called a transition matrix.

Example: If $ X$ in the last names example has a Poisson$ (\lambda)$ distribution then given $ Z_n=k$, $ Z_{n+1}$ is like sum of $ k$ independent Poisson$ (\lambda)$ rvs which has a Poisson($ k\lambda$) distribution. So

$\displaystyle {\bf P}= \left[\begin{array}{llll}
1 & 0 & 0 & \cdots
\\
e^{-\l...
...\lambda} &
\cdots
\\
\vdots &
\vdots &
\vdots &
\ddots
\end{array}\right]
$

Example: Weather: each day is dry (D) or wet (W).

$ X_n$ is weather on day n.

Suppose dry days tend to be followed by dry days, say 3 times in 5 and wet days by wet 4 times in 5.

Markov assumption: yesterday's weather irrelevant to prediction of tomorrow's given today's.

Transition Matrix:

$\displaystyle {\bf P}= \left[\begin{array}{cc} \frac{3}{5} & \frac{2}{5}
\\
\\
\frac{1}{5} & \frac{4}{5}
\end{array} \right]
$

Now suppose it's wet today. P(wet in 2 days)?

$\displaystyle P(X_2=W\vert X_0=W) \hspace*{230pt}
$

$\displaystyle =$ $\displaystyle P(X_2=W,X_1=D \vert X_0=W)$    
  $\displaystyle \quad + P(X_2=W,X_1=W \vert X_0=W)$    
$\displaystyle =$ $\displaystyle P(X_2=W\vert X_1=D, X_0=W)$    
  $\displaystyle \qquad \times P(X_1=D\vert X_0=W)$    
  $\displaystyle \quad + P(X_2=W\vert X_1=W ,X_0=W)$    
  $\displaystyle \qquad \times P(X_1=W\vert X_0=W)$    
$\displaystyle =$ $\displaystyle P(X_2=W \vert X_1=D)$    
  $\displaystyle \qquad \times P(X_1=D\vert X_0=W)$    
  $\displaystyle \quad + P(X_2=W\vert X_1=W)$    
  $\displaystyle \qquad \times P(X_1=W\vert X_0=W)$    
$\displaystyle =$ $\displaystyle P_{W,D} P_{D,W} +P_{W,W} P_{W,W}$    
$\displaystyle =$ $\displaystyle \left(\frac{1}{5}\right)\left(\frac{2}{5}\right) + \left(\frac{4}{5}\right)\left(\frac{4}{5}\right)$    

Notice that all the entries in the last line are items in $ {\bf P}$. Look at the matrix product $ {\bf P}{\bf P}$:

$\displaystyle \left[\begin{array}{ll} \frac{3}{5} & \frac{2}{5}
\\
\\
\frac...
...} & \frac{14}{25}
\\
\\
\frac{7}{25} & \frac{18}{25}
\end{array} \right]
=
$

Notice that $ P(X_2=W\vert X_0=W)$ is exactly the $ W,W$ entry in $ {\bf P}{\bf P}$.

General case. Define

$\displaystyle P^{(n)}_{i,j} =P(X_n=j\vert X_0=i)
$

Then

$\displaystyle P(X_{m+n}=j$ $\displaystyle \vert X_m=i, X_{m-1}=i_{m-1},\ldots)$    
  $\displaystyle = P(X_{m+n}=j\vert X_m=i)$    
  $\displaystyle =P(X_n=j\vert X_0=i)$    
  $\displaystyle = P^{(n)}_{i,j}$    
  $\displaystyle = \left({\bf P}^n\right)_{i,j}.$    

Proof of these assertions by induction on $ m,n$.

Example for $ n=2$. Two bits to do:

First suppose $ U,V,X,Y$ are discrete variables. Assume

$\displaystyle P(Y=y\vert X=x,$ $\displaystyle U=u,V=v)$    
  $\displaystyle = P(Y=y\vert X=x)$    

for any $ x,y,u,v$. Then I claim

$\displaystyle P(Y=y$ $\displaystyle \vert X=x,U=u)$    
  $\displaystyle = P(Y=y\vert X=x)$    

In words, if knowing both $ U$ and $ V$ doesn't change the conditional probability then knowing $ U$ alone doesn't change the conditional probability.

Proof of claim:

$\displaystyle A=\{X=x,U=u\}
$

Then

$\displaystyle P(Y=y$ $\displaystyle \vert X=x,U=u)$    
  $\displaystyle = \frac{P(Y=y,A)}{P(A)}$    
  $\displaystyle = \frac{\sum_v P(Y=y,A,V=v)}{P(A)}$    
  $\displaystyle = \frac{\sum_v P(Y=y\vert A,V=v)P(A,V=v)}{P(A)}$    
  $\displaystyle = \frac{\sum_v P(Y=y\vert X=x)P(A,V=v)}{P(A)}$    
  $\displaystyle = \frac{ P(Y=y\vert X=x)\sum_vP(A,V=v)}{P(A)}$    
  $\displaystyle = \frac{ P(Y=y\vert X=x)P(A)}{P(A)}$    
  $\displaystyle = P(Y=y\vert X=x)$    

Second step: consider

$\displaystyle P(X_{n+2}$ $\displaystyle =k \vert X_n=i)$    
$\displaystyle =$ $\displaystyle \sum_j P(X_{n+2}=k,X_{n+1}=j\vert X_n=i)$    
$\displaystyle =$ $\displaystyle \sum_j P(X_{n+2}=k\vert X_{n+1}=j,X_n=i)$    
  $\displaystyle \qquad \times P(X_{n+1}=j\vert X_n=i)$    
$\displaystyle =$ $\displaystyle \sum_j P(X_{n+2}=k\vert X_{n+1}=j)$    
  $\displaystyle \qquad \times P(X_{n+1}=j\vert X_n=i)$    
$\displaystyle =$ $\displaystyle \sum_j {\bf P}_{i,j}{\bf P}_{j,k}$    

This shows that

$\displaystyle P(X_{n+2}=k\vert X_n=i) = ({\bf P}^2)_{i,k}
$

where $ {\bf P}^2$ means the matrix product $ {\bf P}{\bf P}$. Notice both that the quantity does not depend on $ n$ and that we can compute it by taking a power of $ {\bf P}$.

More general version

$\displaystyle P(X_{n+m}=k\vert X_n=j) = ({\bf P}^m)_{j,k}
$

Since $ {\bf P}^n{\bf P}^m={\bf P}^{n+m}$ we get the Chapman-Kolmogorov equations:

\begin{multline*}
P(X_{n+m}=k\vert X_0=i) =
\\
\sum_j P(X_{n+m}=k\vert X_n=j)P(X_n=j\vert X_0=i)
\end{multline*}

Summary: A Markov Chain has stationary $ n$ step transition probabilities which are the $ n$th power of the 1 step transition probabilities.

Here is Maple output for the 1,2,4,8 and 16 step transition matrices for our rainfall example:


> p:= matrix(2,2,[[3/5,2/5],[1/5,4/5]]);
                      [3/5    2/5]
                 p := [          ]
                      [1/5    4/5]
> p2:=evalm(p*p):
> p4:=evalm(p2*p2):
> p8:=evalm(p4*p4):
> p16:=evalm(p8*p8):
This computes the powers ( evalm understands matrix algebra).

Fact:

\begin{displaymath}
\lim_{n\to\infty} {\bf P}^n = \left[
\begin{array}{cc} \frac...
...{2}{3}
\\
\\
\frac{1}{3} & \frac{2}{3}
\end{array}\right]
\end{displaymath}


> evalf(evalm(p));
            [.6000000000    .4000000000]
            [                          ]
            [.2000000000    .8000000000]
> evalf(evalm(p2));
            [.4400000000    .5600000000]
            [                          ]
            [.2800000000    .7200000000]
> evalf(evalm(p4));
            [.3504000000    .6496000000]
            [                          ]
            [.3248000000    .6752000000]
> evalf(evalm(p8));
            [.3337702400    .6662297600]
            [                          ]
            [.3331148800    .6668851200]
> evalf(evalm(p16));
            [.3333336197    .6666663803]
            [                          ]
            [.3333331902    .6666668098]
Where did $ 1/3$ and $ 2/3$ come from?

Suppose we toss a coin $ P(H)=\alpha_D$ and start the chain with Dry if we get heads and Wet if we get tails.

Then

$\displaystyle P(X_0=x) = \begin{cases}\alpha_D & x=\text{Dry}
\\
\alpha_W=1-\alpha_D & x=\text{Wet}
\end{cases}$

and

$\displaystyle P(X_1=x)$ $\displaystyle = \sum_y P(X_1=x\vert X_0=y)P(X_0=y)$    
  $\displaystyle = \sum_y \alpha_y P_{y,x}$    

Notice last line is a matrix multiplication of row vector $ \alpha$ by matrix $ {\bf P}$. A special $ \alpha$: if we put $ \alpha_D=1/3$ and $ \alpha_W=2/3$ then

$\displaystyle \left[ \frac{1}{3} \quad \frac{2}{3} \right]
\left[\begin{array}{...
...frac{4}{5}
\end{array} \right] = \left[ \frac{1}{3} \quad \frac{2}{3} \right]
$

In other words if we start off $ P(X_0=D)=1/3$ then $ P(X_1=D)=1/3$ and analogously for $ W$. This means that $ X_0$ and $ X_1$ have the same distribution.

A probability vector $ \alpha$ is called an initial distribution for the chain if

$\displaystyle P(X_0=i) = \alpha_i
$

A Markov Chain is stationary if

$\displaystyle P(X_1=i) = P(X_0=i)
$

for all $ i$

An initial distribution is called stationary if the chain is stationary. We find that $ \alpha$ is a stationary initial distribution if

$\displaystyle \alpha {\bf P}=\alpha
$

Suppose $ {\bf P}^n$ converges to some matrix $ {\bf P}^\infty$. Notice that

$\displaystyle \lim_{n\to\infty} {\bf P}^{n-1} = {\bf P}^\infty
$

and

$\displaystyle {\bf P}^\infty$ $\displaystyle = \lim {\bf P}^n$    
  $\displaystyle = \left[\lim {\bf P}^{n-1}\right] {\bf P}$    
  $\displaystyle = {\bf P}^\infty {\bf P}$    

This proves that each row $ \alpha$ of $ {\bf P}^\infty$ satisfies

$\displaystyle \alpha = \alpha {\bf P}
$

Def'n: A row vector $ x$ is a left eigenvector of $ A$ with eigenvalue $ \lambda$ if

$\displaystyle xA=\lambda x
$

So each row of $ {\bf P}^\infty$ is a left eigenvector of $ {\bf P}$ with eigenvalue $ 1$.

Finding stationary initial distributions. Consider the $ {\bf P}$ for the weather example. The equation

$\displaystyle \alpha {\bf P}=\alpha
$

is really

$\displaystyle \alpha_D$ $\displaystyle = 3\alpha_D/5 + \alpha_W/5$    
$\displaystyle \alpha_W$ $\displaystyle = 2\alpha_D/5 + 4\alpha_W/5$    

The first can be rearranged to

$\displaystyle \alpha_W =2\alpha_D
$

and so can the second. If $ \alpha$ is to be a probability vector then

$\displaystyle \alpha_W+\alpha_D=1
$

so we get

$\displaystyle 1-\alpha_D=2\alpha_D
$

leading to

$\displaystyle \alpha_D=1/3
$

Some more examples:

$\displaystyle {\bf P}= \left[\begin{array}{cccc}
0 & 1/3 & 0 & 2/3
\\
1/3&0 & 2/3 & 0
\\
0 & 2/3 & 0 & 1/3
\\
2/3 & 0 & 1/3 & 0
\end{array}\right]
$

Set $ \alpha{\bf P}=\alpha$ and get

$\displaystyle \alpha_1$ $\displaystyle = \alpha_2/3+2\alpha_4/3$    
$\displaystyle \alpha_2$ $\displaystyle = \alpha_1/3+2\alpha_3/3$    
$\displaystyle \alpha_3$ $\displaystyle = 2\alpha_2/3+\alpha_4/3$    
$\displaystyle \alpha_4$ $\displaystyle = 2\alpha_1/3+\alpha_3/3$    
$\displaystyle 1$ $\displaystyle = \alpha_1+\alpha_2+\alpha_3+\alpha_4$    

First plus third gives

$\displaystyle \alpha_1+\alpha_3=\alpha_2+\alpha_4$

so both sums 1/2. Continue algebra to get

$\displaystyle (1/4,1/4,1/4,1/4) \, .
$


p:=matrix([[0,1/3,0,2/3],[1/3,0,2/3,0],
          [0,2/3,0,1/3],[2/3,0,1/3,0]]);

               [ 0     1/3     0     2/3]
               [                        ]
               [1/3     0     2/3     0 ]
          p := [                        ]
               [ 0     2/3     0     1/3]
               [                        ]
               [2/3     0     1/3     0 ]
> p2:=evalm(p*p);
             [5/9     0     4/9     0 ]
             [                        ]
             [ 0     5/9     0     4/9]
        p2:= [                        ]
             [4/9     0     5/9     0 ]
             [                        ]
             [ 0     4/9     0     5/9]
> p4:=evalm(p2*p2):
> p8:=evalm(p4*p4):
> p16:=evalm(p8*p8):
> p17:=evalm(p8*p8*p):



> evalf(evalm(p16));
    [.5000000116 , 0 , .4999999884 , 0]
    [                                 ]
    [0 , .5000000116 , 0 , .4999999884]
    [                                 ]
    [.4999999884 , 0 , .5000000116 , 0]
    [                                 ]
    [0 , .4999999884 , 0 , .5000000116]
> evalf(evalm(p17));
    [0 , .4999999961 , 0 , .5000000039]
    [                                 ]
    [.4999999961 , 0 , .5000000039 , 0]
    [                                 ]
    [0 , .5000000039 , 0 , .4999999961]
    [                                 ]
    [.5000000039 , 0 , .4999999961 , 0]
> evalf(evalm((p16+p17)/2));
  [.2500, .2500, .2500, .2500]
  [                          ]
  [.2500, .2500, .2500, .2500]
  [                          ]
  [.2500, .2500, .2500, .2500]
  [                          ]
  [.2500, .2500, .2500, .2500]
$ {\bf P}^n$ doesn't converges but $ ({\bf P}^n+{\bf P}^{n+1})/2$ does. Next example:

$\displaystyle p = \left[\begin{array}{cccc}
\frac{2}{5} & \frac{3}{5}
& 0 &0
\\...
...ac{2}{5} & \frac{3}{5}
\\
0 &0 &
\frac{1}{5} & \frac{4}{5}\end{array}\right]
$

Solve $ \alpha{\bf P}=\alpha$:

$\displaystyle \alpha_1$ $\displaystyle = \frac{2}{5}\alpha_1+ \frac{1}{5} \alpha_2$    
$\displaystyle \alpha_2$ $\displaystyle = \frac{3}{5}\alpha_1 + \frac{4}{5}\alpha_2$    
$\displaystyle \alpha_3$ $\displaystyle = \frac{2}{5}\alpha_3+ \frac{1}{5} \alpha_4$    
$\displaystyle \alpha_4$ $\displaystyle = \frac{3}{5}\alpha_3 + \frac{4}{5}\alpha_4$    
$\displaystyle 1$ $\displaystyle = \alpha_1+\alpha_2+\alpha_3+\alpha_4$    

Second and fourth equations redundant. Get

$\displaystyle \alpha_2$ $\displaystyle =3\alpha_1$    
$\displaystyle 3\alpha_3$ $\displaystyle =\alpha_4$    
$\displaystyle 1$ $\displaystyle = 4\alpha_1+4\alpha_3$    

Pick $ \alpha_1$ in $ [0,1/4]$; put $ \alpha_3=1/4-\alpha_1$.

$\displaystyle \alpha = (\alpha_1,3\alpha_1,1/4-\alpha_1,3(1/4-\alpha_1))
$

solves $ \alpha{\bf P}=\alpha$. So solution is not unique.


> p:=matrix([[2/5,3/5,0,0],[1/5,4/5,0,0],
            [0,0,2/5,3/5],[0,0,1/5,4/5]]);

               [2/5    3/5     0      0 ]
               [                        ]
               [1/5    4/5     0      0 ]
          p := [                        ]
               [ 0      0     2/5    3/5]
               [                        ]
               [ 0      0     1/5    4/5]
> p2:=evalm(p*p):
> p4:=evalm(p2*p2):
> p8:=evalm(p4*p4):

> evalf(evalm(p8*p8));
        [.2500000000 , .7500000000 , 0 , 0]
        [                                 ]
        [.2500000000 , .7500000000 , 0 , 0]
        [                                 ]
        [0 , 0 , .2500000000 , .7500000000]
        [                                 ]
        [0 , 0 , .2500000000 , .7500000000]
Notice that rows converge but to two different vectors:

$\displaystyle \alpha^{(1)} = (1/4,3/4,0,0)
$

and

$\displaystyle \alpha^{(2)} = (0,0,1/4,3/4)
$

Solutions of $ \alpha{\bf P}=\alpha$ revisited? Check that

$\displaystyle \alpha^{(1)}{\bf P}=\alpha^{(1)}
$

and

$\displaystyle \alpha^{(2)}{\bf P}=\alpha^{(2)}
$

If $ \alpha=\lambda\alpha^{(1)}+(1-\lambda)\alpha^{(2)}$ ( $ 0 \le \lambda \le 1$) then

$\displaystyle \alpha {\bf P}=\alpha
$

so again solution is not unique. Last example:

> p:=matrix([[2/5,3/5,0],[1/5,4/5,0],
             [1/2,0,1/2]]);

                  [2/5    3/5     0 ]
                  [                 ] 
             p := [1/5    4/5     0 ]
                  [                 ]
                  [1/2     0     1/2]
> p2:=evalm(p*p):
> p4:=evalm(p2*p2):
> p8:=evalm(p4*p4):
> evalf(evalm(p8*p8));
  [.2500000000 .7500000000        0       ]
  [                                       ]
  [.2500000000 .7500000000        0       ]
  [                                       ]
  [.2500152588 .7499694824 .00001525878906]

Interpretation of examples

Basic distinguishing features: pattern of 0s in matrix $ {\bf P}$.

Classification of States

State $ i$ leads to state $ j$ if $ {\bf P}^n_{ij} > 0$ for some $ n$. It is convenient to agree that $ {\bf P}^0={\bf I}$ the identity matrix; thus $ i$ leads to $ i$.

Note $ i$ leads to $ j$ and $ j$ leads to $ k$ implies $ i$ leads to $ k$ (Chapman-Kolmogorov).

States $ i$ and $ j$ communicate if $ i$ leads to $ j$ and $ j$ leads to $ i$.

The relation of communication is an equivalence relation; it is reflexive, symmetric and transitive: if $ i$ and $ j$ communicate and $ j$ and $ k$ communicate then $ i$ and $ k$ communicate.

Example ($ +$ signs indicate non-zero entries):

$\displaystyle {\bf P}= \left[\begin{array}{ccccc}
0& 1 & 0 & 0 &0
\\
0 & 0 & ...
...+ & + & 0 & 0
\\
+ & 0 & 0 & + & 0
\\
0 & + & 0 & + & +
\end{array}\right]
$

For this example:

$ 1 \leadsto 2$, $ 2 \leadsto 3$, $ 3 \leadsto 1$ so $ 1,2,3$ are all in the same communicating class.

$ 4 \leadsto 1, 2, 3$ but not vice versa.

$ 5 \leadsto 1,2,3,4$ but not vice versa.

So the communicating classes are

$\displaystyle \{1,2,3\} \quad \{4\} \quad \{5\}
$

A Markov Chain is irreducible if there is only one communicating class.

Notation:

$\displaystyle f_i=P(\exists n>0: X_n=i\vert X_0=i)
$

State $ i$ is recurrent if $ f_i=1$, otherwise transient.

If $ f_i=1$ then Markov property (chain starts over when it gets back to $ i$) means prob return infinitely many times (given started in $ i$ or given ever get to $ i$) is 1.

Consider chain started from transient $ i$. Let $ N$ be number of visits to state $ i$ (including visit at time 0). To return $ m$ times must return once then starting over return $ m-1$ times, then never return. So:

$\displaystyle P(N=m\vert X_0=i) = f_i^{m-1}(1-f_i)
$

for $ m=1,2,\ldots$.

$ N$ has a Geometric distribution and $ {\rm E}(N\vert X_0=i) = 1/(1-f_i)$.

Another calculation:

$\displaystyle N=\sum_{k=0}^\infty 1(X_k=i)
$

so

$\displaystyle {\rm E}(N\vert X_0=i) = \sum_{k=0}0^\infty P(X_k=i)
$

If we start the chain in state $ i$ then this is

$\displaystyle {\rm E}(N\vert X_0=i) = \sum_{k=0}0^\infty {\bf P}^k_{ii}
$

and $ i$ is transient if and only if

$\displaystyle \sum_{k=0}0^\infty {\bf P}^k_{ii} < \infty \, .
$

For last example: $ 4$ and $ 5$ are transient. Claim: states 1, 2 and 3 are recurrent.

Proof: argument above shows each transient state is visited only finitely many times. So: there is a recurrent state. (Note use of finite number of states.) It must be one of 1, 2 and 3. Proposition: If one state in a communicating class is recurrent then all states in the communicating class are recurrent.

Proof: Let $ i$ be the known recurrent state so

$\displaystyle \sum_n {\bf P}^n_{ii} = \infty
$

Assume $ i$ and $ j$ communicate. Find integers $ m$ and $ k$ such that

$\displaystyle {\bf P}^m_{ij} > 0
$

and

$\displaystyle {\bf P}^k_{ji} > 0
$

Then

$\displaystyle {\bf P}^{m+n+k}_{jj} \ge {\bf P}^k_{ji} {\bf P}^n_{ii}{\bf P}^m_{ij}
$

Sum RHS over $ n$ get $ \infty$ so

$\displaystyle \sum_n {\bf P}^n_{jj} = \infty
$

Proposition also means that if 1 state in a class is transient so are all.

State $ i$ has period $ d$ if $ d$ is greatest common divisor of

$\displaystyle \{n: {\bf P}^n_{ii} > 0\}
$

If $ i$ and $ j$ are in the same class then $ i$ and $ j$ have same period. If $ d=1$ then state $ i$ is called aperiodic; if $ d>1$ then $ i$ is periodic.

$\displaystyle {\bf P}= \left[\begin{array}{ccccc}
0& 1 & 0 & 0 &0
\\
0 & 0 & ...
...0 & 0 & 0 & 0
\\
0 & 0 & 0 & 0 & 1
\\
0 & 0 & 0 & 1 & 0
\end{array}\right]
$

For this example $ \{1,2,3\}$ is a class of period 3 states and $ \{4,5\}$ a class of period 2 states.

$\displaystyle {\bf P}= \left[\begin{array}{ccc}
0& 1/2 & 1/2
\\
1 & 0 & 0
\\
1 & 0 & 0
\end{array}\right]
$

has a single communicating class of period 2.

A chain is aperiodic if all its states are aperiodic.

Infinite State Spaces

Example: consider a sequence of independent coin tosses with probability $ p$ of Heads on a single toss. Let $ X_n$ be number of heads minus number of tails after $ n$ tosses. Put $ X_0=0$.

$ X_n$ is a Markov Chain. State space is $ \mathbb{Z}$, the integers and

$\displaystyle {\bf P}_{ij} = \begin{cases}
p & j=i+1
\\
1-p & j=i-1
\\
0 & \text{otherwise}
\end{cases}$

This chain has one communicating class (for $ p\neq 0,1$) and all states have period 2. According to the strong law of large numbers $ X_n/n$ converges to $ 2p-1$. If $ p\neq 1/2$ this guarantees that for all large enough $ n$ $ X_n \neq 0$, that is, the number of returns to 0 is not infinite. The state 0 is then transient and so all states must be transient.

For $ p = 1/2$ the situation is different. It is a fact that

$\displaystyle {\bf P}^n_{00} = P($# H = # T at time $n$$\displaystyle )
$

For $ n$ even this is the probability of exactly $ n/2$ heads in $ n$ tosses.

Local Central Limit Theorem (normal approximation to $ P(-1/2 < X_n < 1/2)$) (or Stirling's approximation) shows

$\displaystyle \sqrt{2m}P($Binomial$\displaystyle (2m,1/2) = m) \to (2/\pi)^{1/2}
$

so:

$\displaystyle \sum_n {\bf P}^n_{00} = \infty
$

That is: 0 is a recurrent state.

Hitting Times

Start irreducible recurrent chain $ X_n$ in state $ i$. Let $ T_j$ be first $ n>0$ such that $ X_n=j$. Define

$\displaystyle m_{ij} = {\rm E}(T_j\vert X_0=i)
$

First step analysis:

$\displaystyle m_{ij}$ $\displaystyle = 1\cdot P(X_1=j\vert X_0=i)$    
  $\displaystyle \qquad + \sum_{k\neq j} (1+{\rm E}(T_j\vert X_0=k))P_{ik}$    
  $\displaystyle = \sum_k P_{ik} + \sum_{k\neq j} P_{ik} m_{kj}$    
  $\displaystyle = 1 + \sum_{k\neq j} P_{ik} m_{kj}$    

Example

$\displaystyle {\bf P}= \left[\begin{array}{cc} \frac{3}{5} & \frac{2}{5}
\\
\\
\frac{1}{5} & \frac{4}{5}
\end{array} \right]
$

The equations are

$\displaystyle m_{11}$ $\displaystyle = 1 +\frac{2}{5} m_{21}$    
$\displaystyle m_{12}$ $\displaystyle = 1 +\frac{3}{5} m_{12}$    
$\displaystyle m_{21}$ $\displaystyle = 1 +\frac{4}{5} m_{21}$    
$\displaystyle m_{22}$ $\displaystyle = 1 +\frac{1}{5} m_{12}$    

The second and third equations give immediately

$\displaystyle m_{12}$ $\displaystyle = \frac{5}{2}$    
$\displaystyle m_{21}$ $\displaystyle = 5$    

Then plug in to the others to get

$\displaystyle m_{11}$ $\displaystyle = 3$    
$\displaystyle m_{22}$ $\displaystyle = \frac{3}{2}$    

Notice stationary initial distribution is

$\displaystyle \left(\frac{1}{m_{11}},\frac{1}{m_{22}}\right)
$

Consider fraction of time spent in state $ j$:

$\displaystyle \frac{1(X_0=j)+\cdots+1(X_n=j)}{n+1}
$

Imagine chain starts in state $ i$; take expected value.

$\displaystyle \frac{\sum_{r=1}^n {\bf P}^r_{ij} +1(i=j)}{n+1}
=
\frac{\sum_{r=0}^n {\bf P}^r_{ij}}{n+1}
$

If rows of $ {\bf P}^n$ converge to $ \pi$ then fraction converges to $ \pi_j$; i.e. limiting fraction of time in state $ j$ is $ \pi_j$.

Heuristic: start chain in $ i$. Expect to return to $ i$ every $ m_{ii}$ time units. So are in state $ i$ about once every $ m_{ii}$ time units; i.e. limiting fraction of time in state $ i$ is $ 1/m_{ii}$.

Conclusion: for an irreducible recurrent finite state space Markov chain

$\displaystyle \pi_i = \frac{1}{m_{ii}} \, .
$

Infinite State Spaces

Previous conclusion is still right if there is a stationary initial distribution.

Example: $ X_n =$   Heads$ -$Tails after $ n$ tosses of fair coin. Equations are

$\displaystyle m_{0,0}$ $\displaystyle = 1 + \frac{1}{2}m_{1,0}+\frac{1}{2} m_{-1,0}$    
$\displaystyle m_{1,0}$ $\displaystyle = 1 + \frac{1}{2} m_{2,0}$    

and many more.

Some observations:

You have to go through 1 to get to 0 from 2 so

$\displaystyle m_{2,0}=m_{2,1}+m_{1,0}
$

Symmetry (switching H and T):

$\displaystyle m_{1,0}=m_{-1,0}
$

The transition probabilities are homogeneous:

$\displaystyle m_{2,1}=m_{1,0}
$

Conclusion:

$\displaystyle m_{0,0}$ $\displaystyle = 1 + m_{1,0}$    
  $\displaystyle = 1+ 1 + \frac{1}{2} m_{2,0}$    
  $\displaystyle = 2 + m_{1,0}$    

Notice that there are no finite solutions!

Summary of the situation:

Every state is recurrent.

All the expected hitting times $ m_{ij}$ are infinite.

All entries $ {\bf P}^n_{ij}$ converge to 0.

Jargon: The states in this chain are null recurrent.

One Example

Page 229, question 21: runner goes from front or back door, prob 1/2 each. Returns front or back, prob 1/2 each. Has $ k$ pairs of shoes, wears pair if any at departure door, leaves at return door. No shoes? Barefoot. Long run fraction of time barefoot?

Solution: Let $ X_n$ be number of shoes at front door on day $ n$. Then $ X_n$ is a Markov Chain.

Transition probabilities:

$ k$ pairs at front door on day $ n$. $ X_{n+1}$ is $ k$ if goes out back door (prob is 1/2) or out front door and back in front door (prob is 1/4). Otherwise $ X_{n+1}$ is $ k-1$.

$ 0 < j < k$ pairs at front door on day $ n$. $ X_{n+1}$ is $ j+1$ if out back, in front (prob is 1/4). $ X_{n+1}$ is $ j-1$ if out front, in back. Otherwise $ X_{n+1}$ is $ j$.

0 pairs at front door on day $ n$. $ X_{n+1}$ is 0 if out front door (prob 1/2) or out back door and in back door (prob 1/4) otherwise $ X_{n+1}$ is $ 1$.

Transition matrix $ {\bf P}$:

$\displaystyle \left[\begin{array}{cccccc}
\frac{3}{4} & \frac{1}{4} & 0 & 0 &\c...
... \vdots
\\
0 & 0 & 0 & \cdots & \frac{1}{4} & \frac{3}{4}
\end{array}\right]
$

Doubly stochastic: row sums and column sums are 1.

So $ \pi_i=1/(k+1)$ for all $ i$ is stationary initial distribution.

Solution to problem: 1 day in $ k+1$ no shoes at front door. Half of those go barefoot. Also 1 day in $ k+1$ all shoes at front door; go barefoot half of these days. Overall go barefoot $ 1/(k+1)$ of the time.

Gambler's Ruin

Insurance company's reserves fluctuate: sometimes up, sometimes down. Ruin is event they hit 0 (company goes bankrupt). General problem. For given model of fluctuation compute probability of ruin either eventually or in next $ k$ time units.

Simplest model: gambling on Red at Casino. Bet $1 at a time. Win $1 with probability $ p$, lose $1 with probability $ 1-p$. Start with $ k$ dollars. Quit playing when down to $0 or up to $ N$. Compute

$\displaystyle P_k = P($reach $N$ before $0$$\displaystyle \vert X_0=k)
$

$ X_n$ = fortune after $ n$ plays. $ X_0=k$.

Transition matrix:

$\displaystyle {\bf P}=
\left[\begin{array}{cccccc}
1 & 0 & 0 & 0 &\cdots & 0
\...
... & \ddots & \ddots & \vdots
\\
0 & 0 & 0 & \cdots & 0 & 1
\end{array}\right]
$

First step analysis:

$\displaystyle P_0$ $\displaystyle =0$    
$\displaystyle P_i$ $\displaystyle = (1-p) P_{i-1}+pP_{i+1}$    
$\displaystyle P_N$ $\displaystyle = 1$    

Middle equation is

$\displaystyle pP_i +(1-p)P_i = (1-p) P_{i-1}+pP_{i+1}
$

or

$\displaystyle P_{i+1}-P_{i}$ $\displaystyle = \frac{1-p}{p} (P_{i}-P_{i-1})$    
  $\displaystyle = \left( \frac{1-p}{p}\right)^2 (P_{i-1}-P_{i-2})$    
  $\displaystyle \vdots$    
  $\displaystyle = \left( \frac{1-p}{p}\right)^{i} (P_1-P_{0})$    
  $\displaystyle = \left( \frac{1-p}{p}\right)^{i}P_1$    

Sum from $ i=0$ to $ i=k-1$ to get

$\displaystyle P_k = \sum_{i=0}^{k-1} \left( \frac{1-p}{p}\right)^i P_1
$

or

$\displaystyle P_k = \frac{1-\{(1-p)/p\}^k}{1-\{(1-p)/p\}} P_1
$

For $ k=N$ we get

$\displaystyle 1=\frac{1-\{(1-p)/p\}^N}{1-\{(1-p)/p\}} P_1
$

so that

$\displaystyle P_k = \frac{1-\{(1-p)/p\}^k}{1-\{(1-p)/p\}^N}
$

Notice that if $ p = 1/2$ our formulas for the sum of the geometric series are wrong. But for $ p = 1/2$ we get

$\displaystyle P_k=kP_1
$

so

$\displaystyle P_k=\frac{k}{N} \, .
$

Mean time in transient states

$\displaystyle {\bf P}= \left[\begin{array}{cccc}
\frac{1}{2} & \frac{1}{2} & 0 ...
...\\
\frac{1}{4} & \frac{1}{4} & \frac{3}{8} & \frac{1}{8}
\end{array}\right]
$

States 3 and 4 are transient. Let $ m_{i,j}$ be the expected total number of visits to state $ j$ for chain started in $ i$.

For $ i=1$ or $ i=2$ and $ j=3$ or 4:

$\displaystyle m_{ij} = 0
$

For $ j=1$ or $ j=2$

$\displaystyle m_{ij} = \infty
$

For $ i,j \in \{3,4\}$ first step analysis:

$\displaystyle m_{3,3}$ $\displaystyle = 1+ \frac{1}{4} m_{3,3} + \frac{1}{4} m_{4,3}$    
$\displaystyle m_{3,4}$ $\displaystyle = 0 + \frac{1}{4} m_{3,4} + \frac{1}{4} m_{4,4}$    
$\displaystyle m_{4,3}$ $\displaystyle = 0 + \frac{3}{8} m_{3,3} + \frac{1}{8} m_{4,3}$    
$\displaystyle m_{4,4}$ $\displaystyle = 1 + \frac{3}{8} m_{3,4} + \frac{1}{8} m_{4,4}$    

In matrix form

$\displaystyle \left[\begin{array}{cc} m_{3,3} &m_{3,4} \\ \\ m_{4,3} &m_{4,4}\end{array}\right]$ $\displaystyle = \left[\begin{array}{cc} 1 & 0 \\ \\ 0 & 1 \end{array}\right] +$    
  $\displaystyle \qquad \left[\begin{array}{cc} \frac{1}{4} & \frac{1}{4} \\ \\ \f...
...eft[\begin{array}{cc} m_{3,3} &m_{3,4} \\ \\ m_{4,3} &m_{4,4}\end{array}\right]$    

Translate to matrix notation:

$\displaystyle {\bf M} = {\bf I} + {\bf P}_T {\bf M}
$

where $ \bf I$ is the identity, $ \bf M$ is the matrix of means and $ {\bf P}_T$ the part of the transition matrix corresponding to transient states.

Solution is

$\displaystyle {\bf M} = ({\bf I} - {\bf P}_T)^{-1}
$

In our case

$\displaystyle {\bf I} - {\bf P}_T =
\left[\begin{array}{rr} \frac{3}{4} & -\frac{1}{4} \\  \\
-\frac{3}{8} & \frac{7}{8}
\end{array}\right]
$

so that

$\displaystyle {\bf M} = \left[\begin{array}{rr} \frac{14}{9} & \frac{4}{9} \\  \\
\frac{2}{3} & \frac{4}{3}
\end{array}\right]
$

Poisson Processes

Particles arriving over time at a particle detector. Several ways to describe most common model.

Approach 1: numbers of particles arriving in an interval has Poisson distribution, mean proportional to length of interval, numbers in several non-overlapping intervals independent.

For $ s<t$, denote number of arrivals in $ (s,t]$ by $ N(s,t)$. Model is

  1. $ N(s,t)$ has a Poisson $ (\lambda(t-s))$ distribution.

  2. For $ 0 \le s_1 < t_1 \le s_2 < t_2 \cdots \le s_k < t_k$ the variables $ N(s_i,t_i);i=1,\ldots,k$ are independent.

Approach 2:

Let $ 0 < S_1 <S_2 < \cdots $ be the times at which the particles arrive.

Let $ T_i = S_i-S_{i-1}$ with $ S_0=0$ by convention.

Then $ T_1,T_2,\ldots$ are independent Exponential random variables with mean $ 1/\lambda$.

Note $ P(T_i > x) =e^{-\lambda x}$ is called survival function of $ T_i$.

Approaches are equivalent. Both are deductions of a model based on local behaviour of process.

Approach 3: Assume:

  1. given all the points in $ [0,t]$ the probability of 1 point in the interval $ (t,t+h]$ is of the form

    $\displaystyle \lambda h +o(h)
$

  2. given all the points in $ [0,t]$ the probability of 2 or more points in interval $ (t,t+h]$ is of the form

    $\displaystyle o(h)
$

All 3 approaches are equivalent. I show: 3 implies 1, 1 implies 2 and 2 implies 3. First explain $ o$, $ O$.

Notation: given functions $ f$ and $ g$ we write

$\displaystyle f(h) = g(h) +o(h)
$

provided

$\displaystyle \lim_{h \to 0} \frac{f(h)-g(h)}{h} = 0
$

[Aside: if there is a constant $ M$ such that

$\displaystyle \limsup_{h \to 0} \left\vert\frac{f(h)-g(h)}{h}\right\vert \le M
$

we say

$\displaystyle f(h) = g(h)+O(h)
$

Notation due to Landau. Another form is

$\displaystyle f(h) = g(h)+O(h)
$

means there is $ \delta>0$ and $ M$ s.t. for all $ \vert h\vert<\delta$

$\displaystyle \vert f(h)-g(h) \vert\le M h
$

Idea: $ o(h)$ is tiny compared to $ h$ while $ O(h)$ is (very) roughly the same size as $ h$.]

Model 3 implies 1: Fix $ t$, define $ f_t(s)$ to be conditional probability of 0 points in $ (t,t+s]$ given value of process on $ [0,t]$.

Derive differential equation for $ f$. Given process on $ [0,t]$ and 0 points in $ (t,t+s]$ probability of no points in $ (t,t+s+h]$ is

$\displaystyle f_{t+s}(h) = 1-\lambda h+o(h)
$

Given the process on $ [0,t]$ the probability of no points in $ (t,t+s]$ is $ f_t(s)$. Using $ P(AB\vert C)=P(A\vert BC)P(B\vert C)$ gives

$\displaystyle f_t(s+h)$ $\displaystyle = f_t(s)f_{t+s}(h)$    
  $\displaystyle = f_t(s)(1-\lambda h +o(h))$    

Now rearrange, divide by $ h$ to get

$\displaystyle \frac{f_t(s+h) - f_t(s)}{h} = -\lambda f_t(s) +\frac{o(h)}{h}
$

Let $ h \to 0$ and find

$\displaystyle \frac{\partial f_t(s)}{\partial s} = -\lambda f_t(s)
$

Differential equation has solution

$\displaystyle f_t(s) = f_t(0) \exp(-\lambda s) = \exp(-\lambda s)\, .
$

Notice: survival function of exponential rv..

General case. Notation: $ N(t) =N(0,t)$.

$ N(t)$ is a non-decreasing function of $ t$. Let

$\displaystyle P_k(t) = P(N(t)=k)
$

Evaluate $ P_k(t+h)$ by conditioning on $ N(s);0 \le s < t$ and $ N(t)=j$.

Given $ N(t)=j$ probability that $ N(t+h) = k$ is conditional probability of $ k-j$ points in $ (t,t+h]$.

So, for $ j\le k-2$:

\begin{multline*}
P(N(t+h)=k\vert N(t) = j, N(s), 0 \le s < t)
\\ =o(h)
\end{multline*}

For $ j=k-1$ we have

\begin{multline*}
P(N(t+h)= k\vert N(t) = k-1, N(s), 0 \le s < t)
\\
= \lambda h + o(h)
\end{multline*}

For $ j=k$ we have

\begin{multline*}
P(N(t+h)= k\vert N(t) = k, N(s), 0 \le s < t)
\\
= 1-\lambda h + o(h)
\end{multline*}

$ N$ is increasing so only consider $ j\le k$.

$\displaystyle P_k(t+h)$ $\displaystyle = \sum_{j=0}^k P(N(t+h)=k\vert N(t) = j) P_j(t)$    
  $\displaystyle = P_k(t) (1-\lambda h) +\lambda h P_{k-1}(t) + o(h)$    

Rearrange, divide by $ h$ and let $ h \to 0$ t get

$\displaystyle P_k^\prime(t) = -\lambda P_k(t) + \lambda P_{k-1}(t)
$

For $ k=0$ the term $ P_{k-1}$ is dropped and

$\displaystyle P^\prime_0(t) = -\lambda P_0(t)
$

Using $ P_0(0)=1$ we get

$\displaystyle P_0(t) = e^{-\lambda t}
$

Put this into the equation for $ k=1$ to get

$\displaystyle P_1^\prime(t) = -\lambda P_1(t) +\lambda e^{-\lambda t}
$

Multiply by $ e^{\lambda t}$ to see

$\displaystyle \left(e^{\lambda t}P_1(t)\right)^\prime = \lambda
$

With $ P_1(0)=0$ we get

$\displaystyle P_1(t) = \lambda t e^{-\lambda t}
$

For general $ k$ we have $ P_k(0)=0$ and

$\displaystyle \left(e^{\lambda t}P_k(t)\right)^\prime = \lambda e^{\lambda t}P_{k-1}(t)
$

Check by induction that

$\displaystyle e^{\lambda t}P_k(t) = (\lambda t)^k/k!
$

Hence: $ N(t)$ has Poisson $ (\lambda t)$ distribution.

Similar ideas permit proof of

\begin{multline*}
P(N(s,t)=k\vert N(u); 0 \le u \le s)
\\
= \frac{\left\{\lambda(t-s)\right\}^k e^{-\lambda(t-s) }}{k!}
\end{multline*}

From which (by induction) we can prove that $ N$ has independent Poisson increments.

Exponential Interarrival Times

If $ N$ is a Poisson Process we define $ T_1,T_2,\ldots$ to be the times between 0 and the first point, the first point and the second and so on.

Fact: $ T_1,T_2,\ldots$ are iid exponential rvs with mean $ 1/\lambda$.

We already did $ T_1$ rigorously. The event $ T>t$ is exactly the event $ N(t)=0$. So

$\displaystyle P(T>t) = \exp(-\lambda t)
$

which is the survival function of an exponential rv.

I do case of $ T_1,T_2$. Let $ t_1,t_2$ be two positive numbers and $ s_1=t_1$, $ s_2=t_1+t_2$. Consider event

$\displaystyle \{t_1 < T_1 \le t_1+\delta_1\} \cap \{t_2 < T_2 \le t_2+\delta_2\} \, .
$

This is almost the same as the intersection of the four events:

$\displaystyle N(0,t_1]$ $\displaystyle =0$    
$\displaystyle N(t_1,t_1+\delta_1]$ $\displaystyle = 1$    
$\displaystyle N(t_1+\delta_1,t_1+\delta_1+t_2]$ $\displaystyle =0$    
$\displaystyle N(s_2+\delta_1,s_2+\delta_1+\delta_2]$ $\displaystyle = 1$    

which has probability

$\displaystyle e^{-\lambda t_1} \times \lambda\delta_1 e^{-\lambda \delta_1}
\times e^{-\lambda t_2} \times \lambda\delta_2 e^{-\lambda \delta_2}
$

Divide by $ \delta_1\delta_2$ and let $ \delta_1$ and $ \delta_2$ go to 0 to get joint density of $ T_1,T_2$ is

$\displaystyle \lambda^2e^{-\lambda t_1}e^{-\lambda t_2}
$

which is the joint density of two independent exponential variates.

More rigor:

First step: Compute

$\displaystyle P( 0 < S_1 \le s_1 < S_2 \le s_2 \cdots < S_k \le s_k)
$

This is just the event of exactly 1 point in each interval $ (s_{i-1},s_i]$ for $ i=1,\ldots,k-1$ ($ s_0=0$) and at least one point in $ (s_{k-1},s_k]$ which has probability

$\displaystyle \prod_1^{k-1} \left\{\lambda(s_i-s_{i-1})e^{-\lambda(s_i-s_{i-1})}\right\}
\left(1-e^{-\lambda(s_k-s_{k-1})}\right)
$

Second step: write this in terms of joint cdf of $ S_1,\ldots,S_k$. I do $ k=2$:

\begin{multline*}
P( 0 < S_1 \le s_1 < S_2 \le s_2) \\ =
F_{S_1,S_2}(s_1,s_2)-F_{S_1,S_2}(s_1,s_1)
\end{multline*}

Notice tacit assumption $ s_1 < s_2$.

Differentiate twice, that is, take

$\displaystyle \frac{\partial^2}{\partial s_1\partial s_2}
$

to get

\begin{multline*}
f_{S_1,S_2}(s_1,s_2)\\ = \frac{\partial^2}{\partial s_1\partia...
...ambda s_1 e^{-\lambda s_1} \left(1-e^{-\lambda (s_2-s_1)}\right)
\end{multline*}

Simplify to

$\displaystyle \lambda^2 e^{-\lambda s_2}
$

Recall tacit assumption to get

$\displaystyle f_{S_1,S_2}(s_1,s_2) =\lambda^2 e^{-\lambda s_2} 1(0 < s_1 < s_2)
$

That completes the first part.

Now compute the joint cdf of $ T_1,T_2$ by

$\displaystyle F_{T_1,T_2}(t_1,t_2) = P(S_1 < t_1, S_2-S_1 <t_2)
$

This is

$\displaystyle P(S_1 < t_1,$ $\displaystyle S_2-S_1 <t_2)$    
  $\displaystyle = \int_0^{t_1} \int_{s_1}^{s_1+t_2} \lambda^2 e^{-\lambda s_2} ds_2ds_1$    
  $\displaystyle = \lambda \int_0^{t_1} \left(e^{-\lambda s_1} - e^{-\lambda(s_1+t_2)}\right) ds_1$    
  $\displaystyle = 1-e^{-\lambda t_1} - e^{-\lambda t_2} + e^{-\lambda(t_1+t_2)}$    

Differentiate twice to get

$\displaystyle f_{T_1,T_2}(t_1,t_2) = \lambda e^{-\lambda t_1} \lambda e^{-\lambda t_2}
$

which is the joint density of two independent exponential random variables.

Summary so far:

Have shown:

Instantaneous rates model implies independent Poisson increments model implies independent exponential interarrivals.

Next: show independent exponential interarrivals implies the instantaneous rates model.

Suppose $ T_1,\ldots$ iid exponential rvs with means $ 1/\lambda$. Define $ N_t$ by $ N_t=k$ if and only if

$\displaystyle T_1+\cdots+T_k \le t \le T_1+\cdots +T_{k+1}
$

Let $ A$ be the event $ N(s)=n(s) ;0 < s \le t$. We are to show

$\displaystyle P(N(t,t+h] = 1\vert N(t)=k,A) = \lambda h + o(h)
$

and

$\displaystyle P(N(t,t+h] \ge 2\vert N(t)=k,A) = o(h)
$

If $ n(s)$ is a possible trajectory consistent with $ N(t) = k$ then $ n$ has jumps at points

$\displaystyle s_1\equiv t_1,s_2\equiv t_1+t_2, \ldots,s_k\equiv t_1+\cdots+t_k < t
$

and at no other points in $ (0,t]$.

So given $ N(s)=n(s) ;0 < s \le t$ with $ n(t)=k$ we are essentially being given

$\displaystyle T_1=t_1,\ldots,T_k=t_k, T_{k+1}> t-s_k
$

and asked the conditional probabilty in the first case of the event $ B$ given by

$\displaystyle t-s_k <T_{k+1}\le t-s_k+h < T_{k+2}+T_{k+1}\, .
$

Conditioning on $ T_1,\ldots,T_k$ irrelevant (independence).

$\displaystyle P(N(t,t+h] = 1\vert$ $\displaystyle N(t)=k,A) /h$    
  $\displaystyle = P(B\vert T_{k+1}> t-s_k)/h$    
  $\displaystyle = \frac{P(B) }{he^{-\lambda(t-s_k)}}$    

The numerator may be evaluated by integration:

$\displaystyle P(B) = \int_{t-s_k}^{t-s_k+h} \int_{t-s_k+h - s_1}^\infty \lambda^2 e^{-\lambda(s_1+s_2)} ds_2ds_1
$

Let $ h \to 0$ to get the limit

$\displaystyle P
(N(t,t+h] = 1\vert N(t)=k,A) /h \to \lambda
$

as required.

The computation of

$\displaystyle \lim_{h\to 0} P(N(t,t+h] \ge 2\vert N(t)=k,A) /h
$

is similar.

Properties of exponential rvs

Convolution: If $ X$ and $ Y$ independent rvs with densities $ f$ and $ g$ respectively and $ Z=X+Y$ then

$\displaystyle P(Z \le z) = \int_{-\infty}^\infty \int_{-\infty}^{z-x} f(x) g(y) dy dx
$

Differentiating wrt $ z$ we get

$\displaystyle f_Z(z) = \int_{-\infty}^\infty f(x) g(z-x) dx
$

This integral is called the convolution of densities $ f$ and $ g$.

If $ T_1,\ldots,T_n$ iid Exponential$ (\lambda)$ then $ S_n=T_1+\cdots+T_n$ has a Gamma $ (n,\lambda)$ distribution. Density of $ S_n$ is

$\displaystyle f_{S_n}(s) = \lambda (\lambda s)^{n-1} e^{-\lambda s} / n!
$

for $ s>0$.

Proof:

$\displaystyle P(S_n>s)$ $\displaystyle = P(N(0,s] < n)$    
  $\displaystyle = \sum_{j=0}^{n-1} (\lambda s)^j e^{-\lambda s}/j!$    

Then

$\displaystyle f_{S_n}(s)$ $\displaystyle = \frac{d}{ds} P(S_n \le s)$    
  $\displaystyle = \frac{d}{ds}\left\{1-P(S_n>s)\right\}$    
  $\displaystyle = -\lambda \sum_{j=1}{n-1}\left\{ j(\lambda s)^{j-1} -(\lambda s)^j\right\} \frac{e^{-\lambda s}}{j!}$    
  $\displaystyle \qquad + \lambda e^{-\lambda s}$    
  $\displaystyle = \lambda e^{-\lambda s } \sum_{j=1}^{n-1}\left\{ \frac{(\lambda s)^j}{j!} - \frac{(\lambda s)^{j-1}}{(j-1)!}\right\}$    
  $\displaystyle \qquad + \lambda e^{-\lambda s}$    

This telescopes to

$\displaystyle f_{S_n}(s) = \lambda (\lambda s)^{n-1} e^{-\lambda s} / (n-1)!
$

Extreme Values: If $ X_1,\ldots,X_n$ are independent exponential rvs with means $ 1/\lambda_1,\ldots,1/\lambda_n$ then $ Y = \min\{X_1,\ldots,X_n\}$ has an exponential distribution with mean

$\displaystyle \frac{1}{\lambda_1+\cdots+\lambda_n}
$

Proof:

$\displaystyle P(Y > y)$ $\displaystyle = P(\forall k X_k>y)$    
  $\displaystyle = \prod e^{-\lambda_k y}$    
  $\displaystyle = e^{-\sum\lambda_k y}$    

Memoryless Property: conditional distribution of $ X-x$ given $ X \ge x$ is exponential if $ X$ has an exponential distribution.

Proof:

$\displaystyle P(X-x>y\vert X\ge$ $\displaystyle x)$    
  $\displaystyle = \frac{P(X>x+y,X\ge x)}{P(X>x)}$    
  $\displaystyle = \frac{PX>x+y)}{P(X\ge x)}$    
  $\displaystyle = \frac{e^{-\lambda(x+y)}}{e^{-\lambda x}}$    
  $\displaystyle = e^{-\lambda y}$    

Hazard Rates

The hazard rate, or instantaneous failure rate for a positive random variable $ T$ with density $ f$ and cdf $ F$ is

$\displaystyle r(t) = \lim_{\delta\to 0} \frac{P(t < T\le t+\delta \vert T \ge t)}{\delta}
$

This is just

$\displaystyle r(t) = \frac{f(t)}{1-F(t)}
$

For an exponential random variable with mean $ 1/\lambda$ this is

$\displaystyle h(t) = \frac{ \lambda e^{-\lambda t}}{e^{-\lambda t}} = \lambda
$

The exponential distribution has constant failure rate.

Weibull random variables have density

$\displaystyle f(t\vert\lambda,\alpha) = \lambda(\lambda t)^{\alpha-1}e^{- (\lambda t)^\alpha}
$

for $ t>0$. The corresponding survival function is

$\displaystyle 1-F(t) = e^{- (\lambda t)^\alpha}
$

and the hazard rate is

$\displaystyle r(t) = \lambda(\lambda t)^{\alpha-1}
$

which is increasing for $ \alpha>1$, decreasing for $ \alpha<1$. For $ \alpha=1$ this is the exponential distribution.

Since

$\displaystyle r(t) = \frac{dF(t)/dt}{1-F(t)} =-\frac{d\log(1-F(t))}{dt}
$

we can integrate to find

$\displaystyle 1-F(t) = \exp\{ -\int_0^t r(s) ds\}
$

so that $ r$ determines $ F$ and $ f$.

Properties of Poisson Processes

1)
If $ N_1$ and $ N_2$ are independent Poisson processes with rates $ \lambda_1$ and $ \lambda_2$, respectively, then $ N=N_1+N_2$ is a Poisson process with rate $ \lambda_1+\lambda_2$.

2)
Suppose $ N$ is a Poisson process with rate $ \lambda$. Suppose each point is marked with a label, say one of $ L_1,\ldots,L_r$, independently of all other occurences. Suppose $ p_i$ is the probability that a given point receives label $ L_i$. Let $ N_i$ count the points with label $ i$ (so that $ N=N_1+\cdots+N_r$). Then $ N_1,\ldots,N_r$ are independent Poisson processes with rates $ p_i\lambda$.

3)
Suppose $ U_1, U_2,\ldots$ independent rvs, each uniformly distributed on $ [0,T]$. Suppose $ M$ is a Poisson $ (\lambda T)$ random variable independent of the $ U$'s. Let

$\displaystyle N(t) = \sum_1^M 1(U_i \le t)
$

Then $ N$ is a Poisson process on $ [0,T]$ with rate $ \lambda$.

4)
Suppose $ N$ is a Poisson process with rate $ \lambda$. Let $ S_1 < S_2 < \cdots$ be the times at which points arrive Given $ N(T)=n$ $ S_1,\ldots,S_n$ have the same distribution as the order statistics of a sample of size $ n$ from the uniform distribution on $ [0,T]$.

5)
Given $ S_{n+1}=T$, $ S_1,\ldots,S_n$ have the same distribution as the order statistics of a sample of size $ n$ from the uniform distribution on $ [0,T]$.

Indications of some proofs:

1) $ N_1,\ldots,N_r$ independent Poisson processes rates $ \lambda_i$, $ N=\sum N_i$. Let $ A_h$ be the event of 2 or more points in $ N$ in the time interval $ (t,t+h]$, $ B_h$, the event of exactly one point in $ N$ in the time interval $ (t,t+h]$.

Let $ A_{ih}$ and $ B_{ih}$ be the corresponding events for $ N_i$.

Let $ H_t$ denote the history of the processes up to time $ t$; we condition on $ H_t$.

We are given:

$\displaystyle P(A_{ih}\vert H_t) =o(h)
$

and

$\displaystyle P(B_{ih}\vert H_t) = \lambda_i h+o(h)\, .$

Note that

$\displaystyle A_h \subset \bigcup_{i=1}^r A_{ih} \cup \bigcup_{i \neq j} \left(B_{ih}\cap
B_{jh}\right)
$

Since

$\displaystyle P(B_{ih}\cap B_{jh}\vert H_t)$ $\displaystyle = P(B_{ih}\vert H_t) P(B_{jh}\vert H_t)$    
  $\displaystyle = (\lambda_i h +o(h))(\lambda_j h+o(h))$    
  $\displaystyle = O(h^2)$    
  $\displaystyle = o(h)$    

we have checked one of the two infinitesimal conditions for a Poisson process.

Next let $ C_h$ be the event of no points in $ N$ in the time interval $ (t,t+h]$ and $ C_{ih}$ the same for $ N_i$. Then

$\displaystyle P(C_h\vert H_t)$ $\displaystyle = P(\cap C_{ih}\vert H_t)$    
  $\displaystyle = \prod P(C_{ih}\vert H_t)$    
  $\displaystyle = \prod (1-\lambda_ih +o(h))$    
  $\displaystyle = 1-(\sum \lambda_i)h + o(h)$    

shows

$\displaystyle P(B_h\vert H_t)$ $\displaystyle = 1-P(C_h\vert H_t) -P(A_h\vert H_t)$    
  $\displaystyle = (\sum \lambda_i) h +o(h)$    

Hence $ N$ is a Poisson process with rate $ \sum \lambda_i$.

2) The infinitesimal approach used for 1 can do part of this. See text for rest. Events defined as in 1): The event $ B_{ih}$ that there is one point in $ N_i$ in $ (t,t+h]$ is the event, $ B_h$ that there is exactly one point in any of the $ r$ processes together with a subset of $ A_h$ where there are two or more points in $ N$ in $ (t,t+h]$ but exactly one is labeled $ i$. Since $ P(A_h\vert H_t)=o(h)$

$\displaystyle P(B_{ih}\vert H_t)$ $\displaystyle = p_iP(B_{h}\vert H_t)+o(h)$    
  $\displaystyle = p_i (\lambda h+o(h)) + o(h)$    
  $\displaystyle = p_i \lambda h + o(h)$    

Similarly, $ A_{ih}$ is a subset of $ A_h$ so

$\displaystyle P(A_{ih}\vert H_t) =o(h)
$

This shows each $ N_i$ is Poisson with rate $ \lambda p_i$. To get independence requires more work; see the text for the algebraic method which is easier.

3) Fix $ s<t$. Let $ N(s,t)$ be the number of points in $ (s,t]$. Given $ N=n$ the conditional distribution of $ N(s,t)$ is Binomial$ (n,p)$ with $ p=(t-s)/T$. So

$\displaystyle P(N(s$ $\displaystyle ,t)=k)$    
  $\displaystyle = \sum_{n=k}^\infty P(N(s,t)=k,N=n)$    
  $\displaystyle = \sum_{n=k}^\infty P(N(s,t)=k\vert N=n) P(N=n)$    
  $\displaystyle = \sum_{n=k}^\infty \frac{n!}{k!(n-k)!} p^k (1-p)^{n-k} \frac{(\lambda T)^n}{n!} e^{-\lambda T}$    
  $\displaystyle = \frac{e^{-\lambda T}}{k!} (\lambda T p)^k\sum_{n=k}^\infty \frac{(1-p)^{n-k} (\lambda T)^{n-k}}{(n-k)!}$    
  $\displaystyle = \frac{e^{-\lambda T}}{k!} (\lambda T p)^k \sum_{m=0}^\infty(1-p)^m(\lambda T)^m/m!$    
  $\displaystyle = \frac{e^{-\lambda T}}{k!} (\lambda T p)^ke^{\lambda T(1-p)}$    
  $\displaystyle = \frac{e^{-\lambda(t-s)}(\lambda(t-s))^k}{k!}$    

4): Fix $ s_i,h_i$ for $ i=1,\ldots, n$ such that

$\displaystyle 0 < s_1 < s_1+h_1 < s_2 < \cdots < s_n<s_n+h_n< T
$

Given $ N(T)=n$ we compute the probability of the event

$\displaystyle A=\bigcap_{i=1}^n \{s_i < S_i < s_i+h_i\}
$

Intersection of $ A$, $ N(T)=n$ is ($ s_0=h_0=0$):

\begin{multline*}
B \equiv \\
\bigcap_{i=1}^n \{N(s_{i-1}+h_{i-1},s_i]=0,N(s_i,s_i+h_i]=1\}
\\
\cap \{N(s_n+h_n,T]=0\}
\end{multline*}

whose probability is

$\displaystyle \left(\prod \lambda h_i\right) e^{-\lambda T}
$

So

$\displaystyle P(A\vert N(t)=n)$ $\displaystyle = \frac{P(A,N(T)=n)}{P(N(T)=n)}$    
  $\displaystyle = \frac{ \lambda^n e^{-\lambda T} \prod h_i}{(\lambda T)^n e^{-\lambda T} / n!}$    
  $\displaystyle = \frac{n!\prod h_i}{T^n}$    

Divide by $ \prod h_i$ and let all $ h_i$ go to 0 to get joint density of $ S_1,\ldots,S_n$ is

$\displaystyle \frac{n!}{T^n} 1(0 < s_1 < \cdots < s_n <T)
$

which is the density of order statistics from a Uniform$ [0,T]$ sample of size $ n$.

5) Replace the event $ S_{n+1}=T$ with $ T < S_{n+1}< T+h$. With $ A$ as before we want

\begin{multline*}
P(A\vert T < S_{n+1}< T+h)
\\
= \frac{ P(B,N(T,T+h] \ge 1)}{P(T < S_{n+1}< T+h)}
\end{multline*}

Note that $ B$ is independent of $ \{N(T,T+h] \ge 1\}$ and that we have already found the limit

$\displaystyle \frac{P(B)}{ \prod h_i} \to \lambda^n e^{-\lambda T}
$

We are left to compute the limit of

$\displaystyle \frac{P(N(T,T+h] \ge 1)}{P(T < S_{n+1}< T+h)}
$

The denominator is

\begin{multline*}
\sum_{k=0}^n P(N(0,T]=k,N(T,T+h]=n+1-k)
\\
+o(h) = P(N(0,T]=n)\lambda h+o(h)
\end{multline*}

Thus

$\displaystyle \frac{P(N(T,T+h] \ge 1)}{P(T < S_{n+1}< T+h)}
$ $\displaystyle = \frac{\lambda h+o(h)}{\frac{(\lambda T)^n}{n!} e^{-\lambda T} \lambda h+o(h)}$    
  $\displaystyle \to \frac{n!}{(\lambda T)^ne^{-\lambda T}}$    

This gives the conditional density of $ S_1,\ldots,S_n$ given $ S_{n+1}=T$ as in 4).

Inhomogeneous Poisson Processes

The idea of hazard rate can be used to extend the notion of Poisson Process. Suppose $ \lambda(t) \ge 0$ is a function of $ t$. Suppose $ N$ is a counting process such that

$\displaystyle P(N(t+h)=k+1\vert N(t)=k,H_t)= \lambda(t) h +o(h)
$

and

$\displaystyle P(N(t+h) \ge k+2\vert N(t)=k,H_t)= o(h)
$

Then $ N$ has independent increments and $ N(t+s)-N(t)$ has a Poisson distribution with mean

$\displaystyle \int_t^{t+s} \lambda (u) du
$

If we put

$\displaystyle \Lambda(t) = \int_0^t
\lambda(u) du
$

then mean of $ N(t+s)-N(t)$ is $ \Lambda(t+s)-\Lambda(t)$.

Jargon: $ \lambda$ is the intensity or instaneous intensity and $ \Lambda$ the cumulative intensity.

Can use the model with $ \Lambda$ any non-decreasing right continuous function, possibly without a derivative. This allows ties.

Compound Poisson Processes

Imagine insurance claims arise at times of a Poisson process, $ N(t)$, (more likely for an inhomogeneous process).

Let $ Y_i$ be the value of the $ i$th claim associated with the point whose time is $ S_i$.

Assume that the $ Y$'s are independent of each other and of $ N$.

Let

$\displaystyle \mu = {\rm E}(Y_i)$    and $\displaystyle \sigma^2 = {\rm var}(Y_i)
$

Let

$\displaystyle X(t) = \sum_{i=1}^{N(t)} Y_i
$

be the total claim up to time $ t$. We call $ X$ a compound Poisson Process.

Useful properties:

$\displaystyle {\rm E}\left\{X(t)\vert N(t)\right\}$ $\displaystyle = N(t) \mu$    
$\displaystyle {\rm Var}\left\{X(t)\vert N(t)\right\}$ $\displaystyle = N(t)\sigma^2$    
$\displaystyle {\rm E}\left\{X(t)\right\}$ $\displaystyle = \mu {\rm E}\left\{N(t)\right\}$    
  $\displaystyle = \mu \lambda t$    
$\displaystyle {\rm Var}\left\{X(t)\right\}$ $\displaystyle = {\rm Var}\left[{\rm E}\left\{X(t)\vert N(t)\right\}\right]$    
  $\displaystyle \qquad +{\rm E}\left[{\rm Var}\left\{X(t)\vert N(t)\right\}\right]$    
  $\displaystyle = \lambda t \mu^2 + \lambda t \sigma^2$    

(Look at all familiar? See homework.)

Continuous Time Markov Chains

Consider a population of single celled organisms in a stable environment.

Fix short time interval, length $ h$.

Each organism has some probability of dividing to produce two organisms and some other probability of dying.

We might suppose:

Tacit assumptions:

Constants of proportionality do not depend on time: ``stable environment''.

Constants do not depend on organism: organisms are all similar and live in similar environments.

$ Y(t)$: total population at time $ t$.

$ {\cal H}_t$: history of process up to time $ t$.

Condition on event $ Y(t) =n $.

Probability of two or more divisions (more than one division by a single organism or two or more organisms dividing) is $ o(h)$.

Probability of both a division and a death or of two or more deaths is $ o(h)$.

So probability of exactly 1 division by any one of the $ n$ organisms is $ n\lambda h+o(h)$.

Similarly probability of 1 death is $ n\mu h+o(h)$.

We deduce:

$\displaystyle P(Y(t+h)$ $\displaystyle =n+1\vert Y(t)=n, {\cal H}_t)$    
  $\displaystyle = n \lambda h+o(h)$    
$\displaystyle P(Y(t+h)$ $\displaystyle =n-1\vert Y(t)=n, {\cal H}_t)$    
  $\displaystyle = n \mu h+o(h)$    
$\displaystyle P(Y(t+h)$ $\displaystyle =n\vert Y(t)=n, {\cal H}_t)$    
  $\displaystyle = 1-n( \lambda+\mu) h+o(h)$    
$\displaystyle P(Y(t+h)$ $\displaystyle \not\in \{n-1,n,n+1\}\vert Y(t)=n, {\cal H}_t)$    
  $\displaystyle = o(h)$    

These equations lead to:

$\displaystyle P(Y(t+s) = j\vert$ $\displaystyle Y(s)=i,{\cal H}_s)$    
  $\displaystyle = P(Y(t+s) = j\vert Y(s)=i)$    
  $\displaystyle = P(Y(t)=j\vert Y(0)=i)$    

This is the Markov Property.

Definition:A process $ \{Y(t); t \ge 0\}$ taking values in $ S$, a finite or countable state space is a Markov Chain if

$\displaystyle P(Y(t+s) = j\vert$ $\displaystyle Y(s)=i,{\cal H}_s)$    
  $\displaystyle = P(Y(t+s) = j\vert Y(s)=i)$    
  $\displaystyle \equiv {\bf P}_{ij}(s,s+t)$    

Definition:A Markov chain $ Y$ has stationary transitions if

$\displaystyle {\bf P}_{ij}(s,s+t) = {\bf P}_{ij}(0,t) \equiv {\bf P}_{ij}(t)
$

From now on: our chains have stationary transitions.

Summary of Markov Process Results

Detailed development

Suppose $ X$ a Markov Chain with stationary transitions. Then

$\displaystyle P($ $\displaystyle X(t+s)=k \vert X(0)=i)$    
  $\displaystyle = \sum_j P(X(t+s)=k,X(t)=j\vert X(0)=i)$    
  $\displaystyle = \sum_j P(X(t+s)=k\vert X(t)=j,X(0)=i)$    
  $\displaystyle \qquad \times P(X(t)=j\vert X(0)=i)$    
  $\displaystyle = \sum_j P(X(t+s)=k\vert X(t)=j)$    
  $\displaystyle \qquad \times P(X(t)=j\vert X(0)=i)$    
  $\displaystyle = \sum_j P(X(s)=k\vert X(0)=j)$    
  $\displaystyle \qquad \times P(X(t)=j\vert X(0)=i)$    

This shows

$\displaystyle {\bf P}(t+s) = {\bf P}(t){\bf P}(s)
$

which is the Chapman-Kolmogorov equation.

Now consider the chain starting from $ i$ and let $ T_i$ be the first $ t$ for which $ X(t) \neq i$. Then $ T_i$ is a stopping time.

[Technically:

$\displaystyle \{T_i \le t\} \in {\cal H}_t
$

for each $ t$.] Then

$\displaystyle P(T_i$ $\displaystyle >t+s\vert T_i>s,X(0)=i)$    
  $\displaystyle = P(T_i>t+s\vert X(u)=i; 0 \le u \le s)$    
  $\displaystyle = P(T_i > t\vert X(0)=i)$    

by the Markov property.

Conclusion: given $ X(0)=i$, $ T_i$ has memoryless property so $ T_i$ has an exponential distribution. Let $ v_i$ be the rate parameter.

Embedded Chain: Skeleton

Let $ T_1 < T_2 < \cdots$ be the stopping times at which transitions occur. Then $ X_n = X(T_n)$. The sequence $ X_n$ is a Markov chain by the Markov property. That $ {\bf P}_{ii}=0$ reflects the fact that $ P(X(T_{n+1})=X(T_n))=0$ by design.

As before we say $ i \leadsto j$ if $ {\bf P}_{ij}(t)>0$ for some $ t$. It is fairly clear that $ i \leadsto j$ for the $ X(t)$ if and only if $ i \leadsto j$ for the embedded chain $ X_n$.

We say $ i{\leftrightarrow}j$ ($ i$ and $ j$ communicate) if $ i \leadsto j$ and $ j\leadsto i$.

Now consider

$\displaystyle P(X(t+h)=j\vert X(t)=i,{\cal H}_t)
$

Suppose the chain has made $ n$ transitions so far so that $ T_n < t < T_{n+1}$. Then the event $ X(t+h)=j$ is, except for possibilities of probability $ o(h)$ the event that

$\displaystyle t < T_{n+1} \le t+h$    and $\displaystyle X_{n+1} = j
$

The probability of this is

$\displaystyle (v_i h +o(h)) {\bf P}_{ij} = v_i {\bf P}_{ij} h +o(h)
$

Kolmogorov's Equations

The Chapman-Kolmogorov equations are

$\displaystyle {\bf P}(t+h) = {\bf P}(t){\bf P}(h)
$

Subtract $ {\bf P}(t)$ from both sides, divide by $ h$ and let $ h \to 0$. Remember that $ {\bf P}(0)$ is the identity. We find

$\displaystyle \frac{{\bf P}(t+h)-{\bf P}(t)}{h} = \frac{{\bf P}(t)({\bf P}(h) - {\bf P}(0))}{h}$

which gives

$\displaystyle {\bf P}^\prime(t) = {\bf P}(t) {\bf P}^\prime(0)
$

The Chapman-Kolmogorov equations can also be written

$\displaystyle {\bf P}(t+h) = {\bf P}(h) {\bf P}(t)
$

Now subtracting $ {\bf P}(t)$ from both sides, dividing by $ h$ and letting $ h \to 0$ gives

$\displaystyle {\bf P}^\prime(t) ={\bf P}^\prime(0) {\bf P}(t)
$

Look at these equations in component form:

$\displaystyle {\bf P}^\prime(t) ={\bf P}^\prime(0) {\bf P}(t)
$

becomes

$\displaystyle {\bf P}^\prime_{ij}(t) = \sum_k {\bf P}^\prime_{ik}(0) {\bf P}_{kj}(t)
$

For $ i \neq k$ our calculations of instantaneous transition rates gives

$\displaystyle {\bf P}^\prime_{ik}(0) = v_i {\bf P}_{ik}
$

For $ i = k$ we have

$\displaystyle P(X(h)=i\vert X(0)=i) = e^{-v_i h} +o(h)
$

($ X(h)=i$ either means $ T_i > h$ which has probability $ e^{-v_i h}$ or there have been two or more transitions in $ [0,h]$, a possibility of probability $ o(h)$.) Thus

$\displaystyle {\bf P}^\prime_{ii}(0) = -v_i
$

Let $ {\bf R}$ be the matrix with entries

$\displaystyle {\bf R}_{ij} = \begin{cases}
q_{ij} \equiv v_i {\bf P}_{ij} & i \neq j
\\
-v_i & i = j
\end{cases}$

That is

$\displaystyle {\bf R}={\bf P}^\prime(0).$

$ {\bf R}$ is the infinitesimal generator of the chain.

Thus

$\displaystyle {\bf P}^\prime(t) ={\bf P}^\prime(0) {\bf P}(t)
$

becomes

$\displaystyle {\bf P}^\prime_{ij}(t)$ $\displaystyle = \sum_k {\bf R}_{ik}{\bf P}_{kj}(t)$    
  $\displaystyle = \sum_{k \neq i } q_{ik} {\bf P}_{kj}(t) -v_i {\bf P}_{ij}(t)$    

Called Kolmogorov's backward equations.

On the other hand

$\displaystyle {\bf P}^\prime(t) = {\bf P}(t) {\bf P}^\prime(0)
$

becomes

$\displaystyle {\bf P}^\prime_{ij}(t)$ $\displaystyle = \sum_k {\bf P}_{ik}(t) {\bf R}_{kj}$    
  $\displaystyle = \sum_{k \neq j } q_{kj} {\bf P}_{ik}(t) -v_j {\bf P}_{ij}(t)$    

These are Kolmogorov's forward equations.

Remark: When the state space is infinite the forward equations may not be justified. In deriving them we interchanged a limit with an infinite sum; the interchange is always justified for the backward equations but not for forward.

Example: $ S=\{0,1\}$. Then

$\displaystyle {\bf P}= \left[\begin{array}{cc} 0 & 1 \\  1 & 0 \end{array}\right]
$

and the chain is otherwise specified by $ v_0$ and $ v_1$. The matrix $ {\bf R}$ is

$\displaystyle {\bf R}= \left[\begin{array}{cc}-v_0 & v_0 \\  v_1 & -v_1 \end{array}\right]
$

The backward equations become

$\displaystyle {\bf P}_{00}^\prime(t)$ $\displaystyle = v_0 {\bf P}_{10}(t) - v_0 {\bf P}_{00}(t)$    
$\displaystyle {\bf P}_{01}^\prime(t)$ $\displaystyle = v_0 {\bf P}_{11}(t) - v_0 {\bf P}_{01}(t)$    
$\displaystyle {\bf P}_{10}^\prime(t)$ $\displaystyle = v_1 {\bf P}_{00}(t) - v_1 {\bf P}_{10}(t)$    
$\displaystyle {\bf P}_{11}^\prime(t)$ $\displaystyle = v_1 {\bf P}_{01}(t) - v_1 {\bf P}_{11}(t)$    

while the forward equations are

$\displaystyle {\bf P}_{00}^\prime(t)$ $\displaystyle = v_1 {\bf P}_{01}(t) - v_0 {\bf P}_{00}(t)$    
$\displaystyle {\bf P}_{01}^\prime(t)$ $\displaystyle = v_0 {\bf P}_{00}(t) - v_1 {\bf P}_{01}(t)$    
$\displaystyle {\bf P}_{10}^\prime(t)$ $\displaystyle = v_1 {\bf P}_{11}(t) - v_0 {\bf P}_{10}(t)$    
$\displaystyle {\bf P}_{11}^\prime(t)$ $\displaystyle = v_0 {\bf P}_{10}(t) - v_1 {\bf P}_{11}(t)$    

Add $ v_1\times$first plus $ v_0\times$third backward equations to get

$\displaystyle v_1{\bf P}_{00}^\prime(t) + v_0{\bf P}_{10}^\prime(t) = 0
$

so

$\displaystyle v_1{\bf P}_{00}(t) + v_0{\bf P}_{10}(t) = c
$

Put $ t=0$ to get $ c=v_1$. This gives

$\displaystyle {\bf P}_{10}(t) = \frac{v_1}{v_0}\left\{1-{\bf P}_{00}(t)\right\}
$

Plug this back in to the first equation and get

$\displaystyle {\bf P}_{00}^\prime(t) = v_1 -(v_1+ v_0) {\bf P}_{00}(t)
$

Multiply by $ e^{(v_1+v_0)t}$ and get

$\displaystyle \left\{e^{(v_1+v_0)t}{\bf P}_{00}(t)\right\}^\prime = v_1e^{(v_1+v_0)t}
$

which can be integrated to get

$\displaystyle {\bf P}_{00}(t)=\frac{v_1}{v_0+v_1} + \frac{v_0}{v_0+v_1}e^{-(v_1+v_0)t}
$

Alternative calculation:

$\displaystyle {\bf R}= \left[\begin{array}{cc}-v_0 & v_0 \\  v_1 & -v_1 \end{array}\right]
$

can be written as

$\displaystyle {\bf M} {\bf\Lambda} {\bf M}^{-1}
$

where

$\displaystyle {\bf M} = \left[\begin{array}{cc} 1 & v_0\\  \\  1 & -v_1\end{array}\right]
$

$\displaystyle {\bf M}^{-1} = \left[\begin{array}{cc} \frac{v_1}{v_0+v_1} & \frac{v_0}{v_0+v_1}
\\  \\  \frac{1}{v_0+v_1} & \frac{-1}{v_0+v_1}\end{array}\right]
$

and

$\displaystyle {\bf\Lambda} = \left[\begin{array}{cc} 0 & 0\\  \\  0 & -(v_0+v_1)\end{array}\right]
$

Then

$\displaystyle e^{{\bf R}t}$ $\displaystyle = \sum_0^\infty {\bf R}^n t^n/n!$    
  $\displaystyle = \sum_0^\infty \left({\bf M} {\bf\Lambda} {\bf M}^{-1}\right)^n \frac{t^n}{n!}$    
  $\displaystyle = {\bf M} \left(\sum_0^\infty{\bf\Lambda}^n\frac{t^n}{n!}\right){\bf M}^{-1}$    

Now

$\displaystyle \sum_0^\infty{\bf\Lambda}^n\frac{t^n}{n!} = \left[\begin{array}{cc}1 & 0 \\  0&
e^{-(v_0+v_1)t}\end{array}\right]
$

so we get

$\displaystyle {\bf P}(t)$ $\displaystyle = e^{{\bf R}t}$    
  $\displaystyle = {\bf M} \left[\begin{array}{cc}1 & 0 \\ 0& e^{-(v_0+v_1)t}\end{array}\right]{\bf M}^{-1}$    
  $\displaystyle = {\bf P}^\infty - \frac{e^{-(v_0+v_1)t}}{v_0+v_1} {\bf R}$    

where

$\displaystyle {\bf P}^\infty=\left[\begin{array}{cc} \frac{v_1}{v_0+v_1} & \fra...
..._0+v_1}
\\
\\
\frac{v_1}{v_0+v_1} & \frac{v_0}{v_0+v_1}
\end{array}\right]
$

Notice: rows of $ {\bf P}^\infty$ are a stationary initial distribution. If rows are $ \pi$ then

$\displaystyle {\bf P}^\infty = \left[\begin{array}{c}1 \\  1 \end{array}\right] \pi \equiv {\bf 1}\pi
$

so

$\displaystyle \pi {\bf P}^\infty = (\pi {\bf 1})\pi = \pi
$

Moreover

$\displaystyle \pi {\bf R}= {\bf0}
$

Fact: $ \pi_0=v_1/(v_0+v_1)$ is long run fraction of time in state 0.

Fact:

$\displaystyle \frac{1}{T}\int_0^T f(X(t))dt
\to \sum_j \pi_j f(j)
$

Ergodic Theorem in continuous time.

Birth and Death Processes

Consider a population of $ X(t)$ individuals. Suppose in next time interval $ (t,t+h)$ probability of population increase of 1 (called a birth) is $ \lambda_i h+o(h)$ and probability of decrease of 1 (death) is $ \mu_i h +o(h)$.

Jargon: $ X$ is a birth and death process.

Special cases:

All $ \mu_i=0$; called a pure birth process.

All $ \lambda_i=0$ (0 is absorbing): pure death process.

$ \lambda_n = n\lambda$ and $ \mu_n=n\mu$ is a linear birth and death process.

$ \lambda_n \equiv 1$, $ \mu_n \equiv 0$: Poisson Process.

$ \lambda_n = n\lambda + \theta$ and $ \mu_n=n\mu$ is a linear birth and death process with immigration.

Queuing Theory

Ingredients of Queuing Problem:

1: Queue input process.

2: Number of servers

3: Queue discipline: first come first serve? last in first out? pre-emptive priorities?

4: Service time distribution.

Example: Imagine customers arriving at a facility at times of a Poisson Process $ N$ with rate $ \lambda$. This is the input process, denoted $ M$ (for Markov) in queuing literature.

Single server case:

Service distribution: exponential service times, rate $ \mu$.

Queue discipline: first come first serve.

$ X(t)$ = number of customers in line at time $ t$.

$ X$ is a Markov process called $ M/M/1$ queue:

$\displaystyle v_i= \lambda+\mu
$

$\displaystyle {\bf P}_{ij} = \begin{cases}\frac{\mu}{\mu+\lambda} & j=i-1 \ge 0
\\
\frac{\lambda}{\mu+\lambda} & j=i+1
\\
0 & \text{otherwise}
\end{cases}$

Example: $ M/M/\infty$ queue:

Customers arrive according to PP rate $ \lambda$. Each customer begins service immediately. $ X(t)$ is number being served at time $ t$. $ X$ is a birth and death process with

$\displaystyle v_n= \lambda+ n \mu
$

and

$\displaystyle {\bf P}_{ij} = \begin{cases}\frac{i\mu}{i\mu+\lambda} & j=i-1 \ge...
...\
\frac{\lambda}{i \mu+\lambda} & j=i+1
\\
0 & \text{otherwise}
\end{cases}$

Stationary Initial Distributions

We have seen that a stationary initial distribution $ \pi$ is a probability vector solving

$\displaystyle \pi {\bf R}= {\bf0}
$

Rewrite this as

$\displaystyle v_j \pi_j = \sum_{i \neq j} q_{ij} \pi_i
$

Interpretation: LHS is rate at which process leaves state $ j$; process is in state $ j$ a fraction $ \pi_j$ of time and then makes transition at rate $ v_j$. RHS is total rate of arrival in state $ j$. For each state $ i \neq j$ $ \pi_i$ is fraction of time spent in state $ i$ and then $ q_{ij}$ the instantaneous rate of transition from $ i$ to $ j$.

So equation says:

Rate of departure from $ j$ balances rate of arrival to $ j$. This is called balance.

Application to birth and death processes:

Equation is

$\displaystyle (\lambda_j+\mu_j) \pi_j = \lambda_{j-1} \pi_{j-1} +\mu_{j+1} \pi_{j+1}
$

for $ j \ge 1$ and

$\displaystyle \lambda_0 \pi_0 = \mu_1 \pi_1
$

Notice that this permits the recursion

$\displaystyle \pi_1$ $\displaystyle = \frac{\lambda_0}{\mu_1} \pi_0$    
$\displaystyle \pi_2$ $\displaystyle = \frac{\lambda_1+\mu_1}{\mu_2}\pi_1 -\frac{\lambda_0}{\mu_2} \pi_0$    
  $\displaystyle = \frac{\lambda_0\lambda_1}{\mu_1\mu_2} \pi_0$    

which extends by induction to

$\displaystyle \pi_k = \frac{\lambda_0\cdots \lambda_{k-1}}{\mu_1\cdots \mu_k}\pi_0
$

Apply $ \sum \pi_k = 1$ to get

$\displaystyle \pi_0 \left(1+\sum_{n=1}^\infty
\frac{\lambda_0\cdots\lambda_{n-1}}{\mu_1\cdots\mu_n}\right) = 1
$

This gives the formula announced:

$\displaystyle \pi_n = \frac{ \lambda_0 \cdots \lambda_{n-1}}{
\mu_1\cdots\mu_n ...
...=1}^\infty \frac{ \lambda_0 \cdots
\lambda_{n-1}}{
\mu_1\cdots\mu_{n}}\right)}
$

If

$\displaystyle \sum_{n=1}^\infty \frac{ \lambda_0 \cdots
\lambda_{n-1}}{
\mu_1\cdots\mu_{n}}<\infty
$

then we have defined a probability vector which solves

$\displaystyle \pi {\bf R}= {\bf0}
$

Since

$\displaystyle {\bf P}^\prime = {\bf R}{\bf P}
$

we see that

$\displaystyle \left\{\pi {\bf P}(t)\right\}^\prime = 0
$

so that $ \pi{\bf P}(t)$ is constant. Put $ t=0$ to discover that the constant is $ \pi$.

Brownian Motion

For fair random walk $ Y_n$ = number of heads minus number of tails,

$\displaystyle Y_n = U_1+ \cdots+U_n
$

where the $ U_i$ are independent and

$\displaystyle P(U_i=1) = P(U_i=-1)=\frac{1}{2}
$

Notice:

$\displaystyle {\rm E}(U_i)$ $\displaystyle =0$    
$\displaystyle {\rm Var}(U_i)$ $\displaystyle = 1$    

Recall central limit theorem:

$\displaystyle \frac{ U_1+ \cdots+U_n}{\sqrt{n}}\Rightarrow N(0,1)
$

Now: rescale time axis so that $ n$ steps take 1 time unit and vertical axis so step size is $ 1/\sqrt{n}$.

We now turn these pictures into a stochastic process:

For $ \frac{k}{n} \le t < \frac{k+1}{n}$ we define

$\displaystyle X_n(t) = \frac{U_1+\cdots+U_k}{\sqrt{n}}
$

Notice:

$\displaystyle {\rm E}(X_n(t)) = 0
$

and

$\displaystyle {\rm Var}(X_n(t)) = \frac{k}{n}
$

As $ n \to \infty$ with $ t$ fixed we see $ k/n \to t$. Moreover:

$\displaystyle \frac{U_1+\cdots+U_k}{\sqrt{k}} = \sqrt{\frac{n}{k}} X_n(t)
$

converges to $ N(0,1)$ by the central limit theorem. Thus

$\displaystyle X_n(t) \Rightarrow N(0,t)
$

Another observation: $ X_n(t+s) -X_n(t)$ is independent of $ X_n(t)$ because the two rvs involve sums of different $ U_i$.

Conclusions.

As $ n \to \infty$ the processes $ X_n$ converge to a process $ X$ with the properties:

  1. $ X(t)$ has a $ N(0,t)$ distribution.

  2. $ X$ has independent increments: if

    $\displaystyle 0 = t_0 < t_1 < t_2 < \cdots < t_k
$

    then

    $\displaystyle X(t_1)-X(t_0), \ldots , X(t_k) - X(t_{k-1})
$

    are independent .

  3. The increments are stationary:

    $\displaystyle X(t+s)-X(s) \sim N(0,t)
$

    regardless of $ s$.

  4. $ X(0)=0$.

Definition:Any process satisfying 1-4 above is a Brownian motion.

Properties of Brownian motion

$\displaystyle {\rm E}(X(t)\vert X(s))$ $\displaystyle = {\rm E}\left\{X(t)-X(s)+X(s)\vert X(s)\right\}$    
  $\displaystyle = {\rm E}\left\{X(t)-X(s)\vert X(s)\right\}$    
  $\displaystyle \qquad + {\rm E}\left\{X(s)\vert X(s)\right\}$    
  $\displaystyle = 0 + X(s) =X(s)$    

Notice the use of independent increments and of $ {\rm E}(Y\vert Y)=Y$.

Suppose $ t<s$. Then $ X(s)=X(t)+\{X(t)-X(s)\}$ is a sum of two independent normal variables. Do following calculation:

$ X\sim N(0,\sigma^2)$, and $ Y\sim N(0,\tau^2)$ independent. $ Z=X+Y$.

Compute conditional distribution of $ X$ given $ Z$:

$\displaystyle f_{X\vert Z}(x\vert z)$ $\displaystyle = \frac{f_{X,Z}(x,z)}{f_Z(z)}$    
  $\displaystyle = \frac{ f_{X,Y}(x,z-x)}{f_Z(z)}$    
  $\displaystyle = \frac{ f_X(x)f_Y(z-x)}{f_Z(z)}$    

Now $ Z$ is $ N(0,\gamma^2)$ where $ \gamma^2 = \sigma^2+\tau^2$ so

$\displaystyle f_{X\vert Z}(x\vert z)$ $\displaystyle = \frac{ \frac{1}{\sigma\sqrt{2\pi}}e^{-x^2/(2\sigma^2)} \frac{1}...
...2\pi}}e^{-(z-x)^2/(2\tau^2)}}{ \frac{1}{\gamma\sqrt{2\pi}}e^{-z^2/(2\gamma^2)}}$    
  $\displaystyle = \frac{\gamma}{\tau\sigma\sqrt{2\pi}} \exp\{-(x-a)^2/(2b^2)\}$    

for suitable choices of $ a$ and $ b$. To find them compare coefficients of $ x^2$, $ x$ and $ 1$.

Coefficient of $ x^2$:

$\displaystyle \frac{1}{b^2} = \frac{1}{\sigma^2}+\frac{1}{\tau^2}
$

so $ b= \tau \sigma/\gamma$.

Coefficient of $ x$:

$\displaystyle \frac{a}{b^2} = \frac{z}{\tau^2}
$

so that

$\displaystyle a=b^2z/\tau^2 = \frac{\sigma^2}{\sigma^2+\tau^2} z
$

Finally you should check that

$\displaystyle \frac{a^2}{b^2} = \frac{z^2}{\tau^2} -\frac{z^2}{\gamma^2}
$

to make sure the coefficients of $ 1$ work out as well.

Conclusion: given $ Z=z$ the conditional distribution of $ X$ is $ N(a,b^2) $ with $ a$ and $ b$ as above.

Application to Brownian motion:

The Reflection Principle

Tossing a fair coin:

HTHHHTHTHHTHHHTTHTH 5 more heads than tails
   
THTTTHTHTTHTTTHHTHT 5 more tails than heads

Both sequences have the same probability.

So: for random walk starting at stopping time:

Any sequence with $ k$ more heads than tails in next $ m$ tosses is matched to sequence with $ k$ more tails than heads. Both sequences have same prob.

Suppose $ Y_n$ is a fair ($ p = 1/2$) random walk. Define

$\displaystyle M_n = \max\{Y_k, 0 \le k \le n\}
$

Compute $ P(M_n \ge x) $? Trick: Compute

$\displaystyle P(M_n \ge x, Y_n = y)
$

First: if $ y\ge x$ then

$\displaystyle \{M_n \ge x, Y_n = y\} = \{Y_n = y\}
$

Second: if $ M_n \ge x$ then

$\displaystyle T \equiv \min\{k: Y_k=x\} \le n
$

Fix $ y < x$. Consider a sequence of H's and T's which leads to say $ T=k$ and $ Y_n=y$.

Switch the results of tosses $ k+1$ to $ n$ to get a sequence of H's and T's which has $ T=k$ and $ Y_n= x+(x-y)=2x-y>x$. This proves

$\displaystyle P(T=k, Y_n=y) = P(T=k,Y_n=2x-y)
$

This is true for each $ k$ so

$\displaystyle P(M_n \ge x, Y_n = y)
$ $\displaystyle = P(M_n \ge x, Y_n =2x-y)$    
  $\displaystyle = P(Y_n=2x-y)$    

Finally, sum over all $ y$ to get

$\displaystyle P(M_n \ge x)$ $\displaystyle = \sum_{y\ge x} P(Y_n=y)$    
  $\displaystyle \qquad + \sum_{y<x} P(Y_n = 2x-y)$    

Make the substitution $ k=2x-y$ in the second sum to get

$\displaystyle P(M_n \ge x)$ $\displaystyle = \sum_{y\ge x} P(Y_n=y)$    
  $\displaystyle \qquad + \sum_{k>x} P(Y_n=k)$    
  $\displaystyle = 2\sum_{k>x} P(Y_n=k) + P(Y_n=x)$    

Brownian motion version:

$\displaystyle M_t = \max\{X(s) ; 0 \le s \le t\}
$

$\displaystyle T_x = \min\{s: X(s)=x\}
$

(called hitting time for level $ x$). Then

$\displaystyle \{T_x \le t\} = \{M_t \ge x\}
$

Any path with $ T_x=s <t$ and $ X(t)= y<x$ is matched to an equally likely path with $ T_x=s <t$ and $ X(t)=2x-y>x$.

So for $ y>x$

$\displaystyle P(M_t \ge x, X(t)>y) =
P(X(t) > y)
$

while for $ y < x$

$\displaystyle P(M_t \ge x, X(t) < y) = P(X(t) > 2x-y)
$

Let $ y\to x$ to get

$\displaystyle P(M_t \ge x, X(t)>x)$ $\displaystyle = P(M_t \ge x, X(t)<x)$    
  $\displaystyle = P(X(t)>x)$    

Adding these together gives

$\displaystyle P(M_t > x)$ $\displaystyle = 2P(X(t)>x)$    
  $\displaystyle = 2P(N(0,1)>x/\sqrt{t})$    

Hence $ M_t$ has the distribution of $ \vert N(0,t)\vert$.

On the other hand in view of

$\displaystyle \{T_x \le t\} = \{M_t \ge x\}
$

the density of $ T_x$ is

$\displaystyle \frac{d}{dt} 2P(N(0,1)>x/\sqrt{t})
$

Use the chain rule to compute this. First

$\displaystyle \frac{d}{dy} P(N(0,1)> y) = -\phi(y)
$

where $ \phi$ is the standard normal density

$\displaystyle \phi(y) = \frac{e^{-y^2/2}}{\sqrt{2\pi}}
$

because $ P(N(0,1)>y)$ is 1 minus the standard normal cdf.

So

$\displaystyle \frac{d}{dt} 2P(N(0,1)$ $\displaystyle >x/\sqrt{t})$    
  $\displaystyle = -2 \phi(x/\sqrt{t}) \frac{d}{dt} (x/\sqrt{t})$    
  $\displaystyle = \frac{x}{\sqrt{2\pi}t^{3/2}} \exp\{-x^2/(2t)\}$    

This density is called the Inverse Gaussian density. $ T_x$ is called a first passage time

NOTE: the preceding is a density when viewed as a function of the variable $ t$.

Martingales

A stochastic process $ M(t)$ indexed by either a discrete or continuous time parameter $ t$ is a martingale if:

$\displaystyle {\rm E}\{M(t)\vert M(u); 0 \le u \le s\} = M(s)
$

whenever $ s<t$.

Examples

Note: Brownian motion with drift is a process of the form

$\displaystyle X(t) = \sigma B(t) + \mu t
$

where $ B$ is standard Brownian motion, introduced earlier. $ X$ is a martingale if $ \mu=0$. We call $ \mu$ the drift

Some evidence for some of the above:

Random walk: $ U_1, U_2,\ldots$ iid with

$\displaystyle P(U_i=1)=P(U_i=-1) = 1/2
$

and $ Y_k=U_1+\cdots+U_k$ with $ Y_0=0$. Then

$\displaystyle {\rm E}(Y_n\vert$ $\displaystyle Y_0,\ldots,Y_k)$    
  $\displaystyle = {\rm E}(Y_n-Y_k+Y_k\vert Y_0,\ldots,Y_k)$    
  $\displaystyle = {\rm E}(Y_n-Y_k\vert Y_0,\ldots,Y_k) +Y_k$    
  $\displaystyle = \sum_{k+1}^n {\rm E}(U_j\vert U_1,\ldots,U_k) + Y_k$    
  $\displaystyle = \sum_{k+1}^n {\rm E}(U_j) +Y_k$    
  $\displaystyle = Y_k$    

Things to notice:

$ Y_k$ treated as constant given $ Y_1,\ldots,Y_k$.

Knowing $ Y_1,\ldots,Y_k$ is equivalent to knowing $ U_1,\ldots,U_k$.

For $ j>k$ we have $ U_j$ independent of $ U_1,\ldots,U_k$ so conditional expectation is unconditional expectation.

Since Standard Brownian Motion is limit of such random walks we get martingale property for standard Brownian motion.

Poisson Process: $ X(t) = N(t) -\lambda t$. Fix $ t>s$.

$\displaystyle {\rm E}(X(t)$ $\displaystyle \vert X(u); 0 \le u \le s)$    
  $\displaystyle = {\rm E}(X(t) - X(s)+X(s) \vert {\cal H}_s)$    
  $\displaystyle = {\rm E}(X(t) - X(s)\vert {\cal H}_s) +X(s)$    
  $\displaystyle = {\rm E}(N(t) - N(s) -\lambda(t-s)\vert {\cal H}_s) +X(s)$    
  $\displaystyle = {\rm E}(N(t) - N(s)) -\lambda(t-s) +X(s)$    
  $\displaystyle = \lambda(t-s) -\lambda(t-s)+X(s)$    
  $\displaystyle = X(s)$    

Things to notice:

I used independent increments.

$ {\cal H}_s$ is shorthand for the conditioning event.

Similar to random walk calculation.

Black Scholes

We model the price of a stock as

$\displaystyle X(t) = x_0 e^{Y(t)}
$

where

$\displaystyle Y(t) = \sigma B(t) + \mu t
$

is a Brownian motion with drift ($ B$ is standard Brownian motion).

If annual interest rates are $ e^\alpha-1$ we call $ \alpha$ the instantaneous interest rate; if we invest $1 at time 0 then at time $ t$ we would have $ e^{\alpha t}$. In this sense an amount of money $ x(t)$ to be paid at time $ t$ is worth only $ e^{-\alpha t} x(t)$ at time 0 (because that much money at time 0 will grow to $ x(t)$ by time $ t$).

Present Value: If the stock price at time $ t$ is $ X(t)$ per share then the present value of 1 share to be delivered at time $ t$ is

$\displaystyle Z(t) = e^{-\alpha t} X(t)
$

With $ X$ as above we see

$\displaystyle Z(t) = x_0 e^{\sigma B(t) +(\mu-\alpha)t}
$

Now we compute

\begin{multline*}
{\rm E}\left\{ Z(t) \vert Z(u);0 \le u \le s\right\}
\\
=
{\rm E}\left\{ Z(t) \vert B(u);0 \le u \le s\right\}
\end{multline*}

for $ s<t$. Write

$\displaystyle Z(t) = x_0 e^{\sigma B(s) +(\mu-\alpha)t} \times e^{\sigma(B(t)-B(s))}
$

Since $ B$ has independent increments we find

\begin{multline*}
{\rm E}\left\{ Z(t) \vert B(u);0 \le u \le s\right\}
\\
= x_...
...\mu-\alpha)t} \times
{\rm E}\left[e^{\sigma\{B(t)-B(s)\}}\right]
\end{multline*}

Note: $ B(t)-B(s)$ is $ N(0,t-s)$; the expected value needed is the moment generating function of this variable at $ \sigma$.

Suppose $ U\sim N(0,1)$. The Moment Generating Function of $ U$ is

$\displaystyle M_U(r) = {\rm E}(e^{rU}) = e^{r^2/2}
$

Rewrite

$\displaystyle \sigma\{B(t)-B(s)\} = \sigma\sqrt{t-s} U
$

where $ U\sim N(0,1)$ to see

$\displaystyle {\rm E}\left[e^{\sigma\{B(t)-B(s)\}}\right]= e^{\sigma^2(t-s)/2}
$

Finally we get

$\displaystyle {\rm E}\{ Z(t)$ $\displaystyle \vert Z(u);0 \le u \le s\}$    
  $\displaystyle = x_0 e^{\sigma B(s) +(\mu-\alpha)s} e^{(\mu-\alpha)(t-s)+\sigma^2(t-s)/2}$    
  $\displaystyle = Z(s)$    

provided

$\displaystyle \mu+\sigma^2/2=\alpha \, .
$

If this identity is satisfied then the present value of the stock price is a martingale.

Option Pricing

Suppose you can pay $$ c$ today for the right to pay $ K$ for a share of this stock at time $ t$ (regardless of the actual price at time $ t$).

If, at time $ t$, $ X(t) >K$ you will exercise your option and buy the share making $ X(t)-K$ dollars.

If $ X(t) \le K$ you will not exercise your option; it becomes worthless.

The present value of this option is

$\displaystyle e^{-\alpha t}(X(t) - K)_+ - c
$

where

$\displaystyle z_+ = \begin{cases}z& z>0 \\  0 & z \le 0 \end{cases}$

(Called positive part of $ z$.)

In a fair market:

So:

$\displaystyle c = {\rm E}\left[ e^{-\alpha t}\left\{X(t) - K\right\}_+\right]
$

Since

$\displaystyle X(t) = x_0e^{N(\mu t, \sigma^2 t)}
$

we are to compute

$\displaystyle {\rm E}\left\{\left(x_0e^{\sigma t^{1/2} U +\mu t}-K\right)_+\right\}
$

This is

$\displaystyle \int_a^\infty \left(x_0e^{bu+d}-K \right) e^{-u^2/2} du/\sqrt{2\pi}
$

where

$\displaystyle a$ $\displaystyle = (\log(K/x_0)-\mu t)/(\sigma t^{1/2})$    
$\displaystyle b$ $\displaystyle = \sigma t^{1/2}$    
$\displaystyle d$ $\displaystyle = \mu t$    

Evidently

$\displaystyle K \int_a^\infty e^{-u^2/2} du/\sqrt{2\pi} = KP(N(0,1) > a)
$

The other integral needed is

$\displaystyle \int_a^\infty e^{ -u^2/2+bu}$ $\displaystyle du/\sqrt{2\pi}$    
  $\displaystyle = \int_a^\infty \frac{e^{-(u-b)^2/2}e^{b^2/2}}{\sqrt{2\pi}} du$    
  $\displaystyle = \int_{a-b}^\infty \frac{e^{-v^2/2}e^{b^2/2}}{\sqrt{2\pi}} dv$    
  $\displaystyle = e^{b^2/2} P(N(0,1) > a-b)$    

Introduce the notation

$\displaystyle \Phi(v) = P(N(0,1) \le v) = P(N(0,1) > -v)
$

and do all the algebra to get

$\displaystyle c$ $\displaystyle = \left\{x_0e^{-\alpha t}e^{b^2/2+d} \Phi(b-a) -Ke^{-\alpha t}\Phi(-a)\right\}$    
  $\displaystyle = \left\{x_0e^{(\mu+\sigma^2/2-\alpha)t}\Phi(b-a) - Ke^{-\alpha t}\Phi(-a)\right\}$    
  $\displaystyle = \left\{x_0\Phi(b-a) - Ke^{-\alpha t}\Phi(-a)\right\}$    

This is the Black-Scholes option pricing formula.

Simulation

Monte Carlo computation of expected value:

To compute $ {\rm E}(X)$: do experiment $ n$ times to generate $ n$ independent observations all with same distribution as $ X$.

Get $ X_1,\ldots,X_n$.

Use

$\displaystyle \bar{X} = \frac{1}{n} \sum_1^n X_i
$

as an estimate of $ {\rm E}(X)$

To estimate $ {\rm E}(g(X))$: use same $ X_i$ and compute

$\displaystyle \frac{1}{n} \sum_1^n g(X_i) \, .
$

Random Number Generation

In practice: random quantities are not used.

Use: pseudo-random uniform random numbers.

They ``behave like'' iid Uniform(0,1) variables.

Many, many generators. One standard kind: linear congruential generators.

Start with $ x_0$ an integer in range $ 0 \le x_0 < m$. Compute

$\displaystyle x_{n+1} = a x_n+b \mod{m}
$

Here the mod means compute the remainder after division by $ m$.

Integers $ a$ and $ b$ are chosen so that

Use

$\displaystyle U_n = \frac{x_n}{m}
$

as Uniform(0,1) random numbers.

General Random Variables

We now pretend we can generate $ U_1, U_2,\ldots$ which are iid Uniform(0,1).

Monte Carlo methods for generating a sample:

Inverse (Probability Integral) Transform

Fact: $ F$ continuous CDF and $ X,U$ satisfy

$\displaystyle U=F(X)
$

then $ U\sim$Uniform(0,1) if and only if $ X\sim F$.

Proof: For simplicity: assume $ F$ strictly increasing on inverval $ (a,b)$ with $ F(b)=1$ and $ F(a)=0$.

If $ U\sim$Uniform(0,1) then

$\displaystyle P(X \le x)$ $\displaystyle = P(F(X) \le F(x))$    
  $\displaystyle = P(U \le F(x))$    
  $\displaystyle = F(x)$    

Conversely: if $ X\sim F$ and $ 0 < u < 1$ then there is a unique $ x$ such that $ F(x) = u$

$\displaystyle P(U \le u)$ $\displaystyle = P(F(X) \le F(x))$    
  $\displaystyle = P(X \le x)$    
  $\displaystyle = F(x)$    
  $\displaystyle = u$    

Application: generate $ U$ and solve $ F(X)=U$ for $ X$ to get $ X=F^{-1}(U)$ which has cdf $ F$.

Example:For the exponential distribution

$\displaystyle F(x) = 1-e^{-\lambda x}
$

Set $ F(X)=U$ and solve to get

$\displaystyle X=-\log(1-U)/\lambda
$

Observation: $ U \sim 1-U$ so $ X=-\log(U)/\lambda$ also has an Exponential($ \lambda$) distribution.

Example:: for $ F$ the standard normal cdf solving

$\displaystyle F(X) = U
$

requires numerical approximation but such solutions are built in to many languages.

Special purpose transformations:

Example:: If $ X$ and $ Y$ are iid $ N(0,1)$ define $ R\ge 0$ and $ \Theta\in[0,2\pi)$ by

$\displaystyle X=R\cos(\Theta) \qquad Y=R\sin(\Theta)
$

NOTE: Book says

$\displaystyle \Theta = \tan^{-1}(Y/X)
$

but this takes values in $ (-\pi/2,\pi/2)$; notation

$\displaystyle \tan^{-1}(x,y)
$

means angle $ \theta$ in $ [0,2\pi)$ so that

$\displaystyle \frac{x}{\sqrt{x^2+y^2}} = \cos(\theta) \qquad \frac{y}{\sqrt{x^2+y^2}} =
\sin(\theta) \, .
$

Then

$\displaystyle \{(r,\theta) : r \le r^*, \theta \le \theta^*\}
$

is part of circle of radius $ r^*$. (Start at $ (r^*,0)$ and go clockwise to angle $ \theta^*$.)

Compute joint cdf of $ R,\Theta$ at $ r^*,\theta^*$.

Define

\begin{multline*}
C = \{(x,y) : x^2+y^2 \le (r^*)^2,\\ 0 \le \tan^{-1}(x,y) \le \theta^*\}
\end{multline*}

Then

$\displaystyle P(R\le r,$ $\displaystyle \Theta \le\theta)$    
  $\displaystyle =\iint_C \frac{e^{-(x^2+y^2)/2}}{2\pi} dx dy$    

Do integral in polar co-ordinates to get

$\displaystyle \int_0^{r^*}\int_0^{\theta^*} \frac{e^{-r^2/2}}{2\pi} r dr d\theta
$

The $ \theta$ integral just gives

$\displaystyle \theta^*/(2\pi)
$

The $ r$ integral then gives

$\displaystyle 1-e^{-(r^*)^2/2}
$

So

$\displaystyle P(R\le r^*,\Theta \le\theta^*)
= \left(1-e^{-(r^*)^2/2}\right) \times \frac{\theta^*}{2\pi}
$

Product of two cdfs so: $ R $ and $ \Theta$ are independent.

Moreover $ R^2$ has cdf

$\displaystyle P(R^2 \le x) = 1-e^{-x/2}
$

which is Exponential with rate $ 1/2$.

$ \Theta$ is Uniform$ (0,2\pi)$.

So: generate $ U_1, U_2$ iid Uniform(0,1).

Define

$\displaystyle \Theta = 2\pi U_2
$

and

$\displaystyle R=\sqrt{-2\log(U_1)}
$

Put

$\displaystyle X=R\cos(\Theta) \qquad Y=R\sin(\Theta)
$

You have generated two independent $ N(0,1)$ variables.

Acceptance Rejection

If $ F(X)=U$ difficult to solve (eg $ F$ hard to compute) sometimes use acceptance rejection.

Goal: generate $ X\sim F$ where $ F^\prime=f$.

Tactic: find density $ g$ and constant $ c$ such that

$\displaystyle f(x) \le c g(x)
$

and s.t. can generate $ Y$ from density $ g$.

Algorithm:

  1. Generate $ Y$ which has density $ g$.

  2. Generate $ U\sim$Uniform(0,1) independent of $ Y$.

  3. If $ U \le f(Y)/\{cg(Y)\}$ let $ X=Y$.

  4. If not reject $ Y$ and go back to step 1; repeat, generating new $ Y$ and $ U$ (independently) till you accept a $ Y$.

Facts:

Most important fact: $ X\sim f$:

Proof: this is like the old craps example:

We compute

$\displaystyle P( X \le x) = P[ Y \le x\vert U \le f(Y)/\{cg(Y)\}]
$

(condition on the first iteration where the condition is met).

Condition on $ Y$ to get

$\displaystyle P($ $\displaystyle X \le x)$    
  $\displaystyle = \frac{\int_{-\infty}^x P[U \le f(Y)/\{cg(Y)\}\vert Y=y]g(y)dy}{ P[U \le f(Y)/\{cg(Y)\}]}$    
  $\displaystyle = \frac{\int_{-\infty}^x \frac{ f(y)}{cg(y)} g(y)dy}{ 1/c}$    
  $\displaystyle = \int_{-\infty}^x f(y) dy$    
  $\displaystyle = F(x)$    

as desired.

Example:Half normal distribution: $ X$ has density

$\displaystyle f(x) = \frac{2e^{-x^2/2}}{\sqrt{2\pi}}1(x>0)
$

(Density of absolute value of $ N(0,1)$.)

Use $ g(x) = ae^{-ax}1(x>0$. To find $ c$ maximize

$\displaystyle \frac{f(x)}{g(x)} = \frac{e^{-x^2/2+ax}}{a\sqrt{2\pi}}
$

Rewrite as

$\displaystyle \frac{e^{a^2/2} e^{-(x-a)^2/2}}{a\sqrt{2\pi}}
$

Maximum is at $ x=a$ giving

$\displaystyle c=\frac{e^{a^2/2}}{a\sqrt{2\pi}}
$

Choose $ a$ to minimize $ c$ (or $ \log(c)$); get $ a=1$ so

$\displaystyle c=\frac{e}{\sqrt{2\pi}}
$

Algorithm is then: generate $ Y=-\log(U_1)$ and then compute

$\displaystyle e^{-(Y-1)^2}
$

Generate $ U_2$ and if

$\displaystyle U_2 \le e^{-(Y-1)^2}
$

accept $ Y$ otherwise try again.

To generate $ N(0,1)$: use a third $ U$ to pick a sign at random: negative if $ u < 1/2$ otherwise positive.

Discrete Distributions

The inverse transformation method works for discrete distributions.

$ X$ has possible values $ \{x_1,x_2,\ldots\}$ with probabilities $ \{p_1,p_2,\ldots\}$ compute cumulative probabilities

$\displaystyle P_k = p_1+\cdots+p_k
$

with $ P_0=0$.

Generate $ U\sim$Uniform(0,1).

Find $ k$ such that

$\displaystyle P_{k-1} < U \le P_k
$

Put $ X=x_k$.



Richard Lockhart
2002-02-07