STAT 380: Introduction to Stochastic Processes

Course outline:

• Basic concepts of probability. Review of

• Distributions

• Expectation and moments

• Moment generating functions

• Independence, conditioning

• Markov Chains

• Poisson Processes

• Birth and Death Processes

• Brownian Motion

• Monte Carlo Simulation techniques

• Random number generators

• Generating random variables

Today's lecture Summary
• Three examples to review

• basic probability: sample space, events, random variables, expected value

• standard distributions: Binomial, Geometric, Poisson

• conditional probability, independence

• Bayes' rule

• Introduction to modelling

Some Basic Examples

Example 1: Three cards: one red on both sides, one black on both sides, one black on one side, red on the other. Shuffle, pick card at random. Side up is Black. What is the probability the side down is Black?

Solution: To do this carefully, enumerate sample space, , of all possible outcomes. Six sides to the three cards. Label three red sides 1, 2, 3 with sides 1, 2 on the all red card (card # 1). Label three black sides 4, 5, 6 with 3, 4 on opposite sides of mixed card (card #2). Define some events:

 side shows face up side showing is black card is chosen

One representation . Then , , and so on.

Modelling: assumptions about some probabilities; deduce probabilities of other events. In example simplest model is

All of the are equally likely.

Apply two rules:

and

to get, for ,

Question was about down side of card. We have been told has happened. Event that a black side is down is . (Of course has happened rules out 3.)

Definition of conditional probability:

Example 2: Monte Hall, Let's Make a Deal. Monte shows you 3 doors. Prize hidden behind one. You pick a door. Monte opens a door you didn't pick; shows you no prize; offers to let you switch to the third door. Do you switch?

Sample space: typical element is where is number of door with prize, is number of your first pick and is door Monte opens with no prize.

 (1,1,2) (1,1,3) (1,2,3) (1,3,2) (2,1,3) (2,2,1) (2,2,3) (2,3,1) (3,1,2) (3,2,1) (3,3,1) (3,3,2)

Model? Traditionally we define events like

 Prize behind door You choose door

and assume that each has chance . We are assuming we have no prior reason to suppose Monte favours one door. But these and all other probabilities depend on the behaviour of people so are open to discussion.

The event , that you lose if you switch is

The natural modelling assumption, which captures the idea that you have no idea where the prize is hidden, is that each is independent of each , that is,

Usually we would phrase this assumption in terms of two random variables, , the door with the prize, and the door you choose. We are assuming that and are independent. Then

So the event you win by switching has probability 2/3 and you should switch.

Usual phrasing of problem. You pick 1, Monte shows 3. Should you take 2? Let be rv door Monte shows you. Question:

Modelling assumptions do not determine this; it depends on Monte's method for choosing door to show when he has a choice. Two simple cases:

1. Monte picks at random so

2. Monte chooses the door with the largest possible number:

Use Bayes' rule:

Numerator is

Denominator is

which simplifies to

which in turn is

For case 1 we get

while for case 2 we get

Notice that in case 2 if we pick door 1 and Monte shows us door 2 we should definitely switch. Notice also that it would be normal to assume that Monte used the case 1 algorithm to pick the door to show when he has a choice; otherwise he is giving the contestant information. If the contestant knows Monte is using algorithm 2 then by switching if door 2 is shown and not if door 3 is shown he wins 2/3 of the time which is as good as the always switch strategy.

Example 3: Survival of family names. Traditionally: family name follows sons. Given man at end of 20th century. Probability descendant (male) with same last name alive at end of 21st century? or end of 30th century?

Simplified model: count generations not years. Compute probability, of survival of name for generations.

Technically easier to compute , probability of extinction by generation .

Useful rvs:

# of male children of first man

# of male children in generation $k$

Event of interest:

Break up :

Now look at the event . Let

 child s line extinct in generations

Then

1. Given ( conditional on) the events are independent. In other words: one son's descendants don't affect other sons' descendants.

2. Given the probability of is . In other words: sons are just like the parent.

Probability generating function:

We have found

and

Notice that so that the limit of the , say , must exist and (because is continuous) solve

Special cases

Geometric Distribution: Assume

( is number of failures before first success. Trials are Bernoulli; is probability of success.)

Then

Set to get

Two roots are

One of the roots is 1; the other is

If the only root which is a probability is 1 and . If then in fact .

Binomial(): If

then

The equation has two roots. One is 1. The other is less than 1 if and only if .

Poisson(): Now

and

The equation has two roots. One is 1. The other is less than 1 if and only if .

Important Points:

• Use of conditioning.

• Approximate nature of modelling assumptions

• Key assumptions of conditional independence, homogeneity

Example 3: Mean values

= total number of sons in generation .

for convenience.

Compute .

Recall definition of expected value:

If is discrete then

If is absolutely continuous then

Theorem: If , has density then

Key properties of :

1: If then . Equals iff .

2: .

3: If then

4: .

Conditional Expectations

If , , two discrete random variables then

Extension to absolutely continuous case:

Joint pmf of and is defined as

Notice: The pmf of is

Analogue for densities: joint density of is

Interpretation is that

Property: if have joint density then has density

Sums for discrete rvs are replaced by integrals.

Example:

is a density because

The marginal density of is, for .

For not in the integral is 0 so

Conditional Densities

If and have joint density then we define the conditional density of given by analogy with our interpretation of densities. We take limits:

in the sense that if we divide through by and let and tend to 0 the conditional density is the limit

Going back to our interpretation of joint densities and ordinary densities we see that our definition is just

When talking about a pair and of random variables we refer to as the joint density and to as the marginal density of .

Example: For of previous example conditional density of given defined only for :

Example: a Poisson random variable. Observe then toss a coin times. is number of heads.

WARNING: in sum is required and , integers so sum really runs from to

which is a Poisson() distribution.

Conditional Expectations

If and are continuous random variables with joint density we define:

Key properties of conditional expectation

1: If then . Equals iff .

2: .

3: If and are independent then

4: .

Example:

has conditional of :

so, for ,

Computing expectations by conditioning:

Notation: is the function of you get by working out , getting a formula in and replacing by . This makes a random variable.

Properties:

1: .

2: If and are independent then

3: .

4: (compute average holding fixed first, then average over ).

In example:

Application to last names problem. Put

For expect exponential decay. For exponential growth (if not extinction).

Summary of Probability Review

We have reviewed the following definitions:

• Probability Space (or Sample Space): set of possible outcomes, .

• Events, subsets of .

• Probability: function defined for events, values in satisfying:

1. and .

2. Countable additivity: pairwise disjoint ( )

• Joint, marginal, conditional pmfs: discrete:

• Joint pmf

• Marginal pmf

• Conditional pmf

• Joint, marginal, conditional densities: continuous:

• Joint density ; probabilities are double integrals

• Marginal density

• Conditional density

• Expected value:

or

• Conditional expectation

or

• is previous with replaced by after getting formula.

Tactics:

• Convert English expressions to set notation:

• Or'' means union (remember inclusive or).

• And'' means intersection.

• not'' means complement.

• Compute prob something happens as 1-prob it doesn't.

• Break up event into disjoint pieces and add up probabilities.

• Find independent events and write event as intersection.

• Find , for which is known.

• Use Bayes rule: if is a partition then

• Use first step analysis. (See family names example, craps game, alternating dice tossing on asst 1.)

Tactics for expected values:

• Use linearity:

• Condition on another variable and use

• Use first step analysis. (Special case of previous.)

Markov Chains

Last names example has following structure: if, at generation there are individuals then number of sons in next generation has distribution of sum of independent copies of the random variable which is number of sons I have. This distribution does not depend on , only on value of . Call a Markov Chain.

Ingredients of a Markov Chain:

• A state space . will be finite or countable in this course.

• A sequence of random variables whose values are all in .

• Matrix with entries for . is required to be stochastic; for all :

and

The stochastic process is called a Markov chain if

for any defined in terms of and for all . Usually used with

for some .

The matrix is called a transition matrix.

Example: If in the last names example has a Poisson distribution then given , is like sum of independent Poisson rvs which has a Poisson() distribution. So

Example: Weather: each day is dry (D) or wet (W).

is weather on day n.

Suppose dry days tend to be followed by dry days, say 3 times in 5 and wet days by wet 4 times in 5.

Markov assumption: yesterday's weather irrelevant to prediction of tomorrow's given today's.

Transition Matrix:

Now suppose it's wet today. P(wet in 2 days)?

Notice that all the entries in the last line are items in . Look at the matrix product :

Notice that is exactly the entry in .

General case. Define

Then

Proof of these assertions by induction on .

Example for . Two bits to do:

First suppose are discrete variables. Assume

for any . Then I claim

In words, if knowing both and doesn't change the conditional probability then knowing alone doesn't change the conditional probability.

Proof of claim:

Then

Second step: consider

This shows that

where means the matrix product . Notice both that the quantity does not depend on and that we can compute it by taking a power of .

More general version

Since we get the Chapman-Kolmogorov equations:

Summary: A Markov Chain has stationary step transition probabilities which are the th power of the 1 step transition probabilities.

Here is Maple output for the 1,2,4,8 and 16 step transition matrices for our rainfall example:


> p:= matrix(2,2,[[3/5,2/5],[1/5,4/5]]);
[3/5    2/5]
p := [          ]
[1/5    4/5]
> p2:=evalm(p*p):
> p4:=evalm(p2*p2):
> p8:=evalm(p4*p4):
> p16:=evalm(p8*p8):

This computes the powers ( evalm understands matrix algebra).

Fact:


> evalf(evalm(p));
[.6000000000    .4000000000]
[                          ]
[.2000000000    .8000000000]
> evalf(evalm(p2));
[.4400000000    .5600000000]
[                          ]
[.2800000000    .7200000000]
> evalf(evalm(p4));
[.3504000000    .6496000000]
[                          ]
[.3248000000    .6752000000]
> evalf(evalm(p8));
[.3337702400    .6662297600]
[                          ]
[.3331148800    .6668851200]
> evalf(evalm(p16));
[.3333336197    .6666663803]
[                          ]
[.3333331902    .6666668098]

Where did and come from?

Suppose we toss a coin and start the chain with Dry if we get heads and Wet if we get tails.

Then

and

Notice last line is a matrix multiplication of row vector by matrix . A special : if we put and then

In other words if we start off then and analogously for . This means that and have the same distribution.

A probability vector is called an initial distribution for the chain if

A Markov Chain is stationary if

for all

An initial distribution is called stationary if the chain is stationary. We find that is a stationary initial distribution if

Suppose converges to some matrix . Notice that

and

This proves that each row of satisfies

Def'n: A row vector is a left eigenvector of with eigenvalue if

So each row of is a left eigenvector of with eigenvalue .

Finding stationary initial distributions. Consider the for the weather example. The equation

is really

The first can be rearranged to

and so can the second. If is to be a probability vector then

so we get

Some more examples:

Set and get

First plus third gives

so both sums 1/2. Continue algebra to get


p:=matrix([[0,1/3,0,2/3],[1/3,0,2/3,0],
[0,2/3,0,1/3],[2/3,0,1/3,0]]);

[ 0     1/3     0     2/3]
[                        ]
[1/3     0     2/3     0 ]
p := [                        ]
[ 0     2/3     0     1/3]
[                        ]
[2/3     0     1/3     0 ]
> p2:=evalm(p*p);
[5/9     0     4/9     0 ]
[                        ]
[ 0     5/9     0     4/9]
p2:= [                        ]
[4/9     0     5/9     0 ]
[                        ]
[ 0     4/9     0     5/9]
> p4:=evalm(p2*p2):
> p8:=evalm(p4*p4):
> p16:=evalm(p8*p8):
> p17:=evalm(p8*p8*p):



> evalf(evalm(p16));
[.5000000116 , 0 , .4999999884 , 0]
[                                 ]
[0 , .5000000116 , 0 , .4999999884]
[                                 ]
[.4999999884 , 0 , .5000000116 , 0]
[                                 ]
[0 , .4999999884 , 0 , .5000000116]
> evalf(evalm(p17));
[0 , .4999999961 , 0 , .5000000039]
[                                 ]
[.4999999961 , 0 , .5000000039 , 0]
[                                 ]
[0 , .5000000039 , 0 , .4999999961]
[                                 ]
[.5000000039 , 0 , .4999999961 , 0]
> evalf(evalm((p16+p17)/2));
[.2500, .2500, .2500, .2500]
[                          ]
[.2500, .2500, .2500, .2500]
[                          ]
[.2500, .2500, .2500, .2500]
[                          ]
[.2500, .2500, .2500, .2500]

doesn't converges but does. Next example:

Solve :

Second and fourth equations redundant. Get

Pick in ; put .

solves . So solution is not unique.


> p:=matrix([[2/5,3/5,0,0],[1/5,4/5,0,0],
[0,0,2/5,3/5],[0,0,1/5,4/5]]);

[2/5    3/5     0      0 ]
[                        ]
[1/5    4/5     0      0 ]
p := [                        ]
[ 0      0     2/5    3/5]
[                        ]
[ 0      0     1/5    4/5]
> p2:=evalm(p*p):
> p4:=evalm(p2*p2):
> p8:=evalm(p4*p4):


> evalf(evalm(p8*p8));
[.2500000000 , .7500000000 , 0 , 0]
[                                 ]
[.2500000000 , .7500000000 , 0 , 0]
[                                 ]
[0 , 0 , .2500000000 , .7500000000]
[                                 ]
[0 , 0 , .2500000000 , .7500000000]

Notice that rows converge but to two different vectors:

and

Solutions of revisited? Check that

and

If ( ) then

so again solution is not unique. Last example:

> p:=matrix([[2/5,3/5,0],[1/5,4/5,0],
[1/2,0,1/2]]);

[2/5    3/5     0 ]
[                 ]
p := [1/5    4/5     0 ]
[                 ]
[1/2     0     1/2]
> p2:=evalm(p*p):
> p4:=evalm(p2*p2):
> p8:=evalm(p4*p4):
> evalf(evalm(p8*p8));
[.2500000000 .7500000000        0       ]
[                                       ]
[.2500000000 .7500000000        0       ]
[                                       ]
[.2500152588 .7499694824 .00001525878906]


Interpretation of examples

• For some all rows converge to some . In this case this is a stationary initial distribution.

• For some the locations of zeros flip flop. does not converge. Observation: average

does converge.

• For some some rows converge to one and some to another. In this case the solution of is not unique.

Basic distinguishing features: pattern of 0s in matrix .

Classification of States

State leads to state if for some . It is convenient to agree that the identity matrix; thus leads to .

The relation of communication is an equivalence relation; it is reflexive, symmetric and transitive: if and communicate and and communicate then and communicate.

Example ( signs indicate non-zero entries):

For this example:

, , so are all in the same communicating class.

but not vice versa.

but not vice versa.

So the communicating classes are

A Markov Chain is irreducible if there is only one communicating class.

Notation:

State is recurrent if , otherwise transient.

If then Markov property (chain starts over when it gets back to ) means prob return infinitely many times (given started in or given ever get to ) is 1.

Consider chain started from transient . Let be number of visits to state (including visit at time 0). To return times must return once then starting over return times, then never return. So:

for .

has a Geometric distribution and .

Another calculation:

so

If we start the chain in state then this is

and is transient if and only if

For last example: and are transient. Claim: states 1, 2 and 3 are recurrent.

Proof: argument above shows each transient state is visited only finitely many times. So: there is a recurrent state. (Note use of finite number of states.) It must be one of 1, 2 and 3. Proposition: If one state in a communicating class is recurrent then all states in the communicating class are recurrent.

Proof: Let be the known recurrent state so

Assume and communicate. Find integers and such that

and

Then

Sum RHS over get so

Proposition also means that if 1 state in a class is transient so are all.

State has period if is greatest common divisor of

If and are in the same class then and have same period. If then state is called aperiodic; if then is periodic.

For this example is a class of period 3 states and a class of period 2 states.

has a single communicating class of period 2.

A chain is aperiodic if all its states are aperiodic.

Infinite State Spaces

Example: consider a sequence of independent coin tosses with probability of Heads on a single toss. Let be number of heads minus number of tails after tosses. Put .

is a Markov Chain. State space is , the integers and

This chain has one communicating class (for ) and all states have period 2. According to the strong law of large numbers converges to . If this guarantees that for all large enough , that is, the number of returns to 0 is not infinite. The state 0 is then transient and so all states must be transient.

For the situation is different. It is a fact that

# H = # T at time $n$

For even this is the probability of exactly heads in tosses.

Local Central Limit Theorem (normal approximation to ) (or Stirling's approximation) shows

Binomial

so:

That is: 0 is a recurrent state.

Hitting Times

Start irreducible recurrent chain in state . Let be first such that . Define

First step analysis:

Example

The equations are

The second and third equations give immediately

Then plug in to the others to get

Notice stationary initial distribution is

Consider fraction of time spent in state :

Imagine chain starts in state ; take expected value.

If rows of converge to then fraction converges to ; i.e. limiting fraction of time in state is .

Heuristic: start chain in . Expect to return to every time units. So are in state about once every time units; i.e. limiting fraction of time in state is .

Conclusion: for an irreducible recurrent finite state space Markov chain

Infinite State Spaces

Previous conclusion is still right if there is a stationary initial distribution.

Example:    HeadsTails after tosses of fair coin. Equations are

and many more.

Some observations:

You have to go through 1 to get to 0 from 2 so

Symmetry (switching H and T):

The transition probabilities are homogeneous:

Conclusion:

Notice that there are no finite solutions!

Summary of the situation:

Every state is recurrent.

All the expected hitting times are infinite.

All entries converge to 0.

Jargon: The states in this chain are null recurrent.

One Example

Page 229, question 21: runner goes from front or back door, prob 1/2 each. Returns front or back, prob 1/2 each. Has pairs of shoes, wears pair if any at departure door, leaves at return door. No shoes? Barefoot. Long run fraction of time barefoot?

Solution: Let be number of shoes at front door on day . Then is a Markov Chain.

Transition probabilities:

pairs at front door on day . is if goes out back door (prob is 1/2) or out front door and back in front door (prob is 1/4). Otherwise is .

pairs at front door on day . is if out back, in front (prob is 1/4). is if out front, in back. Otherwise is .

0 pairs at front door on day . is 0 if out front door (prob 1/2) or out back door and in back door (prob 1/4) otherwise is .

Transition matrix :

Doubly stochastic: row sums and column sums are 1.

So for all is stationary initial distribution.

Solution to problem: 1 day in no shoes at front door. Half of those go barefoot. Also 1 day in all shoes at front door; go barefoot half of these days. Overall go barefoot of the time.

Gambler's Ruin

Insurance company's reserves fluctuate: sometimes up, sometimes down. Ruin is event they hit 0 (company goes bankrupt). General problem. For given model of fluctuation compute probability of ruin either eventually or in next time units.

Simplest model: gambling on Red at Casino. Bet $1 at a time. Win$1 with probability , lose $1 with probability . Start with dollars. Quit playing when down to$0 or up to . Compute

reach $N$ before $0$

= fortune after plays. .

Transition matrix:

First step analysis:

Middle equation is

or

Sum from to to get

or

For we get

so that

Notice that if our formulas for the sum of the geometric series are wrong. But for we get

so

Mean time in transient states

States 3 and 4 are transient. Let be the expected total number of visits to state for chain started in .

For or and or 4:

For or

For first step analysis:

In matrix form

Translate to matrix notation:

where is the identity, is the matrix of means and the part of the transition matrix corresponding to transient states.

Solution is

In our case

so that

Poisson Processes

Particles arriving over time at a particle detector. Several ways to describe most common model.

Approach 1: numbers of particles arriving in an interval has Poisson distribution, mean proportional to length of interval, numbers in several non-overlapping intervals independent.

For , denote number of arrivals in by . Model is

1. has a Poisson distribution.

2. For the variables are independent.

Approach 2:

Let be the times at which the particles arrive.

Let with by convention.

Then are independent Exponential random variables with mean .

Note is called survival function of .

Approaches are equivalent. Both are deductions of a model based on local behaviour of process.

Approach 3: Assume:

1. given all the points in the probability of 1 point in the interval is of the form

2. given all the points in the probability of 2 or more points in interval is of the form

All 3 approaches are equivalent. I show: 3 implies 1, 1 implies 2 and 2 implies 3. First explain , .

Notation: given functions and we write

provided

[Aside: if there is a constant such that

we say

Notation due to Landau. Another form is

means there is and s.t. for all

Idea: is tiny compared to while is (very) roughly the same size as .]

Model 3 implies 1: Fix , define to be conditional probability of 0 points in given value of process on .

Derive differential equation for . Given process on and 0 points in probability of no points in is

Given the process on the probability of no points in is . Using gives

Now rearrange, divide by to get

Let and find

Differential equation has solution

Notice: survival function of exponential rv..

General case. Notation: .

is a non-decreasing function of . Let

Evaluate by conditioning on and .

Given probability that is conditional probability of points in .

So, for :

For we have

For we have

is increasing so only consider .

Rearrange, divide by and let t get

For the term is dropped and

Using we get

Put this into the equation for to get

Multiply by to see

With we get

For general we have and

Check by induction that

Hence: has Poisson distribution.

Similar ideas permit proof of

From which (by induction) we can prove that has independent Poisson increments.

Exponential Interarrival Times

If is a Poisson Process we define to be the times between 0 and the first point, the first point and the second and so on.

Fact: are iid exponential rvs with mean .

We already did rigorously. The event is exactly the event . So

which is the survival function of an exponential rv.

I do case of . Let be two positive numbers and , . Consider event

This is almost the same as the intersection of the four events:

which has probability

Divide by and let and go to 0 to get joint density of is

which is the joint density of two independent exponential variates.

More rigor:

• Find joint density of .

• Use change of variables to find joint density of .

First step: Compute

This is just the event of exactly 1 point in each interval for () and at least one point in which has probability

Second step: write this in terms of joint cdf of . I do :

Notice tacit assumption .

Differentiate twice, that is, take

to get

Simplify to

Recall tacit assumption to get

That completes the first part.

Now compute the joint cdf of by

This is

Differentiate twice to get

which is the joint density of two independent exponential random variables.

Summary so far:

Have shown:

Instantaneous rates model implies independent Poisson increments model implies independent exponential interarrivals.

Next: show independent exponential interarrivals implies the instantaneous rates model.

Suppose iid exponential rvs with means . Define by if and only if

Let be the event . We are to show

and

If is a possible trajectory consistent with then has jumps at points

and at no other points in .

So given with we are essentially being given

and asked the conditional probabilty in the first case of the event given by

Conditioning on irrelevant (independence).

The numerator may be evaluated by integration:

Let to get the limit

as required.

The computation of

is similar.

Properties of exponential rvs

Convolution: If and independent rvs with densities and respectively and then

Differentiating wrt we get

This integral is called the convolution of densities and .

If iid Exponential then has a Gamma distribution. Density of is

for .

Proof:

Then

This telescopes to

Extreme Values: If are independent exponential rvs with means then has an exponential distribution with mean

Proof:

Memoryless Property: conditional distribution of given is exponential if has an exponential distribution.

Proof:

Hazard Rates

The hazard rate, or instantaneous failure rate for a positive random variable with density and cdf is

This is just

For an exponential random variable with mean this is

The exponential distribution has constant failure rate.

Weibull random variables have density

for . The corresponding survival function is

and the hazard rate is

which is increasing for , decreasing for . For this is the exponential distribution.

Since

we can integrate to find

so that determines and .

Properties of Poisson Processes

1)
If and are independent Poisson processes with rates and , respectively, then is a Poisson process with rate .

2)
Suppose is a Poisson process with rate . Suppose each point is marked with a label, say one of , independently of all other occurences. Suppose is the probability that a given point receives label . Let count the points with label (so that ). Then are independent Poisson processes with rates .

3)
Suppose independent rvs, each uniformly distributed on . Suppose is a Poisson random variable independent of the 's. Let

Then is a Poisson process on with rate .

4)
Suppose is a Poisson process with rate . Let be the times at which points arrive Given have the same distribution as the order statistics of a sample of size from the uniform distribution on .

5)
Given , have the same distribution as the order statistics of a sample of size from the uniform distribution on .

Indications of some proofs:

1) independent Poisson processes rates , . Let be the event of 2 or more points in in the time interval , , the event of exactly one point in in the time interval .

Let and be the corresponding events for .

Let denote the history of the processes up to time ; we condition on .

We are given:

and

Note that

Since

we have checked one of the two infinitesimal conditions for a Poisson process.

Next let be the event of no points in in the time interval and the same for . Then

shows

Hence is a Poisson process with rate .

2) The infinitesimal approach used for 1 can do part of this. See text for rest. Events defined as in 1): The event that there is one point in in is the event, that there is exactly one point in any of the processes together with a subset of where there are two or more points in in but exactly one is labeled . Since

Similarly, is a subset of so

This shows each is Poisson with rate . To get independence requires more work; see the text for the algebraic method which is easier.

3) Fix . Let be the number of points in . Given the conditional distribution of is Binomial with . So

4): Fix for such that

Given we compute the probability of the event

Intersection of , is ():

whose probability is

So

Divide by and let all go to 0 to get joint density of is

which is the density of order statistics from a Uniform sample of size .

5) Replace the event with . With as before we want

Note that is independent of and that we have already found the limit

We are left to compute the limit of

The denominator is

Thus

This gives the conditional density of given as in 4).

Inhomogeneous Poisson Processes

The idea of hazard rate can be used to extend the notion of Poisson Process. Suppose is a function of . Suppose is a counting process such that

and

Then has independent increments and has a Poisson distribution with mean

If we put

then mean of is .

Jargon: is the intensity or instaneous intensity and the cumulative intensity.

Can use the model with any non-decreasing right continuous function, possibly without a derivative. This allows ties.

Compound Poisson Processes

Imagine insurance claims arise at times of a Poisson process, , (more likely for an inhomogeneous process).

Let be the value of the th claim associated with the point whose time is .

Assume that the 's are independent of each other and of .

Let

and

Let

be the total claim up to time . We call a compound Poisson Process.

Useful properties:

(Look at all familiar? See homework.)

Continuous Time Markov Chains

Consider a population of single celled organisms in a stable environment.

Fix short time interval, length .

Each organism has some probability of dividing to produce two organisms and some other probability of dying.

We might suppose:

• Different organisms behave independently.

• Probability of division is plus .

• Probability of death is plus .

• Probability that an organism divides twice (or divides once and dies) in the interval of length is .

Tacit assumptions:

Constants of proportionality do not depend on time: stable environment''.

Constants do not depend on organism: organisms are all similar and live in similar environments.

: total population at time .

: history of process up to time .

Condition on event .

Probability of two or more divisions (more than one division by a single organism or two or more organisms dividing) is .

Probability of both a division and a death or of two or more deaths is .

So probability of exactly 1 division by any one of the organisms is .

Similarly probability of 1 death is .

We deduce:

This is the Markov Property.

Definition:A process taking values in , a finite or countable state space is a Markov Chain if

Definition:A Markov chain has stationary transitions if

From now on: our chains have stationary transitions.

Summary of Markov Process Results

• Chapman-Kolmogorov equations:

• Exponential holding times: starting from state time, , until process leaves has exponential distribution, rate denoted .

• Sequence of states visited, is Markov chain - transition matrix has . sometimes called skeleton.

• Communicating classes defined for skeleton chain. Usually assume chain has 1 communicating class.

• Periodicity irrelevant because of continuity of exponential distribution.

• Instantaneous transition rates from to :

• Kolmogorov backward equations:

• Kolmogorov forward equations:

• For strongly recurrent chains with a single communicating class:

• Stationary initial probabilities satisfy:

• Transition probabilities given by

where has entries

• Process is a Birth and Death process if

if

In this case we write for the instantaneous birth'' rate:

and for the instantaneous death'' rate:

We have

• If all then process is a pure birth process. If all a pure death process.

• Birth and Death process have stationary distribution

Necessary condition for existence of is

• Linear birth and death processes have

In this case (note immigration)

provided .

Detailed development

Suppose a Markov Chain with stationary transitions. Then

This shows

which is the Chapman-Kolmogorov equation.

Now consider the chain starting from and let be the first for which . Then is a stopping time.

[Technically:

for each .] Then

by the Markov property.

Conclusion: given , has memoryless property so has an exponential distribution. Let be the rate parameter.

Embedded Chain: Skeleton

Let be the stopping times at which transitions occur. Then . The sequence is a Markov chain by the Markov property. That reflects the fact that by design.

As before we say if for some . It is fairly clear that for the if and only if for the embedded chain .

We say ( and communicate) if and .

Now consider

Suppose the chain has made transitions so far so that . Then the event is, except for possibilities of probability the event that

and

The probability of this is

Kolmogorov's Equations

The Chapman-Kolmogorov equations are

Subtract from both sides, divide by and let . Remember that is the identity. We find

which gives

The Chapman-Kolmogorov equations can also be written

Now subtracting from both sides, dividing by and letting gives

Look at these equations in component form:

becomes

For our calculations of instantaneous transition rates gives

For we have

( either means which has probability or there have been two or more transitions in , a possibility of probability .) Thus

Let be the matrix with entries

That is

is the infinitesimal generator of the chain.

Thus

becomes

Called Kolmogorov's backward equations.

On the other hand

becomes

These are Kolmogorov's forward equations.

Remark: When the state space is infinite the forward equations may not be justified. In deriving them we interchanged a limit with an infinite sum; the interchange is always justified for the backward equations but not for forward.

Example: . Then

and the chain is otherwise specified by and . The matrix is

The backward equations become

while the forward equations are

Add first plus third backward equations to get

so

Put to get . This gives

Plug this back in to the first equation and get

Multiply by and get

which can be integrated to get

Alternative calculation:

can be written as

where

and

Then

Now

so we get

where

Notice: rows of are a stationary initial distribution. If rows are then

so

Moreover

Fact: is long run fraction of time in state 0.

Fact:

Ergodic Theorem in continuous time.

Birth and Death Processes

Consider a population of individuals. Suppose in next time interval probability of population increase of 1 (called a birth) is and probability of decrease of 1 (death) is .

Jargon: is a birth and death process.

Special cases:

All ; called a pure birth process.

All (0 is absorbing): pure death process.

and is a linear birth and death process.

, : Poisson Process.

and is a linear birth and death process with immigration.

Queuing Theory

Ingredients of Queuing Problem:

1: Queue input process.

2: Number of servers

3: Queue discipline: first come first serve? last in first out? pre-emptive priorities?

4: Service time distribution.

Example: Imagine customers arriving at a facility at times of a Poisson Process with rate . This is the input process, denoted (for Markov) in queuing literature.

Single server case:

Service distribution: exponential service times, rate .

Queue discipline: first come first serve.

= number of customers in line at time .

is a Markov process called queue:

Example: queue:

Customers arrive according to PP rate . Each customer begins service immediately. is number being served at time . is a birth and death process with

and

Stationary Initial Distributions

We have seen that a stationary initial distribution is a probability vector solving

Rewrite this as

Interpretation: LHS is rate at which process leaves state ; process is in state a fraction of time and then makes transition at rate . RHS is total rate of arrival in state . For each state is fraction of time spent in state and then the instantaneous rate of transition from to .

So equation says:

Rate of departure from balances rate of arrival to . This is called balance.

Application to birth and death processes:

Equation is

for and

Notice that this permits the recursion

which extends by induction to

Apply to get

This gives the formula announced:

If

then we have defined a probability vector which solves

Since

we see that

so that is constant. Put to discover that the constant is .

Brownian Motion

For fair random walk = number of heads minus number of tails,

where the are independent and

Notice:

Recall central limit theorem:

Now: rescale time axis so that steps take 1 time unit and vertical axis so step size is .

We now turn these pictures into a stochastic process:

For we define

Notice:

and

As with fixed we see . Moreover:

converges to by the central limit theorem. Thus

Another observation: is independent of because the two rvs involve sums of different .

Conclusions.

As the processes converge to a process with the properties:

1. has a distribution.

2. has independent increments: if

then

are independent .

3. The increments are stationary:

regardless of .

4. .

Definition:Any process satisfying 1-4 above is a Brownian motion.

Properties of Brownian motion

• Suppose . Then

Notice the use of independent increments and of .

• Again if :

Suppose . Then is a sum of two independent normal variables. Do following calculation:

, and independent. .

Compute conditional distribution of given :

Now is where so

for suitable choices of and . To find them compare coefficients of , and .

Coefficient of :

so .

Coefficient of :

so that

Finally you should check that

to make sure the coefficients of work out as well.

Conclusion: given the conditional distribution of is with and as above.

Application to Brownian motion:

• For let be and be so . Then , and . Thus

and

SO:

and

The Reflection Principle

Tossing a fair coin:

 HTHHHTHTHHTHHHTTHTH 5 more heads than tails THTTTHTHTTHTTTHHTHT 5 more tails than heads

Both sequences have the same probability.

So: for random walk starting at stopping time:

Any sequence with more heads than tails in next tosses is matched to sequence with more tails than heads. Both sequences have same prob.

Suppose is a fair () random walk. Define

Compute ? Trick: Compute

First: if then

Second: if then

Fix . Consider a sequence of H's and T's which leads to say and .

Switch the results of tosses to to get a sequence of H's and T's which has and . This proves

This is true for each so

Finally, sum over all to get

Make the substitution in the second sum to get

Brownian motion version:

(called hitting time for level ). Then

Any path with and is matched to an equally likely path with and .

So for

while for

Let to get

Hence has the distribution of .

On the other hand in view of

the density of is

Use the chain rule to compute this. First

where is the standard normal density

because is 1 minus the standard normal cdf.

So

This density is called the Inverse Gaussian density. is called a first passage time

NOTE: the preceding is a density when viewed as a function of the variable .

Martingales

A stochastic process indexed by either a discrete or continuous time parameter is a martingale if:

whenever .

Examples

• A fair random walk is a martingale.

• If is a Poisson Process with rate then is a martingale.

• Standard Brownian motion (defined above) is a martingale.

Note: Brownian motion with drift is a process of the form

where is standard Brownian motion, introduced earlier. is a martingale if . We call the drift

• If is a Brownian motion with drift then

is a geometric Brownian motion. For suitable and we can make a martingale.

• If a gambler makes a sequence of fair bets and is the amount of money s/he has after bets then is a martingale - even if the bets made depend on the outcomes of previous bets, that is, even if the gambler plays a strategy.

Some evidence for some of the above:

Random walk: iid with

and with . Then

Things to notice:

treated as constant given .

Knowing is equivalent to knowing .

For we have independent of so conditional expectation is unconditional expectation.

Since Standard Brownian Motion is limit of such random walks we get martingale property for standard Brownian motion.

Poisson Process: . Fix .

Things to notice:

I used independent increments.

is shorthand for the conditioning event.

Similar to random walk calculation.

Black Scholes

We model the price of a stock as

where

is a Brownian motion with drift ( is standard Brownian motion).

If annual interest rates are we call the instantaneous interest rate; if we invest $1 at time 0 then at time we would have . In this sense an amount of money to be paid at time is worth only at time 0 (because that much money at time 0 will grow to by time ). Present Value: If the stock price at time is per share then the present value of 1 share to be delivered at time is With as above we see Now we compute for . Write Since has independent increments we find Note: is ; the expected value needed is the moment generating function of this variable at . Suppose . The Moment Generating Function of is Rewrite where to see Finally we get provided If this identity is satisfied then the present value of the stock price is a martingale. Option Pricing Suppose you can pay$ today for the right to pay for a share of this stock at time (regardless of the actual price at time ).

If, at time , you will exercise your option and buy the share making dollars.

If you will not exercise your option; it becomes worthless.

The present value of this option is

where

(Called positive part of .)

In a fair market:

• The discounted share price is a martingale.

• The expected present value of the option is 0.

So:

Since

we are to compute

This is

where

Evidently

The other integral needed is

Introduce the notation

and do all the algebra to get

This is the Black-Scholes option pricing formula.

Simulation

Monte Carlo computation of expected value:

To compute : do experiment times to generate independent observations all with same distribution as .

Get .

Use

as an estimate of

To estimate : use same and compute

Random Number Generation

In practice: random quantities are not used.

Use: pseudo-random uniform random numbers.

They behave like'' iid Uniform(0,1) variables.

Many, many generators. One standard kind: linear congruential generators.

Here the mod means compute the remainder after division by .

Integers and are chosen so that

• the sequence runs through the set (or as much of it is possible, sometimes).

• the sequence of pairs

fills up as much of the square .

• similarly for triples .

Use

as Uniform(0,1) random numbers.

General Random Variables

We now pretend we can generate which are iid Uniform(0,1).

Monte Carlo methods for generating a sample:

• Transformation:

• Inverse Probability Integral Transform

• Multivariate (e.g. Box Müller)

• Acceptance Rejection

• Markov Chain Monte Carlo

Inverse (Probability Integral) Transform

Fact: continuous CDF and satisfy

then Uniform(0,1) if and only if .

Proof: For simplicity: assume strictly increasing on inverval with and .

If Uniform(0,1) then

Conversely: if and then there is a unique such that

Application: generate and solve for to get which has cdf .

Example:For the exponential distribution

Set and solve to get

Observation: so also has an Exponential() distribution.

Example:: for the standard normal cdf solving

requires numerical approximation but such solutions are built in to many languages.

Special purpose transformations:

Example:: If and are iid define and by

NOTE: Book says

but this takes values in ; notation

means angle in so that

Then

is part of circle of radius . (Start at and go clockwise to angle .)

Compute joint cdf of at .

Define

Then

Do integral in polar co-ordinates to get

The integral just gives

The integral then gives

So

Product of two cdfs so: and are independent.

Moreover has cdf

which is Exponential with rate .

is Uniform.

So: generate iid Uniform(0,1).

Define

and

Put

You have generated two independent variables.

Acceptance Rejection

If difficult to solve (eg hard to compute) sometimes use acceptance rejection.

Goal: generate where .

Tactic: find density and constant such that

and s.t. can generate from density .

Algorithm:

1. Generate which has density .

2. Generate Uniform(0,1) independent of .

3. If let .

4. If not reject and go back to step 1; repeat, generating new and (independently) till you accept a .

Facts:

• Since you repeat the same experiment each trial: number of trials, , success is Geometric (and sure to be finite).

• The probability of success on a individual trial is

so

Most important fact: :

Proof: this is like the old craps example:

We compute

(condition on the first iteration where the condition is met).

Condition on to get

as desired.

Example:Half normal distribution: has density

(Density of absolute value of .)

Use . To find maximize

Rewrite as

Maximum is at giving

Choose to minimize (or ); get so

Algorithm is then: generate and then compute

Generate and if

accept otherwise try again.

To generate : use a third to pick a sign at random: negative if otherwise positive.

Discrete Distributions

The inverse transformation method works for discrete distributions.

has possible values with probabilities compute cumulative probabilities

with .

Generate Uniform(0,1).

Find such that

Put .

Richard Lockhart
2002-02-07