No Title

$next$ $up$ $previous$

Postscript version of this file

STAT 450 Lecture 4

Reading for Today's Lecture: Sections 1, 2 and 3 of Chapter 2. Sections 1 and 2 of Chapter 4.

Goals of Today's Lecture:

Define independent events and random variables.
Define joint and marginal densities.
Define conditional probabilities, conditional densities.

Last time: We introduced distribution theory

X has known distribution
Y=g(X) -- problem is to find distribution of Y.

We derived change of variables method:

Requires g one to one.
Solve y=g(x) for x to get x=h(y); technically, h is just g^-1, the inverse function of g.
Get
$\begin{align*}f_Y(y) &= f_X(h(y)) \vert h^\prime(y)\vert \\ & = \frac{f_X(g^{-1}(y))}{\vert g^\prime(g^{-1}(y))\vert} \end{align*}$

Note: Last time I did only the case g increasing where $g^\prime$ would be positive. Thus my derivation misses the absolute sign needed if g is decreasing.

Reading for Today's Lecture: Chapter 4 sections 1, 2 and 3. Chapter 1 section 3.6.

Goals of Today's Lecture:

Define independent events and random variables.
Describe relation between joint and marginal densities.
Describe structure of joint cdf and density of independent random variables.
Define conditional probabilities.
Motivate definition of conditional density.

Today's notes

Long term plan:

The general multivariate problem has

$\begin{displaymath}Y=(Y_1,\ldots,Y_q) = ( g_1(X_1,\ldots,X_p), \ldots, g_q(X_1,\ldots,X_p)) \end{displaymath}$

and our goal is to compute f_Y from f_X.

Case 1: If q>p then Y will not have a density for ``smooth'' g. Y will have a singular or discrete distribution. This sort of problem is rarely of real interest. (However, variables of interest often have a singular distribution - this is almost always true of the set of residuals in a regression problem.)

Case 2 If q=p then we will be able to use a change of variables formula.

Case 3: If q < p we will try a two step process, first applying a change of variables formula then a technique called marginalization.

Before we do any of this we develop mathematical tools to manipulate joint densities: independence, marginal densities, conditional densities.

Independence, conditional distributions

In the examples so far the density for X has been specified explicitly. In many situations, however, the process of modelling the data leads to a specification in terms of marginal and conditional distributions.

Definition: Events A and B are independent if

$\begin{displaymath}P(AB) = P(A)P(B) \, . \end{displaymath}$

(Note the notation: AB is the event that both A and B happen. It is also written $A\cap B$ .)

Definition: Events A_i, $i=1,\ldots,p$ are independent if

$\begin{displaymath}P(A_{i_1} \cdots A_{i_r}) = \prod_{j=1}^r P(A_{i_j}) \end{displaymath}$

for any set of distinct indices $i_1,\ldots,i_r$ between 1 and p.

Example: p=3

$\begin{eqnarray*}P(A_1A_2A_3) & = & P(A_1)P(A_2)P(A_3) \\ P(A_1A_2) & = & P(A_1... ...\ P(A_1A_3) & = & P(A_1)P(A_3) \\ P(A_2A_3) & = & P(A_2)P(A_3) \end{eqnarray*}$

You need all these equations to be true for independence!

Definition: Random variables X and Y are independent if

$\begin{displaymath}P(X \in A; Y \in B) = P(X\in A)P(Y\in B) \end{displaymath}$

for all A and B.

Definition: Random variables $X_1,\ldots,X_p$ are independent if

$\begin{displaymath}P(X_1 \in A_1, \cdots , X_p \in A_p ) = \prod P(X_i \in A_i) \end{displaymath}$

for any choice of $A_1,\ldots,A_p$ .

Theorem 1

1.

If X and Y are independent then

F_X,Y(x,y) = F_X(x)F_Y(y)

for all x,y

2.

if X and Y have densities f_X and f_Y and X and Y are independent then (X,Y) has density

$\begin{displaymath}f_{X,Y}(x,y) = f_X(x) f_Y(y) \, . \end{displaymath}$

3.

if X and Y are independent and (X,Y) has density f(x,y)then X has a density, say f_X, and Y has a density, say f_Y such that for all x and y

$\begin{displaymath}f_{X,Y}(x,y) = f_X(x) f_Y(y) \, . \end{displaymath}$

4.

F_X,Y(x,y) = F_X(x)F_Y(y)

for all x,y then X and Y are independent.

5.

If (X,Y) has density f(x,y) and there are functions g(x) and h(y) such that

f(x,y) = g(x) h(y)

for all (well technically almost all) (x,y) then X and Y are independent and they each have a density given by X and Y are independent and they each have a density given by

$\begin{displaymath}f_X(x) = g(x)/\int_{-\infty}^\infty g(u) du \end{displaymath}$

and

$\begin{displaymath}f_Y(y) = h(y)/\int_{-\infty}^\infty h(u) du \, . \end{displaymath}$

Proof:

1.

Since X and Y are independent so are the events $X \le x$ and $Y \le y$ ; hence

$\begin{displaymath}P(X \le x, Y \le y) = P(X \le x)P(Y \le y) \end{displaymath}$

2.

For any A and B we have
$\begin{align*}P(X \in A, Y \in B) & = P(X\in A)P(Y \in B) \\ &= \int_Af_X(x)dx \int_B f_Y(y) dy \\ &= \int_A\int_B f_X(x)f_Y(y) dydx \end{align*}$
If we define g(x,y) = f_X(x)f_Y(y) then we have proved that for $C=A \times B$

$\begin{displaymath}P( (X,Y) \in C) = \int_C g(x,y)dy dx \end{displaymath}$

Our definition of density is that g is the density if this formula holds for all (Borel) C. I will not discuss this proof in class but here is the key idea: To prove that g is the joint density of (X,Y) we need only prove that this integral formula is valid for an arbitrary Borel set C, not just a rectangle $A \times B$ . This is proved via a monotone class argument. You prove that the collection of sets Cfor which the identity holds has closure properties which guarantee that this collection includes the Borel sets.

3.

For clarity suppose X and Y are real valued. In assignment 2 I have asked you to prove that the existence of f_X,Yimplies that f_X and f_Y exist (and are given by the marginal density formula). Then for any sets A and B
$\begin{align*}P(X \in A, Y\in B) &= \int_A\int_B f_{X,Y}(x,y) dydx \\ P(X\in A)... ...t_A f_X(x)dx \int_B f_Y(y) dy \\ &= \int_A\int_B f_X(x)f_Y(y) dydx \end{align*}$
Since $P(X \in A, Y\in B) =P(X\in A)P(Y\in B)$ we see that for any sets A and B

$\begin{displaymath}\int_A\int_B [ f_{X,Y}(x,y) - f_X(x)f_Y(y) ]dydx = 0 \end{displaymath}$

It follows (via measure theory) that the quantity in [] is 0 (for almost every pair (x,y)).

4.

This is proved via another monotone class argument.

5.

$\begin{align*}P(X \in A, Y \in B) & = \int_A \int_B g(x) h(y) dy dx \\ & = \int_A g(x) dx \int_B h(y) dy \end{align*}$
Take B=R¹ to see that

$\begin{displaymath}P(X \in A ) = c_1 \int_A g(x) dx \end{displaymath}$

where $c_1 = \int h(y) dy$ . From the definition of density we see that c₁ g is the density of X. Since $\int\int f_{X,Y}(xy)dxdy = 1$ we see that $\int g(x) dx \int h(y) dy = 1$ so that $c_1 = 1/\int g(x) dx$ . A similar argument works for Y.

Theorem 2 If $X_1,\ldots,X_p$ are independent and Y_i =g_i(X_i) then $Y_1,\ldots,Y_p$ are independent. Moreover, $(X_1,\ldots,X_q)$ and $(X_{q+1},\ldots,X_{p})$ are independent.

Conditional probability

Def'n: P(A|B) = P(AB)/P(B) provided $P(B) \neq 0$ .

Def'n: For discrete random variables X and Y the conditional probability mass function of Y given X is
$\begin{align*}f_{Y\vert X}(y\vert x) &= P(Y=y\vert X=x) \\ &= f_{X,Y}(x,y)/f_X(x) \\ &= f_{X,Y}(x,y)/\sum_t f_{X,Y}(x,t) \end{align*}$

For absolutely continuous X the problem is that P(X=x) = 0 for all x so how can we define P(A| X=x) or f_Y|X(y|x)? The solution is to take a limit

$\begin{displaymath}P(A\vert X=x) = \lim_{\delta x \to 0} P(A\vert x \le X \le x+\delta x) \end{displaymath}$

If, for instance, X,Y have joint density f_X,Y then with $A=\{ Y \le y\}$ we have
$\begin{align*}P(A\vert x \le X \le x+\delta x) & = \frac{P(A \cap x \le X \le x+... ...{x+\delta x} f_{X,Y}(u,v)dudv }{ \int_x^{x+\delta x} f_X(u) du } \end{align*}$
Divide the top and bottom by $\delta x$ and let $\delta x$ tend to 0. The denominator converges to f_X(x) while the numerator converges to

$\begin{displaymath}\int_{-\infty}^y f_{X,Y}(x,v) dv \end{displaymath}$

So we define the conditional cdf of Y given X=x to be

$\begin{displaymath}P(Y \le y \vert X=x) = \frac{ \int_{-\infty}^y f_{X,Y}(x,v) dv }{ f_X(x) } \end{displaymath}$

Differentiate with respect to y to get the definition of the conditional density of Y given X=x namely

f_Y|X(y|x) = f_X,Y(x,y)/f_X(x)

or in words ``conditional = joint/marginal''.

Marginalization

Now we turn to multivariate problems. The simplest version has $X=(X_1,\ldots,X_p)$ and Y=X₁ (or in general any X_j).

Theorem 3 If X has (joint) density $f(x_1,\ldots,x_p)$ then $Y=(X_1,\ldots,X_q)$ (with q < p) has a density f_Y given by

$\begin{displaymath}f_{X_1,\ldots,X_q}(x_1,\ldots,x_q) = \int_{-\infty}^\infty \c... ...-\infty}^\infty f(x_1,x_2,\ldots,x_p) \, dx_{q+1} \ldots dx_p \end{displaymath}$

We call $f_{X_1,\ldots,X_q}$ the marginal density of $X_1,\ldots,X_q$ and use the expression joint density for f_X but $f_{X_1,\ldots,X_q}$ is exactly the usual density of $(X_1,\ldots,X_q)$ . The adjective ``marginal'' is just there to distinguish the object from the joint density of X.

Example The function

f(x₁,x₂) = Kx₁x₂1(x₁> 0) 1(x₂ >0) 1(x₁+x₂ < 1)

is a density for a suitable choice of K, namely the value of Kmaking

$\begin{displaymath}P(X\in R^2) = \int_{-\infty}^\infty \int_{-\infty}^\infty f(x_1,x_2)\, dx_1\, dx_2 = 1 \, . \end{displaymath}$

The integral is

$\begin{eqnarray*}K \int_0^1 \int_0^{1-x_1} x_1 x_2 \, dx_1\, dx_2 & = & K \int_0... ...1(1-x_1)^2 \, dx_1 /2 \\ & = & K(1/2 -2/3+1/4)/2 \\ & = & K/24 \end{eqnarray*}$

so that K=24. The marginal density of x₁ is

$\begin{displaymath}f_{X_1}(x_1) = \int_{-\infty}^\infty 24 x_1 x_2 1(x_1> 0) 1(x_2 >0) 1(x_1+x_2 < 1)\, dx_2 \end{displaymath}$

which is the same as

$\begin{displaymath}f_{X_1}(x_1) = 24 \int_0^{1-x_1} x_1 x_2 1(x_1> 0) 1(x_1 < 1) \, dx_2 = 12 x_1(1-x_1)^2 1(0 < x_1 < 1) \end{displaymath}$

This is a $\mbox{Beta}(2,3)$ density.

The general multivariate problem has

$\begin{displaymath}Y=(Y_1,\ldots,Y_q) = ( g_1(X_1,\ldots,X_p), \ldots, g_q(X_1,\ldots,X_p)) \end{displaymath}$

Case 2 If q=p then we will be able to use a change of variables formula which generalizes the one derived above for the case p=q=1. (See below.)

Case 3: If q < p we will try a two step process. In the first step we pad out Y by adding on p-q more variables (carefully chosen) and calling them $Y_{q+1},\ldots,Y_p$ . Formally we find functions $g_{q+1}, \ldots,g_p$ and define

$\begin{displaymath}Z=(Y_1,\ldots,Y_q,g_{q+1}(X_1,\ldots,X_p),\ldots,g_p(X_1,\ldots,X_p)) \end{displaymath}$

If we have chosen the functions carefully we will find that $g=(g_1,\ldots,g_p)$ satisfies the conditions for applying the change of variables formula from the previous case. Then we apply that case to compute f_Z. Finally we marginalize the density of Z to find that of Y:

$\begin{displaymath}f_Y(y_1,\ldots,y_q) = \int_{-\infty}^\infty \cdots \int_{-\in... ...f_Z(y_1,\ldots,y_q,z_{q+1},\ldots,z_p) \, dz_{q+1} \ldots dz_p \end{displaymath}$

$next$ $up$ $previous$

Richard Lockhart
1999-09-14