next up previous


Postscript version of this file

STAT 450 Lecture 4

Reading for Today's Lecture: Sections 1, 2 and 3 of Chapter 2. Sections 1 and 2 of Chapter 4.

Goals of Today's Lecture:

Last time: We introduced distribution theory

We derived change of variables method:

Note: Last time I did only the case g increasing where $g^\prime$ would be positive. Thus my derivation misses the absolute sign needed if g is decreasing.

Reading for Today's Lecture: Chapter 4 sections 1, 2 and 3. Chapter 1 section 3.6.

Goals of Today's Lecture:

Today's notes

Long term plan:

The general multivariate problem has

\begin{displaymath}Y=(Y_1,\ldots,Y_q) = ( g_1(X_1,\ldots,X_p), \ldots, g_q(X_1,\ldots,X_p))
\end{displaymath}

and our goal is to compute fY from fX.

Case 1: If q>p then Y will not have a density for ``smooth'' g. Y will have a singular or discrete distribution. This sort of problem is rarely of real interest. (However, variables of interest often have a singular distribution - this is almost always true of the set of residuals in a regression problem.)

Case 2 If q=p then we will be able to use a change of variables formula.

Case 3: If q < p we will try a two step process, first applying a change of variables formula then a technique called marginalization.

Before we do any of this we develop mathematical tools to manipulate joint densities: independence, marginal densities, conditional densities.

Independence, conditional distributions

In the examples so far the density for X has been specified explicitly. In many situations, however, the process of modelling the data leads to a specification in terms of marginal and conditional distributions.

Definition: Events A and B are independent if

\begin{displaymath}P(AB) = P(A)P(B) \, .
\end{displaymath}

(Note the notation: AB is the event that both A and B happen. It is also written $A\cap B$.)

Definition: Events Ai, $i=1,\ldots,p$ are independent if

\begin{displaymath}P(A_{i_1} \cdots A_{i_r}) = \prod_{j=1}^r P(A_{i_j})
\end{displaymath}

for any set of distinct indices $i_1,\ldots,i_r$ between 1 and p.

Example: p=3

\begin{eqnarray*}P(A_1A_2A_3) & = & P(A_1)P(A_2)P(A_3)
\\
P(A_1A_2) & = & P(A_1...
...\
P(A_1A_3) & = & P(A_1)P(A_3)
\\
P(A_2A_3) & = & P(A_2)P(A_3)
\end{eqnarray*}


You need all these equations to be true for independence!

Definition: Random variables X and Y are independent if

\begin{displaymath}P(X \in A; Y \in B) = P(X\in A)P(Y\in B)
\end{displaymath}

for all A and B.

Definition: Random variables $X_1,\ldots,X_p$ are independent if

\begin{displaymath}P(X_1 \in A_1, \cdots , X_p \in A_p ) = \prod P(X_i \in A_i)
\end{displaymath}

for any choice of $A_1,\ldots,A_p$.

Theorem 1  

1.
If X and Y are independent then

FX,Y(x,y) = FX(x)FY(y)

for all x,y

2.
if X and Y have densities fX and fY and X and Y are independent then (X,Y) has density

\begin{displaymath}f_{X,Y}(x,y) = f_X(x) f_Y(y) \, .
\end{displaymath}

3.
if X and Y are independent and (X,Y) has density f(x,y)then X has a density, say fX, and Y has a density, say fY such that for all x and y

\begin{displaymath}f_{X,Y}(x,y) = f_X(x) f_Y(y) \, .
\end{displaymath}

4.
If

FX,Y(x,y) = FX(x)FY(y)

for all x,y then X and Y are independent.

5.
If (X,Y) has density f(x,y) and there are functions g(x) and h(y) such that

f(x,y) = g(x) h(y)

for all (well technically almost all) (x,y) then X and Y are independent and they each have a density given by X and Y are independent and they each have a density given by

\begin{displaymath}f_X(x) = g(x)/\int_{-\infty}^\infty g(u) du
\end{displaymath}

and

\begin{displaymath}f_Y(y) = h(y)/\int_{-\infty}^\infty h(u) du \, .
\end{displaymath}

Proof:

1.
Since X and Y are independent so are the events $X \le x$ and $Y \le y$; hence

\begin{displaymath}P(X \le x, Y \le y) = P(X \le x)P(Y \le y)
\end{displaymath}

2.
For any A and B we have
\begin{align*}P(X \in A, Y \in B) & = P(X\in A)P(Y \in B)
\\
&= \int_Af_X(x)dx \int_B f_Y(y) dy
\\
&= \int_A\int_B f_X(x)f_Y(y) dydx
\end{align*}
If we define g(x,y) = fX(x)fY(y) then we have proved that for $C=A \times B$

\begin{displaymath}P( (X,Y) \in C) = \int_C g(x,y)dy dx
\end{displaymath}

Our definition of density is that g is the density if this formula holds for all (Borel) C. I will not discuss this proof in class but here is the key idea: To prove that g is the joint density of (X,Y) we need only prove that this integral formula is valid for an arbitrary Borel set C, not just a rectangle $A \times B$. This is proved via a monotone class argument. You prove that the collection of sets Cfor which the identity holds has closure properties which guarantee that this collection includes the Borel sets.

3.
For clarity suppose X and Y are real valued. In assignment 2 I have asked you to prove that the existence of fX,Yimplies that fX and fY exist (and are given by the marginal density formula). Then for any sets A and B
\begin{align*}P(X \in A, Y\in B) &= \int_A\int_B f_{X,Y}(x,y) dydx
\\
P(X\in A)...
...t_A f_X(x)dx \int_B f_Y(y) dy
\\
&= \int_A\int_B f_X(x)f_Y(y) dydx
\end{align*}
Since $P(X \in A, Y\in B) =P(X\in A)P(Y\in B)$ we see that for any sets A and B

\begin{displaymath}\int_A\int_B [ f_{X,Y}(x,y) - f_X(x)f_Y(y) ]dydx = 0
\end{displaymath}

It follows (via measure theory) that the quantity in [] is 0 (for almost every pair (x,y)).

4.
This is proved via another monotone class argument.

5.

\begin{align*}P(X \in A, Y \in B) & = \int_A \int_B g(x) h(y) dy dx
\\
& = \int_A g(x) dx \int_B h(y) dy
\end{align*}
Take B=R1 to see that

\begin{displaymath}P(X \in A ) = c_1 \int_A g(x) dx
\end{displaymath}

where $c_1 = \int h(y) dy$. From the definition of density we see that c1 g is the density of X. Since $\int\int f_{X,Y}(xy)dxdy = 1$ we see that $\int g(x) dx \int h(y) dy = 1$ so that $ c_1 = 1/\int g(x) dx$. A similar argument works for Y.

Theorem 2   If $X_1,\ldots,X_p$ are independent and Yi =gi(Xi) then $Y_1,\ldots,Y_p$ are independent. Moreover, $(X_1,\ldots,X_q)$ and $(X_{q+1},\ldots,X_{p})$ are independent.

Conditional probability

Def'n: P(A|B) = P(AB)/P(B) provided $P(B) \neq 0$.

Def'n: For discrete random variables X and Y the conditional probability mass function of Y given X is
\begin{align*}f_{Y\vert X}(y\vert x) &= P(Y=y\vert X=x)
\\
&= f_{X,Y}(x,y)/f_X(x)
\\
&= f_{X,Y}(x,y)/\sum_t f_{X,Y}(x,t)
\end{align*}

For absolutely continuous X the problem is that P(X=x) = 0 for all x so how can we define P(A| X=x) or fY|X(y|x)? The solution is to take a limit

\begin{displaymath}P(A\vert X=x) = \lim_{\delta x \to 0} P(A\vert x \le X \le x+\delta x)
\end{displaymath}

If, for instance, X,Y have joint density fX,Y then with $A=\{ Y \le y\}$ we have
\begin{align*}P(A\vert x \le X \le x+\delta x) & = \frac{P(A \cap x \le X \le x+...
...{x+\delta x} f_{X,Y}(u,v)dudv
}{
\int_x^{x+\delta x} f_X(u) du
}
\end{align*}
Divide the top and bottom by $\delta x$ and let $\delta x$ tend to 0. The denominator converges to fX(x) while the numerator converges to

\begin{displaymath}\int_{-\infty}^y f_{X,Y}(x,v) dv
\end{displaymath}

So we define the conditional cdf of Y given X=x to be

\begin{displaymath}P(Y \le y \vert X=x) = \frac{
\int_{-\infty}^y f_{X,Y}(x,v) dv
}{
f_X(x)
}
\end{displaymath}

Differentiate with respect to y to get the definition of the conditional density of Y given X=x namely

fY|X(y|x) = fX,Y(x,y)/fX(x)

or in words ``conditional = joint/marginal''.

Marginalization

Now we turn to multivariate problems. The simplest version has $X=(X_1,\ldots,X_p)$ and Y=X1 (or in general any Xj).

Theorem 3   If X has (joint) density $f(x_1,\ldots,x_p)$ then $Y=(X_1,\ldots,X_q)$ (with q < p) has a density fY given by

\begin{displaymath}f_{X_1,\ldots,X_q}(x_1,\ldots,x_q) = \int_{-\infty}^\infty \c...
...-\infty}^\infty
f(x_1,x_2,\ldots,x_p) \, dx_{q+1} \ldots dx_p
\end{displaymath}

We call $f_{X_1,\ldots,X_q}$ the marginal density of $X_1,\ldots,X_q$ and use the expression joint density for fX but $f_{X_1,\ldots,X_q}$ is exactly the usual density of $(X_1,\ldots,X_q)$. The adjective ``marginal'' is just there to distinguish the object from the joint density of X.

Example The function

f(x1,x2) = Kx1x21(x1> 0) 1(x2 >0) 1(x1+x2 < 1)

is a density for a suitable choice of K, namely the value of Kmaking

\begin{displaymath}P(X\in R^2) = \int_{-\infty}^\infty \int_{-\infty}^\infty f(x_1,x_2)\, dx_1\, dx_2 = 1 \, .
\end{displaymath}

The integral is

\begin{eqnarray*}K \int_0^1 \int_0^{1-x_1} x_1 x_2 \, dx_1\, dx_2 & = & K \int_0...
...1(1-x_1)^2 \, dx_1 /2
\\
& = & K(1/2 -2/3+1/4)/2
\\
& = & K/24
\end{eqnarray*}


so that K=24. The marginal density of x1 is

\begin{displaymath}f_{X_1}(x_1) = \int_{-\infty}^\infty 24 x_1 x_2 1(x_1> 0) 1(x_2 >0) 1(x_1+x_2 < 1)\, dx_2
\end{displaymath}

which is the same as

\begin{displaymath}f_{X_1}(x_1) = 24 \int_0^{1-x_1} x_1 x_2 1(x_1> 0) 1(x_1 < 1) \, dx_2 = 12 x_1(1-x_1)^2
1(0 < x_1 < 1)
\end{displaymath}

This is a $\mbox{Beta}(2,3)$ density.

The general multivariate problem has

\begin{displaymath}Y=(Y_1,\ldots,Y_q) = ( g_1(X_1,\ldots,X_p), \ldots, g_q(X_1,\ldots,X_p))
\end{displaymath}

Case 1: If q>p then Y will not have a density for ``smooth'' g. Y will have a singular or discrete distribution. This sort of problem is rarely of real interest. (However, variables of interest often have a singular distribution - this is almost always true of the set of residuals in a regression problem.)

Case 2 If q=p then we will be able to use a change of variables formula which generalizes the one derived above for the case p=q=1. (See below.)

Case 3: If q < p we will try a two step process. In the first step we pad out Y by adding on p-q more variables (carefully chosen) and calling them $Y_{q+1},\ldots,Y_p$. Formally we find functions $g_{q+1}, \ldots,g_p$ and define

\begin{displaymath}Z=(Y_1,\ldots,Y_q,g_{q+1}(X_1,\ldots,X_p),\ldots,g_p(X_1,\ldots,X_p))
\end{displaymath}

If we have chosen the functions carefully we will find that $g=(g_1,\ldots,g_p)$ satisfies the conditions for applying the change of variables formula from the previous case. Then we apply that case to compute fZ. Finally we marginalize the density of Z to find that of Y:

\begin{displaymath}f_Y(y_1,\ldots,y_q) = \int_{-\infty}^\infty \cdots \int_{-\in...
...f_Z(y_1,\ldots,y_q,z_{q+1},\ldots,z_p) \, dz_{q+1} \ldots dz_p
\end{displaymath}


next up previous



Richard Lockhart
1999-09-14