No Title

STAT 801 Lecture 2

Change of variables

Reading for Today's Lecture: Chapters 1, 2 and 4 of Casella and Berger.

Goals of Today's Lecture:

Introduce the multivariate change of variables formula.
Introduce the notion of marginal densities.

Today's notes

So far we have defined probability space, real and vector valued random variables cumulative distribution functions in R¹ and R^p, discrete densities and densities for random variables with absolutely continuous distributions.

We also started distribution theory. So far: for Y=g(X) with X and Y each real valued

$\begin{displaymath}P(Y \le y) = P(g(X) \le y) = P(X \in g^{-1}((-\infty,y])) \end{displaymath}$

Take the derivative with respect to y to compute the density

$\begin{displaymath}f_Y(y) = \frac{d}{dy}\int_{\{x:g(x) \le y\}} f(x) \, dx \end{displaymath}$

Often we can differentiate this integral without doing the integral.

Method 2: Change of variables.

Now assume g is one to one. I will do the case where g is increasing and I will be assuming that g is differentiable. The density has the following interpretation (mathematically what follows is just the expression of the fact that the density is the derivative of the cdf):

$\begin{displaymath}f_Y(y) = \lim_{\delta y \to 0} \frac{P(y \le Y \le y+\delta y... ... \lim_{\delta y \to 0} \frac{F_Y(y+\delta y)-F_Y(y)}{\delta y} \end{displaymath}$

and

$\begin{displaymath}f_X(x) = \lim_{\delta x \to 0} \frac{P(x \le X \le x+\delta x)}{\delta x} \end{displaymath}$

Now assume that y=g(x). Then

$\begin{displaymath}P( y \le Y \le g(x+\delta x) ) = P( x \le X \le x+\delta x) \end{displaymath}$

Each of these probabilities is the integral of a density. The first is the integral of the density of Y over the small interval from y=g(x) to $y=g(x+\delta x)$ . Since the interval is narrow the function f_Y is nearly constant over this interval and we get

$\begin{displaymath}P( y \le Y \le g(x+\delta x) ) \approx f_Y(y)(g(x+\delta x) - g(x)) \end{displaymath}$

Since g has a derivative the difference

$\begin{displaymath}g(x+\delta x) - g(x) \approx \delta x g^\prime(x) \end{displaymath}$

and we get

$\begin{displaymath}P( y \le Y \le g(x+\delta x) ) \approx f_Y(y) g^\prime(x) \delta x \end{displaymath}$

On the other hand the same idea applied to the probability expressed in terms of X gives

$\begin{displaymath}P( x \le X \le x+\delta x) \approx f_X(x) \delta x \end{displaymath}$

which gives

$\begin{displaymath}f_Y(y) g^\prime(x) \delta x \approx f_X(x) \delta x \end{displaymath}$

or, cancelling the $\delta x$ in the limit

$\begin{displaymath}f_Y(y) g^\prime(x) = f_X(x) \end{displaymath}$

If you remember y=g(x) then you get

$\begin{displaymath}f_X(x) = f_Y(g(x)) g^\prime(x) \end{displaymath}$

or if you solve the equation y=g(x) to get x in terms of y, that is, x=g^-1(y) then you get the usual formula

$\begin{displaymath}f_Y(y) = f_X(g^{-1}(y)) / g^\prime(g^{-1}(y)) \end{displaymath}$

I find it easier to remember the first of these formulas. This is just the change of variables formula for doing integrals.

Remark: If g had been decreasing the derivative $g^\prime$ would have been negative but in the argument above the interval $(g(x), g(x+\delta x))$ would have to have been written in the other order. This would have meant that our formula had $g(x) - g(x+\delta x) \approx -g^\prime(x) \delta x$ . In both cases this amounts to the formula

$\begin{displaymath}f_X(x) = f_Y(g(x))\vert g^\prime(x)\vert \, . \end{displaymath}$

Example: $X\sim\mbox{Weibull(shape $\alpha$ , scale $\beta$ )}$ or

$\begin{displaymath}f_X(x)= \frac{\alpha}{\beta} \left(\frac{x}{\beta}\right)^{\alpha-1} \exp\left\{ -(x/\beta)^\alpha\right\} 1(x>0) \end{displaymath}$

Let $Y=\log X$ so that $g(x) = \log(x)$ . Setting $y=\log x$ and solving gives $x=\exp(y)$ so that g^-1(y) = e^y. Then $g^\prime(x) = 1/x$ and $1/g^\prime(g^{-1}(y)) = 1/(1/e^y) =e^y$ . Hence

$\begin{displaymath}f_Y(y) = \frac{\alpha}{\beta} \left(\frac{e^y}{\beta}\right)^... ...a-1} \exp\left\{ -(e^y/\beta)^\alpha\right\} 1(e^y>0) e^y \, . \end{displaymath}$

The indicator is always equal to 1 since e^y is always positive. Simplifying we get

$\begin{displaymath}f_Y(y) = \frac{\alpha}{\beta^\alpha} \exp\left\{\alpha y -e^{\alpha y}/\beta^\alpha\right\} \, . \end{displaymath}$

If we define $\phi = \log\beta$ and $\theta = 1/\alpha$ then the density can be written as

$\begin{displaymath}f_Y(y) = \frac{1}{\theta} \exp\left\{\frac{y-\phi}{\theta} -\exp\left\{\frac{y-\phi}{\theta}\right\}\right\} \end{displaymath}$

which is called an Extreme Value density with location parameter $\phi$ and scale parameter $\theta$ . (Note: there are several distributions going under the name Extreme Value.)

Marginalization

Now we turn to multivariate problems. The simplest version has $X=(X_1,\ldots,X_p)$ and Y=X₁ (or in general any X_j).

Theorem 1 If X has (joint) density $f(x_1,\ldots,x_p)$ then $Y=(X_1,\ldots,X_q)$ (with q < p) has a density f_Y given by

$\begin{displaymath}f_{X_1,\ldots,X_q}(x_1,\ldots,x_q) = \int_{-\infty}^\infty \c... ...-\infty}^\infty f(x_1,x_2,\ldots,x_p) \, dx_{q+1} \ldots dx_p \end{displaymath}$

We call $f_{X_1,\ldots,X_q}$ the marginal density of $X_1,\ldots,X_q$ and use the expression joint density for f_X but $f_{X_1,\ldots,X_q}$ is exactly the usual density of $(X_1,\ldots,X_q)$ . The adjective ``marginal'' is just there to distinguish the object from the joint density of X.

Example The function

f(x₁,x₂) = Kx₁x₂1(x₁> 0) 1(x₂ >0) 1(x₁+x₂ < 1)

is a density for a suitable choice of K, namely the value of K making

$\begin{displaymath}P(X\in R^2) = \int_{-\infty}^\infty \int_{-\infty}^\infty f(x_1,x_2)\, dx_1\, dx_2 = 1 \, . \end{displaymath}$

The integral is

$\begin{eqnarray*}K \int_0^1 \int_0^{1-x_1} x_1 x_2 \, dx_1\, dx_2 & = & K \int_0... ...1(1-x_1)^2 \, dx_1 /2 \\ & = & K(1/2 -2/3+1/4)/2 \\ & = & K/24 \end{eqnarray*}$

so that K=24. The marginal density of x₁ is

$\begin{displaymath}f_{X_1}(x_1) = \int_{-\infty}^\infty 24 x_1 x_2 1(x_1> 0) 1(x_2 >0) 1(x_1+x_2 < 1)\, dx_2 \end{displaymath}$

which is the same as

$\begin{displaymath}f_{X_1}(x_1) = 24 \int_0^{1-x_1} x_1 x_2 1(x_1> 0) 1(x_1 < 1) \, dx_2 = 12 x_1(1-x_1)^2 1(0 < x_1 < 1) \end{displaymath}$

This is a $\mbox{Beta}(2,3)$ density.

The general multivariate problem has

$\begin{displaymath}Y=(Y_1,\ldots,Y_q) = ( g_1(X_1,\ldots,X_p), \ldots, g_q(X_1,\ldots,X_p)) \end{displaymath}$

Case 1: If q>p then Y will not have a density for ``smooth'' g. Y will have a singular or discrete distribution. This sort of problem is rarely of real interest. (However, variables of interest often have a singular distribution - this is almost always true of the set of residuals in a regression problem.)

Case 2 If q=p then we will be able to use a change of variables formula which generalizes the one derived above for the case p=q=1. (See below.)

Case 3: If q < p we will try a two step process. In the first step we pad out Y by adding on p-q more variables (carefully chosen) and calling them $Y_{q+1},\ldots,Y_p$ . Formally we find functions $g_{q+1}, \ldots,g_p$ and define

$\begin{displaymath}Z=(Y_1,\ldots,Y_q,g_{q+1}(X_1,\ldots,X_p),\ldots,g_p(X_1,\ldots,X_p)) \end{displaymath}$

If we have chosen the functions carefully we will find that $g=(g_1,\ldots,g_p)$ satisfies the conditions for applying the change of variables formula from the previous case. Then we apply that case to compute f_Z. Finally we marginalize the density of Z to find that of Y:

$\begin{displaymath}f_Y(y_1,\ldots,y_q) = \int_{-\infty}^\infty \cdots \int_{-\in... ...f_Z(y_1,\ldots,y_q,z_{q+1},\ldots,z_p) \, dz_{q+1} \ldots dz_p \end{displaymath}$

Change of Variables

Suppose $Y=g(X) \in R^p$ with $X\in R^p$ having density f_X. Assume the g is a one to one (``injective") map, that is, g(x₁) = g(x₂) if and only if x₁ = x₂. Then we find f_Y as follows:

Step 1: Solve for x in terms of y: x=g^-1(y).

Step 2: Remember the following basic equation

f_Y(y) dy =f_X(x) dx

and rewrite it in the form

$\begin{displaymath}f_Y(y) = f_X(g^{-1}(y)) \frac{dx}{dy} \end{displaymath}$

It is now a matter of interpreting this derivative $\frac{dx}{dy}$ when p>1. The interpretation is simply

$\begin{displaymath}\frac{dx}{dy} = \left\vert \mbox{det}\left(\frac{\partial x_i}{\partial y_j}\right)\right\vert \end{displaymath}$

which is the so called Jacobian of the transform. An equivalent formula inverts the matrix and writes

$\begin{displaymath}f_Y(y) = \frac{f_X(g^{-1}(y))}{ \left\vert\frac{dy}{dx}\right\vert} \end{displaymath}$

This notation means

$\begin{displaymath}\left\vert\frac{dy}{dx}\right\vert = \left\vert \mbox{det} \... ...rac{\partial y_p}{\partial x_p} \end{array} \right]\right\vert \end{displaymath}$

but with x replaced by the corresponding value of y, that is, replace x by g^-1(y).

Example: The density

$\begin{displaymath}f_X(x_1,x_2) = \frac{1}{2\pi} \exp\left\{ -\frac{x_1^2+x_2^2}{2}\right\} \end{displaymath}$

is called the standard bivariate normal density. Let Y=(Y₁,Y₂) where $Y_1=\sqrt{X_1^2+X_2^2}$ and Y₂ is the angle (between 0 and $2\pi$ ) in the plane from the positive x axis to the ray from the origin to the point (X₁,X₂). In other words, Y is X in polar co-ordinates.

The first step is to solve for x in terms of y which gives

$\begin{eqnarray*}X_1 & = & Y_1 \cos(Y_2) \\ X_2 & = & Y_1 \sin(Y_2) \end{eqnarray*}$

so that in formulas

$\begin{eqnarray*}g(x_1,x_2) & = & (g_1(x_1,x_2),g_2(x_1,x_2)) \\ & = & (\sqrt{x... ..._2) & y_1 \cos(y_2) \end{array}\right) \right\vert \\ & = & y_1 \end{eqnarray*}$

It follows that

$\begin{displaymath}f_Y(y_1,y_2) = \frac{1}{2\pi}\exp\left\{-\frac{y_1^2}{2}\right\}y_1 1(0 \le y_1 < \infty) 1(0 \le y_2 < 2\pi ) \end{displaymath}$

Next problem: what are the marginal densities of Y₁ and Y₂? Note that f_Y can be factored into f_Y(y₁,y₂) = h₁(y₁)h₂(y₂) where

$\begin{displaymath}h_1(y_1) = y_1e^{-y_1^2/2} 1(0 \le y_1 < \infty) \end{displaymath}$

and

$\begin{displaymath}h_2(y_2) = 1(0 \le y_2 < 2\pi )/ (2\pi) \end{displaymath}$

It is then easy to see that

$\begin{displaymath}f_{Y_1}(y_1) = \int_{-\infty}^\infty h_1(y_1)h_2(y_2) \, dy_2 = h_1(y_1) \int_{-\infty}^\infty h_2(y_2) \, dy_2 \end{displaymath}$

which says that the marginal density of Y₁ must be a multiple of h₁. The multiplier needed will make the density integrate to 1 but in this case we can easily get

$\begin{displaymath}\int_{-\infty}^\infty h_2(y_2) \, dy_2 = \int_0^{2\pi} (2\pi)^{-1} dy_2 = 1 \end{displaymath}$

so that

$\begin{displaymath}f_{Y_1}(y_1) = y_1e^{-y_1^2/2} 1(0 \le y_1 < \infty) \end{displaymath}$

which is special Weibull density also called a Rayleigh distribution. Similarly

$\begin{displaymath}f_{Y_2}(y_2) = 1(0 \le y_2 < 2\pi )/ (2\pi) \end{displaymath}$

which is the Uniform( $(0,2\pi)$ density. You should be able to check that W=Y₁²/2 has a standard exponential distribution. You should also know that by definition U=Y₁² has a $\chi^2$ distribution on 2 degrees of freedom and be able to find the $\chi^2_2$ density.

Note: When a joint density factors into a product you will always see the phenomenon above -- the factor involving the variable not be integrated out will come out of the integral and so the marginal density will be a multiple of the factor in question. We will see shortly that this happens when and only when the two parts of random vector are independent.

Independence, conditional distributions

In the examples so far the density for X has been specified explicitly. In many situations, however, the process of modelling the data leads to a specification in terms of marginal and conditional distributions.

Definition: Events A and B are independent if

$\begin{displaymath}P(AB) = P(A)P(B) \, . \end{displaymath}$

(Note the notation: AB is the event that both A and B happen. It is also written $A\cap B$ .)

Definition: Events A_i, $i=1,\ldots,p$ are independent if

$\begin{displaymath}P(A_{i_1} \cdots A_{i_r}) = \prod_{j=1}^r P(A_{i_j}) \end{displaymath}$

for any set of distinct indices $i_1,\ldots,i_r$ between 1 and p.

Example: p=3

$\begin{eqnarray*}P(A_1A_2A_3) & = & P(A_1)P(A_2)P(A_3) \\ P(A_1A_2) & = & P(A_1... ...\ P(A_1A_3) & = & P(A_1)P(A_3) \\ P(A_2A_3) & = & P(A_2)P(A_3) \end{eqnarray*}$