next up


Postscript version of this file

STAT 801 Lecture 1

Course outline

Reading for Today's Lecture: Chapter 1 of Casella and Berger.

Goals of Today's Lecture:

Today's notes

Course outline:

Statistics versus Probability

Standard view of scientific inference has a set of theories which make predictions about the outcomes of an experiment:

Theory Prediction
A 1
B 2
C 3

If we conduct the experiment and see outcome 2 we infer that Theory B is correct (or at least that A and C are wrong).

Add Randomness

Theory Prediction
A Usually 1 sometimes 2 never 3
B Usually 2 sometimes 1 never 3
C Usually 3 sometimes 1 never 2

Now if we actually see outcome 2 we infer that Theory B is probably correct, that Theory A is probably not correct and that Theory C is wrong.

Probability Theory is concerned with constructing the table just given: computing the likely outcomes of experiments.

Statistics is concerned with the inverse process of using the table to draw inferences from the outcome of the experiment. How should we do it and how wrong are our inferences likely to be?

Probability Definitions

A Probability Space (sometimes called a Sample Space) is an ordered triple $(\Omega, {\cal F}, P)$. The idea is that $\Omega$ is the set of possible outcomes of a random experiment, $\cal F$ is the set of those events, or subsets of $\Omega$ whose probability is defined and P is the rule for computing probabilities. Formally:

These axioms guarantee that as we compute probabilities by the usual rules, including approximation of an event by a sequence of others we don't get caught in any logical contradictions.

A vector valued random variable is a function X whose domain is $\Omega$ and whose range is in some p dimensional Euclidean space, Rp with the property that the events whose probabilities we would like to calculate from their definition in terms of X are in $\cal F$. We will write $X=(X_1,\ldots,X_p)$. We will want to make sense of

\begin{displaymath}P(X_1 \le x_1, \ldots , X_p \le x_p)
\end{displaymath}

for any constants $(x_1,\ldots,x_p)$. In our formal framework the notation

\begin{displaymath}X_1 \le x_1, \ldots , X_p \le x_p
\end{displaymath}

is just shorthand for an event, that is a subset of $\Omega$, defined as

\begin{displaymath}\left\{\omega\in\Omega: X_1(\omega) \le x_1, \ldots , X_p (\omega) \le x_p \right\}
\end{displaymath}

Remember that X is a function on $\Omega$ so that X1 is also a function on $\Omega$. In almost all of probability and statistics the dependence of a random variable on a point in the probability space is hidden! You almost always see X not $X(\omega)$.

Now for formal definitions:

The Borel $\sigma$-field in Rp is the smallest $\sigma$-field in Rp containing every open ball.

Every common set is a Borel set, that is, in the Borel $\sigma$-field

An Rp valued random variable is a map $X:\Omega\mapsto R^p$ such that when A is Borel then $\{\omega\in\Omega:X(\omega)\in A\} \in \cal F$.

Fact: this is equivalent to

\begin{displaymath}\left\{
\omega\in\Omega: X_1(\omega) \le x_1, \ldots , X_p (\omega) \le x_p
\right\}
\in \cal F
\end{displaymath}

for all $(x_1,\ldots,x_p)\in R^p$.

Jargon and notation: we write $P(X\in A)$ for $P(\{\omega\in\Omega:X(\omega)\in
A)$ and define the distribution of X to be the map

\begin{displaymath}A\mapsto P(X\in A)
\end{displaymath}

which is a probability on the set Rp with the Borel $\sigma$-field rather than the original $\Omega$ and $\cal F$.

The Cumulative Distribution Function (or CDF) of X is the function FX on Rp defined by

\begin{displaymath}F_X(x_1,\ldots, x_p) =
P(X_1 \le x_1, \ldots , X_p \le x_p)
\end{displaymath}

Properties of FX (or just F when there's only one cdf under consideration):

1.
$0 \le F(x) \le 1$.

2.
$ x> y \Rightarrow F(x) \ge F(y)$ (F is monotone non-decreasing).

3.
$\lim_{x\to - \infty} F(x) = 0$

4.
$\lim_{x\to \infty} F(x) = 1$

5.
$\lim_{x\searrow y} F(x) = F(y)$ (F is right continuous).

6.
$\lim_{x\nearrow y} F(x) \equiv F(y-)$ exists.

7.
F(x)-F(x-) = P(X=x).

8.
FX(t) = FY(t) for all t implies that X and Y have the same distribution, that is, $P(X\in A) = P(Y\in A)$ for any (Borel) set A.

The distribution of a random variable X is discrete (we also call the random variable discrete) if there is a countable set $x_1,x_2,\cdots$ such that

\begin{displaymath}P(X \in \{ x_1,x_2 \cdots\}) =1 = \sum_i P(X=x_i)
\end{displaymath}

In this case the discrete density or probability mass function of X is

fX(x) = P(X=x)

The distribution of a random variable X is absolutely continuous if there is a function f such that

\begin{displaymath}P(X\in A) = \int_A f(x) dx
\end{displaymath}

for any (Borel) set A. This is a p dimensional integral in general. This condition is equivalent to

\begin{displaymath}F(x) = \int_{-\infty}^x f(y) \, dy
\end{displaymath}

We call f the density of X. For most values of x we then have F is differentiable at x and

\begin{displaymath}F^\prime(x) =f(x) \, .
\end{displaymath}

Example: X is exponential.

\begin{displaymath}F(x) = \left\{ \begin{array}{ll}
1- e^{-x} & x > 0
\\
0 & x \le 0
\end{array}\right.
\end{displaymath}


\begin{displaymath}f(x) = \left\{ \begin{array}{ll}
e^{-x} & x> 0
\\
\mbox{undefined} & x= 0
\\
0 & x < 0
\end{array}\right.
\end{displaymath}

Distribution Theory

General Problem: Start with assumptions about the density or cdf of a random vector $X=(X_1,\ldots,X_p)$. Define $Y=g(X_1,\ldots,X_p)$ to be some function of X (usually some statistic of interest). How can we compute the distribution or cdf or density of Y?

Univariate Techniques

Method 1: compute the cdf by integration and differentiate to find fY.

Example: $U \sim \mbox{Uniform}[0,1]$ and $Y=-\log U$. Then

\begin{eqnarray*}F_Y(y) & = & P(Y \le y)
= P(-\log U \le y)
\\
& = & P(\log U ...
...{array}{ll}
1- e^{-y} & y > 0
\\
0 & y \le 0
\end{array}\right.
\end{eqnarray*}


so that Y has a standard exponential distribution.

Example: $Z \sim N(0,1)$, i.e.

\begin{displaymath}f_Z(z) = \frac{1}{\sqrt{2\pi}} e^{-z^2/2}
\end{displaymath}

and Y=Z2. Then

\begin{displaymath}F_Y(y) = P(Z^2 \le y) =
\left\{ \begin{array}{ll}
0 & y < ...
...
P(-\sqrt{y} \le Z \le \sqrt{y}) & y \ge 0
\end{array}\right.
\end{displaymath}

Now

\begin{displaymath}P(-\sqrt{y} \le Z \le \sqrt{y}) = F_Z(\sqrt{y}) -F_Z(-\sqrt{y})
\end{displaymath}

can be differentiated to obtain

\begin{displaymath}f_Y(y) = \left\{ \begin{array}{ll}
0 & y < 0
\\
\frac{d}{dy...
...\right] & y > 0
\\
\mbox{undefined} & y=0
\end{array}\right.
\end{displaymath}

Then

\begin{eqnarray*}\frac{d}{dy} F_Z(\sqrt{y}) & = & f_Z(\sqrt{y})\frac{d}{dy}\sqrt...
...frac{1}{2} y^{-1/2}
\\
& = & \frac{1}{2\sqrt{2\pi y}} e^{-y/2}
\end{eqnarray*}


with a similar formula for the other derivative. Thus

\begin{displaymath}f_Y(y) = \left\{ \begin{array}{ll}
\frac{1}{\sqrt{2\pi y}} e...
...0
\\
0 & y < 0
\\
\mbox{undefined} & y=0
\end{array}\right.
\end{displaymath}

We will find indicator notation useful:


\begin{displaymath}1(y>0) = \left\{ \begin{array}{ll}
1 & y>0
\\
0 & y \le 0
\end{array}\right.
\end{displaymath}

which we use to write

\begin{displaymath}f_Y(y) = \frac{1}{\sqrt{2\pi y}} e^{-y/2} 1(y>0)
\end{displaymath}

(changing the definition unimportantly at y=0).

Notice: I never evaluated FY before differentiating it. In fact FY and FZ are integrals I can't do but I can differentiate then anyway. You should remember the fundamental theorem of calculus:

\begin{displaymath}\frac{d}{dx} \int_a^x f(y) \, dy = f(x)
\end{displaymath}

at any x where f is continuous.


next up



Richard Lockhart
1998-10-26