next up


Postscript version of this file

STAT 450 Lecture 1

Course outline

Reading for Today's Lecture: Chapter 1 and Sections 1, 2 and 3 of Chapter 2 of Mood, Graybill and Boes.

Goals of Today's Lecture:

Today's notes

Course outline:

Statistics versus Probability

Standard view of scientific inference has a set of theories which make predictions about the outcomes of an experiment:

Theory Prediction
A 1
B 2
C 3

If we conduct the experiment and see outcome 2 we infer that Theory B is correct (or at least that A and C are wrong).

Add Randomness

Theory Prediction
A Usually 1 sometimes 2 never 3
B Usually 2 sometimes 1 never 3
C Usually 3 sometimes 1 never 2

Now if we actually see outcome 2 we infer that Theory B is probably correct, that Theory A is probably not correct and that Theory C is wrong.

Probability Theory is concerned with constructing the table just given: computing the likely outcomes of experiments.

Statistics is concerned with the inverse process of using the table to draw inferences from the outcome of the experiment. How should we do it and how wrong are our inferences likely to be?

In this course we will be working on Probability for quite a while before we do Statistics.

Probability Definitions

I am going to begin the course with some very formal mathematical definitions. These definitions need not be memorized and I won't expect you to use them in the homework. I want you to see that the mathematics does not really define the idea of random; instead we just give computational rules which match, we think, our intuitive notion of what probability ought to mean.

Definition: A Probability Space is an ordered triple $(\Omega, {\cal F}, P)$. The idea is that $\Omega$ (called the Sample Space) is the set of possible outcomes of a random experiment, $\cal F$ is the set of those events, or subsets of $\Omega$ whose probability is defined and P is the rule for computing probabilities. [The book uses the jargon Event Space for $\cal F$ but this is not standard and I won't be using the term.] Formally:

These axioms guarantee that as we compute probabilities by the usual rules, including approximation of an event by a sequence of others we don't get caught in any logical contradictions. The symbol $\sigma$ in the definition of a $\sigma$-field means we allow countably infinite unions and intersections. The book discusses the situation with only finite unions and intersections; those examples where we have finite additivity but not countable additivity are mathematical pathologies, in my view. I won't be discussing such things.

NOTE: This definition is included principally to make clear that the mathematics of probability theory is not mystical in any way. You may well feel that the definition completely fails to capture the spirit of ``randomness''. The definitions merely provide the rules for manipulating probabilities; they don't provide any intuition. There will be no problems in this course using the definitions just given.

Definition: A real random variable is a function X whose domain is $\Omega$ and whose range is in the real line R1with the property that

\begin{displaymath}P(\{\omega\in \Omega; X(\omega) \le x\})
\end{displaymath}

is defined for each $x\in R$ (that is, $\{\omega\in \Omega; X(\omega) \le x\}$is in $\cal F$).

Notation: we will write $P(X \le x)$ for $
P(\{\omega\in \Omega; X(\omega) \le x\})
$

Idea: defined events in terms of numerical quantities determined by the outcome of a random experiment.

Definition: A Rp- valued random variable X is just $X=(X_1,\ldots,X_k)$ where each Xi is a real valued random variable.

Definition: The Cumulative Distribution Function (or CDF) of a real random variable X is given by

\begin{displaymath}F_X(x) = P(X \le x)
\end{displaymath}

Definition: The Cumulative Distribution Function (or CDF) of an Rp valued random variable X is the function FX on Rpdefined by

\begin{displaymath}F_X(x_1,\ldots, x_p) =
P(X_1 \le x_1, \ldots , X_p \le x_p)
\end{displaymath}

In this course we will usually study F only for p=1.

Properties of FX (or just F when there's only one CDF under consideration) in the case p=1:

1.
$0 \le F(x) \le 1$.

2.
$ x> y \Rightarrow F(x) \ge F(y)$ (F is monotone non-decreasing).

3.
$\lim_{x\to - \infty} F(x) = 0$

4.
$\lim_{x\to \infty} F(x) = 1$

5.
$\lim_{x\searrow y} F(x) = F(y)$ (F is right continuous).

6.
$\lim_{x\nearrow y} F(x) \equiv F(y-)$ exists (the jargon is that F has left limits).

7.
F(x)-F(x-) = P(X=x).

8.
FX(t) = FY(t) for all t implies that X and Y have the same distribution, that is, $P(X\in A) = P(Y\in A)$ for any (Borel) set A.

As an illustration of the role of the axioms of probability I will prove the second and fourth of these assertions.

Proof of 2: Let $A=\{\omega\in \Omega: X(\omega) \le y\}$and $B=\{\omega\in \Omega: X(\omega) \le x\}$. Let $C = \{\omega\in\Omega:y < X(\omega)\le x\}$. Then an individual $\omega$ in B must be in either A or C and an $\omega$ in either A or C will be in B so $B=A\cup C$. If $\omega$ belongs to C then it does not belong to A and vice-versa; that is, $A\cap C =
\emptyset$. The second axiom of probability tells us that

\begin{displaymath}P(B)=P(A)+P(C) \, .
\end{displaymath}

Now you have to recognize from the definition that P(B)=F(x) and P(A) = F(y) so

\begin{displaymath}F(x)=F(y)+P(C) \, .
\end{displaymath}

Since $P(C) \ge 0$ we get $F(x) \ge F(y)$ which ends the proof. $\bullet$

Proof of 4: Here we are assuming that X has values in R; that is we are assuming that $P(-\infty < X < \infty)=1$. Let $A_n=\{\omega\in \Omega: X(\omega) \le n\}$ for $n=0,1,\ldots$and define

\begin{displaymath}B_n = A_n \setminus A_{n-1} = \{\omega\in \Omega: n-1 <
X(\omega) \le n\}
\end{displaymath}

for n=1,2,.... Let B0=A0. Then we have

\begin{displaymath}\{-\infty < X < \infty\} = \cup_{n=0}^\infty B_n \, .
\end{displaymath}

Moreover, the Bn are pairwise disjoint so
\begin{align*}1 & = P(-\infty < X < \infty) \\
& = P(\cup_{n=0}^\infty B_n) \\ ...
...n) \\
& = \lim_{m\to \infty} P(A_m) \\
& = \lim_{m\to\infty} F(m)
\end{align*}
You should make sure you understand why each line is correct. The first is our assumption that X is real valued. The second rewrites the event $-\infty < X < \infty$ as I did above. The third line uses the second axiom in the definition of probability. The fourth line uses the fact that the definition of an infinite sum is that it is the limit of partial sums. The fifth line uses the second axiom for probabilities again but with a finite union. The sixth line recognizes that if $X( \omega) \le m$ then either $X(\omega) \le 0$ or $X(\omega)$ is between some pair of integers n-1 and n with $1 \le n \le m$. The last line is just the definition of F(m). $\bullet$

Definition: The distribution of a random variable X is discrete (we also call the random variable discrete) if there is a countable set $x_1,x_2,\cdots$ such that

\begin{displaymath}P(X \in \{ x_1,x_2 \cdots\}) =1 = \sum_i P(X=x_i)
\end{displaymath}

In this case the discrete density or probability mass function of X is

fX(x) = P(X=x)

The distribution of a random variable X is absolutely continuous if there is a function f such that

\begin{displaymath}P(X\in A) = \int_A f(x) dx
\end{displaymath}

for any set A. This is a p dimensional integral in general. This condition is equivalent (for p=1) to

\begin{displaymath}F(x) = \int_{-\infty}^x f(y) \, dy
\end{displaymath}

We call f the density of X. For most values of x we then have F is differentiable at x and (for p=1)

\begin{displaymath}F^\prime(x) =f(x) \, .
\end{displaymath}

Notation: Some students will not be used to the notation $ \int_A f(x) dx$ for a multiple integral. For instance if p=2 and the set A is the disk of radius 1 centered at the origin then

\begin{displaymath}\int_A f(x) dx \equiv \int_{-1}^1
\int_{-\sqrt{1-x_1^2}}^{\sqrt{1-x_1^2}}
f(x_1,x_2) \, dx_2 \, dx_1
\end{displaymath}

Notation: X is exponential.

\begin{displaymath}F(x) = \left\{ \begin{array}{ll}
1- e^{-x} & x > 0
\\
0 & x \le 0
\end{array}\right.
\end{displaymath}


\begin{displaymath}f(x) = \left\{ \begin{array}{ll}
e^{-x} & x> 0
\\
\mbox{undefined} & x= 0
\\
0 & x < 0
\end{array}\right.
\end{displaymath}

Notation: The function

\begin{displaymath}f(u,v) = \begin{cases}
u+v & 0 < u,v < 1 \\
0 & \text{otherwise}
\end{cases}\end{displaymath}

is a density. The corresponding cdf is

\begin{displaymath}F(x,y) = \begin{cases}
0 & x\le 0 \text{ or } y\le 0 \\
xy(x...
... x \ge 1 \text{ and } 0 < y < 1 \\
1 & x,y \ge 1
\end{cases}
\end{displaymath}


next up



Richard Lockhart
1999-09-14