No Title

STAT 450

Lecture 26

Rao Blackwell Theorem

Theorem: Suppose that S(X) is a sufficient statistic for some model $\{P_\theta,\theta\in\Theta\}$ . If T is an estimate of some parameter $\phi(\theta)$ then:

1.: E(T|S) is a statistic.
2.: E(T|S) has the same bias as T; if T is unbiased so is E(T|S).
3.: ${\rm Var}_\theta(E(T\vert S)) \le {\rm Var}_\theta(T)$ and the inequality is strict unless T is a function of S.
4.: The MSE of E(T|S) is no more than that of T.

Fact: If X,Y has joint density f_X,Y(x,y) and conditional density f(y|x) then

$\begin{displaymath}\text{E}(Y\vert X=x) = \int y f(y\vert x) dy \end{displaymath}$

or for discrete X,Y

$\begin{displaymath}\text{E}(Y\vert X=x) = \sum_y y f(y\vert x) = \sum_y yP(Y=y\vert X=x) \end{displaymath}$

How to use the theorem:

1.: Guess an unbiased estimate T of the parameter of interest. (Example: if $\mu$ is population mean then X₁X₂ is unbiased for $\mu^2$ .)
2.: Find sufficient statistic S.
3.: Compute $\text{E}(T\vert S=s)$ by doing integral or sum. Get formula with s in it.
4.: Replace s by S to find new estimator.

Examples:

Binomial(n,p) problem: Y₁(1-Y₂) is unbiased estimate of p(1-p). Compute

E(Y₁(1-Y₂)|X)

Two steps. First compute

E(Y₁(1-Y₂)|X=x)

Notice that Y₁(1-Y₂) is either 1 or 0 so:
$\begin{align*}E(Y_1(1-Y_2) \vert X=x) & = P(Y_1(1-Y_2) =1 \vert X=x) \\ & = P(Y... ...rac{\dbinom{n-2}{x-1}}{\dbinom{n}{x}} \\ & = \frac{x(n-x)}{n(n-1)} \end{align*}$
This is simply $n\hat p(1-\hat p)/(n-1)$ (which can be bigger than 1/4 which is the maximum value of p(1-p)).

Example: If $X_1,\ldots,X_n$ are iid $N(\mu,1)$ then $\bar{X}$ is sufficient and X₁ is an unbiased estimate of $\mu$ . Now
$\begin{align*}E(X_1\vert\bar{X})& = E[X_1-\bar{X}+\bar{X}\vert\bar{X}] \\ & = E[X_1-\bar{X}\vert\bar{X}] + \bar{X} \\ & = \bar{X} \end{align*}$
which is the UMVUE.

Finding Sufficient statistics

Binomial example: $\ell$ is a function of X (and not of the original data $Y_1, \ldots,Y_n$ as well). Normal example:

$\begin{displaymath}\ell(\mu) = \mu \sum X_i - n\mu^2/2 = n\mu\bar{X} -n\mu^2/2 \, . \end{displaymath}$

These are examples of the Factorization Criterion:

Theorem: If the model for data X has density $f(x,\theta)$ then the statistic S(X) is sufficient if and only if the density can be factored as

$\begin{displaymath}f(x,\theta) = g(s(x),\theta)h(x) \end{displaymath}$

The theorem is proved by finding a statistic T(x) such that X is a one to one function of the pair S,T and applying the change of variables to the joint density of S and T. If the density factors then you get

$\begin{displaymath}f_{S,T}(s,t) =g(s,\theta) h(x(s,t)) \end{displaymath}$

from which we see that the conditional density of T given S=s does not depend on $\theta$ . Thus the conditional distribution of (S,T) given S does not depend on $\theta$ and finally the conditional distribution of X given S does not depend on $\theta$ . Conversely if S is sufficient then the conditional density of T given S has no $\theta$ in it and the joint density of S,T is

$\begin{displaymath}f_S(s,\theta) f_{T\vert S} (t\vert s) \end{displaymath}$

Apply the change of variables formula to get the density of Xto be

$\begin{displaymath}f_S(s(x),\theta) f_{T\vert S} (t(x)\vert s(x)) J(x) \end{displaymath}$

where J is the Jacobian. This factors.

Example: If $X_1,\ldots,X_n$ are iid $N(\mu,\sigma^2)$ then the joint density is

$\begin{displaymath}(2\pi)^{-n/2} \sigma^{-n} \exp\{-\sum X_i^2/(2\sigma^2) +\mu\sum X_i/\sigma^2 -n\mu^2/(2\sigma^2)\} \end{displaymath}$

which is evidently a function of

$\begin{displaymath}\sum X_i^2, \sum X_i \end{displaymath}$

This pair is a sufficient statistic. You can write this pair as a bijective function of $\bar{X}, \sum (X_i-\bar{X})^2$ so that this pair is also sufficient.

Completeness

In the Binomial(n,p) example I showed that there is only one function of Xwhich is unbiased. The Rao Blackwell theorem shows that a UMVUE, if it exists, will be a function of any sufficient statistic. If I change T might I get a different E(T|S)? Generally the answer is yes but for some models like the binomial with S=X the answer is no.

Definition: A statistic T is complete for a model $P_\theta;\theta\in\Theta$ if

$\begin{displaymath}E_\theta(h(T)) = 0 \end{displaymath}$

for all $\theta$ implies h(T)=0.

We have already seen that X is complete in the Binomial(n,p) model. In the $N(\mu,1)$ model suppose

$\begin{displaymath}E_\mu(h(\bar{X})) \equiv 0 \end{displaymath}$

Since $\bar{X}$ has a $N(\mu,1/n)$ distribution we find that

$\begin{displaymath}\int_{-\infty}^\infty h(x) e^{-nx^2/2} e^{n\mu x} dx \equiv 0 \end{displaymath}$

This is the so called Laplace transform of the function h(x)e^-nx²/2. It is a theorem that a Laplace transform is 0 if and only if the function is 0 ( because you can invert the transform). Hence $h\equiv 0$ .

How to Prove Completeness

There is only one general tactic. Suppose X has density

$\begin{displaymath}f(x,\theta) = h(x) \exp\{\sum_1^p a_i(\theta)S_i(x)+c(\theta)\} \end{displaymath}$

If the range of the function $(a_1(\theta),\ldots,a_p(\theta))$ (as $\theta$ varies over $\Theta$ contains a (hyper-) rectangle in R^p then the statistic

$\begin{displaymath}( S_1(X), \ldots, S_p(X)) \end{displaymath}$

is complete and sufficient.

Example: In the $N(\mu,\sigma^2)$ model the density has the form

$\begin{displaymath}\frac{1}{\sqrt{2\pi}} \exp\left\{ \left(-\frac{1}{2\sigma^2}... ...rac{\mu}{\sigma^2} \right)x -\frac{\mu^2}{2\sigma^2} \right\} \end{displaymath}$

which is an exponential family with

$\begin{displaymath}h(x) = \frac{1}{\sqrt{2\pi}} \end{displaymath}$

$\begin{displaymath}a_1(\theta) = -\frac{1}{2\sigma^2} \end{displaymath}$

S₁(x) = x²

$\begin{displaymath}a_2(\theta) = \frac{\mu}{\sigma^2} \end{displaymath}$

S₂(x) = x

and

$\begin{displaymath}c(\theta) = -\frac{\mu^2}{2\sigma^2} \end{displaymath}$

It follows that

$\begin{displaymath}(\sum X_i^2, \sum X_i) \end{displaymath}$

is a complete sufficient statistic.

Remark: The statistic $(s^2, \bar{X})$ is a one to one function of $(\sum X_i^2, \sum X_i)$ so it must be complete and sufficient, too. Any function of the latter statistic can be rewritten as a function of the former and vice versa.

The Lehmann-Scheffé Theorem

Theorem: If S is a complete sufficient statistic for some model and h(S) is an unbiased estimate of some parameter $\phi(\theta)$ then h(S) is the UMVUE of $\phi(\theta)$ .

Proof: Suppose T is another unbiased estimate of $\phi$ . According to the Rao-Blackwell theorem T is improved by E(T|S) so if h(S) is not UMVUE then there must exist another function h^*(S) which is unbiased and whose variance is smaller than that of h(S) for some value of $\theta$ . But

$\begin{displaymath}E_\theta(h^*(S)-h(S)) \equiv 0 \end{displaymath}$

so, in fact h^*(S) = h(S).

Example: In the $N(\mu,\sigma^2)$ example the random variable $(n-1)s^2/\sigma^2$ has a $\chi^2_{n-1}$ distribution. It follows that
$\begin{multline*}E\left[\frac{\sqrt{n-1}s}{\sigma}\right] \\ = \int_0^\infty x... ...ac{x}{2}\right)^{(n-1)/2-1} e^{-x/2} \frac{dx}{2\Gamma((n-1)/2)} \end{multline*}$
Substitute y=x/2 and get

$\begin{displaymath}E(s) = \frac{\sigma}{\sqrt{n-1}}\frac{\sqrt{2}}{\Gamma((n-1)/2)} \int_0^\infty y^{n/2-1} e^{-y} dy \end{displaymath}$

Hence

$\begin{displaymath}E(s) = \sigma\frac{\sqrt{2(n-1)}\Gamma(n/2)}{\sqrt{n-1}\Gamma((n-1)/2)} \end{displaymath}$

The UMVUE of $\sigma$ is then

$\begin{displaymath}s\frac{\sqrt{n-1}\Gamma((n-1)/2)}{\sqrt{2(n-1)}\Gamma(n/2)} \end{displaymath}$

by the Lehmann-Scheffé theorem.

Criticism of Unbiasedness

1.

The UMVUE can be inadmissible for squared error loss meaning that there is a (biased, of course) estimate whose MSE is smaller for every parameter value. An example is the UMVUE of $\phi=p(1-p)$ which is $\hat\phi =n\hat{p}(1-\hat{p})/(n-1)$ . The MSE of

$\begin{displaymath}\tilde{\phi} = \min(\hat\phi,1/4) \end{displaymath}$

is smaller than that of $\hat\phi$ .

2.

There are examples where unbiased estimation is impossible. The log odds in a Binomial model is $\phi=\log(p/(1-p))$ . Since the expectation of any function of the data is a polynomial function of p and since $\phi$ is not a polynomial function of p there is no unbiased estimate of $\phi$

3.

The UMVUE of $\sigma$ is not the square root of the UMVUE of $\sigma^2$ . This method of estimation does not have the parameterization equivariance that maximum likelihood does.

4.

Unbiasedness is irrelevant (unless you plan to average together many estimators). The property is an average over possible values of the estimate in which positive errors are allowed to cancel negative errors. An exception to this criticism is that if you plan to average a number of estimators to get a single estimator then it is a problem if all the estimators have the same bias. In assignment 5 you have the one way layout example in which the mle of the residual variance averages together many biased estimates and so is very badly biased. That assignment shows that the solution is not really to insist on unbiasedness but to consider an alternative to averaging for putting the individual estimates together.

Minimal Sufficiency

In any model the statistic $S(X)\equiv X$ is sufficient. In any iid model the vector of order statistics $X_{(1)}, \ldots, X_{(n)}$ is sufficient. In the $N(\mu,1)$ model then we have three possible sufficient statistics:

1.: $S_1 = (X_1,\ldots,X_n)$ .
2.: $S_2 = (X_{(1)}, \ldots, X_{(n)})$ .
3.: $S_3 = \bar{X}$ .

Notice that I can calculate S₃ from the values of S₁ or S₂but not vice versa and that I can calculate S₂ from S₁ but not vice versa. It turns out that $\bar{X}$ is a minimal sufficient statistic meaning that it is a function of any other sufficient statistic. (You can't collapse the data set any more without losing information about $\mu$ .)

To recognize minimal sufficient statistics you look at the likelihood function:

Fact: If you fix some particular $\theta^*$ then the log likelihood ratio function

$\begin{displaymath}\ell(\theta)-\ell(\theta^*) \end{displaymath}$

is minimal sufficient. WARNING: the function is the statistic.

The subtraction of $\ell(\theta^*)$ gets rid of those irrelevant constants in the log-likelihood. For instance in the $N(\mu,1)$ example we have

$\begin{displaymath}\ell(\mu) = -n\log(2\pi)/2 - \sum X_i^2/2 + \mu\sum X_i -n\mu^2/2 \end{displaymath}$

This depends on $\sum X_i^2$ which is not needed for the sufficient statistic. Take $\mu^*=0$ and get

$\begin{displaymath}\ell(\mu) -\ell(\mu^*) = \mu\sum X_i -n\mu^2/2 \end{displaymath}$

This function of $\mu$ is minimal sufficient. Notice that from $\sum X_i$ you can compute this minimal sufficient statistic and vice versa. Thus $\sum X_i$ is also minimal sufficient.

FACT: A complete sufficient statistic is also minimal sufficient.

Richard Lockhart
1999-11-15