No Title

STAT 450

Lecture 27

Last time:

Finding Sufficient statistics: Factorization Criterion

Theorem: If the model for data X has density $f(x,\theta)$ then the statistic S(X) is sufficient if and only if the density can be factored as

$\begin{displaymath}f(x,\theta) = g(s(x),\theta)h(x) \end{displaymath}$

Completeness

In the Binomial(n,p) example I showed that there is only one function of Xwhich is unbiased. The Rao Blackwell theorem shows that a UMVUE, if it exists, will be a function of any sufficient statistic. If I change T might I get a different E(T|S)? Generally the answer is yes but for some models like the binomial with S=X the answer is no.

Definition: A statistic T is complete for a model $P_\theta;\theta\in\Theta$ if

$\begin{displaymath}E_\theta(h(T)) = 0 \end{displaymath}$

for all $\theta$ implies h(T)=0.

We have already seen that X is complete in the Binomial(n,p) model. In the $N(\mu,1)$ model suppose

$\begin{displaymath}E_\mu(h(\bar{X})) \equiv 0 \end{displaymath}$

Since $\bar{X}$ has a $N(\mu,1/n)$ distribution we find that

$\begin{displaymath}\int_{-\infty}^\infty h(x) e^{-nx^2/2} e^{n\mu x} dx \equiv 0 \end{displaymath}$

This is the so called Laplace transform of the function h(x)e^-nx²/2. It is a theorem that a Laplace transform is 0 if and only if the function is 0 ( because you can invert the transform). Hence $h\equiv 0$ .

How to Prove Completeness

There is only one general tactic. Suppose X has density

$\begin{displaymath}f(x,\theta) = h(x) \exp\{\sum_1^p a_i(\theta)S_i(x)+c(\theta)\} \end{displaymath}$

If the range of the function $(a_1(\theta),\ldots,a_p(\theta))$ (as $\theta$ varies over $\Theta$ contains a (hyper-) rectangle in R^p then the statistic

$\begin{displaymath}( S_1(X), \ldots, S_p(X)) \end{displaymath}$

is complete and sufficient.

Example: In the $N(\mu,\sigma^2)$ model the density has the form

$\begin{displaymath}\frac{1}{\sqrt{2\pi}} \exp\left\{ \left(-\frac{1}{2\sigma^2}... ...rac{\mu}{\sigma^2} \right)x -\frac{\mu^2}{2\sigma^2} \right\} \end{displaymath}$

which is an exponential family with

$\begin{displaymath}h(x) = \frac{1}{\sqrt{2\pi}} \end{displaymath}$

$\begin{displaymath}a_1(\theta) = -\frac{1}{2\sigma^2} \end{displaymath}$

S₁(x) = x²

$\begin{displaymath}a_2(\theta) = \frac{\mu}{\sigma^2} \end{displaymath}$

S₂(x) = x

and

$\begin{displaymath}c(\theta) = -\frac{\mu^2}{2\sigma^2} \end{displaymath}$

It follows that

$\begin{displaymath}(\sum X_i^2, \sum X_i) \end{displaymath}$

is a complete sufficient statistic.

Remark: The statistic $(s^2, \bar{X})$ is a one to one function of $(\sum X_i^2, \sum X_i)$ so it must be complete and sufficient, too. Any function of the latter statistic can be rewritten as a function of the former and vice versa.

The Lehmann-Scheffé Theorem

Theorem: If S is a complete sufficient statistic for some model and h(S) is an unbiased estimate of some parameter $\phi(\theta)$ then h(S) is the UMVUE of $\phi(\theta)$ .

Proof: Suppose T is another unbiased estimate of $\phi$ . According to the Rao-Blackwell theorem T is improved by E(T|S) so if h(S) is not UMVUE then there must exist another function h^*(S) which is unbiased and whose variance is smaller than that of h(S) for some value of $\theta$ . But

$\begin{displaymath}E_\theta(h^*(S)-h(S)) \equiv 0 \end{displaymath}$

so, in fact h^*(S) = h(S).

Example: In the $N(\mu,\sigma^2)$ example the random variable $(n-1)s^2/\sigma^2$ has a $\chi^2_{n-1}$ distribution. It follows that
$\begin{multline*}E\left[\frac{\sqrt{n-1}s}{\sigma}\right] \\ = \int_0^\infty x... ...ac{x}{2}\right)^{(n-1)/2-1} e^{-x/2} \frac{dx}{2\Gamma((n-1)/2)} \end{multline*}$
Substitute y=x/2 and get

$\begin{displaymath}E(s) = \frac{\sigma}{\sqrt{n-1}}\frac{\sqrt{2}}{\Gamma((n-1)/2)} \int_0^\infty y^{n/2-1} e^{-y} dy \end{displaymath}$

Hence

$\begin{displaymath}E(s) = \sigma\frac{\sqrt{2(n-1)}\Gamma(n/2)}{\sqrt{n-1}\Gamma((n-1)/2)} \end{displaymath}$

The UMVUE of $\sigma$ is then

$\begin{displaymath}s\frac{\sqrt{n-1}\Gamma((n-1)/2)}{\sqrt{2(n-1)}\Gamma(n/2)} \end{displaymath}$

by the Lehmann-Scheffé theorem.

Criticism of Unbiasedness

1.

The UMVUE can be inadmissible for squared error loss meaning that there is a (biased, of course) estimate whose MSE is smaller for every parameter value. An example is the UMVUE of $\phi=p(1-p)$ which is $\hat\phi =n\hat{p}(1-\hat{p})/(n-1)$ . The MSE of

$\begin{displaymath}\tilde{\phi} = \min(\hat\phi,1/4) \end{displaymath}$

is smaller than that of $\hat\phi$ .

2.

There are examples where unbiased estimation is impossible. The log odds in a Binomial model is $\phi=\log(p/(1-p))$ . Since the expectation of any function of the data is a polynomial function of p and since $\phi$ is not a polynomial function of p there is no unbiased estimate of $\phi$

3.

The UMVUE of $\sigma$ is not the square root of the UMVUE of $\sigma^2$ . This method of estimation does not have the parameterization equivariance that maximum likelihood does.

4.

Unbiasedness is irrelevant (unless you plan to average together many estimators). The property is an average over possible values of the estimate in which positive errors are allowed to cancel negative errors. An exception to this criticism is that if you plan to average a number of estimators to get a single estimator then it is a problem if all the estimators have the same bias. In assignment 5 you have the one way layout example in which the mle of the residual variance averages together many biased estimates and so is very badly biased. That assignment shows that the solution is not really to insist on unbiasedness but to consider an alternative to averaging for putting the individual estimates together.

Minimal Sufficiency

In any model the statistic $S(X)\equiv X$ is sufficient. In any iid model the vector of order statistics $X_{(1)}, \ldots, X_{(n)}$ is sufficient. In the $N(\mu,1)$ model then we have three possible sufficient statistics:

1.: $S_1 = (X_1,\ldots,X_n)$ .
2.: $S_2 = (X_{(1)}, \ldots, X_{(n)})$ .
3.: $S_3 = \bar{X}$ .

Notice that I can calculate S₃ from the values of S₁ or S₂but not vice versa and that I can calculate S₂ from S₁ but not vice versa. It turns out that $\bar{X}$ is a minimal sufficient statistic meaning that it is a function of any other sufficient statistic. (You can't collapse the data set any more without losing information about $\mu$ .)

To recognize minimal sufficient statistics you look at the likelihood function:

Fact: If you fix some particular $\theta^*$ then the log likelihood ratio function

$\begin{displaymath}\ell(\theta)-\ell(\theta^*) \end{displaymath}$

is minimal sufficient. WARNING: the function is the statistic.

The subtraction of $\ell(\theta^*)$ gets rid of those irrelevant constants in the log-likelihood. For instance in the $N(\mu,1)$ example we have

$\begin{displaymath}\ell(\mu) = -n\log(2\pi)/2 - \sum X_i^2/2 + \mu\sum X_i -n\mu^2/2 \end{displaymath}$

This depends on $\sum X_i^2$ which is not needed for the sufficient statistic. Take $\mu^*=0$ and get

$\begin{displaymath}\ell(\mu) -\ell(\mu^*) = \mu\sum X_i -n\mu^2/2 \end{displaymath}$

This function of $\mu$ is minimal sufficient. Notice that from $\sum X_i$ you can compute this minimal sufficient statistic and vice versa. Thus $\sum X_i$ is also minimal sufficient.

FACT: A complete sufficient statistic is also minimal sufficient.

Richard Lockhart
1999-11-15