Focus on problem of estimation of 1 dimensional parameter.
Mean Squared Error corresponds to using
Risk function of procedure (estimator)
is
Bayes risk of
is
![]() |
||
![]() |
Choose
to minimize
?
Recognize that
is really a joint
density
Justifies notation
.
Compute
a different way by factoring the joint density a different way:
Call
the posterior density.
Found via Bayes theorem (which is why this is Bayesian statistics):
With this notation we can write
Quantity in square brackets is a quadratic function of
;
minimized by
Example: estimating normal
mean
.
Imagine, for example that
is the true speed of sound.
I think this is around 330 metres per second and am pretty sure that I am within 30 metres per second of the truth with that guess.
I might summarize my opinion by saying that I think
has a normal distribution with mean
330 and standard
deviation
.
That is, I take a prior density
for
to be
.
Before I make any
measurements best guess of
minimizes
Now collect 25 measurements of the speed of sound.
Assume: relationship between the measurements and
is that
the measurements are unbiased and that the standard deviation of the
measurement errors is
which I assume that we know.
So model is: given
,
iid
.
The joint density of the data and
is then
Use standard MVN formulas to calculate conditional means and variances.
Alternatively: exponent in joint density has form
In other words the posterior mean of
is
Notice: weight on data is large when
is large or
is small (precise measurements) and small when
is small (precise prior opinion).
Improper priors: When the density does not integrate to 1 we can still follow the machinery of Bayes' formula to derive a posterior.
Example:
; consider
prior density
I.e., Bayes estimate of
for this improper prior is
.
Admissibility: Bayes procedures corresponding to proper priors are
admissible. It follows that for each
and each real
the estimate
Minimax estimation: The risk function of
is simply
. That is, the risk function is constant since
it does not depend on
. Were
Bayes for a proper
prior this would prove that
is minimax. In fact this is also
true but hard to prove.
Example: Given
,
has a Binomial
distribution.
Give
a Beta
prior density
This is
Beta
density.
Mean of Beta
distribution is
.
So Bayes estimate
of
is
Notice: again weighted average of prior mean and mle.
Notice: prior is proper for
and
.
To get
take
; use improper
prior
Again: it is true that
is admissible but
our theorem is not adequate to prove this fact.
The risk function of
is
Coefficient of
is
Coefficient of
is then
Working
backwards: to get these values for
and
require
.
Moreover
Example:
iid
with
known.
Take improper prior for
which is constant.
Posterior of
given
is then
.
Multivariate estimation: common to extend the notion of squared error loss by defining
Bayes estimate is again posterior mean.
Thus
is Bayes for an improper
prior in this problem.
It turns out that
is minimax; its risk function is
the constant
.
If the dimension
of
is 1 or 2 then
is also admissible but if
then it is inadmissible.
Fact first demonstrated by James and Stein who
produced an estimate which is better,
in terms of this risk function, for every
.
So-called James Stein estimator is essentially never used.