Most statistical models drawing conclusions from data utilize the
principle of maximum-likelihood as their basis for statistical inference.
Though it is often not explicitly stated in text books, familiar
statistical models such as simple linear regression are formulated
using the principle of maximum-likelihood.
The premise of maximum-likelihood estimation is that the parameter
estimates, and their uncertainties, obtained from data drawn from
a population, facilitate inference about the point 'true' values
of those parameters in that population.
Conceptually, the goal of maximum-likelihood estimation is quite
simple, the idea being to identify a deterministic
model that renders your observed data most likely to have been generated
by that model.
That is, the distribution of your observed data should depart from
that deterministic model in a manner
that mimics the error specification
for your model.
This is particularly intuitive for basic models like simple linear
regression where, in the absence of a statistical methodology, your
temptation would be to draw a straight line through your observed
data.
Maximum-likelihood techniques formalize this intuition by using
statistical theory to determine the location of that straight line.
Under the assumption of independence of observed data points, the
principle of maximum-likelihood is applied by simply multiplying
the probabilities (or probability densities) of each individual,
independent data point, given the deterministic
model structure and parameter estimates.
However, this is a conditional probability, which means that an
explicit statement of probability requires that we know the probability
of the model we are using.
Since there are an infinite number of potential models, we cannot
provide an explicit statement of probability of our particular model.
Thus we can only rank model probabilities so we refer to our statistical
goal as finding the most likely model, rather than the most probable,
model.
Typically, to avoid dealing with extremely small numbers that arise
by sequentially multiplying values less than unity (the probabilities),
we choose to sum the negative natural logarithm of those probabilities.
This convenience also comes with satisfying statistical properties
for drawing uncertainty inferences about parameter estimates.
These properties help explain the almost universal use of the negative
ln-likelihood as a basis for statistical inference.
For example, the familiar minimum sum-of-squares parameter estimates
associated with regression analysis and analysis of variance are
maximum-likelihood estimates because the sum-of-squares calculation is derived
from the negative ln-likelihood of a Gaussian (normal) distribution of errors.
|