Linear Models

Introduction to Regression Procedures

Linear Models

In matrix algebra notation, a linear model is written as

$y = X {\beta}+ {{\epsilon}}$

where X is the n ×k design matrix (rows are observations and columns are the regressors), $\beta$ is the k ×1 vector of unknown parameters, and $\epsilon$ is the n ×1 vector of unknown errors. The first column of X is usually a vector of 1s used in estimating the intercept term.

The statistical theory of linear models is based on strict classical assumptions. Ideally, the response is measured with all the factors controlled in an experimentally determined environment. If you cannot control the factors experimentally, some tests must be interpreted as being conditional on the observed values of the regressors.

Other assumptions are that

the form of the model is correct (all important explanatory variables have been included)
regressor variables are measured without error
the expected value of the errors is zero
the variance of the errors (and thus the dependent variable) is a constant across observations (called $\sigma^2$ )
the errors are uncorrelated across observations

When hypotheses are tested, the additional assumption is made that the errors are normally distributed.

Statistical Model

If the model satisfies all the necessary assumptions, the least-squares estimates are the best linear unbiased estimates (BLUE). In other words, the estimates have minimum variance among the class of estimators that are unbiased and are linear functions of the responses. If the additional assumption that the error term is normally distributed is also satisfied, then

the statistics that are computed have the proper sampling distributions for hypothesis testing
parameter estimates are normally distributed
various sums of squares are distributed proportional to chi-square, at least under proper hypotheses
ratios of estimates to standard errors are distributed as Student's t under certain hypotheses
appropriate ratios of sums of squares are distributed as F under certain hypotheses

When regression analysis is used to model data that do not meet the assumptions, the results should be interpreted in a cautious, exploratory fashion. The significance probabilities under these circumstances are unreliable.

Box (1966) and Mosteller and Tukey (1977, chaps. 12 and 13) discuss the problems that are encountered with regression data, especially when the data are not under experimental control.

Chapter Contents
Previous
Next
Top