The general linear model

Univariate case:

Data: $ Y_1,\ldots,Y_n$; covariate $ r$-vectors $ {\bf x}_1,
\ldots,{\bf x}_n$ (each written as row vector).

Model

$\displaystyle Y_i = {\bf x}_i \boldsymbol\beta + \epsilon_i
$

for iid $ N(0,\sigma^2)$ errors $ \epsilon_i$ and unknown coefficient vector $ \boldsymbol\beta$ a $ r\times 1$ (column) vector.

Commonly: first entry in each $ {\bf x}_i$ is 1; corresponding $ \beta_1$ is "intercept". In text: use $ r+1$ for $ r$ so that $ r$ is number of coefficients other than intercept.

Model fitting: Maximum Likelihood = Least squares.

$\displaystyle \hat{\boldsymbol\beta} = \left({{\bf X}^T{\bf X}}\right)^{-1} {\bf X}^T {\bf Y}
$

where $ {\bf Y}$ is $ n\times 1$, entries $ Y_i$ and $ {\bf X}$ is $ n\times r$ with $ i$th row $ {\bf x}_i$.

Fitted response vector

$\displaystyle \hat{\bf Y}= {\bf X}\hat{\boldsymbol\beta}
$

Fitted residual vector

$\displaystyle \hat{\boldsymbol\epsilon} = {\bf Y}- {\bf X}\hat{\boldsymbol\beta...
...{{\bf I}-{\bf X}\left({{\bf X}^T{\bf X}}\right)^{-1} {\bf X}^T\right\} {\bf Y}
$

What sort of hypotheses can we test:

$\displaystyle {\bf L}\boldsymbol\beta = {\bf0}
$

for a given $ q\times r$ matrix of constants $ {\bf L}$.

Example: Test all $ \beta_i$ but $ \beta_1$ are 0:

$\displaystyle {\bf L}= \left[ {\bf0}_{(r-1)\times 1} \vert {\bf I}_{r-1 \times r-1}\right]
$

Example: One way ANOVA with $ m$ groups: $ {\bf X}$ has $ m$ columns. Entry $ j$ for observation $ i$ is 1 or 0 according as case number $ i$ belongs to group $ j$ or not. For our data on 3 test scores:

$\displaystyle {\bf X}= \left[\begin{array}{cccc}
1 & 0 & 0 & 0
\\
\vdots & \vdots & \vdots & \vdots
\\
0 & 0 & 0 & 1
\end{array}\right]
$

To test $ \mu_1=\mu_2=\mu_3=\mu_4$ could use

$\displaystyle {\bf L}= \left[\begin{array}{rrrr}
1 & -1 & 0 & 0 \\  1 & 0 & -1 & 0 \\  1 & 0 & 0 &-1
\end{array}\right]
$

or

$\displaystyle {\bf L}= \left[\begin{array}{rrrr}
1 & -1 & 0 & 0 \\  0 & 1 & -1 & 0 \\  0 & 0 & 1 &-1
\end{array}\right]
$

or many others!

To test $ {\bf L}\boldsymbol\beta = {\bf0}$:

Fit model twice:

Full model: get Error sum of squares:

$\displaystyle ESS = \vert\vert\hat{\boldsymbol\epsilon}\vert\vert^2 ={\bf Y}^T ...
...{\bf I}
-{\bf X}\left({{\bf X}^T{\bf X}}\right)^{-1} {\bf X}^T\right\} {\bf Y}
$

Restricted model: minimize

$\displaystyle \vert\vert{\bf Y}- {\bf X}\boldsymbol\beta\vert\vert^2
$

subject to $ {\bf L}\boldsymbol\beta=0$.

Method: assume rank of $ {\bf L}$ is $ q<r$. (Full rank.) Find $ {\bf L}^*$ a $ (r-q)\times r$ so that

$\displaystyle {\bf P}=\left[\begin{array}{l} {\bf L}^* \\  {\bf L}\end{array}\right]
$

is invertible. Write

$\displaystyle \boldsymbol\beta^* = {\bf P}\boldsymbol\beta
$

and

$\displaystyle {\bf X}^* = {\bf X}{\bf P}^{-1}
$

Model is

$\displaystyle {\bf Y}= {\bf X}^*\boldsymbol\beta^* + \boldsymbol\epsilon
$

Partition $ \boldsymbol\beta^*$ into $ \boldsymbol\beta^*_1$ of length $ r-q$ and $ \boldsymbol\beta^*_2$ of length $ q$.

Then

$\displaystyle {\bf L}\boldsymbol\beta = {\bf0} \equiv \boldsymbol\beta^*_2={\bf0}
$

Restricted ESS is

$\displaystyle ESS_R \equiv \vert\vert({\bf I}- {\bf X}^*_1\left\{({\bf X}^*_1)^T({\bf X}^*_1)\right\}^{-1}({\bf X}^*_1)^T){\bf Y}\vert\vert^2
$

$ F$ test based on

$\displaystyle \frac{(ESS_R-ESS)/q}{ESS/(n-r) }
$

which has $ F_{q,n-r}$ distribution on null. Multivariate Case Now $ {\bf Y}_i$ is a vector of $ p$ responses. Assemble, row by row, in data matrix $ {\bf Y}$ which is $ n\times p$. Write model

$\displaystyle {\bf Y}= {\bf X}\boldsymbol\beta + \boldsymbol\epsilon
$

where now $ \boldsymbol\beta$ is $ r \times p$.

Model for errors: rows are iid $ MVN_p({\bf0},{\boldsymbol\Sigma})$.

NOTE: fitting same regression model to each different response variable but with errors correlated.

Least squares estimates: as before

$\displaystyle \hat{\boldsymbol\beta} = \left({{\bf X}^T{\bf X}}\right)^{-1} {\bf X}^T {\bf Y}
$

Fitted response matrix

$\displaystyle \hat{\bf Y}= {\bf X}\hat{\boldsymbol\beta}
$

Fitted residual matrix

$\displaystyle \hat{\boldsymbol\epsilon} = {\bf Y}- {\bf X}\hat{\boldsymbol\beta...
...{{\bf I}-{\bf X}\left({{\bf X}^T{\bf X}}\right)^{-1} {\bf X}^T\right\} {\bf Y}
$

What sort of hypotheses can we test:

$\displaystyle {\bf L}\boldsymbol\beta = {\bf0}
$

AND more general:

$\displaystyle {\bf L}\boldsymbol\beta{\bf M}= {\bf0}
$

Example: Back to one way manova profile analysis: Design matrix $ {\bf X}$ has entries

$\displaystyle {\bf X}_{ij} = \begin{cases}1 & \text{Subject $i$ is in group $j$}
\\
0 & \text{otherwise}
\end{cases}$

(Number of rows = number of observations, number of columns = number of groups.)

Matrix $ \boldsymbol\beta$ has row $ j$ equal to $ {\boldsymbol\mu}_j^T$. Number of rows = number of groups, number of columns = number of response variables.

Hypotheses of interest:

1) Parallel profiles:

$\displaystyle {\boldsymbol\mu}_{ij} -{\boldsymbol\mu}_{i^\prime j} = {\boldsymbol\mu}_{ij^\prime} - {\boldsymbol\mu}_{i^\prime j^\prime}
$

for all $ i,i^\prime,j,j^\prime$. Do case of 4 groups and 3 variables. So $ \boldsymbol\beta$ is $ 4\times 3$:

$\displaystyle {\bf L}= \left[\begin{array}{rrrr}
1 & -1 & 0 & 0 \\  1 & 0 & -1 & 0 \\  1 & 0 & 0 &-1
\end{array}\right]
$

and

$\displaystyle {\bf M}= \left[\begin{array}{rr} 1 & 1 \\  -1 & 0 \\
0 & -1 \end{array}\right]
$

2) Coincident profiles: $ {\boldsymbol\mu}_i = {\boldsymbol\mu}_{i^\prime}$ for all $ i$ and $ i^\prime$. Now $ {\bf M}$ is the identity and $ {\bf L}$ is as above.

3) Constant profiles: $ {\bf L}$ is the identity, $ {\bf M}$ as above.

Testing $ {\bf L}\boldsymbol\beta{\bf M}= {\bf0}$.

1) Multiply model equation by $ {\bf M}$. Get

$\displaystyle {\bf Y}^* = {\bf X}\boldsymbol\beta^* + \boldsymbol\epsilon^*
$

with

$\displaystyle {\bf Y}^* = {\bf Y}{\bf M}
$

$\displaystyle \boldsymbol\beta^* = \boldsymbol\beta {\bf M}
$

and

$\displaystyle \boldsymbol\epsilon^* = \boldsymbol\epsilon{\bf M}.
$

Reduces problem to testing $ {\bf L}\boldsymbol\beta = {\bf0}$.

2) So now: drop $ *$s and discuss test of $ {\bf L}\boldsymbol\beta = {\bf0}$. Reparametrize again as in univariate: Find $ {\bf L}^*$ a $ (p-q)\times p$ so that

$\displaystyle {\bf P}=\left[\begin{array}{l} {\bf L}^* \\  {\bf L}\end{array}\right]
$

is invertible. Write

$\displaystyle \boldsymbol\beta^* = {\bf P}\boldsymbol\beta
$

and

$\displaystyle {\bf X}^* = {\bf X}{\bf P}^{-1}
$

Model is

$\displaystyle {\bf Y}= {\bf X}^*\boldsymbol\beta^* + \boldsymbol\epsilon
$

Partition $ \boldsymbol\beta^*$ into $ \boldsymbol\beta^*_1$ with dimensions $ r-q \times p$ and $ \boldsymbol\beta^*_2$ of dimension $ q \times p$.

Then

$\displaystyle {\bf L}\boldsymbol\beta = {\bf0} \equiv \boldsymbol\beta^*_2={\bf0}
$

3) Again drop $ *$s and discuss test of $ \boldsymbol\beta_2={\bf0}$ in model

$\displaystyle {\bf Y}= \left[{\bf X}_1 \vert {\bf X}_2\right]\left[\begin{array...
...oldsymbol\beta_1 \\  \boldsymbol\beta_2\end{array}\right]
+\boldsymbol\epsilon
$

Union Intersection Principle: $ \boldsymbol\beta_2={\bf0}$ if and only if $ \boldsymbol\beta_2{\bf a} = {\bf0}$ for all $ p \times 1 $ vectors $ {\bf a}$. So multiply model equation by $ {\bf a}$ get

$\displaystyle {\bf Y}{\bf a} = {\bf X}\boldsymbol\beta{\bf a} + \boldsymbol\epsilon{\bf a}
$

or

$\displaystyle {\bf Y}^* = {\bf X}\boldsymbol\beta^* + \boldsymbol\epsilon^*
$

which is now a univariate regression model.

Get corresponding $ F$ statistic:

$\displaystyle F = \frac{{\bf a}^T{\bf Y}^T {\bf Q}{\bf Y}{\bf a} / q}{
{\bf a}^T{\bf Y}^T ({\bf I}- {\bf C}){\bf Y}{\bf a} /(n-r)}
$

where

$\displaystyle {\bf C}= {\bf X}({\bf X}^T{\bf X})^{-1}{\bf X}^T
$

and

$\displaystyle {\bf Q}= {\bf C}- {\bf X}_1({\bf X}_1^T{\bf X}_1)^{-1} {\bf X}_1
$

To test intersection of null hypotheses use critical region of form union of corresponding rejection regions.

I.e. maximize $ F$ over $ \bf a$.

Maximize

$\displaystyle \frac{{\bf a}^T {\bf A}{\bf a}}{{\bf a}^T {\bf B}{\bf a}}
$

over $ \bf a \neq 0$:

Solution for $ {\bf B}$ not singular is largest eigenvalue of

$\displaystyle {\bf B}^{-1}{\bf A}
$

as earlier in course. This gives Roy's test.

Example: MANACOVA: 45 patients, 4 groups (defined by body type and obesity status).

Record 4 biochemical measurements on urine of each subject: three response variables, one covariate: specific gravity of the urine.

Compare groups adjusting for specific gravity.


run;
    data long;
    infile 'dq5.1';
    input group creatin chloride choline specgrav;
run;
proc print;
run;
proc glm;
    class group;
    model creatin chloride choline = group|specgrav;
    manova h= group*specgrav / printh printe;
run;
proc glm;
    class group;
    model creatin chloride choline = group specgrav;
    manova h= group / printh printe;
    means group / bon scheffe tukey;
run;
proc glm;
    class group;
    model creatin chloride choline = group;
    manova h= group / printh printe;
    means group / bon scheffe tukey;
run;
Begin by examining group by specific gravity interaction:

Characteristic Roots and Vectors of: 
      E Inverse * H, where
H = Type III SS&CP Matrix for SPECGRAV*GROUP   
E = Error SS&CP Matrix
Manova Test Criteria and F Approximations for
the Hypothesis of no Overall SPECGRAV*GROUP Effect
S=3    M=-0.5    N=16.5
Statistic        Value     F NumDF DenDF Pr > F
Wilks' Lambda   0.6153 2.0939  9   85.33 0.0387
Pillai's Trace  0.3899 1.8426  9    111  0.0683
Hotg-Lly Trace  0.6167 2.3070  9    101  0.0211
Roy's Greatest  0.6026 7.4320  3     37  0.0005
NOTE: F Statistic for Roy's Greatest Root 
     is an upper bound.
The group by specific gravity interaction appears to be marginally significant so that adjusting for specific gravity comparisons is difficult.

In what follows we ignore the possibility of any such interaction.

Look at model with no interaction term:


MANOVA Test Criteria and F Approximations for the
Hypothesis of No Overall group Effect
H = Type III SSCP Matrix for group
E = Error SSCP Matrix
S=3    M=-0.5    N=18
Statistic       Value  F  NumDF Den DF Pr > F
Wilks' Lambda  0.5972 2.43  9   92.633 0.0159
Pillai's Trace 0.4398 2.29  9      120 0.0208
Hotg-Lly Trace 0.6127 2.54  9   56.616 0.0159
Roy's Greatest 0.4877 6.50  3       40 0.0011
NOTE: F Statistic for Roy's Greatest Root is
an upper bound.
Conclusion: Pretty clear evidence of differences between groups. Should examine simultaneous confidence intervals. In same model: do we need covariate?

MANOVA Test Criteria and Exact F Statistics for the
Hypothesis of No Overall specgrav Effect
H = Type III SSCP Matrix for specgrav
E = Error SSCP Matrix
S=1    M=0.5    N=18
Statistic      Value   F  NumDF DenDF Pr > F
Wilks' Lambda  0.420 17.51  3    38   <.0001
Pillai's Trace 0.580 17.51  3    38   <.0001
Hotg-Lly Trace 1.382 17.51  3    38   <.0001
Roy's Greatest 1.382 17.51  3    38   <.0001
So covariate is significant. But what if we had not measured it?

MANOVA Test Criteria and F Approximations for the
Hypothesis of No Overall group Effect
H = Type III SSCP Matrix for group
E = Error SSCP Matrix
S=3    M=-0.5    N=18.5
Statistic     Value   F  NumDF DenDF  Pr > F
Wilks' Lambda  0.608 2.40  9   95.066 0.0171
Pillai's Trace 0.431 2.30  9  123     0.0203
Hotg-Lly Trace 0.580 2.47  9   58.185 0.0185
Roy's Greatest 0.434 5.93  3   41     0.0019
NOTE: F Statistic for Roy's Greatest Root is
an upper bound.
Same conclusion: difference in group means. Why is specgrav needed in model but irrelevant to conclusions? No real differences between groups in terms of specgrav

   data long;
    infile 'dq5.1';
    input group creatin chloride choline specgrav;
run;
proc glm;
    class group;
    model specgrav = group ;
run;
which gives

The GLM Procedure
Dependent Variable: specgrav
             Sum of
Source  DF  Squares MeanSquare F   Pr > F
Model    3  120.30  40.100241 1.19 0.3269
Error   41 1386.28  33.811636
Corrd   44 1506.58
Adjustment for covariates:

Suppose $ x$ is a covariate and $ Y$ a response. In order to compare the effect of a treatment on $ Y$ we might have a treatment and a control group.

Example: $ x$ is blood pressure before and $ Y$ is blood pressure after treatment (real or placebo depending on group).

Both $ x$ and $ Y$ must be regarded as random in this example.

Models: $ (X,Y)$ has BVN $ ({\boldsymbol\mu},{\boldsymbol\Sigma})$ distribution with $ {\boldsymbol\mu}$ and $ {\boldsymbol\Sigma}$ which depend on whether the subject is in treatment or control.

Control group:

$\displaystyle {\boldsymbol\mu}= (\mu_1,\mu_2)
$

$\displaystyle {\boldsymbol\Sigma}= \left[\begin{array}{cc}\sigma_1^2 & \rho\sigma_1\sigma_2 \\  \rho\sigma_1\sigma_2 & \sigma_2^2
\end{array}\right]
$

Notice assumptions: means for measurements 1 and 2 might be different (allows for placebo effect); no assumption of equal variability as yet.

Treatment group:

$\displaystyle {\boldsymbol\mu}= (\mu_1,\mu_2^*)
$

$\displaystyle {\boldsymbol\Sigma}= \left[\begin{array}{cc}\sigma_1^2 & \rho^*\sigma_1\sigma_2^* \\  \rho\sigma_1\sigma_2^* & (\sigma_2^*)^2
\end{array}\right]
$

Notice: baseline means, variances assumed equal (should be true for double-blind randomized trial).

If we knew parameters how would we summarize treatment effect?

No easy way. Simplifying assumptions: several possible.

Usual: Equal covariance matrices. Treatment effect is

$\displaystyle \delta \equiv \mu_2^* -\mu_2
$

If so: how can we estimate $ \delta$? Notice:

$\displaystyle \bar{X}_T - \bar{X}_C
$

has mean 0 so

$\displaystyle \hat\delta_a \equiv \bar{Y}_T-\bar{Y}_C +a(\bar{X}_T - \bar{X}_C)
$

is unbiased for $ \delta$ for any choice of $ a$.

Variance is

$\displaystyle \left(\frac{1}{n_T}+\frac{1}{n_C}\right)
\left[ \begin{array}{cc}...
...right]
{\boldsymbol\Sigma}
\left[ \begin{array}{c} 1 \\  a \end{array} \right]
$

Minimized when $ a=-\rho\sigma_1/\sigma^2$; variance becomes

$\displaystyle \left(\frac{1}{n_T}+\frac{1}{n_C}\right)\sigma_1^2(1-\rho^2)
$

Smaller than ignoring baseline covariate. Improvement important if $ \rho^2$ is not too small.

Other treatment effect models?

Effect of treatment: ``after'' measurement is

$\displaystyle Y+T
$

where $ T$ is a $ N(\delta,\sigma_T^2)$ effect.

Could assume $ T$ and $ X,Y$ independent. (Not too realistic.)

Could ask: what does joint $ X,Y,T$ distribution have to be to get same as previous model.

Potential problem: joint distribution of $ X,Y,T$ not estimable from this experimental design.



Richard Lockhart
2002-10-29