Course outline:
Basic structure of typical multivariate data set:
Case by variables: data in matrix. Each row is a case, each column is a variable.
Example: Fisher's iris data: 5 rows of 150 by 5 matrix:
Case | Sepal | Sepal | Petal | Petal | |
# | Variety | Length | Width | Length | Width |
1 | Setosa | 5.1 | 3.5 | 1.4 | 0.2 |
2 | Setosa | 4.9 | 3.0 | 1.4 | 0.2 |
&vellip#vdots; | &vellip#vdots; | &vellip#vdots; | &vellip#vdots; | &vellip#vdots; | &vellip#vdots; |
51 | Versicolor | 7.0 | 3.2 | 4.7 | 1.4 |
&vellip#vdots; | &vellip#vdots; | &vellip#vdots; | &vellip#vdots; | &vellip#vdots; | &vellip#vdots; |
Vector valued random variable: function
such that,
writing
,
Cumulative Distribution Function (CDF) of : function
on
defined by
Defn: Distribution of rv is absolutely continuous
if there is a function
such that
Defn: Any satisfying (1) is a density of
.
For most
is differentiable at
and
Basic tactic: specify density of
Tools: marginal densities, conditional densities, independence, transformation.
Marginalization: Simplest multivariate problem
is the marginal density of
and
the joint density of
but
they are both just densities.
``Marginal'' just to
distinguish from the joint density of
.
Def'n: Events and
are independent if
Def'n: ,
are
independent if
Def'n: and
are
independent if
Def'n: Rvs
independent:
Theorem:
Theorem: If
are independent and
then
are independent.
Moreover,
and
are
independent.
Conditional density of given
:
Suppose
with
having density
.
Assume
is a one to one (``injective") map, i.e.,
if and only if
.
Find
:
Step 1: Solve for in terms of
:
.
Step 2: Use basic equation:
Equivalent formula inverts the matrix:
Example: The density
Solve for in terms of
:
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() ![]() |
||
![]() |
![]() |
![]() |
|
![]() |
![]() |
||
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
Next: marginal densities of ,
?
Factor as
where
Then
![]() |
![]() |
![]() |
|
![]() |
![]() |
Remark: easy to check
.
Thus: have proved original bivariate normal density integrates to 1.
Put
.
Get
![]() |
![]() |
|
![]() |
||
![]() |
Notation:
Defn: The transpose, , of an
matrix
is
the
matrix whose entries are given by
Defn: rank of matrix ,
rank
:
# of linearly independent
columns of
.
We have
rank![]() |
![]() ![]() ![]() ![]() |
|
![]() ![]() ![]() ![]() |
||
![]() ![]() |
If is
then
rank
.
For now: all matrices square
.
If there is a matrix such that
then
we call
the inverse of
. If
exists it is unique and
and we write
. The matrix
has an inverse if and only
if
rank
.
Inverses have the following properties:
Again is
. The determinant if a function on the set
of
matrices such that:
det![]() |
![]() |
|
![]() |
det![]() |
|
![]() ![]() |
Here are some properties of the determinant:
Defn: Two vectors and
are orthogonal if
.
Defn: The inner product or dot product of and
is
Defn: and
are orthogonal if
.
Defn: The norm (or length) of is
is orthogonal if each column of
has length 1 and
is orthogonal to each other column of
.
Suppose is an
matrix. The function
![]() |
![]() |
|
![]() |
If is
and
and
such that
Therefore
det.
Conversely: if
singular
then there is
such that
.
Fact:
det is polynomial in
of degree
.
Each root is an eigenvalue.
General the roots could be
multiple roots or complex valued.
Matrix is diagonalized by a non-singular matrix
if
is diagonal.
If so then so each column of
is eigenvector of
with
the
th column having eigenvalue
.
Thus to be diagonalizable
must have
linearly independent eigenvectors.
If is symmetric then
![]() |
![]() |
|
![]() |
Defn: A symmetric matrix is non-negative definite if
for all
. It is positive definite if in addition
implies
.
is non-negative definite iff all its eigenvalues are
non-negative.
is positive definite iff all eigenvalues positive.
A non-negative definite matrix has a symmetric non-negative definite square root. If
Suppose vector subspace of
,
basis for
. Given any
there
is a unique
which is closest to
;
minimizes
![]() |
![]() |
|
![]() |
||
![]() |
||
![]() |
||
![]() |
Note that
and that
![]() |
![]() |
|
![]() |
||
![]() |
||
![]() |
![]() |
![]() |
|
![]() |
![]() |
Choose to minimize:
minimize second term.
Achieved by making
.
Since
can
take
Summary:
closest point in
is
Notice that the matrix is idempotent:
Suppose
matrix,
,
and
. Make
matrix
by putting
in 2 by 2 matrix:
We can work with partitioned matrices just like ordinary matrices always making sure that in products we never change the order of multiplication of things.
![]() |
![]() |
|
![]() |
![]() |
Note partitioning of and
must match.
Addition: dimensions of and
must be the same.
Multiplication formula must
have as many columns as
has rows, etc.
In general:
need
to make sense for each
.
Works with more than a 2 by 2 partitioning.
Defn: block diagonal matrix: partitioned matrix
for which
if
. If
Partitioned inverses. Suppose ,
are symmetric positive
definite. Look for inverse of
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
Solve to get
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
Defn:
iff
Defn:
if and only if
with the
independent and each
.
In this case according to our theorem
![]() |
![]() |
|
![]() |
Defn:
has a multivariate normal distribution if it
has the same distribution as
for some
, some
matrix of constants
and
.
,
singular:
does not have a density.
invertible: derive multivariate normal density
by change of variables:
![]() |
![]() |
|
![]() |
For which
,
is this a density?
Any
but if
then
![]() |
![]() |
|
![]() |
||
![]() |
Conversely, if
is a positive definite symmetric matrix
then there is a square invertible matrix
such that
so that
there is a
distribution. (
can be found
via the Cholesky decomposition, e.g.)
When is singular
will not
have a density:
such that
;
is confined to a hyperplane.
Still true: distribution of depends only on
: if
then
and
have the same distribution.
Defn: If
has density
then
FACT: if for a smooth
(mapping
)
![]() |
![]() |
|
![]() |
||
![]() |
Linearity:
for real
and
.
Defn: The
moment (about the origin) of a real
rv
is
(provided it exists).
We generally use
for
.
Defn: The
central moment is
Defn: For an
valued random vector
Fact: same idea used for random matrices.
Defn: The (
) variance covariance matrix of
is
Example moments: If
then
![]() |
![]() |
|
![]() |
||
![]() |
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
If now
, that is,
,
then
and
Similarly for
we have
with
and
![]() |
![]() |
|
![]() |
||
![]() |
||
![]() |
Theorem: If
are independent and each
is
integrable then
is integrable and
Defn: The moment generating function of a real valued is
Defn: The moment generating function of
is
Example: If
then
![]() |
![]() |
|
![]() |
||
![]() |
||
![]() |
Theorem: () If
is finite for all
in a neighbourhood of
then
Note: means has continuous derivatives of all orders. Analytic
means has convergent power series expansion in neighbourhood of each
.
The proof, and many other facts about mgfs, rely on techniques of complex variables.
Theorem: Suppose and
are
valued random
vectors such that
The proof relies on techniques of complex variables.
If
are independent and
then
mgf of
is product mgfs
of individual
:
Example: If
are independent
then
![]() |
![]() |
|
![]() |
||
![]() |
Conclusion: If
then
Example: If
then
and
Theorem: Suppose
and
where
and
. Then
and
have the same distribution if and only iff the
following two conditions hold:
Alternatively: if ,
each MVN
then
and
imply that
and
have
the same distribution.
Proof: If 1 and 2 hold the mgf of is
![]() |
![]() |
|
![]() |
||
![]() |
||
![]() |
Thus mgf is determined by
and
.
Theorem: If
then
there is
a
matrix such that
has same distribution
as
for
.
We may assume that is symmetric and non-negative definite,
or that
is upper triangular, or that
is lower triangular.
Proof: Pick any such that
such as
from the spectral decomposition. Then
.
From the symmetric square root can produce an upper triangular square root by
the Gram Schmidt process: if has rows
then let
be
. Choose
proportional to
where
so that
has unit length. Continue in this
way; you automatically get
if
. If
has
columns
then
is orthogonal and
is an
upper triangular square root of
.
Defn: The covariance between and
is
Properties:
![]() |
![]() |
|
![]() |
![]() |
![]() |
|
![]() |
Properties of the distribution
1: All margins are multivariate normal: if
2:
: affine
transformation of MVN is normal.
3: If
4: All conditionals are normal: the conditional distribution of
given
is
Proof of ( 1): If
then
So
Compute mean and variance to check rest.
Proof of ( 2): If
then
Proof of ( 3): If
Proof of ( 4): first case: assume
has an inverse.
Define
Now joint density of and
factors
Specialization to bivariate case:
Write
Then
![]() |
![]() |
|
![]() |
This simplifies to
More generally: any and
:
0 | ![]() |
|
![]() |
Defn: Multiple correlation between and
Thus: maximize
Note
Summary: maximum squared correlation is
Notice: since is squared correlation between two scalars
(
and
) we have
Correlation matrices, partial correlations:
Correlation between two scalars and
is
If has variance
then the correlation matrix of
is
with entries
If
are MVN with the usual partitioned variance covariance matrix
then the conditional variance of
given
is
From this define partial correlation matrix
Note: these are used even when
are NOT MVN
Given data with model
:
Definition: The likelihood function is map : domain
, values given by
Key Point: think about how the density depends on not
about how it depends on
.
Notice: , observed value of the
data, has been plugged into the formula for density.
We use likelihood for most inference problems:
Maximum Likelihood Estimation
To find MLE maximize .
Typical function maximization problem:
Set gradient of equal to 0
Check root is maximum, not minimum or saddle point.
Often is product of
terms (given
independent observations).
Much easier to work with logarithm
of : log of product is sum and logarithm is monotone
increasing.
Definition: The Log Likelihood function is
Simplest problem: collect replicate measurements
from single population.
Model: are iid
.
Parameters ():
.
Parameter space:
and
is some positive definite
matrix.
Log likelihood is
![]() |
![]() |
|
![]() |
![]() |
![]() |
|
![]() |
Fact: if second derivative matrix is negative definite everywhere then function is concave; no more than 1 critical point.
Summary: is maximized at
More difficult: differentiate wrt
.
Somewhat simpler: set
First derivative wrt is matrix with entries
Need: derivative of two functions:
Fact:
th entry of
is
Fact:
; expansion
by minors.
Conclusion
Set = 0 and find only critical point is
Usual sample covariance matrix is
Properties of MLEs:
1)
2)
.
Distribution of ? Joint distribution of
and
?
Theorem:
Suppose
are independent
random
variables.
Then
Proof: Let
.
Then
are
independent
.
So
is multivariate
standard normal.
Note that
and
Thus
So: reduced to and
.
Step 1: Define
![]() |
![]() |
|
![]() |
Put
. Since
Thus
is independent of
.
Since is a function of
we see that
and
are independent.
Also, see
.
First 2 parts done.
Consider
.
Note that
.
Now: distribution of quadratic forms:
Suppose
and
is symmetric.
Put
for
diagonal,
orthogonal.
Then
So:
has same distribution as
Special case: if all are either 0 or 1 then
has a chi-squared distribution with df
= number of
equal to 1.
When are eigenvalues all 1 or 0?
Answer: if and only if is idempotent.
1) If idempotent and
is an eigenpair
the
2) Conversely if all eigenvalues of are 0 or 1 then
has 1s and 0s on diagonal so
Since
it has the law
So eigenvalues are those of
and
is
iff
is idempotent and
.
Our case:
.
Check
.
How many degrees of freedom:
.
Defn: The trace of a square matrix is
Property:
.
So:
![]() |
![]() |
|
![]() |
Conclusion: df for
is
Derivation of the
density:
Suppose
independent
. Define
distribution to be that of
.
Define angles
by
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
Matrix of partial derivatives is
FACT: multiplying a column in a matrix by multiplies
the determinant by
.
SO: Jacobian of transformation is
Thus joint density
of
is
Answer has the form
Evaluate by making
![]() |
![]() |
|
![]() |
![]() |
![]() |
|
![]() |
Fourth part: consequence of
first 3 parts and def'n of
distribution.
Defn:
if
has same distribution
as
Derive density of in this definition:
![]() |
![]() |
|
![]() |
Theorem:
Suppose
are independent
random
variables.
Then
Proof: Let
where
and
are
independent
.
So
.
Note that
and
![]() |
![]() |
|
![]() |
Consequences. In 1, 2 and 4: can assume
and
. In 3 can take
.
Step 1: Do general
. Define
Compute variance covariance matrix
![]() |
![]() |
|
![]() |
Defn: If is
and
is
then
is the
matrix with the pattern
Conclusions so far:
1)
and
are independent.
2)
Next: Wishart law.
Defn: The
distribution is
the distribution of
Properties of Wishart.
1) If
then
2) if
independent
then
Proof of part 3: rewrite
Uses further props of Wishart distribution.
3: If
and
then
4: If
and
then
5: If
then
6: If
is partitioned
into components then
Given data
iid
test
Example: no realistic ones. This hypothesis is not intrinsically useful. However: other tests can sometimes be reduced to it.
Example: Ten water samples split in half. One half of each
to each of two labs. Measure biological oxygen demand (BOD) and
suspended solids (SS). For sample let
be BOD for lab A,
be SS for lab A,
be BOD for lab B and
be
SS for lab B. Question: are labs measuring the same thing? Is there
bias in one or the other?
Notation is vector of 4 measurements on sample
.
Data:
Lab A | Lab B | |||
Sample | BOD | SS | BOD | SS |
1 | 6 | 27 | 25 | 15 |
2 | 6 | 23 | 28 | 13 |
3 | 18 | 64 | 36 | 22 |
4 | 8 | 44 | 35 | 29 |
5 | 11 | 30 | 15 | 31 |
6 | 34 | 75 | 44 | 64 |
7 | 28 | 26 | 42 | 30 |
8 | 71 | 124 | 54 | 64 |
9 | 43 | 54 | 34 | 56 |
10 | 33 | 30 | 29 | 20 |
11 | 20 | 14 | 39 | 21 |
Model:
are iid
.
Multivariate problem because: not able to assume independence between any two measurements on same sample.
Potential sub-model: each measurement is
true mmnt + lab bias + mmnt error.
Model for measurement error vector
is multivariate normal mean 0 and diagonal covariance
matrix
.
Lab bias is unknown vector
.
True measurement should be same for both labs so has form
This would give structured model
This model has variance covariance matrix
We skip this model and let
be unrestricted.
Question of interest:
Reduction: partition as
Define
. Then our model makes
iid
.
Our hypothesis is
Carrying out our test in SPlus:
Working on CSS unix workstation:
Start SPlus then read in, print out data:
[61]ehlehl% mkdir .Data [62]ehlehl% Splus S-PLUS : Copyright (c) 1988, 1996 MathSoft, Inc. S : Copyright AT&T. Version 3.4 Release 1 for Sun SPARC, SunOS 5.3 : 1996 Working data will be in .Data > # Read in and print out data > eff <- read.table("effluent.dat",header=T) > eff BODLabA SSLabA BODLabB SSLabB 1 6 27 25 15 2 6 23 28 13 3 18 64 36 22 4 8 44 35 29 5 11 30 15 31 6 34 75 44 64 7 28 26 42 30 8 71 124 54 64 9 43 54 34 56 10 33 30 29 20 11 20 14 39 21Do some graphical preliminary analysis.
Look for non-normality, non-linearity, outliers.
Make plots on screen or saved in file.
> # Make pairwise scatterplots on screen using > # motif graphics device and then in a postscript > # file. > motif() > pairs(eff) > postscript("pairs.ps",horizontal=F, + height=6,width=6) > pairs(eff) > dev.off() Generated postscript file "pairs.ps". motif 2
> cor(eff) BODLabA SSLabA BODLabB SSLabB BODLabA 0.9999999 0.7807413 0.7228161 0.7886035 SSLabA 0.7807413 1.0000000 0.6771183 0.7896656 BODLabB 0.7228161 0.6771183 1.0000001 0.6038079 SSLabB 0.7886035 0.7896656 0.6038079 1.0000001Notice high correlations.
Mostly caused by variation in true levels from sample to sample.
Get partial correlations.
Adjust for overall BOD and SS content of sample.
> aug <- cbind(eff,(eff[,1]+eff[,3])/2, + (eff[,2]+eff[,4])/2) > aug BODLabA SSLabA BODLabB SSLabB X2 X3 1 6 27 25 15 15.5 21.0 2 6 23 28 13 17.0 18.0 3 18 64 36 22 27.0 43.0 4 8 44 35 29 21.5 36.5 5 11 30 15 31 13.0 30.5 6 34 75 44 64 39.0 69.5 7 28 26 42 30 35.0 28.0 8 71 124 54 64 62.5 94.0 9 43 54 34 56 38.5 55.0 10 33 30 29 20 31.0 25.0 11 20 14 39 21 29.5 17.5 > bigS <- var(aug)
Now compute partial correlations for first four variables given means of BOD and SS:
> S11 <- bigS[1:4,1:4] > S12 <- bigS[1:4,5:6] > S21 <- bigS[5:6,1:4] > S22 <- bigS[5:6,5:6] > S11dot2 <- S11 - S12 %*% solve(S22,S21) > S11dot2 BODLabA SSLabA BODLabB SSLabB BODLabA 24.804665 -7.418491 -24.804665 7.418491 SSLabA -7.418491 59.142084 7.418491 -59.142084 BODLabB -24.804665 7.418491 24.804665 -7.418491 SSLabB 7.418491 -59.142084 -7.418491 59.142084 > S11dot2SD <- diag(sqrt(diag(S11dot2))) > S11dot2SD [,1] [,2] [,3] [,4] [1,] 4.980428 0.000000 0.000000 0.000000 [2,] 0.000000 7.690389 0.000000 0.000000 [3,] 0.000000 0.000000 4.980428 0.000000 [4,] 0.000000 0.000000 0.000000 7.690389 > R11dot2 <- solve(S11dot2SD)%*% + S11dot2%*%solve(S11dot2SD) > R11dot2 [,1] [,2] [,3] [,4] [1,] 1.000000 -0.193687 -1.000000 0.193687 [2,] -0.193687 1.000000 0.193687 -1.000000 [3,] -1.000000 0.193687 1.000000 -0.193687 [4,] 0.193687 -1.000000 -0.193687 1.000000Notice little residual correlation. Carry out Hotelling's
> w <- eff[,1:2]-eff[3:4] > dimnames(w)<-list(NULL,c("BODdiff","SSdiff")) > w BODdiff SSdiff [1,] -19 12 [2,] -22 10 etc [8,] 17 60 etc > Sw <- var(w) > cor(w) BODdiff SSdiff BODdiff 1.0000001 0.3057682 SSdiff 0.3057682 1.0000000 > mw <- apply(w,2,mean) > mw BODdiff SSdiff -9.363636 13.27273 > Tsq <- 11*mw%*%solve(Sw,mw) > Tsq [,1] [1,] 13.63931 > FfromTsq <- (11-2)*Tsq/(2*(11-1)) > FfromTsq [,1] [1,] 6.13769 > 1-pf(FfromTsq,2,9) [1] 0.02082779Conclusion: Pretty clear evidence of difference in mean level between labs. Which measurement causes the difference?
> TBOD <- sqrt(11)*mw[1]/sqrt(Sw[1,1]) > TBOD BODdiff -2.200071 > 2*pt(TBOD,1) BODdiff 0.2715917 > 2*pt(TBOD,10) BODdiff 0.05243474 > TSS <- sqrt(11)*mw[2]/sqrt(Sw[2,2]) > TSS SSdiff 2.15153 > 2*pt(-TSS,10) SSdiff 0.05691733 > postscript("differences.ps", + horizontal=F,height=6,width=6) > plot(w) > abline(h=0) > abline(v=0) > dev.off()Conclusion? Neither? Not a problem - summarizes evidence!
Problem: several tests at level 0.05 on same data. Simultaneous or Multiple comparisons.
Confidence interval for
:
Give coverage intervals for 6 parameters of interest: 4 entries in
and
and
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Problem: each confidence interval has 5% error rate. Pick out last interval (on basis of looking most interesting) and ask about error rate?
Solution: adjust 2.23, multiplier to get
Based on inequality:
Proof by Cauchy Schwarz:
inner product of vectors
and
.
Put
and
to get
In fact the probability of this happening is exactly equal to because for
each data set the supremum of
Our case
Coverage probability of single interval using
?
From
distribution:
Probability all 6 intervals would cover using
?
Use Bonferroni inequality:
Usually just use
General Bonferroni strategy. If we want intervals for
get interval for
at
level
. Simultaneous coverage probability is
at least
. Notice that Bonferroni narrower in
our example unless
giving
.
Motivations for :
1: Hypothesis
is true iff all
hypotheses
are true.
Natural test for
rejects if
Fact:
2: likelihood ratio method.
Compute
In our case to test
find
Now write
Again conclude: likelihood ratio test rejects for where
chosen to make level
.
3: compare estimates of
.
In univariate regression tests to compare a restricted model with a full model
have form
Here: substitute matrices.
Analogue of ESS for full model:
Analogue of ESS for reduced model:
In 1 sample example:
Test of
based on comparing
To make comparison. If null true
Measures of size based on eigenvalues of
Suggested size measures for
:
For our matrix: eigenvalues all 0 except for one.
(So really-matrix not close to .)
Largest eigenvalue is
But: see two sample problem for precise tests based on suggestions.
Test
.
Case 1: for motivation only.
known
.
Natural test statistic: based on
If
not known must estimate. No universally
agreed best procedure (even for
-- called Behrens-Fisher problem).
Usually: assume
.
If so: MLE of
is
and of
is
Possible test developments:
1) By analogy with 1 sample:
2) Union-intersection: test of
based on
Get
3) Likelihood ratio: the MLE of
for the unrestricted model
is
This simplifies to
If are the eigenvalues of
then
Two sample analysis in SAS on css network
data long; infile 'tab57sh'; input group a b c; run; proc print; run; proc glm; class group; model a b c = group; manova h=group / printh printe; run;Notes:
1) First 4 lines form DATA step:
a) creates data set named long by reading in 4 columns of data from file named tab57sh stored in same directory as I was in when I typed sas.
b) Calls variables group (=1 or 2 as label for the two groups) and a, b, c which are names for the 3 test scores for each subject.
2) Next two lines: print out data: result is (slightly edited)
Obs group a b c 1 1 19 20 18 2 1 20 21 19 3 1 19 22 22 etc till 11 2 15 17 15 12 2 13 14 14 13 2 14 16 133) Then use proc glm to do analysis:
a) class group declares that the variable group defines levels of a categorical variable.
b) model statement says to regress the variables a, b, c on variable group.
c) manova statement says to do both 3 univariate regressions
and a mulivariate regression and to print out the and
matrices where
is the matrix corresponding to the presence
of the factor group in the model.
Output of MANOVA: First univariate results
The GLM Procedure Class Level Information Class Levels Values group 2 1 2 Number of observations 13 Dependent Variable: a Sum of Source DF Squares Mean Square F Value Pr > F Model 1 54.276923 54.276923 19.38 0.0011 Error 11 30.800000 2.800000 Corrd Tot 12 85.076923 R-Square Coeff Var Root MSE a Mean 0.637975 10.21275 1.673320 16.38462 Source DF Type ISS Mean Square F Value Pr > F group 1 54.276923 54.276923 19.38 0.0011 Source DF TypeIIISS Mean Square F Value Pr > F group 1 54.276923 54.276923 19.38 0.0011 Dependent Variable: b Sum of Source DF Squares Mean Square F Value Pr > F Model 1 70.892308 70.892308 34.20 0.0001 Error 11 22.800000 2.072727 Corrd Tot 12 93.692308 Dependent Variable: c Sum of Source DF Squares Mean Square F Value Pr > F Model 1 94.77692 94.77692 39.64 <.0001 Error 11 26.30000 2.39090 Corrd Tot 12 121.07692The matrices
E = Error SSCP Matrix a b c a 30.8 12.2 10.2 b 12.2 22.8 3.8 c 10.2 3.8 26.3 Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r| DF = 11 a b c a 1.000000 0.460381 0.358383 0.1320 0.2527 b 0.460381 1.000000 0.155181 0.1320 0.6301 c 0.358383 0.155181 1.000000 0.2527 0.6301 H = Type III SSCP Matrix for group a b c a 54.276923077 62.030769231 71.723076923 b 62.030769231 70.892307692 81.969230769 c 71.723076923 81.969230769 94.776923077The eigenvalues of
Characteristic Roots and Vectors of: E Inverse * H H = Type III SSCP Matrix for group E = Error SSCP Matrix Characteristic Characteristic Vector V'EV=1 Root Percent a b c 5.816159 100.00 0.00403434 0.12874606 0.13332232 0.000000 0.00 -0.09464169 -0.10311602 0.16080216 0.000000 0.00 -0.19278508 0.16868694 0.00000000 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall group Effect H = Type III SSCP Matrix for group E = Error SSCP Matrix S=1 M=0.5 N=3.5 Statistic Value F NumDF DenDF Pr > F Wilks' Lambda 0.1467 17.45 3 9 0.0004 Pillai's Trace 0.8533 17.45 3 9 0.0004 Hotelling-Lawley Tr 5.8162 17.45 3 9 0.0004 Roy's Greatest Root 5.8162 17.45 3 9 0.0004Things to notice:
Wilk's Lambda:
Data
.
Model
independent
.
First problem of interest: test
Based on and
. MLE of
is
.
1 19 20 18 1 20 21 19 1 19 22 22 1 18 19 21 1 16 18 20 1 17 22 19 1 20 19 20 1 15 19 19 2 12 14 12 2 15 15 17 2 15 17 15 2 13 14 14 2 14 16 13 3 15 14 17 3 13 14 15 3 12 15 15 3 12 13 13 4 8 9 10 4 10 10 12 4 11 10 10 4 11 7 12Code
data three; infile 'tab57for3sams'; input group a b c; run; proc print; run; proc glm; class group; model a b c = group; manova h=group / printh printe; run; data four; infile 'table5.7'; input group a b c; run; proc print; run; proc glm; class group; model a b c = group; manova h=group / printh printe; run;Pieces of output: first set of code does first 3 groups.
So: has rank 2.
Characteristic Roots & Vectors of: E Inverse * H Characteristic Characteristic Vector V'EV=1 Root Percent a b c 6.90568180 96.94 0.01115 0.14375 0.08795 0.21795125 3.06 -0.07763 -0.09587 0.16926 0.00000000 0.00 -0.18231 0.13542 0.02083 S=2 M=0 N=5 Statistic Value F NumDF Den DF Pr > F Wilks' 0.1039 8.41 6 24 <.0001 Pillai's 1.0525 4.81 6 26 0.0020 Hotelling-Lawley 7.1236 13.79 6 14.353 <.0001 Roy's 6.9057 29.92 3 13 <.0001 NOTE: F Statistic for Roy's is an upper bound. NOTE: F Statistic for Wilks'is exact.Notice two eigenvalues not 0. Note that exact distribution for Wilk's Lambda is available. Now 4 groups
Root Percent a b c 15.3752900 98.30 0.01128 0.13817 0.08126 0.2307260 1.48 -0.04456 -0.09323 0.15451 0.0356937 0.23 -0.17289 0.09020 0.04777 S=3 M=-0.5 N=6.5 Statistic Value F NumDF Den DF Pr > F Wilks' 0.04790913 10.12 9 36.657 <.0001 Pillai's 1.16086747 3.58 9 51 0.0016 Hot'ng-Lawley 15.64170973 25.02 9 20.608 <.0001 Roy's 15.37528995 87.13 3 17 <.0001 NOTE: F Statistic for Roy's is an upper bound.
Test ?
Define
Then put
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
Data
.
Model: independent,
.
Note: this is the fixed effects model.
Usual approach: define grand mean, main effects, interactions:
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
Test additive effects:
for all
.
Usual test based on ANOVA:
Stack observations into vector
, say.
Estimate ,
, etc by least squares.
Form vectors with entries ,
etc.
Write
Fact: all vectors on RHS are independent and orthogonal. So:
Our problem is like this one BUT the errors are not modeled as independent.
In the analogy:
labels group.
labels the columns: ie
is a, b, c.
runs from 1 to
.
But
Now do analysis in SAS.
Tell SAS that the variables A, B and C are repeated measurements of the same quantity.
proc glm; class group; model a b c = group; repeated scale; run;The results are as follows:
General Linear Models Procedure Repeated Measures Analysis of Variance Repeated Measures Level Information Dependent Variable A B C Level of SCALE 1 2 3 Manova Test Criteria and Exact F Statistics for the Hypothesis of no SCALE Effect H = Type III SS&CP Matrix for SCALE E = Error SS&CP Matrix S=1 M=0 N=7 Statistic Value F NumDF DenDF Pr > F Wilks' Lambda 0.56373 6.1912 2 16 0.0102 Pillai's Trace 0.43627 6.1912 2 16 0.0102 Hotelling-Lawley 0.77390 6.1912 2 16 0.0102 Roy's 0.77390 6.1912 2 16 0.0102Note: should look at interactions first.
Manova Test Criteria and F Approximations for the Hypothesis of no SCALE*GROUP Effect S=2 M=0 N=7 Statistic Value F NumDF DenDF Pr > F Wilks' Lambda 0.56333 1.7725 6 32 0.1364 Pillai's Trace 0.48726 1.8253 6 34 0.1234 Hotelling-Lawley 0.68534 1.7134 6 30 0.1522 Roy's 0.50885 2.8835 3 17 0.0662 NOTE: F Statistic for Roy's Greatest Root is an upper bound. NOTE: F Statistic for Wilks' Lambda is exact.
Repeated Measures Analysis of Variance Tests of Hypotheses for Between Subjects Effects Source DF Type III SS Mean Square F Pr > F GROUP 3 743.900000 247.966667 70.93 0.0001 Error 17 59.433333 3.496078 Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Source: SCALE Adj Pr > F DF TypeIIISS MS F Pr > F G - G H - F 2 16.624 8.312 5.39 0.0093 0.0101 0.0093 Source: SCALE*GROUP DF TypeIII MS F Pr > F G - G H - F 6 18.9619 3.160 2.05 0.0860 0.0889 0.0860 Source: Error(SCALE) DF TypeIII SS Mean Square 34 52.4667 1.54313725 Greenhouse-Geisser Epsilon = 0.9664 Huynh-Feldt Epsilon = 1.2806Greenhouse-Geisser, Huynh-Feldt test to see if
Return to 2 way anova model. Express as:
For fixed effects model is
iid
.
For MANOVA model vector of
is MVN but
with covariance as for
.
Intermediate model. Put in subject effect.
Assume
Essentially model says
Do univariate anova: The data reordered:
1 1 1 19 1 1 2 20 1 1 3 18 2 1 1 20 2 1 2 21 2 1 3 19 et cetera 2 4 2 10 2 4 3 12 3 4 1 11 3 4 2 10 3 4 3 10 4 4 1 11 4 4 2 7 4 4 3 12The four columns are now labels for subject number, group, scale (a, b or c) and the response. The sas commands:
data long; infile 'table5.7uni'; input subject group scale score; run; proc print; run; proc glm; class group; class scale; class subject; model score =group subject(group) scale group*scale; random subject(group) ; run;Some of the output:
Dependent Variable: SCORE Sum of Mean Source DF Squares Square F Pr > F Model 28 843.5333 30.126 19.52 0.0001 Error 34 52.4667 1.543 Total 62 896.0000 Root MSE SCORE Mean 1.242231 15.33333 Source DF TypeISS MS F Pr > F GROUP 3 743.9000 247.9667 160.69 0.0001 SUBJECT(GROUP) 17 59.4333 3.4961 2.27 0.0208 SCALE 2 21.2381 10.6190 6.88 0.0031 GROUP*SCALE 6 18.9620 3.1603 2.05 0.0860 Source DF TypeIIISS MS F Pr > F GROUP 3 743.9000 247.9667 160.69 0.0001 SUBJECT(GROUP) 17 59.4333 3.4961 2.27 0.0208 SCALE 2 16.6242 8.3121 5.39 0.0093 GROUP*SCALE 6 18.9619 3.1603 2.05 0.0860 Source Type III Expected Mean Square GROUP Var(Error) + 3 Var(SUBJECT(GROUP)) + Q(GROUP,GROUP*SCALE) SUBJECT(GROUP) Var(Error) + 3 Var(SUBJECT(GROUP)) SCALE Var(Error) + Q(SCALE,GROUP*SCALE) GROUP*SCALE Var(Error) + Q(GROUP*SCALE)Type I Sums of Squares:
Type III Sums of Squares:
Notice hypothesis of no group by scale interaction is acceptable.
Under the assumption of no such group by scale interaction the hypothesis of no group effect is tested by dividing group ms by subject(group) ms.
Value is 70.9 on 3,17 degrees of freedom.
This is NOT the F value in the table above since the table above is for FIXED effects.
Notice that the sums of squares in this table match those produced in the repeated measures ANOVA. This is not accidental.