next up previous

STAT 350: Lecture 3 Example

Normal Equations for Simple Linear Regression

See Lecture 1 for the framework. Here I consider two models:

The data are

Dose Count
0 27043
0 26902
0 25959
150 27700
150 27530
150 27460
420 30650
420 30150
420 29480
900 34790
900 32020
1800 42280
1800 39370
1800 36200
3600 53230
3600 49260
3600 53030

The design matrix for the linear model is

displaymath23

You can compute tex2html_wrap_inline25 in Minitab or Splus. That matrix has 4 numbers each of which is computed as the dot product of 2 columns of X. For instance the first column dotted with itself produces tex2html_wrap_inline29 . Here is an example S session which reads in the data, produces the design matrices for the two models and computes tex2html_wrap_inline25 .

[36]skekoowahts% S
S-PLUS : Copyright (c) 1988, 1995 MathSoft, Inc.
S : Copyright AT&T.
Version 3.3 Release 1 for Sun SPARC, SunOS 5.3 : 1995 
Working data will be in .Data 
#
# The data are in a file called linear. The !
# tells S that what follows is not an S command but a standard
# UNIX (or DOS) command
#
> !more linear
  Dose Count
    0 27043
    0 26902
    0 25959
  150 27700
  150 27530
  150 27460
  420 30650
  420 30150
  420 29480
  900 34790
  900 32020
 1800 42280
 1800 39370
 1800 36200
 3600 53230
 3600 49260
 3600 53030
#
# The function help(function) produces help for a function such as
# > help(read.table)
#
# Read in the data from a file.  The file has 18 lines:
# 17 lines of data and a first line which has the names of the variables.
# the function read.table reads such data and header=T warns S
# that the first line is variable names.  The first argument of
# read.table is a character string containing the name of the file
# to read from.
#
> x_read.table("linear",header=T)
> x
   Dose Count 
 1    0 27043
 2    0 26902
 3    0 25959
 4  150 27700
 5  150 27530
 6  150 27460
 7  420 30650
 8  420 30150
 9  420 29480
10  900 34790
11  900 32020
12 1800 42280
13 1800 39370
14 1800 36200
15 3600 53230
16 3600 49260
17 3600 53030
#
# the design matrix has a column of 1s and also
# a column consisting of the first column of x
# which is just the list of covariate values
# The notation x[,1] picks out the first column of x
#
> design.mat_cbind(rep(1,17),x[,1])
#
# To print out an object you type its name!
#
> design.mat
      [,1] [,2] 
 [1,]    1    0
 [2,]    1    0
 [3,]    1    0
 [4,]    1  150
 [5,]    1  150
 [6,]    1  150
 [7,]    1  420
 [8,]    1  420
 [9,]    1  420
[10,]    1  900
[11,]    1  900
[12,]    1 1800
[13,]    1 1800
[14,]    1 1800
[15,]    1 3600
[16,]    1 3600
[17,]    1 3600
#
# Compute X^T X  -- this uses %*% to multiply matrices
# and t(x) to compute the transpose of a matrix x.
#
> xprimex <- t(design.mat)%*% design.mat
> xprimex
      [,1]     [,2] 
[1,]    17    19710
[2,] 19710 50816700
#
# Compute X^T Y
#
> xprimey <- t(design.mat)%*% x[,2] 
> xprimey
          [,1] 
[1,]    593054
[2,] 882452100
#
# Next compute the least squares estimates by solving the
# normal equations
#
> solve(xprimex,xprimey)
             [,1] 
[1,] 26806.734691
[2,]     6.968012
#
# solve(A,b) computes the solution of Ax=b for A a 
# square matrix and b a vector. Of course x=A^{-1}b.
#
#
# The next pice of code regresses the variable Count on 
# Dose taking the data from x.
#
> lm( Count~Dose,data=x)
Call:
lm(formula = Count ~ Dose, data = x)

Coefficients:
 (Intercept)     Dose 
    26806.73 6.968012

Degrees of freedom: 17 total; 15 residual
Residual standard error: 1521.238 
#
# Notice that the estimates agree with our calculations
# The residual standard error is the usual estimate of sigma
# namely the square root of the Mean Square for Error.
#
#
# Now we add a third column to fit the quadratic model
#
> design.mat2_cbind(design.mat,design.mat[,2]^2)
#
# Here is X^T X
#
> t(design.mat2)%*% design.mat2
         [,1]         [,2]         [,3] 
[1,]       17        19710 5.081670e+07
[2,]    19710     50816700 1.591544e+11
[3,] 50816700 159154389000 5.367847e+14
#
# Here is X^T Y
#
> t(design.mat2)%*% x[,2]
             [,1] 
[1,] 5.930540e+05
[2,] 8.824521e+08
[3,] 2.469275e+12
#
# But the following illustrates the dangers of doing computations
# blindly on the computer.  The trouble is that the design matrix 
# has a third column which is so much larger that the first two.
#
> solve(t(design.mat2)%*% design.mat2, t(design.mat2)%*% x[,2])
Error in solve.qr(a, b): apparently singular matrix
Dumped
#
# However, good packages know numerical techniques which avoid
# the danger.
#
> lm(Count ~ Dose+Dose^2,data=x)
Call:
lm(formula = Count ~ Dose + Dose^2, data = x)

Coefficients:
 (Intercept)     Dose     I(Dose^2) 
    26718.11 7.240314 -7.596867e-05

Degrees of freedom: 17 total; 14 residual
Residual standard error: 1571.277 
> 
#
# WARNING: you can't tell from the size of the estimate
# of an estimate such as that of beta_3 whether or not
# it is important -- you have to compare it to values
# of the corresponding covariate and to its standard error
#
> q()
# Used to quit S:  pay attention to () -- that part is essential!


next up previous



Richard Lockhart
Mon Mar 3 13:16:50 PST 1997