Postscript version of this file


STAT 450

Linear Algebra Review Notes

Notation:

Definition: The transpose, AT, of an $m \times n$ matrix A is the $n \times m$ matrix whose entries are given by

(AT)ij = Aji

so that AT is $n \times m$. We have

(A+B)T = AT+BT

and

\begin{displaymath}(AB)^T = B^T A^T \, .
\end{displaymath}

Definition: The rank of a matrix A is the number of linear independent columns of A; we use $\text{rank}(A)$ for notation. We have
\begin{align*}\text{rank}(A) & = \text{dim}(\text{column space of $A$ })
\\
& = \text{dim}(\text{row space of $A$ })
\\
& = \text{rank}(A^T)
\end{align*}

If A is $m \times n$ then $\text{rank}(A) \le \min(m,n)$.

Matrix inverses

For this little section all matrices are square $n \times n$ matrices.

If there is a matrix B such that $BA=I_{n \times n}$ then we call B the inverse of A. If B exists it is unique and AB=I and we write B=A-1. The matrix A has an inverse if and only if $\text{rank}(A) = n$.

Inverses have the following properties:

(AB)-1 = B-1A-1

(if one side exists then so does the other) and

(AT)-1 = (A-1)T

Determinants

Again A is $n \times n$. The determinant if a function on the set of $n \times n$ matrices such that:

1.
$\text{det}(I) = 1$.

2.
If $A^\prime$ is the matrix A with two columns interchanged then

\begin{displaymath}\text{det}(A^\lq prime) = - \text{det}(A)\, .
\end{displaymath}

(Notice that this means that two equal columns guarantees $\text{det}(A)=0$.)

3.
$\text{det}(A)$ is a linear function of each column of A. That is if $A = ( a_1, \ldots, a_n)$ with ai denoting the ith column of the matrix then
\begin{align*}\text{det}(a_1,\ldots,a_i+b_i,\ldots,a_n) = &
\text{det}(a_1,\ldots,a_i,\ldots,a_n)
\\
& + \text{det}(a_1,\ldots,b_i,\ldots,a_n)
\end{align*}

Here are some properties of the determinant:

4.
$\text{det}(A^T) = \text{det}(A)$.

5.
$\text{det}(AB) = \text{det}(A)\text{det}(B)$.

6.
$\text{det}(A^{-1})_ = 1/\text{det}(A)$.

7.
A is invertible if and only if $\text{det}(A) \neq 0$ if and only if $\text{rank}(A) = n$.

8.
Determinants can be computed (slowly) by expansion by minors.

Special Kinds of Matrices

1.
A is symmetric if AT=A.

2.
A is orthogonal if AT=A-1 (or AAT = ATA=I).

3.
A is idempotent if $AA\equiv A^2 = A$.

4.
A is diagonal if $i \neq j$ implies Aij=0.

Inner Products and orthogonal and orthonormal vectors

Definition: Two vectors x and y are orthogonal if $x^T y = \sum x_i y_i
= 0$.

Definition: The inner product or dot product of x and y is

\begin{displaymath}<x,y> = x^Ty = \sum x_i y_i
\end{displaymath}

Definition: x and y are orthogonal if xTy=0.

Definition: The norm (or length) of x is $\vert\vert x\vert\vert = (x^tx)^{1/2} = (\sum x_i^2)^{1/2}$

A is orthogonal if each column of A has length 1 and is orthogonal to each other column of A.

Quadratic Forms

Suppose A is an $n \times n$ matirx. The function

g(x) = xT A x

is called a quadratic form. Now
\begin{align*}g(x) & = \sum_{ij} A_{ij} x_i x_j
\\
& = \sum_i A_{ii} x_i^2 + \sum_{i < j} (A_{ij}+A_{ji}) x_ix_j
\end{align*}
so that g(x) depends only on the total Aij+Aji. In fact

\begin{displaymath}x^T Ax = x^T A^T x = x^T\left(\frac{A+A^T}{2} \right) x
\end{displaymath}

Thus we will assume that A is symmetric.

Eigenvalues and eigenvectors

If A is $n \times n$ and $v\neq 0\in R^n$ and $\lambda\in R$ are such that

\begin{displaymath}Av=\lambda v
\end{displaymath}

then we saythat $\lambda$ is an eigenvalue (or characteristic value or latent value) of A and that v is the corresponding eigenvector. Since $Av-\lambda v = (A-\lambda I)v=0$ we find that $A-\lambda I$ must be singular. Therefore $\text{det}(A-\lambda I) =0$. Conversely if $A-\lambda I$ is singular then there is a $v \neq 0$ such that $(A-\lambda I)v=0$. In fact, $\text{det}(A-\lambda I)$ is a polynomial function of $\lambda$ of degree n. Each root is an eigenvalue. For general A the roots could be multiple roots or complex valued.

Diagonalization

A matrix A is diagonalized by a non-singular matrix P is $P^{-1} A P \equiv D$ is a diagonal matrix. If so then AP=PD and each column of P is an eignevector of A with the ith column having eigenvlaue Dii. Thus to be diagonalizable A must have n linearly independent eigenvectors.

Symmetric Matrices

If A is symmetric then

1.
Every eigenvalue of A is real (not complex).

2.
A is diagonalizable and the columns of P may be taken to be orthogonal to each other and of unit length. In other words, A is diagonalizable by an orthogonal matrix P; in symbols PTAP = D. The diagonal entries in D are the eigenvalues of A.

3.
If $\lambda_1 \neq \lambda_2$ are two eigenvalues of Aand v1 and v2 are corresponding eigenvectors then

\begin{displaymath}v_1^T A v_2 = v_1^T \lambda_2 v_2 = \lambda_2 v_1^T v_2
\end{displaymath}

and
\begin{align*}(v_1^T A v_2 )& = (v_1^T A v_2 )^T
\\
& = v2^T A^T v_1
\\
& = v_2^T A v_1
\\
& = v_2^T \lambda_1 v_1
\\
& = \lambda_1 v_2^T v_1
\end{align*}
Since $(\lambda_1-\lambda_2)v_1^t v_2 =0$ and $\lambda_1 \neq \lambda_2$ we see that v1T v2 = 0. In other words eigenvectors corresponding to distinct eigenvalues are orthogonal.

Orthogonal Projections

Suppose that S is a vector subspace of Rn and that $a_1,\ldots,a_m$ are a basis for S. Given any $x\in R^n$ there is a unique $y\in S$ which is closest to x. That is, y minimizes

(x-y)T(X-y)

over $y\in S$. Any y in S is of the form

\begin{displaymath}y = c_1 a_1 + \cdots + c_m a_m = Ac
\end{displaymath}

where A is the $n \times m$ matrix with columns $a_1,\ldots,a_m$ and c is the column vector with ith entry ci. Define

Q= A(ATA)-1AT

(The fact that A has rank m guarantees that ATA is invertible.) Then
\begin{align*}(x-Ac)^T(x-Ac) & = (x-Qx+Qx-Ac)^T (x-Qx+Qx-Ac)
\\
& = (x-Qx)^T(x-...
...+ (Qx-Ac)^T(x-Qx)
\\
& \qquad +(x-Qx)^T(Qx-Ac) +
(Qx-Ac)^T(Qx-Ac)
\end{align*}
Note that x-Qx = (I-Q)x and that

QAc = A(ATA)-1ATAc = Ac

so that

Qx-Ac = Q(x-Ac)

Then

(Qx-Ac)T(x-Qx) = (x-Ac)T QT (I-Q) x

Since QT=Q we see that
\begin{align*}Q^T(I-Q) & = Q(I-Q)
\\
& = Q-Q^2
\\
& = Q-A(A^TA)^{-1}A^TA(A^TA)^{-1}A^T
\\
& = Q-Q
\\
& = 0
\end{align*}
This shows that

(x-Ac)T(x-Ac) = (x-Qx)T(x-Qx) + (Qx-Ac)T(Qx-Ac)

Now to choose Ac to minimize this quantity we need only minimize the second term. This is achieved by making Qx=Ac. Since Qx=A(ATA)-1AT x this can be done by taking c=(ATA)-1AT x. In summary we find that the closest point y in S is

y=Qx=A(ATA)-1AT x

We call y the orthogonal projection of x onto S.

Notice that the matrix Q is idempotent:

Q2=Q

We call Qx the orthogonal projection of x on S because Qx is perpendicular to the residual x-Qx=(I-Q)x.

Partitioned Matrices

Suppose that A11 is a $ p \times r$ matrix, A1,2 is $p \times s$, A2,1 is $q \times r$ and A2,2 is $q \times s$. Then we could make a big $(p+q) \times (r+s)$ matrix by putting together the Aij in a 2 by 2 matrix giving the following picture:

\begin{displaymath}A = \left[ \begin{array}{cc}
A_{11} & A_{12}
\\
A_{21} & A_{22}
\end{array}\right]
\end{displaymath}

For instance if

\begin{displaymath}A_{11} = \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\right]
\end{displaymath}


\begin{displaymath}A_{12} = \left[ \begin{array}{c} 2 \\ 3 \end{array}\right]
\end{displaymath}


\begin{displaymath}A_{21} = \left[ \begin{array}{cc} 4 & 5\end{array}\right]
\end{displaymath}

and

\begin{displaymath}A_{22} = \left[ 6 \right]
\end{displaymath}

then

\begin{displaymath}A = \left[ \begin{array}{cc\vert c}
1 & 0 & 2
\\
0 & 1 & 3
\\
\hline
4 & 5 & 6
\end{array}\right]
\end{displaymath}

where I have drawn in lines to indicate the partitioning.

We can work with partitioned matrices just like ordinary matrices always making sure that in products we never change the order of multiplication of things.

\begin{displaymath}\left[ \begin{array}{cc}
A_{11} & A_{12}
\\
A_{21} & A_{22}
...
...2}+B_{12}
\\
A_{21}+B_{21} & A_{22}+B_{22}
\end{array}\right]
\end{displaymath}

and

\begin{displaymath}\left[ \begin{array}{cc}
A_{11} & A_{12}
\\
A_{21} & A_{22}
...
...1}+A_{22}B_{21} &A_{21}B_{12}+ A_{22}B_{22}
\end{array}\right]
\end{displaymath}

In these formulas the partitioning of A and B must match. For the addition formula the dimensions of Aij and Bijmust be the same. For the multiplication formula A12 must have as many columns as B21 has rows and so on. In general Aij and Bjk must be of the right size for AijBjkto make sense for each i,j,k.

The technique can be used with more than a 2 by 2 partitioning.

Definition: A block diagonal matrix is a partitioned matrix Awith pieces Aij for which Aij=0 if $i \neq j$. If

\begin{displaymath}A = \left[ \begin{array}{cc}
A_{11} & 0
\\
0 & A_{22}
\end{array}\right]
\end{displaymath}

then A is invertible if and only if each Aii is invertible and then

\begin{displaymath}A^{-1} = \left[ \begin{array}{cc}
A_{11}^{-1} & 0
\\
0 & A_{22}^{-1}
\end{array}\right]
\end{displaymath}

Moreover $\text{det}(A) = \text{det}(A_{11}) \text{det}(A_{22})$. Similar formulas work for larger matrices.



Richard Lockhart
1999-09-23