Chapter 9: The Jordan Forms

9.1 Similar matrices.

Definition 1 A matrix A is said to be similar to a matrix B if there exists an invertible matrix P such that A = P-1BP.

Example 1 The matrices

A =

é
ê
ë

 

1

2

3

4

 

ù
ú
û

  and  B =

é
ê
ë

 

2

1

4

3

 

ù
ú
û

 

are similar because A = P-1BP with

P = P-1 =

é
ê
ë

 

0

1

1

0

 

ù
ú
û

 

Example 2 The matrices

A =

é
ê
ë

 

1

2

2

1

 

ù
ú
û

  and  B =

é
ê
ë

 

3

0

0

-1

 

ù
ú
û

 

are similar because A = P-1BP with

P =

é
ê
ë

 

1

1

1

-1

 

ù
ú
û

 

It is easy to see that the following statements hold:

(i) Every matrix is similar to itself. Indeed, we can choose P = I in this case. (ii) If a matrix A is similar to a matrix B, then B is similar to A. Indeed, if A = P-1BP, then B = PAP-1 = Q-1AQ, with Q = P-1. Therefore, we can use the terminology "two similar matrices". (iii) If a matrix A is similar to a matrix B, and B is similar to a matrix C, then A is similar to C. Indeed, if A = P-1BP,   B = Q-1CQ,  then A = P-1Q-1CQP = (QP)-1C(QP).

(iv) If matrices A and B are similar, then their characteristic polynomial coincide. Indeed, if A = P-1BP, then (A-lI) = P-1(B-lI)P. Applying the theorem on determinant of products of matrices, we have

 

DA(l) = |(A-lI)| = |P-1(B-lI)P| = |P-1||P||(B-lI)| =

|P-1P||(B-lI)| = |(B-lI)| = DB(l).

 

From (iv) it follows immediately that

(v) If A and B are similar, then they have the same eigenvalues.

9.2 Matrix representations in different bases.

We know that every square matrix A generates a linear operator, also denoted by A, on the space Cn. If we denote by (A)i the i-th column of A, then

(A)i = Aei =

n
å
k = 1 

akiek.

Suppose now that x1, x2,..., xn are arbitrary n-linearly independent vectors in Cn. As we know (see Chapter 2), every vector x in Cn is a linear combination of x1, x2,..., xn. In particular, the vector Axi, i = 1,2,...,n, are linear combinations of x1, x2,..., xn. Thus, we can write

Axi = b1ix1+b2ix2+...+bnixn =

n
å
k = 1 

bkixk.

(1)

Therefore, a new matrix B = [bij]n×n has arisen, which for the given A, depends on the choice of the basis x1, x2,..., xn. We call B the matrix representation of A in the basis x1, x2,..., xn. Let consider a matrix X whose i-th column is xi, i = 1,2,...,n. Thus, we can write X in this way:

X = [x1  x2   ...   xn].

Since x1, x2,..., xn are linearly independent, the matrix X is invertible. We want to show that AX = XB, and for this we compare the j-th column of AX with the j-th column of XB. Since the common {i,j}-entry of the matrix AX, (AX)ij , is

(AX)ij =

n
å
k = 1 

aikxkj,

we see that the j-column of AX coincides with Axj, wich is

Axj =

n
å
k = 1 

bkjxk =

n
å
k = 1 

bkj

é
ê
ê
ê
ê
ê
ê
ê
ë

 

x1k

x2k

·

·

xnk

 

ù
ú
ú
ú
ú
ú
ú
ú
û

=

é
ê
ê
ê
ê
ê
ê
ê
ë

 

x11b1j+x12b2j+...+x1nbnj

x21b1j+x22b2j+...+x2nbnj

·

·

xn1b1j+xn2b2j+...+xnnbnj

 

ù
ú
ú
ú
ú
ú
ú
ú
û

 

(2)

Analogously, the {i,j}-entry of the matrix XB, (XB)ij , is

(XB)ij =

n
å
k = 1 

xikbkj,

hence the j-column of XB is

 

é
ê
ê
ê
ê
ê
ê
ê
ë

 

x11b1j+x12b2j+...+x1nbnj

x21b1j+x22b2j+...+x2nbnj

·

·

xn1b1j+xn2b2j+...+xnnbnj

 

ù
ú
ú
ú
ú
ú
ú
ú
û

 

(3)

Comparing (2) and (3), we see that the j-columns of AX and XB coincide, for all j = 1,2,...,n, which implies that AX = XB. Since X is invertible, we have B = X-1AX, which implies that A and B are similar matrices.

Thus, we have proved the following theorem.

Theorem 1 Let A is a given square matrix of order n×n. Assume that x1,...,xn are linearly independent vectors in Cn, and B is the matrix of the operator A in the basis x1,...,xn, defined by (1). Then A and B are similar matrices. Moreover, the matrix X whose columns are the vectors x1,...,xn is the matrix of similar transformation, i.e. B = X-1AX.

It is not difficult to see that the statement converse to Theorem 1 also holds. Namely, if B is a matrix which is similar to the matrix A, then there exists a basis x1,...,xn such that the matrix representation of A in this basis coincides with B. In fact, if

B = P-1AP

for some invertible matrix P, then it is enough to define

xi

def
=
 

 i-th  column of  P,

and the validity of this statement is obvious from the above analysis. In connection with Theorem 1 a natural question arises: given a matrix A, can we find a basis x1,...,xn so that the representation of A in this basis, i.e. the corrsponding matrix B, will have a simplest form?

As a simple form we can consider matrices of special form such as diagonal, matrix, upper or lower triangular, and so on. Let us start with the most simple case of matrices which are similar to diagonal ones.

Diagonalizable matrices.

Definition 2 A matrix A is called diagonalizable if it is similar to a a diagonal matrix.

Theorem 2 If A is diagonalizable, then A has n linearly independent eigenvectors.

Proof. Assume that A is diagonalizable, i.e. there exist a diagonal operator D and an invertible operator P such that

A = P-1DP.

Let l1,...,ln be the diagonal elements of D. If we denote, as before, by ei the standard basis vector in Cn, then ei are linearly independent and D ei = li ei, i = 1,2,...n. Let xi = P-1ei. Then xi, i = 1,2,...,n, are linearly independent, and

Axi = P-1DPxi = P-1Dei = P-1(li ei) = li P-1ei = lixi,

i.e. xi, i = 1,2,...n, are n linearly independent eigenvectors of A.[¯]

It is remarkable that the converse statement to Theorem 2 also holds.

Theorem 3 If a matrix A has n linearly independent eigenvectors, then A is diagonalizable.

Proof. Assume that A has n linearly independent eigenvectors x1,...,xn, so that Axi = lixi for some li, i = 1,2,...,n. Hence, the matrix representation of A in the basis x1,...,xn is a diagonal operator D with diagonal elements li, i = 1,2,...n. By Theorem 1, A is similar to D, what is required to prove.[¯]

Theorem 3 also gives the method of reducing A to a diagonal form once n linearly independent eigenvectors of A are known: it is enough to form the matrix X whose columns coincide with xi, i = 1,2,...,n, and compute X-1AX.

Theorem 3 and the fact that eigenvectors corresponding to different eigenvalues are linearly independent (Theorem ? of chapter 5 ) imply the following corollary.

Corollary 1 If a matrix A has distinct eigenvalues, then A is diagonalizable.

Example 3 Reduce the matrix

A =

é
ê
ë

 

-19

-28

15

22

 

ù
ú
û

 

to a diagonal form.

Solution. The matrix A has eigenvalues l1 = 1, l2 = 2, and the eigenvectors

x1 =

é
ê
ë

 

7

-5

 

ù
ú
û

,  x2 =

é
ê
ë

 

-4

3

 

ù
ú
û

,

respectively. Let

X =

é
ê
ë

 

7

-4

-5

3

 

ù
ú
û

,  X-1 =

é
ê
ë

 

3

4

5

7

 

ù
ú
û

.

Then

X-1AX =

é
ê
ë

 

1

0

0

2

 

ù
ú
û

 

It is natural to ask if every matrix A is diagonalizable, i.e. can we always manage to find an invertibe matrix P such that P-1AP is a diagonal matrix? From Theorem 2 we see that if a matrix A is diagonalizable, then it must have n linearly independent eigenvectors. Does every matrix have n linearly independent eigenvectors? The following example shows that the answer is "no".

Example 4 The matrix

A =

é
ê
ë

 

1

1

0

1

 

ù
ú
û

 

does not have two linearly independent eigenvectors.

Indeed, the matrix A has only one eigenvalue l = 1. A vector

x =

é
ê
ë

 

x1

x2

 

ù
ú
û

 

is an eigenvector of A if and only if Ax = x, i.e.

 

é
ê
ë

 

x1+x2

x2

 

ù
ú
û

=

é
ê
ë

 

x1

x2

 

ù
ú
û

 

It foolows that x2 = 0, so that any two eigenvectors of A must be linearly dependent.

Therefore, the operator A in example 5 is not diagonalizable.

9.4 Jordan blocks and Jordan matrices.

Since not every matrix is diagonalizable, our next goal is to find a simplest form that a matrix can be reduced to by similar transformations. Motivated by Example 5, let us consider in more details the structure of a more general matrix of the following form

Jm(l)

é
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ë

 

l

1

0

0

¼

0

0

l

1

0

¼

0

0

0

l

1

¼

0

¼

¼

¼

¼

¼

¼

0

0

¼

¼

l

1

0

0

¼

¼

¼

l

 

ù
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
û

,

(4)

i.e., Jm(l) is a square matrix of order m×m whose main diagonal consists of l's and the diagonal above the main diagonal consists of 1's, and zeros elsewhere. Thus, Jm(l) is an upper triangular matrix which has only eigenvalue l of multiplicity m. Matrix Jm(l) is called a Jordan block (of size m×m). In next paragraphs we denote Jm(l) by J for simplicity. It is not difficult to see that J has the following important property:

Property of J:

 

(J-lI)em = em-1,

(J-lI)em-1 = em-2,

¼

(J-lI)e2 = e1,

(J-lI)e1 = 0.

 

(5)

The properties in (50 can be checked easily once we notice that

J-lI = Jn(l)-lI = Jn(0) = Jn(0)

é
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ë

 

0

1

0

0

¼

0

0

0

1

0

¼

0

0

0

0

1

¼

0

¼

¼

¼

¼

¼

¼

0

0

¼

¼

0

1

0

0

¼

¼

¼

0

 

ù
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
û

,

Therefore, the n vectors

em, em-1 = (J-lI)em, em-2 = (J-lI)2 em, ..., e1 = (J-l)m-1em

are linearly independent, and (J-lI)mem = 0. Generalizing this example, we introduce the following definition:

Definition 3 Let A be an n×n matrix and l be an eigenvalue of A. A vector x is called a root vector of type m of a matrix A, corresponding to the eigenvalue l, if

(A-lI)mx = 0  and  (A-lI)m-1x ¹ 0.

Root vectors are also called generalized eigenvectors. If x is a root vector of type m, corresponding to l, then y = (A-lI)m-1x is an (nonzero) eigenvector corresponding to the eigenvalue l. Eigenvectors are root vectors of type 1.

Theorem 4 If x is a root vector of type m of a matrix A, corresponding to l, then the vectors

x1 = (A-lI)m-1 x, x2 = (A-lI)m-2 x,...,xm-1 = (A-lI)x, xm = x

are linearly independent.

Proof. We have to show that if

a1x1+a2x2+...+am xm = 0,

(6)

then a1 = a2 = ... = am = 0. Apply the operator (A-lI)m-1 to both parts of (6), and taking into account that (A-lI)m-1xm = (A-lI)m-1x ¹ 0,

(A-lI)m-1 xi = (A-lI)m-1(A-lI)m-ix = (A-lI)2m-i-1x = 0 for all i = 1,2,...,m-1, we have