9.1 Similar matrices.
Definition 1 A matrix A is said to be similar to a matrix B if there exists an invertible matrix P such that A = P-1BP.
Example 1 The matrices
|
are similar because A = P-1BP
with
|
Example 2 The matrices
|
are similar because A = P-1BP
with
|
It is easy to see that the following statements hold:
(i) Every matrix is similar to itself. Indeed, we can choose P = I in this case. (ii) If a matrix A is similar to a matrix B, then B is similar to A. Indeed, if A = P-1BP, then B = PAP-1 = Q-1AQ, with Q = P-1. Therefore, we can use the terminology "two similar matrices". (iii) If a matrix A is similar to a matrix B, and B is similar to a matrix C, then A is similar to C. Indeed, if A = P-1BP, B = Q-1CQ, then A = P-1Q-1CQP = (QP)-1C(QP).
(iv) If matrices A and B are similar, then their characteristic polynomial coincide. Indeed, if A = P-1BP, then (A-lI) = P-1(B-lI)P. Applying the theorem on determinant of products of matrices, we have
|
From (iv) it follows immediately that
(v) If A and B are similar, then they have the same eigenvalues.
9.2 Matrix representations in different bases.
We know that every square matrix A generates a linear operator, also denoted by A, on the space Cn. If we denote by (A)i the i-th column of A, then
|
Suppose now that x1, x2,..., xn are arbitrary n-linearly independent vectors in Cn. As we know (see Chapter 2), every vector x in Cn is a linear combination of x1, x2,..., xn. In particular, the vector Axi, i = 1,2,...,n, are linear combinations of x1, x2,..., xn. Thus, we can write
|
(1) |
Therefore, a new matrix B = [bij]n×n has arisen, which for the given A, depends on the choice of the basis x1, x2,..., xn. We call B the matrix representation of A in the basis x1, x2,..., xn. Let consider a matrix X whose i-th column is xi, i = 1,2,...,n. Thus, we can write X in this way:
|
Since x1, x2,..., xn are linearly independent, the matrix X is invertible. We want to show that AX = XB, and for this we compare the j-th column of AX with the j-th column of XB. Since the common {i,j}-entry of the matrix AX, (AX)ij , is
|
we see that the j-column of AX coincides with Axj, wich is
|
(2) |
Analogously, the {i,j}-entry of the matrix XB, (XB)ij , is
|
hence the j-column of XB is
|
(3) |
Comparing (2) and (3), we see that the j-columns of AX and XB coincide, for all j = 1,2,...,n, which implies that AX = XB. Since X is invertible, we have B = X-1AX, which implies that A and B are similar matrices.
Thus, we have proved the following theorem.
Theorem 1 Let A is a given square matrix of order n×n. Assume that x1,...,xn are linearly independent vectors in Cn, and B is the matrix of the operator A in the basis x1,...,xn, defined by (1). Then A and B are similar matrices. Moreover, the matrix X whose columns are the vectors x1,...,xn is the matrix of similar transformation, i.e. B = X-1AX.
It is not difficult to see that the statement converse to Theorem 1 also holds. Namely, if B is a matrix which is similar to the matrix A, then there exists a basis x1,...,xn such that the matrix representation of A in this basis coincides with B. In fact, if
|
for some invertible matrix P, then it is enough to define
|
and the validity of this statement is obvious from the above analysis. In connection with Theorem 1 a natural question arises: given a matrix A, can we find a basis x1,...,xn so that the representation of A in this basis, i.e. the corrsponding matrix B, will have a simplest form?
As a simple form we can consider matrices of special form such as diagonal, matrix, upper or lower triangular, and so on. Let us start with the most simple case of matrices which are similar to diagonal ones.
Diagonalizable matrices.
Definition 2 A matrix A is called diagonalizable if it is similar to a a diagonal matrix.
Theorem 2 If A is diagonalizable, then A has n linearly independent eigenvectors.
Proof. Assume that A is diagonalizable, i.e. there exist a diagonal operator D and an invertible operator P such that
|
Let l1,...,ln be the diagonal elements of D. If we denote, as before, by ei the standard basis vector in Cn, then ei are linearly independent and D ei = li ei, i = 1,2,...n. Let xi = P-1ei. Then xi, i = 1,2,...,n, are linearly independent, and
|
i.e. xi, i = 1,2,...n, are n linearly independent eigenvectors of A.[¯]
It is remarkable that the converse statement to Theorem 2 also holds.
Theorem 3 If a matrix A has n linearly independent eigenvectors, then A is diagonalizable.
Proof. Assume that A has n linearly independent eigenvectors x1,...,xn, so that Axi = lixi for some li, i = 1,2,...,n. Hence, the matrix representation of A in the basis x1,...,xn is a diagonal operator D with diagonal elements li, i = 1,2,...n. By Theorem 1, A is similar to D, what is required to prove.[¯]
Theorem 3 also gives the method of reducing A to a diagonal form once n linearly independent eigenvectors of A are known: it is enough to form the matrix X whose columns coincide with xi, i = 1,2,...,n, and compute X-1AX.
Theorem 3 and the fact that eigenvectors corresponding to different eigenvalues are linearly independent (Theorem ? of chapter 5 ) imply the following corollary.
Corollary 1 If a matrix A has distinct eigenvalues, then A is diagonalizable.
Example 3 Reduce the matrix
|
to a diagonal form.
Solution. The matrix A has eigenvalues l1 = 1, l2 = 2, and the eigenvectors
|
respectively. Let
|
Then
|
It is natural to ask if every matrix A is diagonalizable, i.e. can we always manage to find an invertibe matrix P such that P-1AP is a diagonal matrix? From Theorem 2 we see that if a matrix A is diagonalizable, then it must have n linearly independent eigenvectors. Does every matrix have n linearly independent eigenvectors? The following example shows that the answer is "no".
Example 4 The matrix
|
does not have two linearly independent eigenvectors.
Indeed, the matrix A has only one eigenvalue l = 1. A vector
|
is an eigenvector of A if and only if Ax = x, i.e.
|
It foolows that x2 = 0, so that any two eigenvectors of A must be linearly dependent.
Therefore, the operator A in example 5 is not diagonalizable.
9.4 Jordan blocks and Jordan matrices.
Since not every matrix is diagonalizable, our next goal is to find a simplest form that a matrix can be reduced to by similar transformations. Motivated by Example 5, let us consider in more details the structure of a more general matrix of the following form
|
(4) |
i.e., Jm(l) is a square matrix of order m×m whose main diagonal consists of l's and the diagonal above the main diagonal consists of 1's, and zeros elsewhere. Thus, Jm(l) is an upper triangular matrix which has only eigenvalue l of multiplicity m. Matrix Jm(l) is called a Jordan block (of size m×m). In next paragraphs we denote Jm(l) by J for simplicity. It is not difficult to see that J has the following important property:
Property of J:
|
(5) |
The properties in (50 can be checked easily once we notice that
|
Therefore, the n vectors
|
are linearly independent, and (J-lI)mem = 0. Generalizing this example, we introduce the following definition:
Definition 3 Let A be an n×n matrix and l be an eigenvalue of A. A vector x
is called a root vector of type m of a matrix A, corresponding to the
eigenvalue l, if
|
Root vectors are also called generalized eigenvectors. If x is a root vector of type m, corresponding to l, then y = (A-lI)m-1x is an (nonzero) eigenvector corresponding to the eigenvalue l. Eigenvectors are root vectors of type 1.
Theorem 4 If x is a root vector of type m of a matrix A,
corresponding to l, then the vectors
|
are linearly independent.
Proof. We have to show that if
|
(6) |
then a1 = a2 = ... = am = 0. Apply the operator (A-lI)m-1 to both parts of (6), and taking into account that (A-lI)m-1xm = (A-lI)m-1x ¹ 0,
(A-lI)m-1 xi = (A-lI)m-1(A-lI)m-ix = (A-lI)2m-i-1x = 0 for all i = 1,2,...,m-1, we have
|
which implies am = 0. Hence, (6) is reduced to:
|
(7) |
Next, apply the operator (A-lI)m-2 to both parts of (7) and repeat the above argument, we obtain am-1 = 0, and so on until we obtain a1 = 0. Thus, a1 = a2 = ... = am = 0, which is what is required to prove.[¯]
Definition 4 A set of vectors xm, xm-1,...,x2,
x1 is called a chain for the given matrix A if xm
is a root vector of type m and
|
Thus, if xm, xm-1,...,x2, x1 is a chain corresponding to l, then x1 is a root vector of type 1, i.e. an eigenvalue, x2 is a root vector of type 2,..., xj is a root vector of type j, for all j = 1,2,...,m.
Definition 5 A matrix A of order n×n is said to be unicellular if it has a chain consisting of n vectors xn, xn-1,...,x2, x1.
Theorem 5 Every unicellular matrix is similar to a Jordan block.
Proof. Assume that A is a unicellular matrix, i.e. A has a chain of n vectors xn, xn-1,...,x2, x1. Thus, xn is a root vector of type n and
|
By Theorem 4, xn, xn-1,...,x2, x1 are n linearly independent vectors, so they form a basis in Cn. The matrix B = (A-lI) acts on the vectors of this basis in the following manner:
|
Hence, the matrix representation of (A-lI) in the basis xn, xn-1,...,x2, x1 coincides with Jn(0). By Theorem 1, (A-lI) is similar to Jn(0), i.e. there exists an invertible matrix P such that
|
(8) |
From (8) it follows that A = lI+P-1Jn(0)P = P-1(Jn(0)+lI)P = P-1Jn(l)P, which is what is required to prove.[¯]
Definition 6 A matrix A is called a Jordan matrix if A has a block diagonal form such that the diagonal blocks are Jordan blocks.
In other words, Jordan matrix is a matrix of the following form:
|
(9) |
The culmination of the matrix theory is the following theorem.
Theorem 6 Every matrix is similar to a Jordan matrix.
Definition 7 If A is an arbitrary matrix, and J is a Jordan matrix which is similar to A, then J is called the Jordan canonical form of A.
As preliminary steps to the proof of Theorem 6, we need to develop some further techniques related to the root vectors and the so called minimal functions.
9.5 Minimal polynomials.
Recall that the Calley-Hamilton's theorem states that if A is an arbitrary matrix and D(l) is its characteristic polynomial, then D(A) = 0. It follows that there are many polynomials that satisfy this conditions, e.g. if p(l) is divisible by D(l), then p(A) = 0.
Definition 8 A plolynomial p(l) is called an annihilating polynomial of a matrix A if p(A) = 0.
Lemma 1 If p(A) is an annihilating polynomial of a matrix A, and li is an eigenvalue of A, then p(li) = 0.
Proof. Since li is an eigenvalue of A, there exists a nonzero vector x such that Ax = li x. Then we have p(A)x = p(li)x = 0, which implies p(li) = 0.[¯]
Among all annihilating polynomials of A there is a polynomial of smallest degree.
Definition 9 An annihilating polynomial of A of the smallest degree is called minimal polynomial of A.
We denote the minimal polynomial of A by MA(l). It is defined uniquely up to a multiplication by a scalar.
Lemma 2 If A is a matrix with charateristic polynomial
|
(10) |
and M(l) is the minimal polynomial of A, then D(l) is divisible by M(l).
Proof. We use the method of proof by contradiction. Assume that D(l) is not divisible by M(l). This means that there exist polynomials q(l) and r(l) such that r(l) ¹ 0, the degree of r(l) is less than the degree of M(l) and
|
(11) |
From (11) it follows that
|
Since D(A) = 0, M(A) = 0, it follows that r(A) = 0. Thus, we have found a polynomial r(l) with the degree less than that of M(l), such that r(A) = 0. This is a contradiction because M(l) is the minimal polynomial of A.[¯]
Lemmas 1 and 2 imply the following corollary
Corollary 2 Let D(l)
be the characteristic polynomial of a matrix A given in (10).
Then the minimal polynomial of A has the following form (up to a
multiple coefficient):
|
where r1,r2,...,rk are some integers which satisfy the condition 1 £ ri £ mi, for all i = 1,2,...,k.
The minimal polynomial M(l) of A may have the same degree as D(l), or stricly less.
Example 5 Let A = a I be the scalar matrix of order n×n. Then DA(l) = (a-l)n, MA(l) = (a-l).
Example 6 Let
|
then DA(l) = MA(l) = (1-l)(2-l) (up to a multiple coefficient).
Example 6 can be generalized as in the following theorem.
Theorem 7 If a matrix A has simple spectrum, then DA(l) = MA(l) (up to a multiple coefficient).
Proof. Assume that A has simple spectrum, i.e. the eigenvalues l1,l2,...,ln of A are distinct. By Lemma 1, M(li) = 0, i = 1,2,...,n. Thus M(l) is divisible by (l1-l)(l2-l)...(ln-l), i.e. M(l) is divisible by D(l). Since M(l) is the minimal polynomial, it follows that M(l) = D(l) (up to a multiple coefficient).[¯]
9.6 Direct sum of subspaces.
Definition 10 Let E1,..., Ek be (nonzero)
linear subspaces of Cn. We say that Cn has a decomposition
into the direct sum of E1, E2,..., Ek, and
write
|
(12) |
if every vector x Î Cn has a unique decomposition
|
(13) |
Let dim Ei = mi.
Lemma 3 If Cn has a decomposition into direct sum as
in (12), then m1+m2+...+mk = n. Moreover, if
|
are mi linearly independent vectors in Ei, then ei,j: j = 1,2,..., mi, i = 1,2,...,k are n linearly independent vectors in Cn.
Proof. We have to prove that if
|
then aij = 0 for all i, j. Put
|
Then xi Î Ei, and (13) can be rewritten as
|
Since (12) is a direct sum, it follows that xi = 0, for all i = 1,2,..., k, i.e.
|
for each i = 1,2,...,k. Since the vectors ei,j: j = 1,2,..., mi are linearly independent, it follows that aij = 0 for all j = 1,..., mi and for all i.[¯]
Assume that A is a given matrix and we have a direct sum in (12) such that every subspace Ei is invariant with respect to A. Let ei,j: j = 1,2,..., mi be mi linearly independent vectors in Ei. Then ei,j: j = 1,2,..., mi is a basis of Ei, and the ei,j: j = 1,2,..., mi, i = 1,2,..., k is a basis of Cn. It is not difficult to see that the matrix representation of A in the basis
|
will have a block diagonal form
|
(14) |
Example 7 Assume that A has n linearly independent
eigenvectors x1,..., xn, Axi
= lixi,
i = 1,2,...,n. Let Ei be one-dimensional subspace generated by xi.
Then it is not difficult to see that Ei are invariant subspaces of A,
|
and the block diagonal form of A according to this decomposition coincides with the diagonal representation of A in the basis xi, i = 1,2,...n.
9.7 Spectral decompositions.
We introduce the following subspace Wi which plays an important role in the construction of the Jordan canonical form.
|
Theorem 8 The space Cn is the direct sum of the
subspace Wi, i.e.
|
Proof. We know that the minimal function of A is
|
Put
|
It follows from the known properties of polynomials (see Appendix A), that there exist polynomials P1(l), P2(l),.., Pk(l) such that
|
(15) |
From (15) we have
|
(16) |
Let x be a given vector in Cn. We must show that x has a unique decomposition
|
with xi Î Wi, i = 1,2,..., k. Let
|
Since (A-li I)rixi = M(A)Pi(A)x = 0, it is immediate that xi Î Wi. From (16) it foolows that x = x1+...+xk. It remains to show that the decomposition is unique. Assume, for this reason, that
|
(17) |
is another decomposition such that yi Î Wi, we have to show yi = xi for all i = 1,2,...,k. Applying to both parts of (17) the operator P1(A)Q1(A), and noting that Q1(A)y2 = Q1(A)y3 = ... = Q1(A)yk = 0, P2(A)Q2(A)y1+...+Pk(A)Qk(A)y1 = 0, we have
|
hence x1 = y1. The same argument shows that yi = xi for all i = 2,3,...,k, which is what is required to prove.[¯]
It is not difficult to see that each subspace Wi is invariant with respect to A. Indeed, if x Î Wi, then (A-li I)rix = 0, hence (A-li I)riAx = A(A-li)rix = A0 = 0, hence Ax Î Wi. Put mi = dim Wi. It follows that m1+m2+...+mk = n.
Lemma 4 ri £ mi, for all i = 1,2,...,k.
Proof. Applying Lemma 3, we see that if we choose in each subspace Wi a basis ei,j: j = 1,2,..., mi, then
|
is a basis of Cn in which the matrix representation of A has a block diagonal form as in (14). The block Ai in (14) corresponds to the restriction of A to Wi. [¯]
Lemma 5 Let Ai be the block of A in the subspace Wi. Then (li-l)ri is the minimal function of Ai.
Proof. Let us denote by Mi(l) the minimal polynomial of Ai. It follows from the definition of Wi that (li-l)ri is an annihilating polynomial of Ai, for all i = 1,2,...,k. Thus, for each i = 1,2,...,k, there is pi £ ri such that Mi(l) = (li-l)pi. Since Mi(Ai) = 0, it follows that if N(l) = M1(l)...Mk(l) then
|
i.e. N(l) is an annihilating polynomial of A. It follows that
|
(18) |
If at least for one i we have pi < ri, then the degree of N(l) would be strictly less than the degree of M(l), which is a contradiction to (18). The lemma is proved.[¯]
In order to show that A is similar to a Jordan matrix it remains to show that each block Ai in (14) is similar to a Jordan matrix. Each block Ai is characterized by the property that its has only one eigenvalue li (i.e. one point spectrum). Moreover, (li I-Ai)ri = 0 and (li I-A)ri-1 ¹ 0. Put B = (li I-Ai). Then Bri = 0.
Definition 11 A matrix B is called a nilpotent matrix of there exists r such that Br = 0.
Thus, to show that Ai is a jordan matrix, it remains to show that every nilpotent matrix is similar to a Jordan matrix. But first let us introduce the notion of complementary subspaces.
9.8 Complementary subspaces.
Definition 12 Let E be a linear subspace of Cn. A subspace F of Cn is called complementary subspace to E in Cn if C is the direct sum of E and F.
Lemma 6 A complementary subspace to a given subspace E of Cn always exists.
Proof. For a given subspace E of Cn, we present a construction of a complementary subspace F in the following way. Choose a basis in E, say x1,..., xk, and extend it to be a basis x1,...,xk, xk+1,..., xn in Cn. Set F = L(xk+1,..., xn). It is not difficult to see that Cn is a direct sum of E and F. [¯]
Analogously, if E1 is a subspace of E then a subspace E2 of E is called a complementary subspace to E1 in E2 if E is a direct sum of E1 and E2. Moreover, we have dim E = dim E1 +dim E2. There are infinitely many subspaces complementary to a given subspace E1 in E (E1 ¹ E).
Definition 13 Let E1 be a subspace of Cn. The vectors x1,x2,..., xp in Cn are called linearly independent over E1 if x1,..., xp are linearly independent and such that the sum E+F, where F1: = L(x1,...,xp), is a direct sum.
In other words, x1,..., xp are linearly independent over E1 if from
|
it follows that a1 = a2 = ... = ap = 0.
Lemma 7 If E1 is a linear subspace of a space E and dim E = r and dim E1 = q, then there are r-q vectors in E which are linearly independent over E1.
Proof. The existence of r-q vectors satisfying the condition in the lemma follows immediately from the argument in Lemma 7 . On the other hand, if there are more than r-q vectors x1,..., xl, l > r-q, which are linearly independent over E1, then dim E = q+l > r, which is a contradiction. The lemma is proved.[¯]
9.9 Nilpotent matrices.
Assume that B is a nilpotent matrix of order m×m and r is a positive integer such that
|
(19) |
We put
|
Lemma 8 We have the following strict inclusions
|
Proof. If x Î Wi, then Bix = 0. This implies that Bi+1 = 0. i.e. x Î Wi+1. This proves the inclusion Wi Ì Wi+1. Next we show that all the inclusions are strict, i.e. Wi ¹ Wi+1 for all i = 0,1,..., r = 1. Assuming the contrary, i.e. Wi = Wi+1 for some i, 0 £ i < r. Then we have Br-1-iWi = Br-1-iWi+1, which implies Wr-1 = Wr = Cm. Hence Br-1 = 0, which is a contradiction to (19). The lemma is proved.[¯]
Let
|
We have, by Lemma ?, 0 = d0 < d1 < ... < dr = m. Since Wr-1 ¹ Wr, there are linearly independent vectors x1,..., xp1, (p1 = dr-dr-1) which are linearly independent over Wr-1.
Lemma 9 The vectors
|
(20) |
are linearly independent over Wr-2.
Proof. We prove by contradiction. Assume that Bx1, ..., Bxp1 are not linearly independent over Wr-2. That is, there exists a vector y Î Wr-2, y ¹ 0, such that
|
(21) |
Applying Br-2 to both parts of (21), noting that Br-2y = 0, we have
|
(22) |
From (22) it follows that a1 x1+...+ap1xp1 Î Wr-1, which is a contradiction sine x1,..., xp1 are linearly independent over Wr-1. The lemma is proved.[¯]
Corollary 3 dr-1-dr-2 ³ dr-dr-1 .
Proof. Apply Lemma 8 and Lemma 9.[¯]
Thus, we have a system of p1 vectors Bx1,..., Bxp1 in Wr-1. Since dr-1-dr-2 ³ p1, we can extend this system Bx1,..., Bxp1 to a system of dr-1-dr-2 vectors linearly independent over Wr-2, by adding new ones xp1+1, xp1+2,..., xp2 ( p2 = dr-1-dr-2). Next we apply the operator B to the obtained system of vectors Bx1,...,Bxp1, xp1+1,..., xp2 in Wr-1, so that we obtain vectors
|
(23) |
in Wr-2. By the same reasoning, we see that the vectors in (23) are linearly independent over Wr-3. As in Lemma 10, we see that dr-2-dr-3 ³ dr-1-dr-2 ( = p2), so that we can add new vectors xp2+1,...,xp3 to obtain a system consisting of dr-2-dr-3 vectors linearly independent over Wr-3 (p3 = dr-2-dr-3). We continue this process to exhaust all the subspaces Wr-3, Wr-4,...,W1, W0 = {0}. We list the obtained systems of vectors in the following table.
|
(24) |
The vectors in this table are linearly independent and form a basis of Cm. They are arranged in columns, so we see that the first p1 columns contain r vectors, the next (p2-p1) columns contain (r-1) vecrors, and so on. The linear hull of vectors of each column is an invariant subspace with respect to B, hence Cm is a direct sum of invariant subspaces for B. On each of the subspace the matrix B, in the chosen "column" basis, is a Jordan block, according to Theorem 5. Hence B is similar to a Jordan matrix. Thus, we have proved the Theorem 6.
9.10 Examples of Jordan forms. In this section we show how to practically calculate the Jordan canonical form of a matrix.