Chapter 7: Functions of matrices

7.1 Polynomials and exponential functions. Let A be a square matrix. Then An is defined for any n = 0,1,.... Hence, if p(l) is a polynomial given by

p(l) = anln+an-1ln-1+...+a1l+a0,

then one can define p(A) by

p(A) = anAn+an-1An-1+...+a1A+a0I.

It is clear that p(A) is again a matrix of the same order.

Definition 1 A sequence of matrices An is said to be convergent to a matrix A if aij(k) converges to aij, as k®¥, for every i,j = 1,2,...,n.

Definition 2 The series

 

¥
å
1 

An

is said to converge to A if it partial sums

Sn =

n
å
i = 1 

Ai

converge to A.

It is well known from calculus that the function f(x) = ex has the following representation in series:

ex = 1+

x


1!

+

x2


2!

+...+

xk


k!

+... =

¥
å
k = 0 

 

xk


k!

,

and that the series converges for every x. In the next theorem we show that analogous result holds true if instead of the scalar x we put a matrix A.

Theorem 1 For every matrix A, the series

I+

A


1!

+

A2


2!

+...+

Ak


k!

+...

converges.

Proof: Denote the elements of the matrix Ak by aij(k). Then the partial sum

Sk = I+

A


1!

+...+

Ak


k!

 

is a matrix whose elements are

s(k)ij = dij+

aij


1!

+

a(2)ij


2!

+...+

a(k)ij


k!

.

Denote, for convernience, aij(0) = dij. We need to show that the series

 

¥
å
k = 0 

 

a(k)ij


k!

 

converges, for every i,j = 1,2,...,n. Using the well known test of absolute convergence, we see easily that it is enough to show that there exists a positive number r > 0, depending on A but independent of k, such that

|a(k)ij| £ rk,   for all   k = 1,2,...

(1)

We choose r = åi,j = 1n |aij| and prove the inequality by induction.

The case k = 1: The inequality (?) for k = 1 is obvious.

Assume that (1) for k = m. We prove that it also holds for k = m+1. We have, using the identity Am+1 = AmA, that

a(m+1)ij = a(m)i1a1j+a(m)i2a2j+...+a(m)inanj

Hence

|a(m+1)ij| £ lm (|a1j|+a2j|+...+anj| £ lambdaml = lm+1

which is what is required to prove.

The sum of the series in (?) is denoted by eA.

Example: Let A be a diagonal matrix with the diagonal elements l1,...,ln. Then Ak is a diagonal matrix whose diagonal consists of l1k,l2k,...,lnk. From this one can see easily that eA is a diagonal matrix whose elements are el1, el2,...,eln.

7.2 The Caley-Hamilton Theorem.

Theorem 2 If D(l) º DA(l) is the characteristic polynomial of the matrix A, then D(A) = 0.

Proof for the case of simple spectrum:

We first give the proof of Calley-Hamilton's Theorem for the special case of matrices with simple spectrum (i.e. for matrices having distinct eigenvalues).

Assume that A have distinct eigenvalues l1,l2,...,ln, and le xi be an eigenvector correspoonding to the eigenvalue li (i = 1,2,...,n). By Theorem ?, the vectors xi are linearly independent. Therefore, the linear hull L(x1,...,xn) coincides with Cn. Since (A-li)xi = 0 and

D(A) = (-1)n(A-l1I)(A-l2I)...(A-lnI),

it follows that D(A)xi = 0, for all i = 1,2,...,n. Hence D(A)y = 0 for all y Î L(x1,...,xn) = Cn, so that D(A) = 0.

Proof for the general case: In order to prove that D(A) = 0, we need to show that the only eigenvalue of D(A) is 0. So, assume that l is an eigenvalue of D(A). Consider the set E of all eigenvectors of D(A) corresponding to the eigenvalue l, i.e.

E = {x Î Cn: D(A)x = lx}.

As we have seen before (and easily verified), E is a nonzero linear subspace of Cn. The subspace E is invariant with respect to A in the following sense: if x Î E, then Ax Î E. In fact, if x Î E, then D(A)x = lx. Therefore, D(A)Ax = AD(A)x = A(lx) = lAx, i.e. Ax Î E.

It follows that we can consider the restriction of the operator A to the subspace E. Such an operator must have an eigenvalue and an eigenvector (see?). Thus, there exists m and x Î E such that

Ax = mx.

Clearly, m is an eigenvalue of the matrix A, hence there exists a li such that m = li, and (A-liI)x = 0. This implies that

D(A)x = (-1)n(A-l1I)...(A-liI)...(A-lnI)x = 0.

On the other hand, D(A)x = lx, so we have lx = 0, or l = 0. Thus, all the eigenvalues of D(A) must be zero, so D(A) = 0.

Caley-Hamilton's Theorem also gives another formula for the inverse matrices. Namely, if A is invertible, then |A| = (-1)na0 ¹ 0, hence

A(An-1+an-1An-2+...+a1I) = -a0I,

so that

A-1 = -a0-1(An-1+....).

7.3 Applications of Calley-Hamilton's Theorem.

Let A be a given matrix with the characteristic function D(l), and let f be an arbitrary polynomial:

f(l) = amlm+am-1lm-1+...+a1l+a0.

One can use the formula (?) to calculate f(A), but it m is a large number, then such method is not convenient. Calley-Hamilton's Theorem provides an effective alternative method of calculation of f(A).

First we note that one can always find a polynomial q(l) and r(l such that the degree of r(l) is less than the degree of D(l), and

f(l) = D(l)q(l)+r(l).

(2)

From (?) it follows that f(A) = D(A)q(A)+r(A), so by Calley-Hamilton's Theorem we have f(A) = r(A), i.e. we have reduced the calculation of f(A) to r(A), where r(A) always has the degree less than the order of the matrix A.

In general, representation (?) can be found, e.g. through the so called long division method. However, since we do not need the information about q(l) and we need only to find r(l), we can apply a different, usually more efficient, method. This method uses the information about roots of the polynomial D(l).

Namely, it is easy to see from formula (?) that if li is a root of D(l), then

f(li) = r(li).

(3)

More generally, and this also can be seen from formula (?), if li is a root of D(l) of multiplicity ki, then we have

 

f(li) = r(li),

(4)

f¢(li) = r¢(li),

(5)

....

(6)

f(ki-1)(li) = r(ki-1)(li).

(7)

 

Formula (?)-(?) give relations for determining the coefficients of r(A).

Example 1 Let

A =

é
ê
ë

 

1

2

2

1

 

ù
ú
û

 

and f(l) = l5-3l4+l2+2. Find f(A).

Answer: Since A is a 2×2 matrix, the degree of r(l) is one, i.e.

r(l) = al+b.

The eigenvalues of the matrix A are l1 = 3, l2 = -1. Hence, by formula (?):

 

r(3) = 3a+b = f(3) = 11

(8)

r(-1) = -a+b = f(-1) = -1.

(9)

 

Solving the system (?)-(?), we obtain a = 3, b = 2. Hence r(l) = 3l+2. Therefore

f(A) = A5-3A4+A2+2I = r(A) = 3A+2I =

é
ê
ë

 

5

6

6

5

 

ù
ú
û

.

Example 2 Let

A =

é
ê
ë

 

1

-1

1

3

 

ù
ú
û

 

and f(l) = l7-4l5-10l3+78. Find f(A).

Answer:

Since A is a 2×2 matrix, the degree of r(l) is one, i.e.

r(l) = al+b.

The characteristic polynomial of A is D(l) = l2-4l+4, which has a root l = 2 of multiplicity 2. Hence, by formula (?):

 

r(2) = 2a+b = f(2) = -2

(10)

r¢(2) = a = f¢(2) = 8.

(11)

 

Solving the system (?)-(?), we obtain a = 8, b = -18. Hence r(l) = 5l-18. Therefore

f(A) = A7-4A5-10A3+78I = r(A) = 8A-18 I =

é
ê
ë

 

-10

-8

8

6

 

ù
ú
û

.

The above method can be applied not only to polynomials but also to the exponential functions.

Example 3 Find eA, where

A =

é
ê
ë

 

1

-1

1

3

 

ù
ú
û

.

Asnwer: We apply the above procedure to the function f(l) = el. The eigenvalue of A is l = 2 and has multiplicity 2. Hence the function r(l) = al+b must satisfy the following conditions:

 

f(2) = e2 = r(2) = 2a+b,

(12)

f¢(2) = e2 = r¢(2) = b.

(13)

 

Solving (?)-(?), we find a = 0, b = e2. Hence

eA = e2I =

é
ê
ë

 

e2

0

0

e2

 

ù
ú
û

.