3.1 Matrices as linear maps . Every matrix A of order m×n can be considered as a mapping from the space Cn (of column vectors of dimension n) to the space Cm in the following way:
If u Î Cn, then Au = v is a vector in Cm which is obtained as a result of the product of the matrix A and the ''matrix'' u. Since A and u have orders m×n and n×1, respectively, the product Au is defined and is a matrix of order m×1, i.e. a column vector in Cm. We denote the mapping defined in this way by the same symbol A, and it always will be clear from the context wheather we are talking about a matrix A or the mapping generated by A.
The mapping A defined above satisfies the following properties which follow immediately from properties (15)-(16)of Chapter 1:
|
(1) |
for all vectors u, v in Cn and scalars l. Property (1) is equivalent to
|
(2) |
Because of these properties (1)-(2) the mapping A is called a linear mapping, or a linear transformation, a linear operator, or simply an operator, from the (linear) space Cn to Cm. In the sequel we will just call A an operator.
If Au = v, then v is called the image of u under the mapping A. The set of all image vectors under A is called the range of A, and denoted by R(A).
From the linearlity of A, (1), it follows that R(A) is a linear subset of Cm, i.e. R(A) contains, together with vectors u1,u2,...,uk, all their linear combinations. Linear subsets are called linear subspaces, so that R(A) is a linear subspace of Cm. It may happen that R(A) = Cm (such as for A = I), or R(A) = {0} (such as for A = 0) (where by {0} we denote the subspace consisting of only the zero vector). But in general R(A) is a subspace of Cm which is different from both Cm and {0}. The operator A is said to be onto, or surjective, if its range, R(A), coincides with the whole Cm.
We denote by N(A) the subset of all vectors u Î Cn such that Au = 0. The linearity property of A also implies immediately that N(A) is a linear subset, i.e. a subspace, of Cn. It is called the null-space, or the kernel, of the operator A. It may happen that N(A) = Cn (example: A = 0) or N(A) = {0} (example: A = I), but in general N(A) is a subspace of Cn which is different from both Cn and {0}. It follows immediately from the definitions and the linearity property (1) that A is one-to-one if and only if N(A) = {0}. We will also speak about the range or the kernel of a matrix A as of the corresponding operator associated with A in the above manner.
Let ak, k = 1,2,..,n denote the k-th column of the matrix A, i.e.
|
and let ek, k = 1,2,...,n, be vectors in Cn of the form
|
Then it is easy to see that ak = Aek. Since ek, k = 1,2,...,n, form a basis of Cn, i.e. the linear hull of ek, k = 1,2,...,n coincides with Cn, it follows that R(A) = L(a1,...,an) and that r(A) is equal to the dimension of R(A).
The correspondence between linear operators from Cn to Cm is one-to-one, in the sense that any linear operator f from Cn to Cm is generated by a matrix A in the above manner. Namely, if we take a matrix A whose colums ak sastisfies:
|
then A = f.
Theorem 1 If A is a one-to-one operator from Cn to Cm, then n £ m, and rank(A) = n.
Proof: Take any basis of Cn, for instance, the standard basis e1,...,en. Let
|
Since A is one-to-one, it follows that vk, k = 1,2,..., are linearly independent vectors in Cm. Indeed, if
|
then
|
so that
|
(because A is one-to-one), which implies a1 = a2 = ... = an = 0, since e1,...,en are linearly independent.
Thus, we have found n linearly independent vectors in Cm, which is possible only if m ³ n.
Since Aek, k = 1,2,...,n are linearly independent, we have n £ ran(A). On the other hand, the column rank of A, which is the same as rank(A), is always £ n - the number of columns, we have ran(A) = n.
The following theorem is in a sense '' dual'' to Theorem (1).
Theorem 2 If A is an ''onto'' operator from Cn to Cm, then n ³ m and rank(A) = m.
Proof: Let e1, e2,...,em be the standard basis of Cm. Since A is ''onto'', there exist u1,...,um in Cn such that Auk = ek,k = 1,2,...,m. The vectors uk, k = 1,2,...,m, are linearly independent. Indeed, if
|
then
|
which implies that a1 = a2 = ... = am = 0, since e1,e2,...,em are linearly independent.
Thus, we have found m linearly independent vectors in Cn, so that n ³ m.
Th proof that rank(A) = m is analogous to the proof of the corresponding statement in Theorem 1 and is left as an excercise.
Therems (1) and (2) combined show that an invertible operator exists from Cn to Cm if and only if m = n, i.e. the matrices corresponding to invertible operators are square matrices.
Theorem 3 A square matrix A is invertible if and only if it is silmultaneously one-to-one and onto.
Actually, a stronger statement than theorem 3 holds. Namely, only one of the conditions above above (i.e. ''onto'' or ''one-to-one'') is enough for A to be an invertible matrix. We shall prove this stronger fact later but we also prefer to give a simple direct proof of Theorem 3.
Proof: Assume that A is invertible, i.e. there exists B such that AB = BA = I. If Au = 0, then, multiplying both parts by B from the left, we have, on one hand, BAu = B0 = 0, and on the other hand, BAu = Iu = u. Thus, u = 0. This shows that A is one-to-one.
If v is an arbitrary vector in Cn, we put u = Bv. Then, Au = ABv = Iv = v. Hence, v Î R(A), so that R(A) = Cn, or A is onto.
Conversely, assume that A is one-to-one and onto. Since A is onto, for every v Î Cn there exists an u in Cn such that Au = v. Since A is one-to-one, the element u is unique. Therefore, we can define a mapping B from Cn to Cn by putting
|
It is clear that the mapping B is the inverse mapping of A, i.e. ABu = BAu = u for all u Î Cn.
Since if Au1 = v1, Au2 = v2, then
|
it follows that
|
i.e. B is a linear operator. Thus, A is invertibe and A-1 = B.
Theorem 4 Let A be a square matrix of order n×n, such that ran(A) = n. Then A is invertible.
Proof:According to Theorem 3, one only needs to show that A is one-to-one.
Assume, on the contrary, that A is not one-to-one. Then there exists a nonzero vector x1 in Cn, such that Ax1 = 0.
The vector x1 is linearly independent by itself, and can be complemented by other vectors x2,..., xn to a maximal linearly independent subset in Cn. Theorem 5 Let A be a square matrix of order n×n, such that N(A) = {0}. Then A is invertible.
Consider a systems of linear equations
|
(4) |
and its matrix form
|
(5) |
Together with (5) consider also the homogeneous equation
|
(6) |
Note that the homogeneous equation (6) always has a solution x = 0. Such a solution is called a trivial solution. A vector x is a solution of the homogeneous equation (5) if and only if x Î N(A). Thus, equation (6) has a nontrivial solution if and only if N(A) ¹ {0}. For general solutions of the nonhomogeneous equation (5) we have the following theorem the proof of which is straightforward.
Theorem 6 Let x0 be a particular solution of Eq.(5). Then for every x Î N(A), x0+x is a solution of Eq.(5). Moreover, every solution y of Eq.(5) can be written as y = x0+x, where x is some solution of the homogenepus equation (6).
It is clear that the system (4)
(i) has at least one solution if and only if b Î R(A);
(ii) has at most one solution if and only if N(A = {0}; and
(iii) has precisely one solution if both (i) and (ii) hold. Thus, we have the following theorem.
Theorem 7 In order for Eq. (4) to have a unique solution for every b Î Cm, it is necessary and sufficient that the operator A is invertible (in particular, m = n).
If the inverse matrix A-1 is known, then the unique solution og Eq.(4) is obtained by the simple formula
|
Thus, it is an important question how to find the inverse matrices when they exist. In the next section, we will give a practical method of finding inverse matrices. The formula for the inverse matrices will be given in the next chapter after we introduce the notion of determinant of a matrix.
3.2 Finding the inverse. 3.2.1 Test for the invertibility. Let A be a given square matrix of order n×n. According to Theorem 4, A is invertible if and only if rank(A) = n. In Chapter 2 we have shown that, in order to find rank of A it is enough to reduce A to a row reduced form and count the number of nonzero rows. Since A is a square matrix, its row reduced form is an upper triangular matrix of order n×n whose diagonal elements are 1 or 0. Thus, if some element on the diagonal of the row reduced form of A is zero, then the n-th row of A is a zero row, which implies that rank (A) < n. If none of the elements on the diagonal is zero, then none of the rows are zero row, which implies that rank(A) = n. Thefore, we have the fol
lowing test for the invertibility of a matrix A:
A square matrix A is invertible if and only if its row reduced form has the diagonal consisting of only the unit.
3.2.2 Elementary matrices. Recall that there are three types of elementary row transformations that are used in reduction of a matrix to a row reduced forms: (i) interchanging any two rows, (ii) multiplying a row by a nonzero scalar, and (iii) adding to a row a scalar times another row.
An elementary matrix of order n×n is, by definition, a square matrix of order n×n which is obtained from In by applying one of the elementary row transformations listed above.
Thus, there are three types of elementary matrices. The first type consists of matrices which are obtained from In by replacing any two rows. We denote elementary matrices of this type by E(1).
The second type consists of matrices which are obtained from In by replacing a row by its product with a nonzero scalar. We denote elementary matrices of this type by E(2).
The third type consists of matrices which are obtained from In by adding to a row a scalar times another row. We denote elementary matrices of this type by E(3).
Here are examples of elementary matrices of type 1, 2, and 3, respectively.
|
|
|
One can check directly, that
|
|
|
3.2.3 Constructing the inverse matrix. Assume that A is an invertible matrix. Let us denote the row reduced form of A by Ar = [ai,j(r)], and the elementary matrices corresponding to the elementary row transformations that reduce A to Ar by E1,..., Ek, then we have
|
(7) |
As we have seen, the row reduced matrix Ar is a diagonal matrix whose diagonal consists of units. By applying elementary row transformations, it is possible to reduce Ar further to the identity matrix. Indeed, because the an,n(r) = 1, one can apply an elementary row transformation to reduce all elements above an,n(r) to zero. Then, repeating the argument, one can reduce all elements above an-1,n-1(r) to zero, and so on, until we obtain the identity matrix.
Therefore, there are elementary matrices Ek+1,...,Ej such that
|
(8) |
From (7) and (8) we have
|
(9) |
which implies that
|
(10) |
If we rewrite formula (10) in the obviously equivalent form
|
then we immediately obtain the following method for practical calculations of inverse matrices:
We write two matrices, side by side: the first one is the given matrix A, and the second one is the identity matrix.
Use the elementary row transformations to reduce A, first to a row reduced form, and then to the identity matrix. Each time, when we applied an elementary row transformation to A, we apply the same transformation to the second matrix. The final matrix at the second place will be the inverse matrix to A.
3.3 LU-Decomposition. The above method of calculating the inverse matrix also suggests a method of a possible factorization of a matrix in to a product of a lower triangular matrix with an upper triangular matrix - a LU-decomposition.
Namely, we consider only elementary matrices of type E(3) which correspond to ''adding to a row a scalar times another row''. Moreover, we restrict ourselves to lower triangular matrices of type E(3) which correspond to ''adding to a row a scalar times one of the preceding rows''. Below we describe the process of reducing a given matrix A to an upper diagonal form using the above lower triangular elementary matrices.
The first requirement is that a1,1 ¹ 0. Then there is a lower triangular elementary matrix E1 such that for the matrix
|
we have a(1)2,1 = 0. Next, we choose a lower triangular elementary matrix E2 such that for
|
we have a(2)3,1 = 0, and so on, until we have transformed all elements below a1,1 to zero.
If the {2,2}-entry of the resulted matrix is zero, then we stop the process, i.e. the LU-decomposition method is not applicable, otherwise we repeat the same process as described above to transform all elements below the {2,2}-entry in to zero. As
suming that we never get a zero element on the diagonal, then there exist lower triangular elementary matrices E1,...., Ej such that
|
is an upper triangular matrix, which we denote by Au. Thus,
|
(11) |
From (11) it follows that
|
(12) |
Since a product of lower triangular matrices is a lower triangular matrix, and the inverse of a lower triangular matrix is a lower triangular matrix, the matrix
|
is lower triangular. We denote it by Al. Thus, (12) implies
|
which is a decomposition of A into a product of a lower triangular matrix and an upper triangular matrix. This is called an LU-decomposition of A.
The LU-decomposition is not unique, in general. Indeed, if D is an arbitrary invertible diagonal matrix, then AlD is lower triangular and D-1Au is upper triangular, and
|
gives another LU-decomposition. However, if we require a normalizing condition that all diagonal elements of Al be unity, then one can show that the decomposition, if exists, is unique.