2.1 Systems of Linear equations.
Consider a system of m linear equations with n unknowns
|
(1) |
Here, aij and bj, (i = 1,2,...,m; j = 1,2,...,n) are known scalars and x1,...,xn are unknowns. Some of the scalars aij and bj may be zero, but we assume that among a11, a21,...,am1 at least one is not zero (otherwise, if all ai1 = 0, i = 1,2..., m, then x1 would not be present in any of the equations so the system has only (n-1) unknown x2,...,xn. Analogously, for every j = 2,3,...,n, at least one element among a1j, a2j,...,am,j must be different from zero.
A solution to (1) is a set of n scalars x1,...,xn such that when substituted into (1) all the equations are satisfied. In this chapter we will use the matrix theory to consider the questions of existence of solutions and how to find them. We associate with system (1) a matrix A of order m×n and two column vectors x, b given by
|
Then, as we can see easily, system (1) can be rewritten in the following matrix form:
|
(2) |
Alternatively, if we denote by ak the k-column of A then system (1)-(2) can be rewritten in the following form:
|
(3) |
It may happen the the system (1)-(2) has no solutions, has exactly one solution, or has more than one solution. For example, the system
|
(4) |
has no solutions. The system
|
(5) |
has precisely one solution (x = 1, y = 0), and the system
|
(6) |
has infinitely many solutions (take any number z and put x = -3/2z, y = 1+1/2z).
The representation of system (1) in the form (3) immediately implies the following fact: if (1) has a solution, then the (column) vector b is a linear combination of the (column) vectors ak, k = 1,...,n.
A linear system is called consistent if it has at least one solution. Thus, system (3) is not consistent and systems (4)-(5) are consistent.
System (1)-(2) is called homogeneous if b = 0, i.e. b1 = b2 = ... = bm = 0. If b ¹ 0, i.e. at least one of bi is different from 0, then (1)-(2) is called nonhomogeneous. Since a homogeneous system always has a solution x = 0, it is consistent (the zero solution is called a trivial solution of the homogeneous system).
2.2 Substitution method.
Consider system (1):
|
(7) |
There is a very narural and straighforward method of solving the system (1). Let us first consider some examples.
Example 1: Consider the system
|
From the first equation we obtain x = 1-y-z. Substituting this value of x into the second and third equations we obtain 2-2y-2z-y+z = 2-3y-z = -1, or y = 1-1/3z, and -1+y+z+2y-z = -1+3y = 0, or y = 1/3. Thus, we obtain the following system
|
Now, substituting y = 1/3 back into the second equation we obtain z = 2, and substituting y = 1/3, z = 2 into the first equation we obtain x = -4/3.
Example 2: Consider the system
|
and apply the same method as in Example 1. From the first equation we have x = 1-y. Substituting this value of x into the second equation we have 1-y+y = 2, or 1 = 2, which is absurd. Thus, the system does not have solution (is inconsistent).
The above examples are illustrations of the method of substitution which consists in the following:
1. Since not all ai1 are equall zero, we can always rearrange the equations so that to assume that a11 ¹ 0. Take the first equation a11x1+a12x2+...+a1nxn = b1, solve for x1 in terms of x2,...,xn (namely, x1 = -a11-1(b1-a12x2-...-a1nxn)), and substitude the value of x1 into all other equations in the system. This way we obtain the first derived system of m-1 equations with n-1 unknowns x2,...,xn.
2. We repeat step 1 for the first derived system, as a result we obtain the second derived system.
3. We keep continuing the process until either we obtain a system in the following form which can be solved easily by back substitutions:
|
(8) |
(or until we obtain an obvious inconsistency which means that the system does not have a solution).
2.3 Gaussian elimination. System (1) of linear equations is completely described by the matrix A and the column vector b. Therefore, it is convenient to introduce the notation
|
which will be called the augmented matrix associated with system (1). Any matrix is an augmented matrix of some system of linear equations.
For instance, the matrix
|
is the augmented matrix of the system
|
The Gausian elimination method is in fact a version of the substitution method in which only the numbers aij, bj are present. Therefore, it is effective for computer implementation. First we make the following simple observations. We call two systems of linear equations containing the same unknown equivalent if they have the same solutions. We obtain an equivalent system if we do one or more of the following operations:
(1) Interchange any two equations in a system;
(2) Multiply an equation in a system by a nonzero scalar;
(3) Add to one equation a scalar times another equation.
In terms of the augmented matrices representing systems of linear equations the operations 1-3 above correspond to the following operations which are called elementary row transformations of matrices.
(i) Interchange any two rows in a matrix;
(ii) Multiply any row by a nonzero scalar;
(iii) Add to any row of a matrix a scalar times another row of that same matrix.
The Gaussian elimination method consists in applying the elementary row transformations (i)-(iii) to the augmented matrix of a system of linear equations in order to get a matrix which has a special row-reduced form. The special row-reduced form is suggested by the substitution method described in the previous section, namely, a matrix is said to be in a row-reduced form if it has the following properties:
(a) The first nonzero element of any row is 1;
(b) Elements below the first nonzero elements of any row are zeros;
(c) All zero rows (i.e. rows consisting of only zeros) are below nonzero rows.
We remark that the property (a) above is just a normalized condition which can be achieved by applying the elementary row transform (i) - multiplying by a nonzero scalar. The property (b) is the most important - it means that every next equation in the corresponding system contains less unknown than the preceding equations. The zero rows correspond to trivial equations (of the form 0x1+0x2+...+0xn = 0) which are satisfied trivially. Such equations can be ignored in solving the systems and property (c) means that we put them at the bottom to ignore them.
Below are some examples of row-reduced matrices:
|
(9) |
|
(10) |
|
(11) |
Systems of equations corresponding to the row reduced matrices (9), (10), (11) have exactly one solution, infinitely many solution, and no solutions, respectively.
Below we give an example of using the Gaussian elimination.
Example: Solve the system
|
Solution: The augmented matrix of the system is:
|
The elementary row transformations yield:
|
|
|
|
|
|
|
The last matrix above is the augmented matrix of the following system
|
which is solved easily and has solution z = 2, y = -1, x = 1.
2.4 Linear independent vectors.
We have introduced in Chapter 1 the space Rn of row vectors of dimension n. In this section, we introduce the notions of linearly dependent and independent vectors, rank of a set of vectors, and show how to apply these notions to matrices and systems of linear equations.
We denote elements of Rn by a, b,.... A vector b is called a linear combination of vectors a1,...,ak if there exist scalars a1,...,ak such that
|
We have already met with linear combinations earlier in this section. A set of vectors a1,...,ak is said to be linearly independent if from
|
it follows that a1 = a2 = ... = ak = 0.
A set of vectors a1,...,ak is said to be linearly dependent if it is not linearly independent. In other words, a1,...,ak are linearly dependent if there exist numbers a1,...,ak, not all equal zero, such that
|
The following simple facts follow immediately from the definitions:
1. Any set of vectors which contains the zero vector is linearly dependent.
2. A set consisting of a single vector a is linearly independent if and only if a ¹ 0.
3. If one of the vectors a1,...,ak is a linear combination of the others, then the set a1,...,ak is linearly dependent. Conversely, if the set
|
is linearly dependent, then there is a vector of this set which is a linear combination of the remaining vectors.
4. If a1,...,ak is a linearly dependent set then any set of vectors which contains a1,...,ak is also linearly dependent.
5. If a1,...,ak is a linearly independent set then any subset of this set also is linearly independent.
Given a set of vectors a1,...,ak, not all of which are equal to zero, one can always find a maximal linearly independent subset in the following way:
Start with a nonzero vector, say ak1. Consider another vector, say ak2. If it forms a linearly dependent set with ak1 then discard, otherwise join it to the subset we construct. Next, take another vector ak3. If it forms a linearly dependent set with the chosen vectors, then discard, otherwise join it to the subset we construct, and so on until we exhaust all vectors a1,...,ak. The constructed subset a k1,...,akr is a linearly independent by construction, and is maximal in the sense that if we add to it any other element of the set a1,...,ak, then we obtain a linearly dependent subset (this latter property also holds by construction).
Also remark that the above construction shows that if we are given a linearly independent subset a1,...,am of any set a1,...,ak, (m £ k), then we can complement the subset a1,...,am to a maximal linearly independent subset.
It is obvious that r = k if and only if the set a1,...,ak is linearly independent. In general we have r £ k.
The maximal linearly independent subset a k1,...,akr is not unique in the sense that it depends on the order of the elements a1,...,ak in which we make the construction. Thus, if we proceed in another order we might obtain another maximal linearly independent subset a i1,...,aip (where i1,...,ip are elements from the set {1,..., k}). We show that, however, any two maximal linearly independent subsets of the set a 1,...,ak must have the same number of elements.
Theorem 1 Let a1,...,ak be a set of vectors not all of which are zero, and assume that a k1,...,akr and ai1,...,aip are two maximal linearly independent subsets of a1,...,ak. Then p = r.
In the proof of Theorem 1 we use the following simple and useful facts.
Lemma 1 If a k1,...,akr is a
maximal linearly independent subset of the set a1,...,ak,
then every element
aj,j = 1,2,...,k, is a linear combination of a k1,...,akr.
Lemma 1 just follows immediately from the construction of maximally independent subsets.
Lemma 2 Let a1,...,am, (m > 1), be a set of vectors, and let b
and c be linear combinations of a1,...,am,
i.e.
|
|
such that c is not a linear combination of a1,...,am-1 (thus, bm ¹ 0). Then there exists a scalar l such that b-lc is a linear combination of a1,...,am-1.
Proof: Choose l = [(am)/( bm)] (which is possible because bm ¹ 0 by the assumption). Then we have
|
For a given set of vectors a1,...,am, we denote by L(a1,...,am) the set of all vectors which are linear combinations of a1,...,am, and call it the linear hull of a1,...,am. It is obvious from the definition that L(a1,...,am) contains, together with any elements, their linear combinations. This is the same as saying that L(a1,...,am) is a linear subspace (of Rn).
Lemma 3 For a set of m vectors a1,...,am, every subset of L(a1,...,am) which contains m+1 (or more) vectors is a linearly dependent set.
Proof: We prove Lemma 3 by the method of mathematical induction. see.The statement is obviously true if m = 1. Assume that it is true for m = k, we are going to show that it is true for m = k+1.
Let b1,...,bk+2 be (k+2) vectors from the linear hull L(a1,...,ak+1). If all b1,...,bk+2 belong to L(a1,...,ak), then the induction assumption implies that b1,...,bk+2 are linearly independent.
Therefore, we can assume that at least some vector among b1,...,bk+2 does not belong to L(a1,...,ak). For definitness, let bk+2 Ï L(a1,...,ak). Applying Lemma (2) to each of the pairs of vectors bi,bk+2, i = 1,2,...,k+1, we see that there exist li,i = 1,...,k+1, such that
|
By the induction assumption, the vectors bi-libk+2, i = 1,..., k+1, are linearly dependent, that is there exist bi, i = 1,...,k+1, not all equal to zero, such that
|
But the latter implies imediately that b1,...,bk+2 are linearly dependent.
Now the proof of Theorem 1 follows from Lemma (3) easily. In fact, if p > r, then the system a i1,...,aip would have to be linearly dependent, which is a contradiction.
The number r of elements in a maximal linearly independent subset of a given set a1,...,ak is called the rank of the system a1,...,ak, and denoted by rank( a1,...,ak).
The rank of the set a1,...,ak is called the dimension of the linear subspace L(a1,...,ak). In general, a subset L of Rn is called a linear subspace if
|
The reader is suggested to check that the same argument as in the proof of Theorem 1 shows that maximal linearly independent subsets of L have the same number of elements. Any such maximal linearly independent subset is called a basis of L, and the number of elements in a maximal linearly independent subset is called the dimension of L. Furthermore, a subset a1,...,ak of L is a basis if and only if every vector b in L has a unique decomposition as
|
In particular, L can be the whole space Rn. There is a basis of Rn, called the standard basis, which consists of the following vectors:
|
Every vector b = (b1,b2,...,bn) has a unique decomposition as a linear combination of e1,...,en, namely
|
Theorem 2 Rank of any set of vectors a1,...,ak does not change if we replace a vector in the set by a product of this vector with a nonzero number, or if we add to a vector in the set a scalar times another vector.
Proof: First we show that the sets
|
(12) |
and
|
(13) |
where a ¹ 0, have the same rank. Assume that a k1,...,akr is a maximal linearly independent subset of a1,...,ak. It is clear that if a1 is not contained in this subset, then a k1,...,akr also is a maximal linearly independent subset of (13). If a1 is contained in a k1,...,akr, then we form a new subset of (13) which contains all elements a k1,...,akr with a1 replaced by aa1. This correspondence shows that (12) and (13) have the same rank.
Next we show that the sets
|
(14) |
and
|
(15) |
where b1 = a1+aa2, a is an arbitrary scalar, have the same rank.
Let denote by r and s rank of the set (14) and (15), respectively. Assume that a k1,...,akr is a maximal linearly independent subset of a1,...,ak. It is clear that if a1 is not contained in this subset, then a k1,...,akr also is a maximal linearly independent subset of (15). If a1 is contained in a k1,...,akr, then we form a new subset of (15) which contains all elements a k1,...,akr with a1 replaced by b = a1+aa2. It is easy to see that the new subset of (15) consists of linearly independent vectors. Thus, r £ s. Changing the places of the sets (14) and (15) in this argument, we obtain s £ r. Therefore, r = s.
Theorem 3 If we add to a set of vectors a1,...,ak a vector ak+1, then both sets a1,...,ak and a1,...,ak, ak+1 have the same rank if and only if ak+1 is a linear combination of a1,...,ak,
Proof: Assume that a1,...,am is a maximal linearly independent subset of a1,...,ak, and ak+1 is a linear combination of a1,...,ak. Then ak+1 is a linear combination of a1,...,am, hence a1,...,am is a maximal linearly independent subset of a1,...,ak, ak+1, too.
Conversely, if both sets a1,...,ak and a1,...,ak, ak+1 have the same rank, then a1,...,am must be also a maximal linearly independent subset of a1,...,ak, ak+1, which implies that ak+1 is a linear combination of a1,...,am.
Before applying the previous material to matrices and systems of linear equations we remark that, obviously, all definitions and facts that hold for row vectors also hold for column vectors.
Let A be a given matrix of order m×n. The row rank of A is, be definition, the rank of the row vectors of A. Theorem 2 implies that if a matrix is obtained from another matrix by elementary matrix transformations, then both have the same row rank. Ranks of matrices in row reduced forms are particularly easy to determine. In fact, due to the property (b))the nonzero row vectors of a row reduced matrix are linearly independent, and it is a maximal linearly independent subset because the remaining row vectors are zero. Therefore, rank of a row reduced matrix is equal to the number of nonzero rows. To determine rank of an arbitrary matrix it remains to reduce it to a row reduced form and count the number of nonzero rows.
Analogously one can define the notions of column reduced forms, elementary column transformations of matrices, and column rank.
Now let us apply all the above facts to systems (1)-(3) of linear equations. Recall that x1,...,xn is a solution of (1)-(2) if and only if
|
i.e. b is a linear combination of the column vectors of A. By Theorem 3, A and Ab must have the same rank. Thus, we have proved the following theorem.
Theorem 4 The system Ax = b has a solution if and only if A and Ab have the same rank.