6.1 The inner product of real vectors.
By real vectors we mean vectors in Cn or Rn with real components. We fist introduce and study properties of the inner products of real vectors, and at the end of this chapter we shall discuss the case of complex vectors (i.e. vectors with complex components).
Actually, we have already introduced the notion of inner product in Chapter 1. Recall, that if x = (x1,...,xn) and y = (y1,...,yn) are two vectors with real components, then the inner product áx,yñ is defined by
|
It follows from the definition that the inner product satisfies the following properties:
(1) áx,xñ ³ 0, áx,xñ = 0 if and only if x = 0.
(2) áx,yñ = áy,xñ for all x, y.
(3) álx,yñ = láx,yñ, for all vectors x, y, and all scalars l.
(4) álx1+x2,yñ = álx1,yñ+álx2,yñ.
From (3) it follows that á0,yñ = 0. Properties (3)-(4) mean that if we fix a vector y and consider álx,yñ as a function of x, then is function is a linear function.
The proof of properties (1)-(4) is staightforward and is left to the student as excercises.
For a vector x we put
|
and call it the norm of x. The following inequality, known as Cauchy's inequality, plays an important role in spaces with inner product.
Theorem 1 For any two vectors x, y, we have
|
(1) |
Proof: First we remark that for every two real vectors x and y we have
|
(2) |
In fact, using the definition and properties (2)-(4), we have
|
From (3) it follows that for every real l we have
|
(4) |
We have in (4) a quadratic function of l, of the form al2+bl+c, with a = ||y||2, b = -2áx,yñ,c = ||x||2, which is always nonnegative. Hence the discriminant, b2-4ac = 4(áx,yñ)2-4||x||2||y||2, of this quadratic function must be nonpositive. Hence
|
which implies |áx,yñ| £ ||x||||y||.
The following theorem contains the main properties of the norm.
Theorem 2 We have
(i) ||x|| ³
0, ||x|| = 0 if and only if x = 0.
(ii) ||lx|| = |l|||x|| for any scalar l.
(iii) ||x+|| £ ||x||+||y| for all vectors x,y.
Proof: The proofs of (i) and (ii) are straighforward and left as an excercise. For a proof of (iii), we use the Cauchy's inequality. We have
|
from which (iii) follows immediately.
6.2 Orthonormal vectors.
Definition 1 Two vectors x and y are called orthogonal if their inner product is zero, i.e. if áx,yñ = 0.
It is obvious that the zero vector is orthogonal to any vectors. A set of vectors {x1, x2,...,xk} is called an orthogonal set if each vector in the set is orthogonal to every other vector in the set.
A vector x is called normalized if ||x|| = 1. An orthogonal set consisting of normalized vectors is called an orthonormal set. Given an orthogonal set of nonzero vectors {x1, x2,...,xk}, one can normalize it by dividing each xi by its norm.
It is convenient to introduce the following symbol dij, called Kronecker symbol, by
|
Thus, a set of vectors {x1, x2,...,xk} is orthonormal is and only if
|
Theorem 3 Every orthogonal set of nonzero vectors is linearly independent.
Proof: Let {x1, x2,...,xk} be a set of pairwise orthogonal nonzero vectors. We show that it is linearly independent. Assume that
|
where ci¢s are some scalars. We have to show that all ci¢s are zero. Using the properties (1)-(4) of the inner product, and the fact that áxi,x1ñ = 0 for all i = 2,3,..., we have
|
hence c1 = 0. Analogously, ci = 0 for all i = 2,3,...,k.
Theorem 4 If {x1, x2,...,xk} is a linearly independent set of vectors, then there is an orthonormal set {q1, q2,...,qk} such that qi is a linear combination of x1,...,xi.
Proof: Our proof is constructive in the sense that we give a method for construction of the vectors qi. The method consists in the following steps:
1. We put q1 = [(x1)/( ||x1||)], i.e. we normalize the vector x1.
2. We construct x¢2 = x2-r12q1, and choose r1,2 so that the obtained vector x¢2 is orthogonal to q1. Solving the equation
|
we find that such a r12 always exists. Indeed, r12 = áx2,q1ñ. The second vector q2 is obtained by normalizing x¢2, i.e.
|
In general, if the vectors q1,q2,...,qj-1 have been constructed so that they are pairwise orthogonal and have norm one, such that qi is a linear combination of x1,...,xi for all i = 1,2,...,j, then we construct x¢j in the form
|
(6) |
and choose the coefficients r1j, r2j,...,rj-1,j so that the obtained vector x¢j will be orthogonal to all q1,...,qj-1, i.e.
|
From (6) and the orthonormality of q1,...,qj-1 it follows that
|
(7) |
hence such rij's exist and equal
|
Since the vectors x1,..., xk are linearly independent, it follows that x¢j's are different from zero. So we can normalize it by putting
|
(8) |
We continue the process until all the vectors are exhausted. As a result, we obtain a new set of vectors {q1,q2,...,qk} which are pairwise orthogonal and such that qi is a linear combination of x1,...,xi, for all i = 1,2,...,k. Example 1: Orthonormalize the vectors:
|
Solution: We have ||x1|| = Ö5. Hence
|
and r11 = Ö5, r12 = áx2,q1ñ = [11/( Ö5)]. Next, we find
|
and ||x¢2|| = [Ö(4/5)]. Hence
|
So the orthonormal vectors obtained by orthonormalizing x1 and x2 are:
|
Example 2: Orthonormalize the vectors:
|
Solution: We have ||x1|| = Ö2, hence
|
and r12 = áx2,q1ñ = 1/Ö2, r13 = áx3, q1ñ = 0. Next, we find x¢2:
|
and ||x¢2|| = [Ö(3/2)]. Hence,
|
and r23 = áx3,q2ñ = 2/Ö6.
Finally, we find x¢3:
|
and ||x¢|| = 2/Ö3. Hence
|
Thus the orthonormal vectors obtained from x1, x2, x3 are:
|
The above process of orthonormalizing a given system of vectors is called the Gram-Schmidt orthornormalization.
6.3 The QR-decomposition. As a useful by-product of the Gram-Schmidt orthonormalization process we obtain a method of decomposition of a given matrix X into a product of a matrix Q and a matrix R with the following properties:
1. The columns vectors of Q form an orthonormal system.
2. R is a upper triangular matrix.
In fact, we have rij = áxj,qiñ, for i = 1,2,...,j-1. Let put
|
(9) |
Then formula (6) can be rewritten as
|
Let us denote by X = [xij]n×k and Q = [qij]n×k the matrices (of order n×k) whose columns are x1,...,xk and q1,...,qk, respectively, and let R = [rii]k×k denote the (square) upper triangular matrix of order k×k with the elements rij, i,j = 1,2,...k, i £ j. Thus the i-th component of the vector xj is the {i,j}- entry of the matrix X. Similarly, the i-th component of the vector qk is the {i,k}-entry of Q, hence formula (11) can be rewritten as
|
Now we can see easily that (12) has the following matrix form
|
(12) |
which is a decomposition of the matrix X as a product of two special matrices Q and R, Q has the property that its columns form an orthonormal set, and R is upper triangular.
Now it is apparent how to decompose a given matrix X of order n×k, whose columns are linearly independent vectors, into a product of Q and R with the above properties. One needs to use the decribed Gram-Schmidt orthogonalization process to construct the corresponding orthonormal system q1,...,qk, and simultaneously the matrix R whose elements are given by formulas (8), (9),(10).
Examples: Find the QR-decompossition of the matrix
|
Solution: Consider two column vectors
|
We have found
|
r11 = ||x¢1|| = Ö5, r12 = áx2,q1ñ = [11/( Ö5)] and and r22 = ||x¢2|| = 2/Ö5. Therefore,
|
Example 2: Find the QR-decomposition of the matrix
|
Solution: Consider three column vectors
|
Using the Gram-Schmidt process, we have found
|
and r11 = Ö2, r12 = 1/Ö2, r13 = 0,
r22 = [Ö(3/2)], r23 = 2/Ö6, and r33 = ||x¢3|| = 2/Ö3. Therefore,
|
We remark that it is possible consider RQ-decomposition of a matrix a a product of R and Q, where R is lower triangular and Q is such that its row vectors form an orthonormal set.
6.4 The method of least squares.
Let us return to the system of linear equations written in the matrix form
|
(13) |
We know that, generally speaking, the system may be inconsistent, i.e. has no solution. It is clear that a vector x is a solution of the system (14) if and only if
|
(14) |
and, since
|
for all y Î Cn, the vector x is a solution of a minimization problem:
|
(15) |
Therefore, in the general case when the (exact) solution to the system (14) may exist or not, we may pose the problem of finding the solution to the minimization problem formulated precisely by (16). Any such a solution is called a least square solution.
The notion of inner products leads to a simple and useful method of finding least square solutions. We start with the following theorem.
Theorem 5 If x is such that the vector Ax-b is orthogonal to all column vectors of A, then x is a solution to the minimization problem (6).
Proof: First we remark that if a vector x0 is orthogonal to all columns ai, i = 1,2,...,n, of A, then x0 is orthogonal to Ay for all y Î Cn. In fact, we have ai = Aei,i = 1,2,...,n, where ei is vectors of the standard basis of Cn. So we have that x0 is orthogonal to Aei, for all i. This implies that x0 is orthogonal to Ay, for all y which are linear combinations of ei, i.e. x0 is orthogonal to Ay for all y Î Cn.
Assume that x is such that the vector Ax-b is orthogonal to all column vectors of A. We need to show that the function ||Ax-b|| attains its minimum at x, i.e.,
|
for all x0 Î Cn.
Using the remark above, we have
|
which is what is required to prove.
Theorem ? also suggests a method for finding the solution to problem (?). First, we need the following simple useful fact.
Lemma 1 For any vectors x,y Î Cn and any matrix A, we
have
|
(17) |
Proof: Let A = [aij]n×n, and
|
Then the i-th component of the vector Ay is
|
hence
|
(18) |
Similarly, if we denote by atij the elements of At, then the j-th component of the vector Atx is
|
hence
|
(19) |
Comparing (?) and (?)< we obtain (?).
Theorem ? has reduce the minimization problem ? to the problem of finding x such that
|
(20) |
According to Lemma (?), equation (?) is equivalent to
|
(21) |
which is, in turn, equivalent to
|
(22) |
The matrix AtA is a square matrix, so that the system ? has as many equations as the number of unknowns.
It has a unique solution whenever the colums of A are linearly independent, and the solutions can be found by any of the considered methods.
Example 1 Find a least square solution to the following system of
linear equations:
|
(23) |
Solution: The matrix A is
|
hence we have
|
Therefore, the system (?) has form:
|
(27) |
Solving the last system we obtain x = 1, y = 0.