Chapter 1: Main Definitions

1.1  Definition of matrix.

A matrix is a rectangular array of elements arranged in rows and columns. The elements can be numbers, functions, other matrices matrices or objects.

Example 1 The following are matrices

 

 

é
ê
ê
ê
ê
ë

 

1

3

6

-1

2

0

7

8

5

 

ù
ú
ú
ú
ú
û

 

(1)

 

é
ê
ë

 

4

5

6

7

3

0

0

0

 

ù
ú
û

 

(2)

 

é
ê
ë

 

x

sinx

cosx

x2

lnx

p

 

ù
ú
û

 

(3)

 

A general notation of a matrix  A  is

A =

é
ê
ê
ê
ê
ê
ê
ê
ë

 

a11

a12

a13

¼

a1n

a21

a22

a23

¼

a2n

a31

a32

a33

¼

a3n

¼

¼

¼

¼

¼

am1

am2

am3

¼

amn

 

ù
ú
ú
ú
ú
ú
ú
ú
û

 

(4)

Here   m  is the number of rows, and  n  is the number of columns of the matrix   A. We say that  A  is a matrix of order m×n. Note that the first letter in  m×n  denotes the number of rows, and the second letter denotes the number of columns. Thus, a matrix of order   3×4  has  3  rows and 4  columns. Matrices in the examples (1)-(3) above have orders 3×3, 2×4 and 2×3, respectively.

The matrix  A  can also be denoted by   A = [aij]m×n (or, sometimes by  A = [aij]1m,n, or simply by  A = [aij] if it is clear from the context which order A has. For a fix  i and j, the element aij   is the element of the matrix  A  which is on the i-th  row and j-th column, such an element is called the {i,j}-entry of A. Thus, for instance, a23 is the element on the second row and third column. Sometimes, the {i,j}-entry of the matrix A is denoted by (A)i,j. The i-th row of A is the row

[ai1, ai2,...,ain],

while the i-th column is the column

 

é
ê
ê
ê
ê
ê
ë

 

a1i

a2i

:

ami

 

ù
ú
ú
ú
ú
ú
û

.

A matrix of order  n×n  is called a square matrix. Elements  aii  of a square matrix are called diagonal elements since they are on the (main) diagonal of the matrix (the line from  a11  to  ann). Diagonal elements can also be defined for non-square rectangular matrices, formally by the same definition, but they do not belong to the ''diagonal'', are of less use, and therefore we prefer not to use this term for non-square matrices.

So far, a matrix is just a symbol, a rectangular array of elements arranged in rows and columns. To give to matrices a ''mathematical'' life, the elements of the matrices must have some mathematical sense themselves. Therefore, we assume from now on, that elements of the considered matrices are either real or complex numbers. These matrices are called real and complex, respectively. (For those who are still not familiar with complex numbers: the main concepts that define and study in this book can be formulated and proved for real matrices as well as for complex matrices. However, there are some results that are true for complex matrices but are not true for real matrices, we will meet with such results later. For a full understanding of the matrix theory as presented in this book the reader should know complex numbers. All necessary information on complex numbers are given in Appendix A). In the following sections, we introcduce mathematical life to matrices.

1.2    Addition and multiplication by numbers.

Two matrices  A  and  B  are equal if they have the same order and if every {i,j}-entry of the matrix A is equal to the corresponding {i,j}-entry of the matrix B. In other words:

A = B

def
Û
 

  aij = bij  "  i = 1,2,...,m;j = 1,2,...,n.

If  A = [aij]  is an arbitrary matrix and  l   is a (real or complex) number, then we can define a new matrix  lA, called the product of  l  and  A,  by

lA

def
=
 

[laij].

Example 2 For the matrix

A =

é
ê
ë

 

1

3

4

2

-2

0

 

ù
ú
û

 

and the number l = 3 we have

3A =

é
ê
ë

 

3

9

12

6

-6

0

 

ù
ú
û

 

The multiplication by numbers satisfies the following properties:

1A = A,    l(mA) = (lm)A.

(5)

If  A = [aij]  and  B = [bij]  are two matrices of the same order  m×n, then we can define the sum  A+ B  as a matrix whose elements are sums of the corresponding elements of  A  and   B, i.e.

A+B

def
=
 

[aij+bij]m×n.

Example 3 For the matrices

A =

é
ê
ë

 

1

3

4

2

-2

0

 

ù
ú
û

  and  B =

é
ê
ë

 

-1

3

4

1

0

2

 

ù
ú
û

 

we have

A+B =

é
ê
ë

 

0

6

8

3

-2

2

 

ù
ú
û

 

Note that sum is not defined for two matrices with different orders. Thus, if we want to form sums of matrices freely, we must fix an order, say  m×n, and consider only matrices of order  m×n. The family of matrices of order  m×n  usually is denoted by  M(m,n). Among matrices of order  m×n   (elements of  M(m,n)) there is a special one with all zero elements. We denote this matrix by   0m,n or simply by 0 if it is clear from the contex what order the matrix has. The matrix 0 has the following property:

0+A = A+0 = A,   l0 = 0A = 0

(6)

for any matrix  A (of the same order   m×n)  and any number  l.

We can also define sum of three or more matrices of the same order  m×n  in a natural way. It is clear that the algebraic operations we have just defined satisfy the following properties:

A+B = B+A    (commutativity of addition)

(7)

 

A+(B+C) = (A+B)+C   (associativity of addition)

(8)

 

l(A+B) = lA+lB,  (l+m)A = lA+mA   (distributivity)

(9)

for any matrices  A,  BC  (of the same order) and for any numbers  l  and  m. Subtraction of matrices  A - B  can be defined analogously, or via addition and multiplication by number by:

A - B

def
=
 

A+( - 1)B.

1.3 Matrix multiplication. So far we have defined addition of matrices of the same order and multiplication of a matrix by a number. It is natural to ask if it is possible also to define multiplication of a matrix by a matrix ? The first idea that might come to mind is to define multiplication of matrices of the same order componentwise, analogous to the addition, i.e.  to define

AB

def
=
 

[aijbij]m×n.

Of course we can define multiplication in this way, and such a multiplication may prove useful in some concrete problems. However, there is another, deeper, definition of multiplication of matrices that will play a cental role in the matrix theory and its applications, and we introduce this definition below.

In order to define the product   AB  of two matrices   A  and  B, the matrices must have agreeable orders. Namely, if the matrix   A  has order   m×n, then the matrix   B  must have an order   n×p. In other words,

the number of columns of the first matrix must be equal to the number of rows of the second matrix

In this case the product  C = AB  is, by the definition, a matrix  C = [cij]m×p  of order  m×p  whose elements  cij  are found by the following formula

cij =

n
å
k = 1 

aikbkj = ai1b1j+ai2b2j+...+ainbnj,  i = 1,2,...,m; j = 1,2,...,p.

(10)

Example 4 For the matrices

A =

é
ê
ë

 

1

3

4

2

-2

0

 

ù
ú
û

  and  B =

é
ê
ê
ê
ê
ë

 

-3

4

1

2

0

3

 

ù
ú
ú
ú
ú
û

 

we have

AB =

é
ê
ë

 

0

22

-8

4

 

ù
ú
û

 

Note that the order in the product AB is essential: if m ¹ p, then BA is not even defined. If m = p, then both product AB and BA and are square matrices of order m×m and n×n, respectively. Thus, they will be always different if m ¹ n. Finally, if A and B are both square matrices of the same order n×n, then AB and BA are both square matrices of the same order n×n.

Definition 1 Two square matrices A and B of the same order n×n are said to be commuting, if  AB = BA.

Example 5 The matrices

A =

é
ê
ê
ê
ê
ë

 

1

-1

0

-2

3

1

2

3

-1

 

ù
ú
ú
ú
ú
û

,   and  B =

é
ê
ê
ê
ê
ë

 

3

-4

-1

-6

14

2

-6

4

4

 

ù
ú
ú
ú
ú
û

 

are commuting because

AB = BA =

é
ê
ê
ê
ê
ë

 

9

-18

-3

-30

54

12

-6

30

0

 

ù
ú
ú
ú
ú
û

 

Example 6 The matrices

A =

é
ê
ë

 

1

-1

3

1

 

ù
ú
û

,   and  B =

é
ê
ë

 

3

-4

-6

1

 

ù
ú
û

 

are not commuting because

AB =

é
ê
ë

 

9

-5

3

-11

 

ù
ú
û

  and  BA =

é
ê
ë

 

-11

-7

-3

7

 

ù
ú
û

 

so AB ¹ BA.

For square matrices A, we can define

A2

def
=
 

AA,

and, more generally,

An

def
=
 

 



AA¼A


n   times 

.

It is instructive to consider a special case of products of matrices of order 1×n with matrices of order n×1. By the general definition (5) we have

[a1, a2,¼, an]

é
ê
ê
ê
ê
ê
ë

 

b1

b2

:

bn

 

ù
ú
ú
ú
ú
ú
û

=

n
å
k = 1 

akbk.

(11)

Thus, products of 1×n-matrices with n×1-matrices are just numbers. We also say in this case about a product of a row with a column. Now it is not difficult to remember the following rule of multiplication of matrices:

the product of a matrix A of order m×n with a matrix B of order n×p is a matrix C of order m×p whose {i,j}-entry is the product of the i-th row of A with the j-th column of B.

Thus, to multiply two matrices (having agreeable orders), one has to multiply the first row with the first column, the first row with the second column, and so on, .... The results are the first row of the resulted product. Then do the same with the second row, etc.

It is also possible to form product A1A2...Ak of k matrices A1, A2,...,Ak provided that the number of columns of Ai is equal to the number of rows of Ai+1, for all i = 1,...,k-1.

1.4 More on square matrices.

Square matrices form an important class of matrices. A square matrix A = [aij]n×n is called upper triangular, if aij = 0 for all i > j, i.e.  if all elements below the main diagonal are equal zero.

Example 7 The matrix

A =

é
ê
ê
ê
ê
ë

 

2

1

5

0

-1

4

0

0

3

 

ù
ú
ú
ú
ú
û

 

is upper triangular.

Analogously, if aij = 0 for all i < j, i.e. if all elements above the main diagonal are equal to zero, then the matrix is called lower triangular. A square matrix A = [aij]n×n is called a diagonal matrix if aij = 0 for all i ¹ j, i.e.  all nondiagonal elements are equal to zero. In other words, A is diagonal if and only if it is simultaneusly upper and lower triangular. Thus, diagonal matrices have the following form:

A =

é
ê
ê
ê
ê
ê
ê
ê
ë

 

l1

0

0

¼

0

0

l2

0

¼

0

0

0

l3

¼

0

¼

¼

¼

¼

¼

0

0

0

¼

ln

 

ù
ú
ú
ú
ú
ú
ú
ú
û

 

An identity matrix is a diagonal matrix having all the diagonal elements equal to 1. Thus, there is only one identity matrix of a fixed order n×n, which we denote by In, or simply by I if it is clear from the context which order I has. The identy matrix I has the following properties: for any square matrix A (of the same order) we have:

AI = IA = A.

(12)

We denote by Mn the class of all square matrices of order n×n, and we write

A Î Mn

if A is a square matrix of order n×n, and say that A is an element of Mn. As we have seen, one can form the algebraic operations of addition, multiplication by numbers, and multiplication inside the class Mn. These algebraic operations satisfy properties (5)-(9) as well as (12) and the following additional ones:

A(BC) = (AB)C

(13)

 

(A+B)C = AC+BC,   A(B+C) = AB+AC,

(14)

 

l(AB) = (lA)B = A(lB)

(15)

for all elements A, B, C of Mn and all numbers l. A square matrix A of order n×n is called invertible if there exists a matrix B such that AB = BA = I. Such a matrix B is neccessarily unique (why?) and is called the inverse of A. The inverse of A is denoted by A-1. We will show in a later chapter how to determine if a given matrix is invertible, and how to find its inverse. It is clear that the identity matrix I is invertible and I-1 = I. The operation of inversion has the following properties:

(A-1)-1 = A,   (AB)-1 = B-1A-1,

(16)

for arbitrary invertible matrices A and B. We denote by Gn the family of all invertible matrices of order n×n. We say that Gn is a subset of Mn and write

Gn Ì Mn.

The availability of the operations of matrix multiplication and inversion in Gn and relations (13)-(14) mean that Gn is a group (which is why the letter G was used). Since the multiplication in Gn is not commutative (except for n = 1), the group Gn is non-commutative, or nonabelian.

We have already defined An for positive intergers n and arbitrary square matrix A. If A is invertible, one can also define A-n by

A-n = (A-1)n.

By convention, we put A0 = I. The powers of A satisfy the same basic identity as powers of ordinary numbers, namely

An+m = AnAm.

(17)

1.5 Transpose matrices.

Let A = [aij]m×n be a matrix of order m×n. A matrix B = [bij]n×m of order n×m is called a transpose of A if bij = aji for all i = 1,2,...,n, j = 1,2,...,m. The transpose matrix of A is denoted by At. Thus, the i-th row of A is the i-th column of At.

Example 8 If

A =

é
ê
ê
ê
ê
ë

 

1

3

6

-1

2

0

7

8

5

 

ù
ú
ú
ú
ú
û

 

then

At =

é
ê
ê
ê
ê
ë

 

1

-1

7

3

2

8

6

0

5

 

ù
ú
ú
ú
ú
û

 

It can be seen easily that the transpose have the following properties.

(At)t = A,   (lA)t = lAt,

 

(A+B)t = At+Bt,   (AB)t = BtAt.

A square matrix A is called symmetric if A = At, i.e.  if aij = aji for all i,j = 1,2,...,n (elements of A are symmetric with respect to the main diagonal). If aij = -aji (i.e.  At = -A), then A is called skew-symmetric.

1.6 Submatrices and block matrices. Let A be a given matrix. If we remove some rows and some column from the matrix A then the remaining elements will form a matrix, which is called a submatrix of A.

Example 9 Let

A =

é
ê
ê
ê
ê
ê
ë

 

1

0

5

3

2

-1

8

7

3

0

5

10

-2

11

9

1

 

ù
ú
ú
ú
ú
ú
û

 

If we remove the second row and the third column, then we obtain the following submatrix

B =

é
ê
ê
ê
ê
ë

 

1

0

3

3

0

10

-2

11

1

 

ù
ú
ú
ú
ú
û

 

If we remove third row and don't remove any row then we obtain the following submatrix

C =

é
ê
ê
ê
ê
ë

 

1

0

5

3

2

-1

8

7

-2

11

9

1

 

ù
ú
ú
ú
ú
û

 

In particular, an element of A can also be considered as a submatrix (of order 1×1) of A : the {i,j}-entry is obtained if we remove all rows but the i-th row and all columns but the j-th column.

If we divide a matrix A by horizontal and vertical lines between the rows and the columns then we obtain a partition of A into submatrices. The horizontal lines divide A into several horizontal ''strips''. Each such strip is called a block row. Thus, a block row can consist of one or more rows. Analogously, the vertical lines divide A into block columns. The submatrices in the above partition are the intersections of the block rows and block columns. The matrix A now can be considered as a matrix of a smaller order elements of which are again matrices (the submatrices of A). Such a matrix whose elements are again matrices is called a block matrix.

Thus, we can denote a block matrix A by A = [Aij]m×n, where each Aij is a matrix. Note that Aij and Aik must have the same number of rows, while as Aij and Akj must have the same number of columns.

We can treat block matrices the same way as we do for ordinary matrices, i.e. we can define addition of block matrices having the same structure (the corrsponding elements must be matrices of same orders), multiplication by numbers, and multiplication of block matrices by block matrices with agreeable structures (so that all the products of the blocks must have sense).

A matrix A which has a block matrix form

A = [Aij]m×n such that Aij = 0 for all i ¹ j is called block diagonal.

Finally, for a matrix A

A =

é
ê
ê
ê
ê
ê
ê
ê
ë

 

a11

a12

¼

a1n

a21

a22

¼

a2n

a31

a32

¼

a3n

¼

¼

¼

¼

am1

am2

¼

amn

 

ù
ú
ú
ú
ú
ú
ú
ú
û

 

we denote by ak the k-th column of A, k = 1,2,...,n. Then we can write

A =

é
ë

 

a1

a2

...

an

 

ù
û

 

(18)

and, since columns are matrices (of order n×1), (18) is also a presentation of the matrix A in a block matrix form.

1.7 Vectors.

Matrices of order 1×n are called n-dimensional row vectors, or simply row vectors. Similarly, matrices of order n×1 are called (n-dimensional) column vectors. Thus, column vectors are transpose of row vectors (and vice versa). Since they are matrices, we have actually already defined additions of vectors of the same dimension and multiplication of vectors by numbers, and these operations of course satisfy the same properties as (5)-(9).

Example 10 If

a =

é
ê
ê
ê
ê
ë

 

1

3

-1

 

ù
ú
ú
ú
ú
û

  and  b =

é
ê
ê
ê
ê
ë

 

2

1

-1

 

ù
ú
ú
ú
ú
û

,

then

2a+3b =

é
ê
ê
ê
ê
ë

 

2+6

6+3

-2-3

 

ù
ú
ú
ú
ú
û

=

é
ê
ê
ê
ê
ë

 

8

9

-5

 

ù
ú
ú
ú
ú
û

.

We denote by Rn the set of all n-dimensional row vectors and by Cn the set of all n-dimensional column vectors (thus, Rn = M(1,n), Cn = M(n,1)). The availability of the operation of addition of elements of Rn and multiplication of elements of Rn by numbers which satisfy (5)-(9) means that Rn is a linear space (linear spaces are also called vector spaces). The same can be said about the set Cn of column vectors. In fact, the two spaces are the ''same'' - the only difference is the way we arrange the elements (in row or in column). By the same reason, the set M(m,n) of all matrices of order m×n also is a linear space. As linear spaces M(m,n) coincide with Rmn (or Cmn) - the space of nm-dimensional row (respectively, column) vectors. The difference is the way we arrange the elements - now in an array of rows and columns. Such a notation of nm-dimensional vectors in the matrix form, however, opens possiblilities for introducing new algebraic structures (such as multiplication, inversion, etc.) and have proved to be very useful for many applications.

< a name="4"> As we already know, n-dimensional row vectors cannot multiply each by others (except the trivial case n = 1). However, row vectors can be multiplied by column vectors, and the result will be a number (see (11). Thus, for any two row vectors y1, y2, we can define a new type of multiplication (denoted by < áy1,y2ñ) as

áy1,y2ñ = y1(y1)t.

This product, for real vectors, coincides with the so called scalar product.