Matrices

One of the most fundamental objects in linear algebra (in fact, in all applied mathematics) is a matrix, and the most basic core problem is solving systems of linear equations. So, we start with a brief review of matrices and methods for solving systems of linear equations.

Definition

A matrix is a rectangular array of numbers:

A = [\begin{matrix} a_{11} & \dots & a_{1 n} \\ ⋮ & ⋱ & ⋮ \\ a_{m 1} & \dots & a_{m n} \end{matrix}]

Here, $m$ is the number of rows in $A$ and $n$ is the number of columns. The matrix $A$ is square if $m = n$ .

Matrix Arithmetic

Matrix Addition:

$C = A + B$ $c_{i j} = a_{i j} + b_{i j}$ , $A, B, C \in M_{m \times n}$
It has usual properties:

$A + B = B + A$ (Commutative)
$A + (B + C) = (A + B) + C$ (Associative)

Scalar Multiplication:

$B = α \cdot A$ $b_{i j} = α a_{i j}$ $α \in R$

Matrix Multiplication:

$C = A \cdot B$ $c_{i j} = \sum_{k = 1}^{n} a_{i k} b_{k j}$ $A \in M_{m \times n}$ $B \in M_{n \times k}$ $C \in M_{m \times k}$
All usual properties are there such as associativity and distributivity except $A B \neq B A$ .

Linear Equations

The above defined matrix multiplication immediately allows to rewrite a general linear system of $m$ equations with $n$ unknowns:

{\begin{cases} a_{11} x_{1} + a_{12} x_{2} + \dots + a_{1 n} x_{n} = b_{1} \\ ⋮ \\ a_{m 1} x_{1} + a_{m 2} x_{2} + \dots + a_{m n} x_{n} = b_{m} \end{cases}

in a compact matrix form: $A x = b$ where $A = [a_{i j}]$ , $b [\begin{matrix} b_{1} \\ ⋮ \\ b_{m} \end{matrix}]$ .

Curve Fitting:

Suppose we have a set of data points: $(x_{1}, y_{1}), \dots, (x_{n}, y_{n})$ where $(x_{i}, y_{i})$ are measurements in a certain experiment. Suppose we want to fit a polynomial to the data, e.g. find a polynomial $y = p (x)$ that passes through these points. Given $n$ points, it is enough to consider polynomials of degree $\leq n - 1$ . Let $p (x) = a_{0} + a_{1} x + a_{2} x^{2} + \dots + a_{n - 1} x^{n - 1}$ , where $a_{i} \in R$ . So the problem is to find the coefficients $a_{0}, \dots, a_{n - 1}$ such that:

{\begin{cases} p (x_{1}) = a_{0} + a_{1} x_{1} + a_{2} x_{1}^{2} + \dots + a_{n - 1} x_{1}^{n - 1} = y_{1} \\ ⋮ \\ p (x_{n}) = a_{0} + a_{1} x_{n} + a_{2} x_{n}^{2} + \dots + a_{n - 1} x_{n}^{n - 1} = y_{n} \end{cases} ⟺ [\begin{matrix} 1 & x_{1} & \dots & x_{1}^{n - 1} \\ 1 & x_{2} & \dots & x_{2}^{n - 1} \\ ⋮ & ⋮ & ⋮ \\ 1 & x_{n} & \dots & x_{n}^{n - 1} \end{matrix}] [\begin{matrix} a_{0} \\ a_{1} \\ ⋮ \\ a_{n - 1} \end{matrix}] = [\begin{matrix} y_{1} \\ y_{2} \\ ⋮ \\ y_{n} \end{matrix}]

The more data points we have, the larger the system.

Solving Differential Equations:

Very few differential equations can be solved analytically (intuitive reason: very few functions are analytically integrable). In most applications, numerical solutions are required. Consider the following equation (Poisson eq.):

Related to Poisson's Ratio.
$- \frac{d^{2} u}{d x^{2}} = f (x)$
With boundary conditions:
$u (0) = 0 = u (1)$

This describes many simple physical phenomena:

Temperature distribution ( $u (x)$ ) in a bar with a heat source $f (x)$
Deformation of an elastic bar
Deformation of a string under tension

In applications, the "source term" may not be even known in a closed form. We may just be able to measure $f$ at any point $x$ . To solve this problem we need to discretize it.

Let us subdivide interval $[0, 1]$ into $(n + 1)$ equal subintervals:
$x_{i} = i h$ $i = 0, \dots, n + 1$ , $h = \frac{1}{n + 1} << 1$
Let $u_{i} = u (x_{i})$ . From the boundary condition, we know that $u_{0} = 0$ , $u_{n + 1} = 0$ . If we find $u_{1}, \dots, u_{n}$ this will give us an approximation of $u (x)$ . The first step is to approximate $\frac{d^{2} u}{d x^{2}}$ . Assuming that $h$ is small ( $n$ is large) which can be obtained as the average of two more direct approximations: $\frac{u (x + h) - u (x)}{h}$ and $\frac{u (x) - u (x - h)}{h}$ :

\frac{d u}{d x} \approx \frac{u (x + h) - u (x - h)}{2 h}

\frac{d^{2} u}{d x^{2}} \approx \frac{u^{'} (x + \frac{h}{2}) - u^{'} (x - \frac{h}{2})}{h} = \frac{\frac{u (x + h) - u (x)}{h} - \frac{u (x) - u (x - h)}{h}}{h} = \frac{u (x + h) - 2 u (x) + u (x - h)}{h^{2}}

Therefore, $- \frac{d^{2} u}{d x^{2}} = f (x)$ leads to the difference equation: $- u_{i + 1} + 2 u_{i} - u_{i - 1} = h^{2} f_{i}$ where $f_{i} = f (x_{i})$ , $i = 1, \dots, n$ .
This system of difference equations can be written in the matrix form:

\begin{bmatrix} 	2 & -1 & 0 & \cdots & 0 \\ 	-1 & 2 & -1  \\ 	0 & -1 & 2 & \ddots  & 0\\ 	\vdots & & \ddots & \ddots & -1 \\ 	0 & \cdots & 0 & -1 & 2 	\end{bmatrix}\begin{bmatrix}u_1 \\\vdots\\ u_n\end{bmatrix}=h^2\begin{bmatrix}f_1\\\vdots\\ f_n\end{bmatrix}$$ To obtain an accurate approximation, the discretization step $h$ should be small $\Rightarrow n$ should be large.  Numerical schemes for PDEs arising in fluid and solid mechanics, weather prediction, image and video processing, molecular dynamics, chemical processes, etc., often require $n\sim10^6$ and more. The design of efficient numerical algorithms for solving large systems is an active area of research.  ## Square Matrices ### Inverses: The inverse of a matrix is an analog of the reciprocal of a number.  > [!Definition] > Let $A\in M_{n\times n}$. The inverse of $A$, denoted $A^{-1}$, is an $n\times n$ matrix that satisfies: $AA^{-1}=A^{-1}A=I_n$ where $I_n=\begin{bmatrix}1 & & 0 \\ & \ddots \\ 0 & & 1\end{bmatrix}$ is the $n \times n$ identity matrix.  Let us recall a few important properties of matrix inverses.  > [!info]- > If $A\in\mathbb{M}_{m\times n}$ we can define a right inverse of $A$ as a matrix $X\in\mathbb{M}_{n\times m}$ such that $AX=I_m$; or left inverse of $A$ as a matrix $Y\in\mathbb{M}_{n\times m}$ such that $YA=I_n$.  1) If $A^{-1}$ exists, then it is unique. 	Proof: Suppose $AX=XA=I$ and $AY=YA=I$. Then $X=XI=X(AY)=(XA)Y=Y$. 2) $(A^{-1})^{-1}=A$ 	Proof: All we need to check is that the definition is satisfied: $A^{-1}\cdot A=I_n$, similarly $A\cdot A^{-1}=I_n$. 3) If $A,B\in\mathbb{M}_{n\times n}$ are invertible $\Rightarrow (AB)^{-1}=B^{-1}A^{-1}$ 	Proof: Simple check. 		$(AB)\cdot(B^{-1}A^{-1})=A(BB^{-1})A^{-1}=AA^{-1}=I$ 		$(B^{-1}A^{-1})\cdot(AB)=B^{-1}(A^{-1}A)B=B^{-1}B=I$ 		From the theory of matrices, it follows that asking $AA^{-1}=I$ or $A^{-1}A=I$ in the definition is enough. 4) If $AX=I_n$, then $XA=I_n$ and thus $X=A^{-1}$.    If $XA=I_n$, then $AX=I_n$ and thus $X=A^{-1}$. 	Proof: 		Not all square matrices have the inverse (but most!). See below.  ### Singularity:  > [!Definition] > A noninvertible matrix $A$ is called singular. If $A^{-1}$ exists $\Rightarrow A$ is nonsingular.  The original motivation for introducing the matrix inverse is that it allows to write the solution of any linear system in a compact way: If $A$ is nonsingular, then the unique solution of $Ax=b$ is $x=A^{-1}b$. However, finding the inverse $A^{-1}$ (using e.g. the Gauss-Jordan method) is computationally inefficient as compared to directGaussian Elimination, which provides a systematic method for solving linear systems. Nevertheless, $A^{-1}$ is of great theoretical importance and provides insights into the design of practical algorithms.