Least Squares

Applications of Least Squares

One of the most important applications of the least squares is data fitting.

Suppose we are studying a certain phenomenon, which can be schematically represented as follows:

Pasted image 20231025050855.png

Let (t1,y1),,(tm,ym) be the collected data. Suppose that the theory underlying the phenomenon indicates that y=α+βt

Goal: To find α and β from the data (ti,yi)
Simple example: An object is moving with a constant speed β and at time t=0 its position was α. at time t its position is y=α+βt.

In an ideal world, where the theory is absolutely correct (i.e. exactly describes the phenomenon) and there are no measurement errors, yi=α+βti for all i=1,,m and just two data points tare enough to compute α and β.

In reality, even if the linear model is correct, the data looks like this:
Pasted image 20231025051653.png
(measurement errors are inevitable) there is no line that goes exactly through all data point

Goal: To find a line y=α+βt that "fits best" the measured data and use α and β as the estimates for α and β.

Question: What does it mean "fits best"?

There are several ways to define the best fit, here is one:
The difference between the observed value yi and the value predicted by the model is called the residual (also called prediction error in engineering fields): ei=yi(α+βt). Let e=(e1,,em)T be the vector of residuals. [e1em]=[y1ym][1t11tm][αβ]
We would want all residuals to be small. The overall measure of the fit is the Euclidean norm of e. Thus, we are looking for the coefficient vector x=(α,β)T that minimizes the Euclidean norm of the residual vector.

x=arg minxR2||Axy||2=||e||2

Geometrically:
Pasted image 20231025052539.png

But this is exactly the least squares solution to the system Ax=y.

Remark: In principle, we can minimize ||e||1=|e1|++|em| or ||e||α=max{|e1|,,|em|}, but these minimization problems are much harder, nonlinear, and to solve them, we need to use tools outside of linear algebra. As a result of simplicity, least squares is used in most applications.

Last time we established the following result: if kerA={0}x=(ATA)1ATy. In our case, A=[1t11tm] . Therefore, kerA={0} columns of A are linearly independent not all t; are equal (a very weak assumption). Basically, we need to measure y at least two distinct times (or two distinct points).

Under assumptions that not all ti are equal:

ATA=

This system of two equations is easy to solve:

α=y¯t¯β,β=ty¯t¯y¯t2¯(t¯)2=(tit¯)(yiy¯)(tit¯)2

Thus, the best -- in the least squares sense -- straight line that fits the given data (ty,yi),i=1,,m is y=α+βt.

Remark: Often problems that don't look like linear least squares problems can be converted to the least squares formulation by taking appropriate transformations of participating variables.

Let mi be the (measured) amount of radioactive material in a sample of an unknown isotope at time t; Data: (ti,mi). Theory m(t)=m0eβt, m0 initial mass, β<0 decay rate. Problem: find m0 and β. The model is not linear, but: logm(t)=logm0+βt. We can fit this to D=(ti,logmi).

Suppose the scatterplot of the data (ti,yi) looks like this:
Pasted image 20231025055222.png

We can fit a line to the data, but it does not really make sense. A parabola y=α+βt+γt2 seems to be a better model. In general, suppose we want to fit a polynomial of degree n to the data
y(t)=α0+α1t++αntn,αiR
The ith residual: ei=yiy(ti)=yi(α0+α1ti++αntin), i=1,,m.
[e1em]=[y1ym][1t1t1n1t2t2n1tmtmn][α0αn]
A is called a Vaudermoude matrix (French mathematician that did not introduce the Vandermonds matrix) AMm×(n+1).

Consider a special case: m=n+1 (# measurements = # coefficients) A is square and, if A is nonsingular, we can find x such that e=0. In other words, we can solve Ax=y exactly, i.e. find a polynomial that fits the data {(ti,yi)} exactly. This polynomial is called interpolating polynomial.

Lemma: If t1,,tn+1 are all distinct (titj) the (n+1)×(n+1) Vandermond matrix is nonsingular.

Remark: Textbook gives a proof based on an LU decomposition. But the statement is very intuitive if you think about it geometrically.