One of the most important applications of the least squares is data fitting.
Suppose we are studying a certain phenomenon, which can be schematically represented as follows:
Let be the collected data. Suppose that the theory underlying the phenomenon indicates that
Goal: To find and from the data
Simple example: An object is moving with a constant speed and at time its position was . at time its position is .
In an ideal world, where the theory is absolutely correct (i.e. exactly describes the phenomenon) and there are no measurement errors, for all and just two data points tare enough to compute and .
In reality, even if the linear model is correct, the data looks like this:
(measurement errors are inevitable) there is no line that goes exactly through all data point
Goal: To find a line that "fits best" the measured data and use and as the estimates for and .
Question: What does it mean "fits best"?
There are several ways to define the best fit, here is one:
The difference between the observed value and the value predicted by the model is called the residual (also called prediction error in engineering fields): . Let be the vector of residuals.
We would want all residuals to be small. The overall measure of the fit is the Euclidean norm of . Thus, we are looking for the coefficient vector that minimizes the Euclidean norm of the residual vector.
Geometrically:
But this is exactly the least squares solution to the system .
Remark: In principle, we can minimize or , but these minimization problems are much harder, nonlinear, and to solve them, we need to use tools outside of linear algebra. As a result of simplicity, least squares is used in most applications.
Last time we established the following result: if . In our case, . Therefore, columns of are linearly independent not all ; are equal (a very weak assumption). Basically, we need to measure at least two distinct times (or two distinct points).
Under assumptions that not all are equal:
This system of two equations is easy to solve:
Thus, the best -- in the least squares sense -- straight line that fits the given data is .
Remark: Often problems that don't look like linear least squares problems can be converted to the least squares formulation by taking appropriate transformations of participating variables.
Let be the (measured) amount of radioactive material in a sample of an unknown isotope at time ; Data: . Theory , initial mass, decay rate. Problem: find and . The model is not linear, but: . We can fit this to .
Suppose the scatterplot of the data looks like this:
We can fit a line to the data, but it does not really make sense. A parabola seems to be a better model. In general, suppose we want to fit a polynomial of degree to the data
The th residual: , . is called a Vaudermoude matrix (French mathematician that did not introduce the Vandermonds matrix) .
Consider a special case: (# measurements = # coefficients) is square and, if is nonsingular, we can find such that . In other words, we can solve exactly, i.e. find a polynomial that fits the data exactly. This polynomial is called interpolating polynomial.
Lemma: If are all distinct the Vandermond matrix is nonsingular.
Remark: Textbook gives a proof based on an LU decomposition. But the statement is very intuitive if you think about it geometrically.