LINEAR LEAST SQUARES

'Linear least squares' is a mathematical optimization technique to find an approximate solution for a system of linear equations that has no exact solution. This usually happens if the number of equations (''m'') is bigger than the number of variables (''n''). (See also linear regression.)

Contents
Definition
Computation
Applications
Limitations
Example
See also
External links

Definition


In mathematical terms, we want to find a solution hat{mathbf{x}} for the "equation"
:Ahat{mathbf{x}} pprox mathbf{b},
where ''A'' is a known ''m''-by-''n'' matrix (usually with ''m'' > ''n''), 'x' is an unknown ''n''-dimensional parameter vector, and 'b' is a known ''m''-dimensional measurement vector. More precisely, we want to minimize the Euclidean norm squared of the residual ''A'''x' − 'b', that is, the quantity
:|Amathbf{x}-mathbf{b}|^2 = left([Amathbf{x}]_1-mathbf{b}_1
ight)^2
+left([Amathbf{x}]_2-mathbf{b}_2
ight)^2
+dots+
left([Amathbf{x}]_m-mathbf{b}_m
ight)^2,

where [''A'''x']''i'' denotes the ''i''-th component of the vector ''A'''x'. Hence the name "least squares".
Using the fact that the squared norm of 'v' is 'v'T'v', where 'v'T stands for the transpose of 'v', rewrite the expression as
:(A old{x}- old{b})^T(A old{x}- old{b}) = (A old{x})^T (A old{x}) - old{b}^T A old{x} - (A old{x})^T old{b} + old{b}^T old{b}.
The two middle terms 'b'T(''A'''x') and (''A'''x')T'b' are equal and the minimum is found at the zero of the derivative with respect to 'x',
: rac{d}{dx} [(A old{x})^T (A old{x}) - 2 (A old{x})^T old{b} ] = 2 A^T A hat{old{x}} - 2 A^T old{b} = old{0} .
Therefore the minimizing vector hat{mathbf{x}} is a solution of the 'normal equation'
: A^T ! A hat{mathbf{x}} = A^T mathbf{b}.
Note that this corresponds to a system of linear equations. The matrix ''A''T''A'' on the left-hand side is an ''n''-by-''n'' square matrix, which is invertible if ''A'' has full column rank (that is, if the rank of ''A'' is ''n''). In that case, the solution of the system of linear equations is unique and given by
: hat{mathbf{x}} = (A^T!A)^{-1} A^T mathbf{b}.
The matrix (A^TA)^{-1}A^T is called the pseudoinverse of ''A''. We cannot use the true matrix inverse of ''A'' (that is, A^{-1}), because it does not exist as ''A'' is not a square matrix (''m'' ≠ ''n'').

Computation


If the matrix ''ATA'' has full rank and is well-conditioned, the normal equations can be solved directly by using the Cholesky decomposition ''ATA'' = ''RTR'', giving:
: R^T R hat{mathbf{x}} = A^T mathbf{b}
where ''R'' is an upper triangular matrix.
A slower but more numerically stable method, which still works if ''A'' is not full rank, can be obtained by computing the QR decomposition ''A = Q R''. One can then solve
: R hat{mathbf{x}} = Q^T mathbf{b}
where ''Q'' is an orthogonal matrix and ''R'' is an upper triangular matrix.
A third alternative is to use the singular value decomposition (SVD). If A = USigma V^
★ is the singular value decomposition of ''A'', then the pseudoinverse of the matrix ''A'' is
''V Σ+ U
'', so
: hat{mathbf{x}} = V Sigma^+ U^
★ mathbf{b} ,
where Σ+ is the transpose of Σ with every nonzero entry replaced by its reciprocal. This method is the most computationally intensive, but is particularly useful if the matrix ''A'' is very ill-conditioned (i.e. if its condition number multiplied by the machine's relative round-off error is appreciably large). In that case, including the smallest singular values in the inversion merely adds numerical noise to the solution. This can be cured using the truncated SVD approach, giving a more stable and exact answer, by explicitly setting to zero all singular values below a certain threshold and so ignoring them, before calculating the pseudoinverse.

Applications


The linear least squares method can be used to find an affine function 'R'n → 'R' that best fits a given set of data (see general least squares method). It is widely and erroneously thought that the word ''linear'' in the term ''linear regression'' refers to the linear or affine nature of the fitted function.
For example
: y = lpha + eta x + gamma x^2 + arepsilon
is still a linear regression model, for the right-hand side is a linear combination of the parameters α, β, and γ; moreover, the least-squares estimates of those parameters are linear in the vector of observed ''y''-values. In this case it is useful to think of ''x''2 as a new independent variable, formed by modifying the original variable ''x''. However, it is convention to call this a quadratic fit or a 2nd-order polynomial fit.
We write the linear function we try to find as a 1-by-''n'' matrix 'x'T (so 'x' is actually a column vector, see also linear transformation).
The set of data consists of ''m'' (''n'' + 1)-tuples (''x''1, ..., ''x''''n'', ''y''). Those can be written into an ''m''-by-''n'' matrix ''A'' and a vector 'b', where every tuple corresponds to a row of ''A'', the ''y'' becoming the corresponding entry in 'b'.
Then,
: ''A'''x' ≈ 'b'
yields the function 'x' we seek.

Limitations


The least squares approach relies on the calculation of the pseudo inverse (A^T! A)^{-1} A^T. The pseudo inverse is guaranteed to exist for any full-rank matrix A. However, in some cases the matrix A^T A is ill-conditioned; this occurs when the measurements are only marginally related to the estimated parameters. In these cases, the least squares estimate amplifies the measurement noise, and may be grossly inaccurate. This may occur even when the pseudo inverse itself can be accurately calculated numerically. Various regularization techniques can be applied in such cases, the most common of which is called Tikhonov regularization. If further information about the parameters is known, for example, a range of possible values of 'x', then minimax techniques can also be used to increase the stability of the solution.
Another drawback of the least squares estimator is the fact that it seeks to minimize the norm of the measurement error, ''A'''x' − 'b'. In many cases, one is truly interested in obtaining small error in the parameter 'x', e.g., a small value of |mathbf{x}-hat{mathbf{x}}|. However, since 'x' is unknown, this quantity cannot be directly minimized. If a prior probability on 'x' is known, then a Bayes estimator can be used to minimize the mean squared error, E left{ | mathbf{x} - hat{mathbf{x}} |^2
ight} . The least squares method is often applied when no prior is known. Surprisingly, however, better estimators can be constructed, an effect known as Stein's phenomenon. For example, if the measurement error is Gaussian, several estimators are known which dominate, or outperform, the least squares technique; the most common of these is the James-Stein estimator.

Example


Consider the points (0, 3), (2, 3), (4, 4), (−1, 2). We seek a solution of the form α''x'' + β = ''y'', that is,
: egin{pmatrix}x & 1 end{pmatrix}egin{pmatrix} lpha \ etaend{pmatrix} = y
We can then form the matrix ''A'':
: A=egin{pmatrix}
0 & 1 \
2 & 1 \
4 & 1 \
-1 & 1 \ end{pmatrix}
and the vector 'b'
: mathbf{b} = egin{pmatrix}
3 \
3 \
4 \
2 end{pmatrix}
and then
: A^T=egin{pmatrix}
0 & 2 & 4 & -1 \
1 & 1 & 1 & 1 end{pmatrix}
:A^Tmathbf{b}=egin{pmatrix}
20 \
12 end{pmatrix}
: A^TA=egin{pmatrix}
21 & 5 \
5 & 4 end{pmatrix}
:(A^TA)^{-1}={1over 59}egin{pmatrix}
4 & -5 \
-5 & 21 end{pmatrix}
So, the normal equation is
Plot of points and solution.

:A^TAegin{pmatrix} lpha \ eta end{pmatrix} = A^Tmathbf{b}
:egin{pmatrix}
21 & 5 \
5 & 4 end{pmatrix} egin{pmatrix} lpha \ eta end{pmatrix}=egin{pmatrix}
20 \
12 end{pmatrix}
and
:egin{pmatrix} lpha \ etaend{pmatrix}={1over 59}egin{pmatrix}
4 & -5 \
-5 & 21 end{pmatrix}egin{pmatrix}
20 \
12 end{pmatrix}=egin{pmatrix}
20/59 \
152/59 end{pmatrix}
and the line of best fit is (20/59)''x'' + 152/59 = ''y''.

See also



Linear regression

Least-squares estimation of linear regression coefficients

Tikhonov regularization

James-Stein estimator

External links



Least Squares Fitting – From MathWorld

Least Squares Fitting-Polynomial – From MathWorld

This article provided by Wikipedia. To edit the contents of this article, click here for original source.

psst.. try this: add to faves