In regression analysis, least squares, also known as ordinary least squares analysis, is a method for linear regression that determines the values of unknown quantities in a statistical model by minimizing the sum of the residuals (the difference between the predicted and observed values) squared. This method was first described by Carl Friedrich Gauss at the turn of the 19th century. Today, this method is available in most statistical software packages. The leastsquares approach to regression analysis has been shown to be optimal in the sense that it satisfies the GaussMarkov theorem. In statistics, regression analysis examines the relation of a dependent variable (response variable) to specified independent variables (explanatory variables). ...
In statistics, linear regression is a regression method that models the relationship between a dependent variable Y, independent variables Xp, and a random term Îµ. The model can be written as where Î²1 is the intercept (constant term), the Î²is are the respective parameters of independent variables, and p is the...
Johann Carl Friedrich Gauss or GauÃŸ ( ; Latin: ) (30 April 1777 â€“ 23 February 1855) was a German mathematician and scientist who contributed significantly to many fields, including number theory, analysis, differential geometry, geodesy, electrostatics, astronomy, and optics. ...
Alternative meaning: Nineteenth Century (periodical) (18th century — 19th century — 20th century — more centuries) As a means of recording the passage of time, the 19th century was that century which lasted from 18011900 in the sense of the Gregorian calendar. ...
This article is not about GaussMarkov processes. ...
A related method is the least mean squares (LMS) method. It occurs when the number of measured data is 1 and the gradient descent method is used to minimize the squared residual. LMS is known to minimize the expectation of the squared residual, with the smallest number of operations per iteration). However, it requires a large number of iterations to converge. Least mean squares algorithms are used in adaptive filters to find the filter coefficients that relate to producing the least mean squares of the error signal (difference between the desired and the actual signal). ...
Gradient descent is an optimization algorithm that approaches a local minimum of a function by taking steps proportional to the negative of the gradient (or the approximate gradient) of the function at the current point. ...
Furthermore, many other types of optimization problems can be expressed in a least squares form, by either minimizing energy or maximizing entropy. Claude Shannon In information theory, the Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable. ...
History
In 1795, Carl Friedrich Gauss, at the age of 18, is credited with developing the fundamentals of the basis for leastsquares analysis. However, as with many of his discoveries, he did not publish them. The strength of his method was demonstrated in 1801, when it was used to predict the future location of the newly discovered asteroid Ceres. Image File history File links Download high resolution version (576x738, 235 KB) Description: Ausschnitt aus einem GemÃ¤lde von C. F. Gauss Source: evtl. ...
Image File history File links Download high resolution version (576x738, 235 KB) Description: Ausschnitt aus einem GemÃ¤lde von C. F. Gauss Source: evtl. ...
Johann Carl Friedrich Gauss or GauÃŸ ( ; Latin: ) (30 April 1777 â€“ 23 February 1855) was a German mathematician and scientist who contributed significantly to many fields, including number theory, analysis, differential geometry, geodesy, electrostatics, astronomy, and optics. ...
1795 was a common year starting on Thursday (see link for calendar). ...
Johann Carl Friedrich Gauss or GauÃŸ ( ; Latin: ) (30 April 1777 â€“ 23 February 1855) was a German mathematician and scientist who contributed significantly to many fields, including number theory, analysis, differential geometry, geodesy, electrostatics, astronomy, and optics. ...
1 Ceres (SEER eez) was the first asteroid to be discovered, with a diameter of 959. ...
On January 1st, 1801, the Italian astronomer Giuseppe Piazzi had discovered the asteroid Ceres and had been able to track its path for 40 days before it was lost in the glare of the sun. Based on this data, it was desired to determine the location of Ceres after it emerged from behind the sun without solving the complicated Kepler's nonlinear equations of planetary motion. The only predictions that successfully allowed the Hungarian astronomer Franz Xaver von Zach to relocate Ceres were those performed by the 24yearold Gauss using leastsquares analysis. January 1 is the first day of the calendar year in both the Julian and Gregorian calendars. ...
The Union Jack, flag of the newly formed United Kingdom of Great Britain and Ireland. ...
Giuseppe Piazzi. ...
Johannes Keplers primary contributions to astronomy/astrophysics were his three laws of planetary motion. ...
Franz Xaver, Baron Von Zach Baron Franz Xaver von Zach (Franz Xaver Freiherr von Zach) (June 4, 1754  September 2, 1832) was an Austrian astronomer born at Bratislava. ...
However, Gauss did not publish the method until 1809, when it appeared in volume two of his work on celestial mechanics, Theoria Motus Corporum Coelestium in sectionibus conicis solem ambientium. Year 1809 (MDCCCIX) was a common year starting on Sunday (link will display the full calendar). ...
The idea of leastsquares analysis was independently formulated by the Frenchman AdrienMarie Legendre in 1805 and the American Robert Adrain in 1808. AdrienMarie Legendre (September 18, 1752 â€“ January 10, 1833) was a French mathematician. ...
1805 was a common year starting on Tuesday (see link for calendar). ...
Robert Adrain (September 30, 1775  August 10, 1843) was a scientist and mathematician. ...
Year 1808 (MDCCCVIII) was a leap year starting on Friday (link will display the full calendar) of the Gregorian calendar (or a leap year starting on Wednesday of the 12day slower Julian calendar). ...
In 1829, Gauss was able to state that the leastsquares approach to regression analysis is optimal in the sense that in a linear model where the errors have a mean of zero, are uncorrelated, and have equal variances, the best linear unbiased estimators of the coefficients is the leastsquares estimators. This result is known as the GaussMarkov theorem. Johann Wolfgang von Goethe 1829 was a common year starting on Thursday (see link for calendar). ...
This article is not about GaussMarkov processes. ...
Problem statement The objective consists of adjusting a model function to best fit a data set. The chosen model function has adjustable parameters. The data set consist of n points with . The model function has the form , where y is the dependent variable, are the independent variables, and are the model adjustable parameters. We wish to find the parameter values such that the model best fits the data according to a defined error criterion. The least sum square method minimizes the sum square error equation with respect to the adjustable parameters . For an example, the data is height measurements over a surface. We choose to model the data by a plane with parameters for plane mean height, plane tip angle, and plane tilt angle. The model equation is then y = f(x_{1},x_{2}) = a_{1} + a_{2}x_{1} + a_{3}x_{2}, the independent variables are , and the adjustable parameters are .
Solving the least squares problem Least square optimization problems can be divided into linear and nonlinear problems. The linear problem has a closed form solution. The optimization problem is said to be a linear optimization problem if the first order partial derivatives of S with respect to the parameters results in a set of equations that is linear in the parameter variables. The general, nonlinear, unconstrained optimization problem has no closed form solution. In this case recursive methods, such as Newton's method, combined with the gradient descent method, or specialized methods for least squares analysis, such as the GaussNewton algorithm or the LevenbergMarquardt algorithm can be used. In mathematics, the term optimization, or mathematical programming, refers to the study of problems in which one seeks to minimize or maximize a real function by systematically choosing the values of real or integer variables from within an allowed set. ...
In mathematics, Newtons method is a wellknown algorithm for finding roots of equations in one or more dimensions. ...
Gradient descent is an optimization algorithm that approaches a local minimum of a function by taking steps proportional to the negative of the gradient (or the approximate gradient) of the function at the current point. ...
In mathematics, the GaussNewton algorithm is used to solve nonlinear least squares problems. ...
The LevenbergMarquardt algorithm provides a numerical solution to the mathematical problem of minimizing a function, generally nonlinear, over a space of parameters of the function. ...
Least squares and regression analysis In regression analysis, one replaces the relation In statistics, regression analysis examines the relation of a dependent variable (response variable) to specified independent variables (explanatory variables). ...
by where the noise term ε is a random variable with mean zero. Note that we are assuming that the x values are exact, and all the errors are in the y values. Again, we distinguish between linear regression, in which case the function f is linear in the parameters to be determined (e.g., f(x) = ax^{2} + bx + c), and nonlinear regression. As before, linear regression is much simpler than nonlinear regression. (It is tempting to think that the reason for the name linear regression is that the graph of the function f(x) = ax + b is a line. But fitting a curve like f(x) = ax^{2} + bx + c when estimating a, b, and c by least squares, is an instance of linear regression because the vector of leastsquare estimates of a, b, and c is a linear transformation of the vector whose components are f(x_{i}) + ε_{i}. In probability theory, a random variable is a quantity whose values are random and to which a probability distribution is assigned. ...
In statistics, linear regression is a regression method that models the relationship between a dependent variable Y, independent variables Xp, and a random term Îµ. The model can be written as where Î²1 is the intercept (constant term), the Î²is are the respective parameters of independent variables, and p is the...
dataset with approximating polynomials Nonlinear regression in statistics is the problem of fitting a model to multidimensional x,y data, where f is a nonlinear function of x with parameters Î¸. In general, there is no algebraic expression for the bestfitting parameters, as there is in linear regression. ...
In mathematics, a linear transformation (also called linear map or linear operator) is a function between two vector spaces that preserves the operations of vector addition and scalar multiplication. ...
Parameter estimates By recognizing that the regression model is a system of linear equations we can express the model using data matrix X, target vector Y and parameter vector δ. The ith row of X and Y will contain the x and y value for the ith data sample. Then the model can be written as which when using pure matrix notation becomes where ε is normally distributed with expected value 0 (i.e., a column vector of 0s) and variance σ^{2} I_{n}, where I_{n} is the n×n identity matrix. The leastsquares estimator for δ is In parametric statistics, the leastsquares estimator is often used to estimate the coefficients of a linear regression. ...
(where X^{T} is the transpose of X) and the sum of squares of residuals is One of the properties of leastsquares is that the matrix is the orthogonal projection of Y onto the column space of X. The fact that the matrix X(X^{T}X)^{−1}X^{T} is a symmetric idempotent matrix is incessantly relied on in proofs of theorems. The linearity of as a function of the vector Y, expressed above by saying In linear algebra, a symmetric matrix is a matrix that is its own transpose. ...
In mathematics, an idempotent element is an element which, intuitively, leaves something unchanged. ...
is the reason why this is called "linear" regression. Nonlinear regression uses nonlinear methods of estimation. The matrix I_{n} − X (X^{T} X)^{−1} X^{T} that appears above is a symmetric idempotent matrix of rank n − 2. Here is an example of the use of that fact in the theory of linear regression. The finitedimensional spectral theorem of linear algebra says that any real symmetric matrix M can be diagonalized by an orthogonal matrix G, i.e., the matrix G′MG is a diagonal matrix. If the matrix M is also idempotent, then the diagonal entries in G′MG must be idempotent numbers. Only two real numbers are idempotent: 0 and 1. So I_{n} − X(X^{T}X)^{ 1}X^{T}, after diagonalization, has n − 2 1s and two 0s on the diagonal. That is most of the work in showing that the sum of squares of residuals has a chisquare distribution with n−2 degrees of freedom. In mathematics, particularly linear algebra and functional analysis, the spectral theorem is any of a number of results about linear operators or about matrices. ...
Linear algebra is the branch of mathematics concerned with the study of vectors, vector spaces (also called linear spaces), linear maps (also called linear transformations), and systems of linear equations. ...
In matrix theory, a real orthogonal matrix is a square matrix Q whose transpose is its inverse: // Overview An orthogonal matrix is the real specialization of a unitary matrix, and thus always a normal matrix. ...
In probability theory and statistics, the chisquare distribution (also chisquared or Ï‡2 distribution) is one of the theoretical probability distributions most widely used in inferential statistics, i. ...
Regression parameters can also be estimated by Bayesian methods. This has the advantages that Bayesian refers to probability and statistics  either methods associated with the Reverend Thomas Bayes (ca. ...
 confidence intervals can be produced for parameter estimates without the use of asymptotic approximations,
 prior information can be incorporated into the analysis.
Suppose that in the linear regression In this diagram, the bars represent observation means and the red lines represent the confidence intervals surrounding them. ...
we know from domain knowledge that alpha can only take one of the values {−1, +1} but we do not know which. We can build this information into the analysis by choosing a prior for alpha which is a discrete distribution with a probability of 0.5 on −1 and 0.5 on +1. The posterior for alpha will also be a discrete distribution on {−1, +1}, but the probability weights will change to reflect the evidence from the data. In modern computer applications, the actual value of β is calculated using the QR decomposition or slightly more fancy methods when X^{T}X is near singular. The code for the MATLAB function is an excellent example of a robust method. In linear algebra, the QR decomposition (also called the QR factorization) of a matrix is a decomposition of the matrix into an orthogonal and a triangular matrix. ...
MATLAB is a numerical computing environment and programming language. ...
Summarizing the data We sum the observations, the squares of the Xs and the products XY to obtain the following quantities. Estimating beta (the slope) We use the summary statistics above to calculate , the estimate of β. Estimating alpha (the intercept) We use the estimate of β and the other statistics to estimate α by: A consequence of this estimate is that the regression line will always pass through the "center" .
Limitations Least squares estimation for linear models is notoriously nonrobust to outliers. If the distribution of the outliers is skewed, the estimates can be biased. In the presence of any outliers, the least squares estimates are inefficient and can be extremely so. When outliers occur in the data, methods of robust regression are more appropriate. In robust statistics, robust regression is a form of regression analysis designed to circumvent the limitations of traditional parametric and nonparametric methods. ...
References  Abdi, H. "[1] (2003). Leastsquares. In M. LewisBeck, A. Bryman, T. Futing (Eds): Encyclopedia for research methods for the social sciences. Thousand Oaks (CA): Sage. pp. 792795.]".
 Stigler, S.M. (1986). The History of Statistics: The Measurement of Uncertainty Before 1900. Harvard University Press, Cambridge MA and London, England.
See also Isotonic regression (IR) involves finding a weighted leastsquares fit to a vector with weights vector subject to a set of monotonicity constraints giving a partial order over the variables. ...
Least mean squares (LMS) algorithms are used in adaptive filters to find the filter coefficients that relate to producing the least mean squares of the error signal (difference between the desired and the actual signal). ...
In parametric statistics, the leastsquares estimator is often used to estimate the coefficients of a linear regression. ...
Linear least squares is a mathematical optimization technique to find an approximate solution for a system of linear equations that has no exact solution. ...
In statistics, linear regression is a regression method that models the relationship between a dependent variable Y, independent variables Xp, and a random term Îµ. The model can be written as where Î²1 is the intercept (constant term), the Î²is are the respective parameters of independent variables, and p is the...
Segmented linear regression to detect relations and breakpoints despite scatter // Mustard and salinity In statistics, regression analysis [1] is done to detect a mathematical relation between several series of measured things (elements) that have variable values, especially when the relation is scattered due to random variation. ...
The measurement uncertainty quantifies the distance between the actually measured value of a physical quantity and the true value of the same physical quantity. ...
Moving least squares is a method of reconstructing continuous functions from a set of unorganized point samples via the calculation of a weighted least squares measure biased towards the region around the point at which the reconstructed value is requested. ...
// Description Recursive least squares algorithm is used in adaptive filters to find the filter coefficients that relate to producing the recursively least squares of the error signal (difference between the desired and the actual signal) Performance This algorithm converges faster than the LMS algorithm. ...
In statistics, regression analysis examines the relation of a dependent variable (response variable) to specified independent variables (explanatory variables). ...
In robust statistics, robust regression is a form of regression analysis designed to circumvent the limitations of traditional parametric and nonparametric methods. ...
In mathematics, the root mean square or rms is a statistical measure of the magnitude of a varying quantity. ...
Errorsinvariables is a robust modeling technique in statistics, which assumes that every variable can have error or noise. ...
ErrorsinVariables is a robust modeling technique in statistics, which assumes that every variable can have error or noise. ...
Weighted least squares is a method of regression, similar to least squares in that it uses the same minimization of the sum of the residuals: However, instead of weighting all points equally, they are weighted such that points with a greater weight contribute more to the fit: Often, wi is...
External links  MIT Linear Algebra Lecture on Least Squares at Google Video, from MIT OpenCourseWare
 http://www.physics.csbsju.edu/stats/least_squares.html
 Zunzun.com  Online curve and surface fitting
 http://www.orbitals.com/self/least/least.htm
 Eric W. Weisstein, Least Squares Fitting Polynomial at MathWorld.
 Module for Least Squares Polynomials
 Least squares on PlanetMath
 Derivation of quadratic least squares
Dr. Eric W. Weisstein Encyclopedist Dr. Eric W. Weisstein (born March 18, 1969, in Bloomington, Indiana) is a noted encyclopedist in several technical areas of science and mathematics. ...
MathWorld is an online mathematics reference work, sponsored by Wolfram Research Inc. ...
PlanetMath is a free, collaborative, online mathematics encyclopedia. ...
