Least Squares Regression for a Straight Line The method of least squares is used to find the best-fitting straight line (or other curve) to a set of data points by minimizing the sum of the squares of the vertical offsets (residuals) of the points from the line. Equation of a Straight Line The equation of a straight line is given by: $y = a_0 + a_1 x$ $a_0$: y-intercept $a_1$: slope of the line Normal Equations To find $a_0$ and $a_1$, we solve the following system of normal equations: $\sum y_i = n a_0 + a_1 \sum x_i$ $\sum x_i y_i = a_0 \sum x_i + a_1 \sum x_i^2$ Where $n$ is the number of data points. Formulas for $a_0$ and $a_1$ The coefficients $a_0$ and $a_1$ can also be found directly using these formulas: $a_1 = \frac{n \sum x_i y_i - (\sum x_i)(\sum y_i)}{n \sum x_i^2 - (\sum x_i)^2}$ $a_0 = \bar{y} - a_1 \bar{x}$ Where $\bar{x} = \frac{\sum x_i}{n}$ and $\bar{y} = \frac{\sum y_i}{n}$ are the means of $x$ and $y$ respectively. Example Calculation Given data points: $x_i$ $y_i$ $x_i^2$ $x_i y_i$ 0 1.0 0 0.0 1 1.8 1 1.8 2 3.3 4 6.6 3 4.5 9 13.5 4 6.3 16 25.2 Calculate Summations $\sum x_i = 0+1+2+3+4 = 10$ $\sum y_i = 1.0+1.8+3.3+4.5+6.3 = 16.9$ $\sum x_i^2 = 0+1+4+9+16 = 30$ $\sum x_i y_i = 0.0+1.8+6.6+13.5+25.2 = 47.1$ $n = 5$ Calculate $a_1$ (Slope) $a_1 = \frac{n \sum x_i y_i - (\sum x_i)(\sum y_i)}{n \sum x_i^2 - (\sum x_i)^2}$ $a_1 = \frac{5(47.1) - (10)(16.9)}{5(30) - (10)^2}$ $a_1 = \frac{235.5 - 169}{150 - 100}$ $a_1 = \frac{66.5}{50} = 1.33$ Calculate $a_0$ (Y-intercept) First, find the means: $\bar{x} = \frac{10}{5} = 2$ $\bar{y} = \frac{16.9}{5} = 3.38$ $a_0 = \bar{y} - a_1 \bar{x}$ $a_0 = 3.38 - (1.33)(2)$ $a_0 = 3.38 - 2.66 = 0.72$ Resulting Straight Line Equation The best-fitting straight line for the given data is: $y = 0.72 + 1.33x$