7.1110 min read

Fitting Lines, Parabolas, and Planes

Least squares shines when fitting curves to data. Given $m$ data points $(x_1, y_1), \ldots, (x_m, y_m)$ , we want a polynomial $p(x) = c_0 + c_1 x + c_2 x^2 + \cdots + c_n x^n$ that fits them best. Writing out $p(x_i) \approx y_i$ for all $i$ gives the overdetermined system $A\mathbf{c} \approx \mathbf{y}$ .

The matrix $A$ is the Vandermonde matrix: row $i$ is $[1, x_i, x_i^2, \ldots, x_i^n]$ . Solving the normal equations $A^T A \hat{c} = A^T y$ gives the polynomial with the minimum total squared error.

For a line ( $n=1$ ): $A$ has columns $[1,\ldots,1]^T$ and $[x_1,\ldots,x_m]^T$ . The normal equations solve for the slope and intercept of the least squares line.

For a plane fitting 3D data: we have two input variables, so rows of $A$ become $[1, x_i, z_i]$ and we solve for the coefficients of $y = c_0 + c_1 x + c_2 z$ . The same normal equations apply — the geometry generalizes seamlessly to higher dimensions.

Formal View

Definition 7.11 — Polynomial Least Squares

Given data

(x_1, y_1), \ldots, (x_m, y_m)

, the degree-$n$ polynomial least squares fit minimizes

\sum_{i=1}^m \left(y_i - \sum_{j=0}^n c_j x_i^j\right)^2.

This equals

\|y - A\mathbf{c}\|^2

where

A_{ij} = x_i^{j-1}

is the Vandermonde matrix, solved via

A^T A \hat{c} = A^T y

Remark 7.11 — Underfitting vs. Overfitting

Increasing the polynomial degree

n

decreases the training error

\|y - A\hat{c}\|

. At

n = m-1

, we can interpolate exactly. But high-degree polynomials overfit: they fit noise in the data and predict poorly on new points. Choosing

n

by cross-validation balances bias and variance.

Why This Matters

Polynomial curve fitting is the simplest form of supervised machine learning and underpins everything from trend analysis to scientific modeling.

Data science: fitting trend lines and polynomial models in regression
Physics: fitting calibration curves for sensors and instruments
Finance: polynomial trend fitting for time series analysis
Biology: fitting dose-response curves in pharmacology

Learning Resources

Least Squares Fitting (Polynomial Regression)

StatQuest with Josh Starmer

Intuitive explanation of polynomial regression and the bias-variance tradeoff.

20 min

Linear Regression and Least Squares

MIT OpenCourseWare

Covers fitting lines and planes using the normal equations.

30 min

Quiz

Question 1

When fitting a degree-1 polynomial (line) to $m$ points, the matrix $A$ has how many columns?

Question 2

Fitting a degree- $(m-1)$ polynomial to $m$ distinct points always gives zero residual.

Question 3

The Vandermonde matrix for fitting a degree-2 polynomial to 5 data points has shape:

Common Mistakes

Using too high a polynomial degree — the fit looks perfect in-sample but generalizes poorly (overfitting).
Forgetting the constant column of ones in the Vandermonde matrix — omitting it forces the fitted polynomial through the origin.
Confusing interpolation (exact fit, $n = m-1$ ) with regression (best fit, small $n$ ) — they solve different problems.