Linear Algebra
7.1110 min read

Fitting Lines, Parabolas, and Planes

Least squares shines when fitting curves to data. Given mm data points (x1,y1),,(xm,ym)(x_1, y_1), \ldots, (x_m, y_m), we want a polynomial p(x)=c0+c1x+c2x2++cnxnp(x) = c_0 + c_1 x + c_2 x^2 + \cdots + c_n x^n that fits them best. Writing out p(xi)yip(x_i) \approx y_i for all ii gives the overdetermined system AcyA\mathbf{c} \approx \mathbf{y}.

The matrix AA is the Vandermonde matrix: row ii is [1,xi,xi2,,xin][1, x_i, x_i^2, \ldots, x_i^n]. Solving the normal equations ATAc^=ATyA^T A \hat{c} = A^T y gives the polynomial with the minimum total squared error.

For a line (n=1n=1): AA has columns [1,,1]T[1,\ldots,1]^T and [x1,,xm]T[x_1,\ldots,x_m]^T. The normal equations solve for the slope and intercept of the least squares line.

For a plane fitting 3D data: we have two input variables, so rows of AA become [1,xi,zi][1, x_i, z_i] and we solve for the coefficients of y=c0+c1x+c2zy = c_0 + c_1 x + c_2 z. The same normal equations apply — the geometry generalizes seamlessly to higher dimensions.

Formal View

Definition 7.11 — Polynomial Least Squares
Given data (x1,y1),,(xm,ym)(x_1, y_1), \ldots, (x_m, y_m), the degree-$n$ polynomial least squares fit minimizes
i=1m(yij=0ncjxij)2.\sum_{i=1}^m \left(y_i - \sum_{j=0}^n c_j x_i^j\right)^2.
This equals yAc2\|y - A\mathbf{c}\|^2 where Aij=xij1A_{ij} = x_i^{j-1} is the Vandermonde matrix, solved via ATAc^=ATyA^T A \hat{c} = A^T y.
Remark 7.11 — Underfitting vs. Overfitting
Increasing the polynomial degree nn decreases the training error yAc^\|y - A\hat{c}\|. At n=m1n = m-1, we can interpolate exactly. But high-degree polynomials overfit: they fit noise in the data and predict poorly on new points. Choosing nn by cross-validation balances bias and variance.

Why This Matters

Polynomial curve fitting is the simplest form of supervised machine learning and underpins everything from trend analysis to scientific modeling.

  • Data science: fitting trend lines and polynomial models in regression
  • Physics: fitting calibration curves for sensors and instruments
  • Finance: polynomial trend fitting for time series analysis
  • Biology: fitting dose-response curves in pharmacology

Quiz

Question 1

When fitting a degree-1 polynomial (line) to mm points, the matrix AA has how many columns?

Question 2

Fitting a degree-(m1)(m-1) polynomial to mm distinct points always gives zero residual.

Question 3

The Vandermonde matrix for fitting a degree-2 polynomial to 5 data points has shape:

Common Mistakes

  • Using too high a polynomial degree — the fit looks perfect in-sample but generalizes poorly (overfitting).
  • Forgetting the constant column of ones in the Vandermonde matrix — omitting it forces the fitted polynomial through the origin.
  • Confusing interpolation (exact fit, n=m1n = m-1) with regression (best fit, small nn) — they solve different problems.