## Least-Squares Approximation by Natural Cubic Splines

The construction of a least-squares approximant usually requires that one have in hand a basis for the space from which the data are to be approximated. As the example of the space of “natural” cubic splines illustrates, the explicit construction of a basis is not always straightforward.

This section makes clear that an explicit basis is not actually needed; it is sufficient to have available some means of interpolating in some fashion from the space of approximants. For this, the fact that the Curve Fitting Toolbox™ spline functions support work with vector-valued functions is essential.

This section discusses these aspects of least-squares approximation by “natural” cubic splines.

### Problem

You want to construct the least-squares approximation to given data (`x`,`y`) from the space S of “natural” cubic splines with given breaks `b(1)` < ...< `b(l+1)`.

### General Resolution

If you know a basis, (f1,f2,...,fm), for the linear space S of all “natural” cubic splines with break sequence `b`, then you have learned to find the least-squares approximation in the form `c(1)`f1+ `c(2)`f2+ ... + `c(m)`fm, with the vector `c` the least-squares solution to the linear system `A*c = y`, whose coefficient matrix is given by

```A(i,j) = fj(x(i)), i=1:length(x), j=1:m . ```

In other words, `c = A\y`.

### Need for a Basis Map

The general solution seems to require that you know a basis. However, in order to construct the coefficient sequence `c`, you only need to know the matrix `A`. For this, it is sufficient to have at hand a basis map, namely a function `F` say, so that `F(c)` returns the spline given by the particular weighted sum `c(1)`f1`+c(2)`f2`+`... `+c(m)`fm. For, with that, you can obtain, for `j=1:m`, the `j`-th column of `A` as `fnval(F(ej),x)`, with `ej` the `j`-th column of `eye(m)`, the identity matrix of order `m`.

Better yet, the Curve Fitting Toolbox spline functions can handle vector-valued functions, so you should be able to construct the basis map `F` to handle vector-valued coefficients `c(i)` as well. However, by agreement, in this toolbox, a vector-valued coefficient is a column vector, hence the sequence c is necessarily a row vector of column vectors, i.e., a matrix. With that, `F(eye(m))` is the vector-valued spline whose `i`-th component is the basis element f`i`,` i=1:m`. Hence, assuming the vector `x` of data sites to be a row vector, `fnval(F(eye(m)),x)` is the matrix whose `(i,j)`-entry is the value of f`i` at `x(j)`, i.e., the transpose of the matrix `A` you are seeking. On the other hand, as just pointed out, your basis map `F` expects the coefficient sequence `c` to be a row vector, i.e., the transpose of the vector `A\y`. Hence, assuming, correspondingly, the vector `y` of data values to be a row vector, you can obtain the least-squares approximation from S to data (`x`,`y`) as

```F(y/fnval(F(eye(m)),x)) ```

To be sure, if you wanted to be prepared for `x` and `y` to be arbitrary vectors (of the same length), you would use instead

```F(y(:).'/fnval(F(eye(m)),x(:).')) ```

### A Basis Map for “Natural” Cubic Splines

What exactly is required of a basis map `F` for the linear space S of “natural” cubic splines with break sequence `b(1) < ... < b(l+1)`? Assuming the dimension of this linear space is `m`, the map `F` should set up a linear one-to-one correspondence between `m`-vectors and elements of S. But that is exactly what `csape(b, . ,'var')` does.

To be explicit, consider the following function `F`:

```function s = F(c) s = csape(b,c,'var'); ```

For given vector `c` (of the same length as b), it provides the unique “natural” cubic spline with break sequence b that takes the value `c(i)` at `b(i)`, `i=1:l+1`. The uniqueness is key. It ensures that the correspondence between the vector `c` and the resulting spline `F(c)` is one-to-one. In particular, `m` equals `length(b)`. More than that, because the value f(t) of a function f at a point t depends linearly on f, this uniqueness ensures that `F(c)` depends linearly on `c` (because `c` equals `fnval(F(c),b)` and the inverse of an invertible linear map is again a linear map).

### The One-line Solution

Putting it all together, you arrive at the following code

```csape(b,y(:).'/fnval(csape(b,eye(length(b)),'var'),x(:).'),... 'var') ```

for the least-squares approximation by “natural” cubic splines with break sequence `b`.

### The Need for Proper Extrapolation

Let's try it on some data, the census data, say, which is provided in MATLAB® by the command

```load census ```

and which supplies the years, `1790:10:1990`, as `cdate` and the values as `pop`. Use the break sequence `1810:40:1970`.

```b = 1810:40:1970; s = csape(b, ... pop(:)'/fnval(csape(b,eye(length(b)),'var'),cdate(:)'),'var'); fnplt(s, [1750,2050],2.2); hold on plot(cdate,pop,'or'); hold off ```

Have a look at Least-Squares Approximation by “Natural” Cubic Splines With Three Interior Breaks which shows, in thick blue, the resulting approximation, along with the given data.

This looks like a good approximation, -- except that it doesn't look like a “natural” cubic spline. A “natural” cubic spline, to recall, must be linear to the left of its first break and to the right of its last break, and this approximation satisfies neither condition. This is due to the following facts.

The “natural” cubic spline interpolant to given data is provided by `csape` in ppform, with the interval spanned by the data sites its basic interval. On the other hand, evaluation of a ppform outside its basic interval is done, in MATLAB `ppval` or Curve Fitting Toolbox spline function `fnval`, by using the relevant polynomial end piece of the ppform, i.e., by full-order extrapolation. In case of a “natural” cubic spline, you want instead second-order extrapolation. This means that you want, to the left of the first break, the straight line that agrees with the cubic spline in value and slope at the first break. Such an extrapolation is provided by `fnxtr`. Because the “natural” cubic spline has zero second derivative at its first break, such an extrapolation is even third-order, i.e., it satisfies three matching conditions. In the same way, beyond the last break of the cubic spline, you want the straight line that agrees with the spline in value and slope at the last break, and this, too, is supplied by `fnxtr`.

Least-Squares Approximation by “Natural” Cubic Splines With Three Interior Breaks ### The Correct One-Line Solution

The following one-line code provides the correct least-squares approximation to data (`x`,`y`) by “natural” cubic splines with break sequence `b`:

```fnxtr(csape(b,y(:).'/ ... fnval(fnxtr(csape(b,eye(length(b)),'var')),x(:).'),'var')) ```

But it is, admittedly, a rather long line.

The following code uses this correct formula and plots, in a thinner, red line, the resulting approximation on top of the earlier plots, as shown in Least-Squares Approximation by “Natural” Cubic Splines With Three Interior Breaks.

``` ss = fnxtr(csape(b,pop(:)'/ ... fnval(fnxtr(csape(b,eye(length(b)),'var')),cdate(:)'),'var')); hold on, fnplt(ss,[1750,2050],1.2,'r'),grid, hold off legend('incorrect approximation','population', ... 'correct approximation') ```

### Least-Squares Approximation by Cubic Splines

The one-line solution works perfectly if you want to approximate by the space S of all cubic splines with the given break sequence `b`. You don't even have to use the Curve Fitting Toolbox spline functions for this because you can rely on the MATLAB `spline`. You know that, with `c` a sequence containing two more entries than does `b`, `spline(b,c)` provides the unique cubic spline with break sequence `b` that takes the value `c(i+1)` at `b(i)`, all `i`, and takes the slope `c(1)` at `b(1)`, and the slope `c(end)` at `b(end)`. Hence, `spline(b,.)` is a basis map for `S`.

More than that, you know that `spline(b,c,xi)` provides the value(s) at `xi` of this interpolating spline. Finally, you know that `spline` can handle vector-valued data. Therefore, the following one-line code constructs the least-squares approximation by cubic splines with break sequence `b` to data (`x`,`y`) :

```spline(b,y(:)'/spline(b,eye(length(b)),x(:)')) ```