The construction of a least-squares approximant usually requires that one have in hand a basis for the space from which the data are to be approximated. As the example of the space of “natural” cubic splines illustrates, the explicit construction of a basis is not always straightforward.

This section makes clear that an explicit basis is not actually needed; it is sufficient to have available some means of interpolating in some fashion from the space of approximants. For this, the fact that the Curve Fitting Toolbox™ spline functions support work with vector-valued functions is essential.

This section discusses these aspects of least-squares approximation by “natural” cubic splines.

You want to construct the least-squares approximation to given
data (`x`

,`y`

) from the space *S* of
“natural” cubic splines with given breaks `b(1)`

<
...< `b(l+1)`

.

If you know a basis, (f1,f2,...,fm), for the linear space *S* of
all “natural” cubic splines with break sequence `b`

,
then you have learned to find the least-squares approximation in the
form `c(1)`

f1+ `c(2)`

f2+ ... + `c(m)`

fm,
with the vector `c`

the least-squares solution to
the linear system `A*c = y`

, whose coefficient matrix
is given by

A(i,j) = fj(x(i)), i=1:length(x), j=1:m .

In other words, `c = A\y`

.

The general solution seems to require that you know a basis.
However, in order to construct the coefficient sequence `c`

,
you only need to know the matrix `A`

. For this, it
is sufficient to have at hand a *basis map*, namely a function `F`

say,
so that `F(c)`

returns the spline given by the particular
weighted sum `c(1)`

f1`+c(2)`

f2`+`

... `+c(m)`

fm.
For, with that, you can obtain, for `j=1:m`

, the `j`

-th
column of `A`

as `fnval(F(ej),x)`

,
with `ej`

the `j`

-th column of `eye(m)`

,
the identity matrix of order `m`

.

Better yet, the Curve Fitting
Toolbox spline functions can
handle *vector-valued* functions, so you should
be able to construct the basis map `F`

to handle
vector-valued coefficients `c(i)`

as well. However,
by agreement, in this toolbox, a vector-valued coefficient is a *column* vector,
hence the sequence c is necessarily a row vector of column vectors,
i.e., a *matrix*. With that, `F(eye(m))`

is
the vector-valued spline whose `i`

-th component is
the basis element f`i`

,` i=1:m`

.
Hence, assuming the vector `x`

of data sites to be
a row vector, `fnval(F(eye(m)),x)`

is the matrix
whose `(i,j)`

-entry is the value of f`i`

at `x(j)`

,
i.e., the *transpose* of the matrix `A`

you
are seeking. On the other hand, as just pointed out, your basis map `F`

expects
the coefficient sequence `c`

to be a row vector,
i.e., the transpose of the vector `A\y`

. Hence,
assuming, correspondingly, the vector `y`

of data
values to be a row vector, you can obtain the least-squares approximation
from *S* to data (`x`

,`y`

)
as

F(y/fnval(F(eye(m)),x))

To be sure, if you wanted to be prepared for `x`

and `y`

to
be arbitrary vectors (of the same length), you would use instead

F(y(:).'/fnval(F(eye(m)),x(:).'))

What exactly is required of a basis map `F`

for
the linear space *S* of “natural” cubic
splines with break sequence `b(1) < ... < b(l+1)`

?
Assuming the dimension of this linear space is `m`

,
the map `F`

should set up a linear one-to-one correspondence
between `m`

-vectors and elements of *S*.
But that is exactly what `csape(b, . ,'var')`

does.

To be explicit, consider the following function `F`

:

function s = F(c) s = csape(b,c,'var');

For given vector `c`

(of the same length as
b), it provides the *unique* “natural”
cubic spline with break sequence b that takes the value `c(i)`

at `b(i)`

, `i=1:l+1`

.
The uniqueness is key. It ensures that the correspondence between
the vector `c`

and the resulting spline `F(c)`

is
one-to-one. In particular, `m`

equals `length(b)`

.
More than that, because the value *f*(*t*)
of a function *f* at a point *t* depends
linearly on *f*, this uniqueness ensures that `F(c)`

depends
linearly on `c`

(because `c`

equals `fnval(F(c),b)`

and
the inverse of an invertible linear map is again a linear map).

Putting it all together, you arrive at the following code

csape(b,y(:).'/fnval(csape(b,eye(length(b)),'var'),x(:).'),... 'var')

for the least-squares approximation by “natural”
cubic splines with break sequence `b`

.

Let's try it on some data, the census data, say, which is provided
in MATLAB^{®} by the command

load census

and which supplies the years, `1790:10:1990`

,
as `cdate`

and the values as `pop`

.
Use the break sequence `1810:40:1970`

.

b = 1810:40:1970; s = csape(b, ... pop(:)'/fnval(csape(b,eye(length(b)),'var'),cdate(:)'),'var'); fnplt(s, [1750,2050],2.2); hold on plot(cdate,pop,'or'); hold off

Have a look at Least-Squares Approximation by “Natural” Cubic Splines With Three Interior Breaks which shows, in thick blue, the resulting approximation, along with the given data.

This looks like a good approximation, -- except that it doesn't look like a “natural” cubic spline. A “natural” cubic spline, to recall, must be linear to the left of its first break and to the right of its last break, and this approximation satisfies neither condition. This is due to the following facts.

The “natural” cubic spline interpolant to given
data is provided by `csape`

in ppform, with the interval
spanned by the data sites its basic interval. On the other hand, evaluation
of a ppform outside its basic interval is done, in MATLAB `ppval`

or Curve Fitting
Toolbox spline
function `fnval`

, by using the relevant polynomial
end piece of the ppform, i.e., by full-order extrapolation. In case
of a “natural” cubic spline, you want instead second-order
extrapolation. This means that you want, to the left of the first
break, the straight line that agrees with the cubic spline in value
and slope at the first break. Such an extrapolation is provided by `fnxtr`

. Because the “natural” cubic
spline has zero second derivative at its first break, such an extrapolation
is even third-order, i.e., it satisfies three matching conditions.
In the same way, beyond the last break of the cubic spline, you want
the straight line that agrees with the spline in value and slope at
the last break, and this, too, is supplied by `fnxtr`

.

**Least-Squares Approximation by “Natural” Cubic
Splines With Three Interior Breaks**

The following one-line code provides the correct least-squares
approximation to data (`x`

,`y`

)
by “natural” cubic splines with break sequence `b`

:

fnxtr(csape(b,y(:).'/ ... fnval(fnxtr(csape(b,eye(length(b)),'var')),x(:).'),'var'))

But it is, admittedly, a rather long line.

The following code uses this correct formula and plots, in a thinner, red line, the resulting approximation on top of the earlier plots, as shown in Least-Squares Approximation by “Natural” Cubic Splines With Three Interior Breaks.

ss = fnxtr(csape(b,pop(:)'/ ... fnval(fnxtr(csape(b,eye(length(b)),'var')),cdate(:)'),'var')); hold on, fnplt(ss,[1750,2050],1.2,'r'),grid, hold off legend('incorrect approximation','population', ... 'correct approximation')

The one-line solution works perfectly if you want to approximate
by the space *S* of all cubic splines with the given
break sequence `b`

. You don't even have to use the Curve Fitting
Toolbox spline
functions for this because you can rely on the MATLAB `spline`

.
You know that, with `c`

a sequence containing two
more entries than does `b`

, `spline(b,c)`

provides
the unique cubic spline with break sequence `b`

that
takes the value `c(i+1)`

at `b(i)`

,
all `i`

, and takes the slope `c(1)`

at `b(1)`

,
and the slope `c(end)`

at `b(end)`

.
Hence, `spline(b,.)`

is a basis map for `S`

.

More than that, you know that `spline(b,c,xi)`

provides
the value(s) at `xi`

of this interpolating spline.
Finally, you know that `spline`

can handle vector-valued
data. Therefore, the following one-line code constructs the least-squares
approximation by cubic splines with break sequence `b`

to
data (`x`

,`y`

) :

spline(b,y(:)'/spline(b,eye(length(b)),x(:)'))