Regression splines tutorial

This tutorial describes the spline basis and smoothing techniques which are based on splines.

using Plots
pyplot()

Here is a simple function and noisy data. The object of this tutorial is to estimate the sinuosoidal curve given the noisy observations.

x = linspace(0, 2*pi, 100)[1:end-1]
y = sin(4*x)
yn = y + randn(size(y)) *.2


plot(x,y)
scatter!(x,yn)

To do this we will use cubic splines. The spline basis with knots at $ \xi_i $ is given by $ (x-\xi_i)^3_i $ . This can be expressed in terms of so-called B-splines by the formula,

$$ p(x) = \sum_i c_i B_{i,3}(x). $$

The B-splines are a nice basis for the space of cubic splines, because they are smooth at the knots $\xi_i$. They are defined recursively in the following fashion:

$$ B_{i,0}(x):=\left\{\begin{matrix}1&\mathrm {if} \quad t_{i}\leq x < t_{i+1}\\0&\mathrm {otherwise} \end{matrix}\right\} $$
$$ B_{i,k}(x):={\frac {x-t_{i}}{t_{i+k}-t_{i}}}B_{i,k-1}(x)+{\frac {t_{i+k+1}-x}{t_{i+k+1}-t_{i+1}}}B_{i+1,k-1}(x).$$

This formula, and the corresponding recursive formula for the derivatives of beta splines are implemented by the following pair of functions. B-splines are typically used because they have compact support, which can speed up evaluation. I do not take advantage of this here.

function B(i, k, knots)
    e = eps(0.0)
    L = knots[end] - knots[1]
    # pad the knots vector for periodic case
    knots = [knots..., (L+knots[2:2+k]-knots[1])...]
    
    if k == 0
        x-> float( knots[i] - e .<= x .< knots[i+1] + e)
    else
        x->(B(i,k-1, knots)(x) .* (x- knots[i])/(knots[i+k]-knots[i])
        + B(i+1,k-1, knots)(x) .* (knots[i+k+1]-x)/(knots[i+k+1]-knots[i+1]))
    end 
end


function Bder(i, k, knots, p)
    e = eps(0.0)
    L = knots[end] - knots[1]
    # pad the knots vector for periodic case
    knots = [knots..., (L+knots[2:2+k]-knots[1])...]

    
    if p > k
        x -> zero(x)
    elseif p == 0
        x -> B(i,k,knots)(x)
    else
        x->((Bder(i,k-1, knots,p)(x) .* (x- knots[i]) + Bder(i,k-1, knots,p-1)(x))/(knots[i+k]-knots[i])
        + (Bder(i+1,k-1, knots,p)(x) .* (knots[i+k+1]-x) - Bder(i+1,k-1,knots,p-1)(x))/(knots[i+k+1]-knots[i+1]))
    end 
end
Bder (generic function with 1 method)

This is what the B-splines for orders 0-3 look like.

knots = collect(0:.1:1.0)
xx = linspace(0,1,200)
plot(xx,[B(4, k, knots)(xx) for k in 0:3])

And these are the first through third derivatives of the 3-rd order B-spline.

plot([Bder(4, 3, knots, p)(xx) for p in 0:2])

Smoothing using splines

Now I return to the original problem, and use a spline basis to smooth the noisy observations.

n = 20
p = 3
knots = collect(-p:n-1)*(2*pi)/n;


sp3basis = hcat([B(i,p,knots)(x) for i in 1:length(knots)]...);
yhat = sp3basis * (sp3basis \ yn);
plot(yhat)
plot!(y)
# scatter!(yn)

While this looks okay, we are not taking advantage of the boundary conditions. For periodic boundaries, we demand that the first and second derivatives be continuous at the boundary.

$$ \sum c_i B_{i,3}^{(p)}(0) = \sum c_i B_{i,3}^{(p)}(L),\ \text{for p = 0,1,2} $$
periodic_constraints = [Bder(i,3,knots, p)(x[1]) - Bder(i,3,knots, p)(x[end]) for p=0:2, i=1:length(knots)]
3×23 Array{Float64,2}:
  0.333333    1.33333   0.333333  0.0  …  -0.558076   -0.164105  -0.00423442
 -3.1831      0.0       3.1831    0.0     -0.0909283  -0.674302  -0.0506727 
 10.1321    -20.2642   10.1321    0.0      1.42003    -0.40682   -0.202131  

This is a projection operator which applies the constraints.

$y = A (I - B B^{\dagger}) c + \epsilon$

P = eye(size(periodic_constraints,2)) - pinv(periodic_constraints)*periodic_constraints;

Indeed the constraints are satisfied for random vectors of cofficents

periodic_constraints*P * rand(n+3)
3-element Array{Float64,1}:
  1.70889e-15
  1.08728e-15
 -2.96689e-15

Let’s perform a least squares fit

A = sp3basis * P


beta = A\yn

yhat = A* beta
yhat = yhat;

The spline does a decent job at fitting the data.

plot(yhat)
plot!(y)
scatter!(yn)