Regression splines tutorial
This tutorial describes the spline basis and smoothing techniques which are based on splines.
using Plots
pyplot()
Here is a simple function and noisy data. The object of this tutorial is to estimate the sinuosoidal curve given the noisy observations.
x = linspace(0, 2*pi, 100)[1:end-1]
y = sin(4*x)
yn = y + randn(size(y)) *.2
plot(x,y)
scatter!(x,yn)
To do this we will use cubic splines. The spline basis with knots at
The B-splines are a nice basis for the space of cubic splines, because they are smooth at the knots
This formula, and the corresponding recursive formula for the derivatives of beta splines are implemented by the following pair of functions. B-splines are typically used because they have compact support, which can speed up evaluation. I do not take advantage of this here.
function B(i, k, knots)
e = eps(0.0)
L = knots[end] - knots[1]
# pad the knots vector for periodic case
knots = [knots..., (L+knots[2:2+k]-knots[1])...]
if k == 0
x-> float( knots[i] - e .<= x .< knots[i+1] + e)
else
x->(B(i,k-1, knots)(x) .* (x- knots[i])/(knots[i+k]-knots[i])
+ B(i+1,k-1, knots)(x) .* (knots[i+k+1]-x)/(knots[i+k+1]-knots[i+1]))
end
end
function Bder(i, k, knots, p)
e = eps(0.0)
L = knots[end] - knots[1]
# pad the knots vector for periodic case
knots = [knots..., (L+knots[2:2+k]-knots[1])...]
if p > k
x -> zero(x)
elseif p == 0
x -> B(i,k,knots)(x)
else
x->((Bder(i,k-1, knots,p)(x) .* (x- knots[i]) + Bder(i,k-1, knots,p-1)(x))/(knots[i+k]-knots[i])
+ (Bder(i+1,k-1, knots,p)(x) .* (knots[i+k+1]-x) - Bder(i+1,k-1,knots,p-1)(x))/(knots[i+k+1]-knots[i+1]))
end
end
Bder (generic function with 1 method)
This is what the B-splines for orders 0-3 look like.
knots = collect(0:.1:1.0)
xx = linspace(0,1,200)
plot(xx,[B(4, k, knots)(xx) for k in 0:3])
And these are the first through third derivatives of the 3-rd order B-spline.
plot([Bder(4, 3, knots, p)(xx) for p in 0:2])
Smoothing using splines
Now I return to the original problem, and use a spline basis to smooth the noisy observations.
n = 20
p = 3
knots = collect(-p:n-1)*(2*pi)/n;
sp3basis = hcat([B(i,p,knots)(x) for i in 1:length(knots)]...);
yhat = sp3basis * (sp3basis \ yn);
plot(yhat)
plot!(y)
# scatter!(yn)
While this looks okay, we are not taking advantage of the boundary conditions. For periodic boundaries, we demand that the first and second derivatives be continuous at the boundary.
periodic_constraints = [Bder(i,3,knots, p)(x[1]) - Bder(i,3,knots, p)(x[end]) for p=0:2, i=1:length(knots)]
3×23 Array{Float64,2}:
0.333333 1.33333 0.333333 0.0 … -0.558076 -0.164105 -0.00423442
-3.1831 0.0 3.1831 0.0 -0.0909283 -0.674302 -0.0506727
10.1321 -20.2642 10.1321 0.0 1.42003 -0.40682 -0.202131
This is a projection operator which applies the constraints.
P = eye(size(periodic_constraints,2)) - pinv(periodic_constraints)*periodic_constraints;
Indeed the constraints are satisfied for random vectors of cofficents
periodic_constraints*P * rand(n+3)
3-element Array{Float64,1}:
1.70889e-15
1.08728e-15
-2.96689e-15
Let’s perform a least squares fit
A = sp3basis * P
beta = A\yn
yhat = A* beta
yhat = yhat;
The spline does a decent job at fitting the data.
plot(yhat)
plot!(y)
scatter!(yn)