Overfitting Ridge & Lasso Regression

Create sinusoidal dataset

Let's look at a synthetic dataset consisting of 30 points drawn from the sinusoid $y = \sin(4x)$:

In [26]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')
In [3]:
import graphlab
import math
import random
import numpy
from matplotlib import pyplot as plt
%matplotlib inline

Create random values for x in interval [0,1)

In [4]:
random.seed(98103)
n = 30
x = graphlab.SArray([random.random() for i in range(n)]).sort()

Compute y, add random Gaussiann, put in dataset

In [5]:
y = x.apply(lambda x: math.sin(4*x))
random.seed(1)
e = graphlab.SArray([random.gauss(0,1.0/3.0) for i in range(n)])
y = y + e
data = graphlab.SFrame({'X1':x,'Y':y})
data.head(3)
Out[5]:
X1 Y
0.0395789449501 0.587050191026
0.0415680996791 0.648655851372
0.0724319480801 0.307803309485
[3 rows x 2 columns]

Useful Polynomial Regression Functions

In [6]:
# Create features for a polynomial regression model of any degree
def polynomial_features(data, deg):
    data_copy=data.copy()
    for i in range(1,deg):
        data_copy['X'+str(i+1)]=data_copy['X'+str(i)]*data_copy['X1']
    return data_copy
In [7]:
# Plot data and predictions made
def plot_poly_predictions(data, model):
    # Plot actual data
    plt.plot(data['X1'],data['Y'],'k.')
    plt.xlabel('x')
    plt.ylabel('y')

    # Get the degree of the polynomial
    deg = len(model.coefficients['value'])-1
    
    # Create 200 points in the x axis and compute the predicted value for each point
    x_pred = graphlab.SFrame({'X1':[i/200.0 for i in range(200)]})
    y_pred = model.predict(polynomial_features(x_pred,deg))
    
    # plot predictions
    plt.plot(x_pred['X1'], y_pred, 'g-', label='degree ' + str(deg) + ' fit')
    plt.legend(loc='upper left')
    plt.axis([0,1,-1.5,2])
In [8]:
# Prints the polynomial coefficients in a pretty way
def print_coefficients(model):    
    # Get the degree of the polynomial
    deg = len(model.coefficients['value'])-1

    # Get learned parameters as a list
    w = list(model.coefficients['value'])

    # Numpy has a nifty function to print out polynomials in a pretty way
    # (We'll use it, but it needs the parameters in the reverse order)
    print 'Learned polynomial for degree ' + str(deg) + ':'
    w.reverse()
    print numpy.poly1d(w)

Fit a degree-2 polynomial

In [10]:
model = graphlab.linear_regression.create(polynomial_features(data, 2), target='Y', l2_penalty=0.,l1_penalty=0., validation_set=None,verbose=False)
print_coefficients(model)
plot_poly_predictions(data,model)
Learned polynomial for degree 2:
        2
-5.129 x + 4.147 x + 0.07471

Fit a degree-16 polynomial

In [11]:
model = graphlab.linear_regression.create(polynomial_features(data, 16), target='Y', l2_penalty=0.,l1_penalty=0., validation_set=None,verbose=False)
print_coefficients(model)
plot_poly_predictions(data,model)
Learned polynomial for degree 16:
           16             15             14             13
2.583e+06 x  - 1.092e+07 x  + 1.443e+07 x  + 1.873e+06 x 
              12             11             10             9
 - 2.095e+07 x  + 1.295e+07 x  + 9.366e+06 x  - 1.232e+07 x
              8             7             6             5             4
 - 2.544e+06 x + 1.181e+07 x - 9.325e+06 x + 3.887e+06 x - 9.666e+05 x
              3             2
 + 1.441e+05 x - 1.215e+04 x + 506.6 x - 7.325

Look at the difference in magnitude of the coefficients in these models.

Ridge Regression

Ridge regression aims to avoid overfitting by adding a cost to the RSS term of standard least squares that depends on the 2-norm of the coefficients $\|w\|$. The result is penalizing fits with large coefficients. The strength of this penalty, and thus the fit vs. model complexity balance, is controled by a parameter lambda (here called "L2_penalty").

Fits with increasing L2 penalty

In [12]:
for l2_penalty in [1e-25, 1e-10, 1e-6, 1e-3, 1e2]:
    model = graphlab.linear_regression.create(polynomial_features(data, 16), target='Y', l2_penalty=l2_penalty, validation_set=None, verbose=False)
    print 'lambda = %.2e' % l2_penalty
    print_coefficients(model)
    print '\n'
    plt.figure()
    plot_poly_predictions(data,model)
    plt<