.. note::
    :class: sphx-glr-download-link-note

    Click :ref:`here <sphx_glr_download_packages_scikit-learn_auto_examples_plot_bias_variance.py>` to download the full example code
.. rst-class:: sphx-glr-example-title

.. _sphx_glr_packages_scikit-learn_auto_examples_plot_bias_variance.py:


====================================
Bias and variance of polynomial fit
====================================

Demo overfitting, underfitting, and validation and learning curves with
polynomial regression.

Fit polynomes of different degrees to a dataset: for too small a degree,
the model *underfits*, while for too large a degree, it overfits.


.. code-block:: python


    import numpy as np
    import matplotlib.pyplot as plt


    def generating_func(x, err=0.5):
        return np.random.normal(10 - 1. / (x + 0.1), err)


A polynomial regression


.. code-block:: python

    from sklearn.pipeline import make_pipeline
    from sklearn.linear_model import LinearRegression
    from sklearn.preprocessing import PolynomialFeatures


A simple figure to illustrate the problem


.. code-block:: python


    n_samples = 8

    np.random.seed(0)
    x = 10 ** np.linspace(-2, 0, n_samples)
    y = generating_func(x)

    x_test = np.linspace(-0.2, 1.2, 1000)

    titles = ['d = 1 (under-fit; high bias)',
              'd = 2',
              'd = 6 (over-fit; high variance)']
    degrees = [1, 2, 6]

    fig = plt.figure(figsize=(9, 3.5))
    fig.subplots_adjust(left=0.06, right=0.98, bottom=0.15, top=0.85, wspace=0.05)

    for i, d in enumerate(degrees):
        ax = fig.add_subplot(131 + i, xticks=[], yticks=[])
        ax.scatter(x, y, marker='x', c='k', s=50)

        model = make_pipeline(PolynomialFeatures(d), LinearRegression())
        model.fit(x[:, np.newaxis], y)
        ax.plot(x_test, model.predict(x_test[:, np.newaxis]), '-b')

        ax.set_xlim(-0.2, 1.2)
        ax.set_ylim(0, 12)
        ax.set_xlabel('house size')
        if i == 0:
            ax.set_ylabel('price')

        ax.set_title(titles[i])


.. image:: /packages/scikit-learn/auto_examples/images/sphx_glr_plot_bias_variance_001.png
    :class: sphx-glr-single-img


Generate a larger dataset


.. code-block:: python

    from sklearn.model_selection import train_test_split

    n_samples = 200
    test_size = 0.4
    error = 1.0

    # randomly sample the data
    np.random.seed(1)
    x = np.random.random(n_samples)
    y = generating_func(x, error)

    # split into training, validation, and testing sets.
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=test_size)

    # show the training and validation sets
    plt.figure(figsize=(6, 4))
    plt.scatter(x_train, y_train, color='red', label='Training set')
    plt.scatter(x_test, y_test, color='blue', label='Test set')
    plt.title('The data')
    plt.legend(loc='best')


.. image:: /packages/scikit-learn/auto_examples/images/sphx_glr_plot_bias_variance_002.png
    :class: sphx-glr-single-img


Plot a validation curve


.. code-block:: python

    from sklearn.model_selection import validation_curve

    degrees = np.arange(1, 21)

    model = make_pipeline(PolynomialFeatures(), LinearRegression())

    # The parameter to vary is the "degrees" on the pipeline step
    # "polynomialfeatures"
    train_scores, validation_scores = validation_curve(
                     model, x[:, np.newaxis], y,
                     param_name='polynomialfeatures__degree',
                     param_range=degrees)

    # Plot the mean train error and validation error across folds
    plt.figure(figsize=(6, 4))
    plt.plot(degrees, validation_scores.mean(axis=1), lw=2,
             label='cross-validation')
    plt.plot(degrees, train_scores.mean(axis=1), lw=2, label='training')

    plt.legend(loc='best')
    plt.xlabel('degree of fit')
    plt.ylabel('explained variance')
    plt.title('Validation curve')
    plt.tight_layout()


.. image:: /packages/scikit-learn/auto_examples/images/sphx_glr_plot_bias_variance_003.png
    :class: sphx-glr-single-img


Learning curves
###########################################################

 Plot train and test error with an increasing number of samples


.. code-block:: python


    # A learning curve for d=1, 5, 15
    for d in [1, 5, 15]:
        model = make_pipeline(PolynomialFeatures(degree=d), LinearRegression())

        from sklearn.model_selection import learning_curve
        train_sizes, train_scores, validation_scores = learning_curve(
            model, x[:, np.newaxis], y,
            train_sizes=np.logspace(-1, 0, 20))

        # Plot the mean train error and validation error across folds
        plt.figure(figsize=(6, 4))
        plt.plot(train_sizes, validation_scores.mean(axis=1),
                lw=2, label='cross-validation')
        plt.plot(train_sizes, train_scores.mean(axis=1),
                    lw=2, label='training')
        plt.ylim(ymin=-.1, ymax=1)

        plt.legend(loc='best')
        plt.xlabel('number of train samples')
        plt.ylabel('explained variance')
        plt.title('Learning curve (degree=%i)' % d)
        plt.tight_layout()


    plt.show()


.. rst-class:: sphx-glr-horizontal


    *

      .. image:: /packages/scikit-learn/auto_examples/images/sphx_glr_plot_bias_variance_004.png
            :class: sphx-glr-multi-img

    *

      .. image:: /packages/scikit-learn/auto_examples/images/sphx_glr_plot_bias_variance_005.png
            :class: sphx-glr-multi-img

    *

      .. image:: /packages/scikit-learn/auto_examples/images/sphx_glr_plot_bias_variance_006.png
            :class: sphx-glr-multi-img


**Total running time of the script:** ( 0 minutes  1.423 seconds)


.. _sphx_glr_download_packages_scikit-learn_auto_examples_plot_bias_variance.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download

     :download:`Download Python source code: plot_bias_variance.py <plot_bias_variance.py>`


  .. container:: sphx-glr-download

     :download:`Download Jupyter notebook: plot_bias_variance.ipynb <plot_bias_variance.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.readthedocs.io>`_