{"id":3506,"date":"2020-05-28T19:00:22","date_gmt":"2020-05-28T19:00:22","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2020\/05\/28\/how-to-use-polynomial-feature-transforms-for-machine-learning\/"},"modified":"2020-05-28T19:00:22","modified_gmt":"2020-05-28T19:00:22","slug":"how-to-use-polynomial-feature-transforms-for-machine-learning","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2020\/05\/28\/how-to-use-polynomial-feature-transforms-for-machine-learning\/","title":{"rendered":"How to Use Polynomial Feature Transforms for Machine Learning"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Often, the input features for a predictive modeling task interact in unexpected and often nonlinear ways.<\/p>\n<p>These interactions can be identified and modeled by a learning algorithm. Another approach is to engineer new features that expose these interactions and see if they improve model performance. Additionally, transforms like raising input variables to a power can help to better expose the important relationships between input variables and the target variable.<\/p>\n<p>These features are called interaction and polynomial features and allow the use of simpler modeling algorithms as some of the complexity of interpreting the input variables and their relationships is pushed back to the data preparation stage. Sometimes these features can result in improved modeling performance, although at the cost of adding thousands or even millions of additional input variables.<\/p>\n<p>In this tutorial, you will discover how to use polynomial feature transforms for feature engineering with numerical input variables.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Some machine learning algorithms prefer or perform better with polynomial input features.<\/li>\n<li>How to use the polynomial features transform to create new versions of input variables for predictive modeling.<\/li>\n<li>How the degree of the polynomial impacts the number of input features created by the transform.<\/li>\n<\/ul>\n<p>Let&rsquo;s get started.<\/p>\n<div id=\"attachment_10373\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10373\" class=\"size-full wp-image-10373\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/05\/How-to-Use-Polynomial-Features-Transforms-for-Machine-Learning.jpg\" alt=\"How to Use Polynomial Features Transforms for Machine Learning\" width=\"800\" height=\"532\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/How-to-Use-Polynomial-Features-Transforms-for-Machine-Learning.jpg 800w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/How-to-Use-Polynomial-Features-Transforms-for-Machine-Learning-300x200.jpg 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/How-to-Use-Polynomial-Features-Transforms-for-Machine-Learning-768x511.jpg 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"><\/p>\n<p id=\"caption-attachment-10373\" class=\"wp-caption-text\">How to Use Polynomial Feature Transforms for Machine Learning<br \/>Photo by <a href=\"https:\/\/flickr.com\/photos\/dcoetzee\/3573755676\/\">D Coetzee<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into five parts; they are:<\/p>\n<ol>\n<li>Polynomial Features<\/li>\n<li>Polynomial Feature Transform<\/li>\n<li>Sonar Dataset<\/li>\n<li>Polynomial Feature Transform Example<\/li>\n<li>Effect of Polynomial Degree<\/li>\n<\/ol>\n<h2>Polynomial Features<\/h2>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Polynomial\">Polynomial<\/a> features are those features created by raising existing features to an exponent.<\/p>\n<p>For example, if a dataset had one input feature X, then a polynomial feature would be the addition of a new feature (column) where values were calculated by squaring the values in X, e.g. X^2. This process can be repeated for each input variable in the dataset, creating a transformed version of each.<\/p>\n<p>As such, polynomial features are a type of feature engineering, e.g. the creation of new input features based on the existing features.<\/p>\n<p>The &ldquo;<em>degree<\/em>&rdquo; of the polynomial is used to control the number of features added, e.g. a degree of 3 will add two new variables for each input variable. Typically a small degree is used such as 2 or 3.<\/p>\n<blockquote>\n<p>Generally speaking, it is unusual to use d greater than 3 or 4 because for large values of d, the polynomial curve can become overly flexible and can take on some very strange shapes.<\/p>\n<\/blockquote>\n<p>&mdash; Page 266, <a href=\"https:\/\/amzn.to\/2SfkCXh\">An Introduction to Statistical Learning with Applications in R<\/a>, 2014.<\/p>\n<p>It is also common to add new variables that represent the interaction between features, e.g a new column that represents one variable multiplied by another. This too can be repeated for each input variable creating a new &ldquo;<em>interaction<\/em>&rdquo; variable for each pair of input variables.<\/p>\n<p>A squared or cubed version of an input variable will change the probability distribution, separating the small and large values, a separation that is increased with the size of the exponent.<\/p>\n<p>This separation can help some machine learning algorithms make better predictions and is common for regression predictive modeling tasks and generally tasks that have numerical input variables.<\/p>\n<p>Typically linear algorithms, such as linear regression and logistic regression, respond well to the use of polynomial input variables.<\/p>\n<blockquote>\n<p>Linear regression is linear in the model parameters and adding polynomial terms to the model can be an effective way of allowing the model to identify nonlinear patterns.<\/p>\n<\/blockquote>\n<p>&mdash; Page 11, <a href=\"https:\/\/amzn.to\/2Yvcupn\">Feature Engineering and Selection<\/a>, 2019.<\/p>\n<p>For example, when used as input to a linear regression algorithm, the method is more broadly referred to as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Polynomial_regression\">polynomial regression<\/a>.<\/p>\n<blockquote>\n<p>Polynomial regression extends the linear model by adding extra predictors, obtained by raising each of the original predictors to a power. For example, a cubic regression uses three variables, X, X2, and X3, as predictors. This approach provides a simple way to provide a non-linear fit to data.<\/p>\n<\/blockquote>\n<p>&mdash; Page 265, <a href=\"https:\/\/amzn.to\/2SfkCXh\">An Introduction to Statistical Learning with Applications in R<\/a>, 2014.<\/p>\n<h2>Polynomial Feature Transform<\/h2>\n<p>The polynomial features transform is available in the scikit-learn Python machine learning library via the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.PolynomialFeatures.html\">PolynomialFeatures class<\/a>.<\/p>\n<p>The features created include:<\/p>\n<ul>\n<li>The bias (the value of 1.0)<\/li>\n<li>Values raised to a power for each degree (e.g. x^1, x^2, x^3, &hellip;)<\/li>\n<li>Interactions between all pairs of features (e.g. x1 * x2, x1 * x3, &hellip;)<\/li>\n<\/ul>\n<p>For example, with two input variables with values 2 and 3 and a degree of 2, the features created would be:<\/p>\n<ul>\n<li>1 (the bias)<\/li>\n<li>2^1 = 2<\/li>\n<li>3^1 = 3<\/li>\n<li>2^2 = 4<\/li>\n<li>3^2 = 9<\/li>\n<li>2 * 3 = 6<\/li>\n<\/ul>\n<p>We can demonstrate this with an example:<\/p>\n<pre class=\"crayon-plain-tag\"># demonstrate the types of features created\r\nfrom numpy import asarray\r\nfrom sklearn.preprocessing import PolynomialFeatures\r\n# define the dataset\r\ndata = asarray([[2,3],[2,3],[2,3]])\r\nprint(data)\r\n# perform a polynomial features transform of the dataset\r\ntrans = PolynomialFeatures(degree=2)\r\ndata = trans.fit_transform(data)\r\nprint(data)<\/pre>\n<p>Running the example first reports the raw data with two features (columns) and each feature has the same value, either 2 or 3.<\/p>\n<p>Then the polynomial features are created, resulting in six features, matching what was described above.<\/p>\n<pre class=\"crayon-plain-tag\">[[2 3]\r\n [2 3]\r\n [2 3]]\r\n\r\n[[1. 2. 3. 4. 6. 9.]\r\n [1. 2. 3. 4. 6. 9.]\r\n [1. 2. 3. 4. 6. 9.]]<\/pre>\n<p>The &ldquo;<em>degree<\/em>&rdquo; argument controls the number of features created and defaults to 2.<\/p>\n<p>The &ldquo;<em>interaction_only<\/em>&rdquo; argument means that only the raw values (degree 1) and the interaction (pairs of values multiplied with each other) are included, defaulting to <em>False<\/em>.<\/p>\n<p>The &ldquo;<em>include_bias<\/em>&rdquo; argument defaults to <em>True<\/em> to include the bias feature.<\/p>\n<p>We will take a closer look at how to use the polynomial feature transforms on a real dataset.<\/p>\n<p>First, let&rsquo;s introduce a real dataset.<\/p>\n<h2>Sonar Dataset<\/h2>\n<p>The sonar dataset is a standard machine learning dataset for binary classification.<\/p>\n<p>It involves 60 real-valued inputs and a two-class target variable. There are 208 examples in the dataset and the classes are reasonably balanced.<\/p>\n<p>A baseline classification algorithm can achieve a classification accuracy of about 53.4 percent using repeated stratified 10-fold cross-validation. <a href=\"https:\/\/machinelearningmastery.com\/results-for-standard-classification-and-regression-machine-learning-datasets\/\">Top performance<\/a> on this dataset is about 88 percent using repeated stratified 10-fold cross-validation.<\/p>\n<p>The dataset describes radar returns of rocks or simulated mines.<\/p>\n<p>You can learn more about the dataset from here:<\/p>\n<ul>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\">Sonar Dataset<\/a><\/li>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.names\">Sonar Dataset Description<\/a><\/li>\n<\/ul>\n<p>No need to download the dataset; we will download it automatically from our worked examples.<\/p>\n<p>First, let&rsquo;s load and summarize the dataset. The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># load and summarize the sonar dataset\r\nfrom pandas import read_csv\r\nfrom pandas.plotting import scatter_matrix\r\nfrom matplotlib import pyplot\r\n# Load dataset\r\nurl = \"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\"\r\ndataset = read_csv(url, header=None)\r\n# summarize the shape of the dataset\r\nprint(dataset.shape)\r\n# summarize each variable\r\nprint(dataset.describe())\r\n# histograms of the variables\r\ndataset.hist()\r\npyplot.show()<\/pre>\n<p>Running the example first summarizes the shape of the loaded dataset.<\/p>\n<p>This confirms the 60 input variables, one output variable, and 208 rows of data.<\/p>\n<p>A statistical summary of the input variables is provided showing that values are numeric and range approximately from 0 to 1.<\/p>\n<pre class=\"crayon-plain-tag\">(208, 61)\r\n               0           1           2   ...          57          58          59\r\ncount  208.000000  208.000000  208.000000  ...  208.000000  208.000000  208.000000\r\nmean     0.029164    0.038437    0.043832  ...    0.007949    0.007941    0.006507\r\nstd      0.022991    0.032960    0.038428  ...    0.006470    0.006181    0.005031\r\nmin      0.001500    0.000600    0.001500  ...    0.000300    0.000100    0.000600\r\n25%      0.013350    0.016450    0.018950  ...    0.003600    0.003675    0.003100\r\n50%      0.022800    0.030800    0.034300  ...    0.005800    0.006400    0.005300\r\n75%      0.035550    0.047950    0.057950  ...    0.010350    0.010325    0.008525\r\nmax      0.137100    0.233900    0.305900  ...    0.044000    0.036400    0.043900\r\n\r\n[8 rows x 60 columns]<\/pre>\n<p>Finally, a histogram is created for each input variable.<\/p>\n<p>If we ignore the clutter of the plots and focus on the histograms themselves, we can see that many variables have a skewed distribution.<\/p>\n<div id=\"attachment_10370\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10370\" class=\"size-full wp-image-10370\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Input-Variables-for-the-Sonar-Binary-Classification-Dataset-3.png\" alt=\"Histogram Plots of Input Variables for the Sonar Binary Classification Dataset\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Input-Variables-for-the-Sonar-Binary-Classification-Dataset-3.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Input-Variables-for-the-Sonar-Binary-Classification-Dataset-3-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Input-Variables-for-the-Sonar-Binary-Classification-Dataset-3-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Input-Variables-for-the-Sonar-Binary-Classification-Dataset-3-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10370\" class=\"wp-caption-text\">Histogram Plots of Input Variables for the Sonar Binary Classification Dataset<\/p>\n<\/div>\n<p>Next, let&rsquo;s fit and evaluate a machine learning model on the raw dataset.<\/p>\n<p>We will use a k-nearest neighbor algorithm with default hyperparameters and evaluate it using repeated stratified k-fold cross-validation. The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate knn on the raw sonar dataset\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom pandas import read_csv\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.neighbors import KNeighborsClassifier\r\nfrom sklearn.preprocessing import LabelEncoder\r\nfrom matplotlib import pyplot\r\n# load dataset\r\nurl = \"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\"\r\ndataset = read_csv(url, header=None)\r\ndata = dataset.values\r\n# separate into input and output columns\r\nX, y = data[:, :-1], data[:, -1]\r\n# ensure inputs are floats and output is an integer label\r\nX = X.astype('float32')\r\ny = LabelEncoder().fit_transform(y.astype('str'))\r\n# define and configure the model\r\nmodel = KNeighborsClassifier()\r\n# evaluate the model\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\nn_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\r\n# report model performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))<\/pre>\n<p>Running the example evaluates a KNN model on the raw sonar dataset.<\/p>\n<p>We can see that the model achieved a mean classification accuracy of about 79.7 percent, showing that it has skill (better than 53.4 percent) and is in the ball-park of good performance (88 percent).<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: 0.797 (0.073)<\/pre>\n<p>Next, let&rsquo;s explore a polynomial features transform of the dataset.<\/p>\n<h2>Polynomial Feature Transform Example<\/h2>\n<p>We can apply the polynomial features transform to the Sonar dataset directly.<\/p>\n<p>In this case, we will use a degree of 3.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# perform a polynomial features transform of the dataset\r\ntrans = PolynomialFeatures(degree=3)\r\ndata = trans.fit_transform(data)<\/pre>\n<p>Let&rsquo;s try it on our sonar dataset.<\/p>\n<p>The complete example of creating a polynomial features transform of the sonar dataset and summarizing the created features is below.<\/p>\n<pre class=\"crayon-plain-tag\"># visualize a polynomial features transform of the sonar dataset\r\nfrom pandas import read_csv\r\nfrom pandas import DataFrame\r\nfrom pandas.plotting import scatter_matrix\r\nfrom sklearn.preprocessing import PolynomialFeatures\r\nfrom matplotlib import pyplot\r\n# load dataset\r\nurl = \"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\"\r\ndataset = read_csv(url, header=None)\r\n# retrieve just the numeric input values\r\ndata = dataset.values[:, :-1]\r\n# perform a polynomial features transform of the dataset\r\ntrans = PolynomialFeatures(degree=3)\r\ndata = trans.fit_transform(data)\r\n# convert the array back to a dataframe\r\ndataset = DataFrame(data)\r\n# summarize\r\nprint(dataset.shape)<\/pre>\n<p>Running the example performs the polynomial features transform on the sonar dataset.<\/p>\n<p>We can see that our features increased from 61 (60 input features) for the raw dataset to 39,711 features (39,710 input features).<\/p>\n<pre class=\"crayon-plain-tag\">(208, 39711)<\/pre>\n<p>Next, let&rsquo;s evaluate the same KNN model as the previous section, but in this case on a polynomial features transform of the dataset.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate knn on the sonar dataset with polynomial features transform\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom pandas import read_csv\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.neighbors import KNeighborsClassifier\r\nfrom sklearn.preprocessing import LabelEncoder\r\nfrom sklearn.preprocessing import PolynomialFeatures\r\nfrom sklearn.pipeline import Pipeline\r\nfrom matplotlib import pyplot\r\n# load dataset\r\nurl = \"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\"\r\ndataset = read_csv(url, header=None)\r\ndata = dataset.values\r\n# separate into input and output columns\r\nX, y = data[:, :-1], data[:, -1]\r\n# ensure inputs are floats and output is an integer label\r\nX = X.astype('float32')\r\ny = LabelEncoder().fit_transform(y.astype('str'))\r\n# define the pipeline\r\ntrans = PolynomialFeatures(degree=3)\r\nmodel = KNeighborsClassifier()\r\npipeline = Pipeline(steps=[('t', trans), ('m', model)])\r\n# evaluate the pipeline\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\nn_scores = cross_val_score(pipeline, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\r\n# report pipeline performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))<\/pre>\n<p>Running the example, we can see that the polynomial features transform results in a lift in performance from 79.7 percent accuracy without the transform to about 80.0 percent with the transform.<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: 0.800 (0.077)<\/pre>\n<p>Next, let&rsquo;s explore the effect of different scaling ranges.<\/p>\n<h2>Effect of Polynomial Degree<\/h2>\n<p>The degree of the polynomial dramatically increases the number of input features.<\/p>\n<p>To get an idea of how much this impacts the number of features, we can perform the transform with a range of different degrees and compare the number of features in the dataset.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># compare the effect of the degree on the number of created features\r\nfrom pandas import read_csv\r\nfrom sklearn.preprocessing import LabelEncoder\r\nfrom sklearn.preprocessing import PolynomialFeatures\r\nfrom matplotlib import pyplot\r\n\r\n# get the dataset\r\ndef get_dataset():\r\n\t# load dataset\r\n\turl = \"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\"\r\n\tdataset = read_csv(url, header=None)\r\n\tdata = dataset.values\r\n\t# separate into input and output columns\r\n\tX, y = data[:, :-1], data[:, -1]\r\n\t# ensure inputs are floats and output is an integer label\r\n\tX = X.astype('float32')\r\n\ty = LabelEncoder().fit_transform(y.astype('str'))\r\n\treturn X, y\r\n\r\n# define dataset\r\nX, y = get_dataset()\r\n# calculate change in number of features\r\nnum_features = list()\r\ndegress = [i for i in range(1, 6)]\r\nfor d in degress:\r\n\t# create transform\r\n\ttrans = PolynomialFeatures(degree=d)\r\n\t# fit and transform\r\n\tdata = trans.fit_transform(X)\r\n\t# record number of features\r\n\tnum_features.append(data.shape[1])\r\n\t# summarize\r\n\tprint('Degree: %d, Features: %d' % (d, data.shape[1]))\r\n# plot degree vs number of features\r\npyplot.plot(degress, num_features)\r\npyplot.show()<\/pre>\n<p>Running the example first reports the degree from 1 to 5 and the number of features in the dataset.<\/p>\n<p>We can see that a degree of 1 has no effect and that the number of features dramatically increases from 2 through to 5.<\/p>\n<p>This highlights that for anything other than very small datasets, a degree of 2 or 3 should be used to avoid a dramatic increase in input variables.<\/p>\n<pre class=\"crayon-plain-tag\">Degree: 1, Features: 61\r\nDegree: 2, Features: 1891\r\nDegree: 3, Features: 39711\r\nDegree: 4, Features: 635376\r\nDegree: 5, Features: 8259888<\/pre>\n<\/p>\n<div id=\"attachment_10371\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10371\" class=\"size-full wp-image-10371\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/02\/Line-Plot-of-the-Degree-vs-the-Number-of-Input-Features-for-the-Polynomial-Feature-Transform.png\" alt=\"Line Plot of the Degree vs. the Number of Input Features for the Polynomial Feature Transform\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Line-Plot-of-the-Degree-vs-the-Number-of-Input-Features-for-the-Polynomial-Feature-Transform.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Line-Plot-of-the-Degree-vs-the-Number-of-Input-Features-for-the-Polynomial-Feature-Transform-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Line-Plot-of-the-Degree-vs-the-Number-of-Input-Features-for-the-Polynomial-Feature-Transform-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Line-Plot-of-the-Degree-vs-the-Number-of-Input-Features-for-the-Polynomial-Feature-Transform-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10371\" class=\"wp-caption-text\">Line Plot of the Degree vs. the Number of Input Features for the Polynomial Feature Transform<\/p>\n<\/div>\n<p>More features may result in more overfitting, and in turn, worse results.<\/p>\n<p>It may be a good idea to treat the degree for the polynomial features transform as a hyperparameter and test different values for your dataset.<\/p>\n<p>The example below explores degree values from 1 to 4 and evaluates their effect on classification accuracy with the chosen model.<\/p>\n<pre class=\"crayon-plain-tag\"># explore the effect of degree on accuracy for the polynomial features transform\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom pandas import read_csv\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.neighbors import KNeighborsClassifier\r\nfrom sklearn.preprocessing import PolynomialFeatures\r\nfrom sklearn.preprocessing import LabelEncoder\r\nfrom sklearn.pipeline import Pipeline\r\nfrom matplotlib import pyplot\r\n\r\n# get the dataset\r\ndef get_dataset():\r\n\t# load dataset\r\n\turl = \"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\"\r\n\tdataset = read_csv(url, header=None)\r\n\tdata = dataset.values\r\n\t# separate into input and output columns\r\n\tX, y = data[:, :-1], data[:, -1]\r\n\t# ensure inputs are floats and output is an integer label\r\n\tX = X.astype('float32')\r\n\ty = LabelEncoder().fit_transform(y.astype('str'))\r\n\treturn X, y\r\n\r\n# get a list of models to evaluate\r\ndef get_models():\r\n\tmodels = dict()\r\n\tfor d in range(1,5):\r\n\t\t# define the pipeline\r\n\t\ttrans = PolynomialFeatures(degree=d)\r\n\t\tmodel = KNeighborsClassifier()\r\n\t\tmodels[str(d)] = Pipeline(steps=[('t', trans), ('m', model)])\r\n\treturn models\r\n\r\n# evaluate a give model using cross-validation\r\ndef evaluate_model(model):\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\tscores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\r\n\treturn scores\r\n\r\n# define dataset\r\nX, y = get_dataset()\r\n# get the models to evaluate\r\nmodels = get_models()\r\n# evaluate the models and store results\r\nresults, names = list(), list()\r\nfor name, model in models.items():\r\n\tscores = evaluate_model(model)\r\n\tresults.append(scores)\r\n\tnames.append(name)\r\n\tprint('&gt;%s %.3f (%.3f)' % (name, mean(scores), std(scores)))\r\n# plot model performance for comparison\r\npyplot.boxplot(results, labels=names, showmeans=True)\r\npyplot.show()<\/pre>\n<p>Running the example reports the mean classification accuracy for each polynomial degree.<\/p>\n<p>In this case, we can see that performance is generally worse than no transform (degree 1) except for a degree 3.<\/p>\n<p>It might be interesting to explore scaling the data before or after performing the transform to see how it impacts model performance.<\/p>\n<pre class=\"crayon-plain-tag\">&gt;1 0.797 (0.073)\r\n&gt;2 0.793 (0.085)\r\n&gt;3 0.800 (0.077)\r\n&gt;4 0.795 (0.079)<\/pre>\n<p>Box and whisker plots are created to summarize the classification accuracy scores for each polynomial degree.<\/p>\n<p>We can see that performance remains flat, perhaps with the first signs of overfitting with a degree of 4.<\/p>\n<div id=\"attachment_10372\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10372\" class=\"size-full wp-image-10372\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/02\/Box-Plots-of-Degree-for-the-Polynomial-Feature-Transform-vs-Classification-Accuracy-of-KNN-on-the-Sonar-Dataset.png\" alt=\"Box Plots of Degree for the Polynomial Feature Transform vs. Classification Accuracy of KNN on the Sonar Dataset\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plots-of-Degree-for-the-Polynomial-Feature-Transform-vs-Classification-Accuracy-of-KNN-on-the-Sonar-Dataset.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plots-of-Degree-for-the-Polynomial-Feature-Transform-vs-Classification-Accuracy-of-KNN-on-the-Sonar-Dataset-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plots-of-Degree-for-the-Polynomial-Feature-Transform-vs-Classification-Accuracy-of-KNN-on-the-Sonar-Dataset-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Box-Plots-of-Degree-for-the-Polynomial-Feature-Transform-vs-Classification-Accuracy-of-KNN-on-the-Sonar-Dataset-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10372\" class=\"wp-caption-text\">Box Plots of Degree for the Polynomial Feature Transform vs. Classification Accuracy of KNN on the Sonar Dataset<\/p>\n<\/div>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Books<\/h3>\n<ul>\n<li><a href=\"https:\/\/amzn.to\/2SfkCXh\">An Introduction to Statistical Learning with Applications in R<\/a>, 2014.<\/li>\n<li><a href=\"https:\/\/amzn.to\/2Yvcupn\">Feature Engineering and Selection<\/a>, 2019.<\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.PolynomialFeatures.html\">sklearn.preprocessing.PolynomialFeatures API<\/a>.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Polynomial\">Polynomial, Wikipedia<\/a>.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Polynomial_regression\">Polynomial regression, Wikipedia<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to use polynomial feature transforms for feature engineering with numerical input variables.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Some machine learning algorithms prefer or perform better with polynomial input features.<\/li>\n<li>How to use the polynomial features transform to create new versions of input variables for predictive modeling.<\/li>\n<li>How the degree of the polynomial impacts the number of input features created by the transform.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/polynomial-features-transforms-for-machine-learning\/\">How to Use Polynomial Feature Transforms for Machine Learning<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/polynomial-features-transforms-for-machine-learning\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Often, the input features for a predictive modeling task interact in unexpected and often nonlinear ways. These interactions can be identified and [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2020\/05\/28\/how-to-use-polynomial-feature-transforms-for-machine-learning\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":3507,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3506"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=3506"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3506\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/3507"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=3506"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=3506"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=3506"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}