{"id":3463,"date":"2020-05-17T19:00:27","date_gmt":"2020-05-17T19:00:27","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2020\/05\/17\/how-to-use-power-transforms-for-machine-learning\/"},"modified":"2020-05-17T19:00:27","modified_gmt":"2020-05-17T19:00:27","slug":"how-to-use-power-transforms-for-machine-learning","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2020\/05\/17\/how-to-use-power-transforms-for-machine-learning\/","title":{"rendered":"How to Use Power Transforms for Machine Learning"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Machine learning algorithms like <a href=\"https:\/\/machinelearningmastery.com\/linear-regression-with-maximum-likelihood-estimation\/\">Linear Regression<\/a> and <a href=\"https:\/\/machinelearningmastery.com\/classification-as-conditional-probability-and-the-naive-bayes-algorithm\/\">Gaussian Naive Bayes<\/a> assume the numerical variables have a Gaussian probability distribution.<\/p>\n<p>Your data may not have a <a href=\"https:\/\/machinelearningmastery.com\/continuous-probability-distributions-for-machine-learning\/\">Gaussian distribution<\/a> and instead may have a Gaussian-like distribution (e.g. nearly Gaussian but with outliers or a skew) or a totally different distribution (e.g. exponential).<\/p>\n<p>As such, you may be able to achieve better performance on a wide range of machine learning algorithms by transforming input and\/or output variables to have a Gaussian or more-Gaussian distribution. Power transforms like the Box-Cox transform and the Yeo-Johnson transform provide an automatic way of performing these transforms on your data and are provided in the scikit-learn Python machine learning library.<\/p>\n<p>In this tutorial, you will discover how to use power transforms in scikit-learn to make variables more Gaussian for modeling.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Many machine learning algorithms prefer or perform better when numerical variables have a Gaussian probability distribution.<\/li>\n<li>Power transforms are a technique for transforming numerical input or output variables to have a Gaussian or more-Gaussian-like probability distribution.<\/li>\n<li>How to use the PowerTransform in scikit-learn to use the Box-Cox and Yeo-Johnson transforms when preparing data for predictive modeling.<\/li>\n<\/ul>\n<p>Let&rsquo;s get started.<\/p>\n<div id=\"attachment_10324\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10324\" class=\"size-full wp-image-10324\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/05\/How-to-Use-Power-Transforms-With-scikit-learn.jpg\" alt=\"How to Use Power Transforms With scikit-learn\" width=\"800\" height=\"451\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/How-to-Use-Power-Transforms-With-scikit-learn.jpg 800w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/How-to-Use-Power-Transforms-With-scikit-learn-300x169.jpg 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/How-to-Use-Power-Transforms-With-scikit-learn-768x433.jpg 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"><\/p>\n<p id=\"caption-attachment-10324\" class=\"wp-caption-text\">How to Use Power Transforms With scikit-learn<br \/>Photo by <a href=\"https:\/\/flickr.com\/photos\/ian-arlett\/30436658200\/\">Ian D. Keating<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into five parts; they are:<\/p>\n<ol>\n<li>Make Data More Gaussian<\/li>\n<li>Power Transforms<\/li>\n<li>Sonar Dataset<\/li>\n<li>Box-Cox Transform<\/li>\n<li>Yeo-Johnson Transform<\/li>\n<\/ol>\n<h2>Make Data More Gaussian<\/h2>\n<p>Many machine learning algorithms perform better when the distribution of variables is Gaussian.<\/p>\n<p>Recall that the observations for each variable may be thought to be drawn from a probability distribution. The Gaussian is a common distribution with the familiar bell shape. It is so common that it is often referred to as the &ldquo;<em>normal<\/em>&rdquo; distribution.<\/p>\n<p>For more on the Gaussian probability distribution, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/continuous-probability-distributions-for-machine-learning\/\">Continuous Probability Distributions for Machine Learning<\/a><\/li>\n<\/ul>\n<p>Some algorithms like linear regression and logistic regression explicitly assume the real-valued variables have a Gaussian distribution. Other nonlinear algorithms may not have this assumption, yet often perform better when variables have a Gaussian distribution.<\/p>\n<p>This applies both to real-valued input variables in the case of classification and regression tasks, and real-valued target variables in the case of regression tasks.<\/p>\n<p>There are data preparation techniques that can be used to transform each variable to make the distribution Gaussian, or if not Gaussian, then more Gaussian like.<\/p>\n<p>These transforms are most effective when the data distribution is nearly-Gaussian to begin with and is afflicted with a skew or outliers.<\/p>\n<blockquote>\n<p>Another common reason for transformations is to remove distributional skewness. An un-skewed distribution is one that is roughly symmetric. This means that the probability of falling on either side of the distribution&rsquo;s mean is roughly equal<\/p>\n<\/blockquote>\n<p>&mdash; Page 31, <a href=\"https:\/\/amzn.to\/3b2LHTL\">Applied Predictive Modeling<\/a>, 2013.<\/p>\n<p>Power transforms refer to a class of techniques that use a power function (like a logarithm or exponent) to make the probability distribution of a variable Gaussian or more-Gaussian like.<\/p>\n<p>For more on the topic of making variables Gaussian, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/how-to-transform-data-to-fit-the-normal-distribution\/\">How to Transform Data to Better Fit the Normal Distribution<\/a><\/li>\n<\/ul>\n<h2>Power Transforms<\/h2>\n<p>A <a href=\"https:\/\/en.wikipedia.org\/wiki\/Power_transform\">power transform<\/a> will make the probability distribution of a variable more Gaussian.<\/p>\n<p>This is often described as removing a skew in the distribution, although more generally is described as&nbsp; stabilizing the variance of the distribution.<\/p>\n<blockquote>\n<p>The log transform is a specific example of a family of transformations known as power transforms. In statistical terms, these are variance-stabilizing transformations.<\/p>\n<\/blockquote>\n<p>&mdash; Page 23, <a href=\"https:\/\/amzn.to\/2zZOQXN\">Feature Engineering for Machine Learning<\/a>, 2018.<\/p>\n<p>We can apply a power transform directly by calculating the log or square root of the variable, although this may or may not be the best power transform for a given variable.<\/p>\n<blockquote>\n<p>Replacing the data with the log, square root, or inverse may help to remove the skew.<\/p>\n<\/blockquote>\n<p>&mdash; Page 31, <a href=\"https:\/\/amzn.to\/3b2LHTL\">Applied Predictive Modeling<\/a>, 2013.<\/p>\n<p>Instead, we can use a generalized version of the transform that finds a parameter (<em>lambda<\/em>) that best transforms a variable to a Gaussian probability distribution.<\/p>\n<p>There are two popular approaches for such automatic power transforms; they are:<\/p>\n<ul>\n<li>Box-Cox Transform<\/li>\n<li>Yeo-Johnson Transform<\/li>\n<\/ul>\n<p>The transformed training dataset can then be fed to a machine learning model to learn a predictive modeling task.<\/p>\n<p>A hyperparameter, often referred to as lambda&nbsp; is used to control the nature of the transform.<\/p>\n<blockquote>\n<p>&hellip; statistical methods can be used to empirically identify an appropriate transformation. Box and Cox (1964) propose a family of transformations that are indexed by a parameter, denoted as lambda<\/p>\n<\/blockquote>\n<p>&mdash; Page 32, <a href=\"https:\/\/amzn.to\/3b2LHTL\">Applied Predictive Modeling<\/a>, 2013.<\/p>\n<p>Below are some common values for lambda<\/p>\n<ul>\n<li><em>lambda<\/em> = -1. is a reciprocal transform.<\/li>\n<li><em>lambda<\/em> = -0.5 is a reciprocal square root transform.<\/li>\n<li><em>lambda<\/em> = 0.0 is a log transform.<\/li>\n<li><em>lambda<\/em> = 0.5 is a square root transform.<\/li>\n<li><em>lambda<\/em> = 1.0 is no transform.<\/li>\n<\/ul>\n<p>The optimal value for this hyperparameter used in the transform for each variable can be stored and reused to transform new data in the future in an identical manner, such as a test dataset or new data in the future.<\/p>\n<p>These power transforms are available in the scikit-learn Python machine learning library via the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.PowerTransformer.html\">PowerTransformer class<\/a>.<\/p>\n<p>The class takes an argument named &ldquo;<em>method<\/em>&rdquo; that can be set to &lsquo;<em>yeo-johnson<\/em>&lsquo; or &lsquo;<em>box-cox<\/em>&lsquo; for the preferred method. It will also standardize the data automatically after the transform, meaning each variable will have a zero mean and unit variance. This can be turned off by setting the &ldquo;<em>standardize<\/em>&rdquo; argument to <em>False<\/em>.<\/p>\n<p>We can demonstrate the <em>PowerTransformer<\/em> with a small worked example. We can generate a sample of <a href=\"https:\/\/machinelearningmastery.com\/how-to-generate-random-numbers-in-python\/\">random Gaussian numbers<\/a> and impose a skew on the distribution by calculating the exponent. The PowerTransformer can then be used to automatically remove the skew from the data.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># demonstration of the power transform on data with a skew\r\nfrom numpy import exp\r\nfrom numpy.random import randn\r\nfrom sklearn.preprocessing import PowerTransformer\r\nfrom matplotlib import pyplot\r\n# generate gaussian data sample\r\ndata = randn(1000)\r\n# add a skew to the data distribution\r\ndata = exp(data)\r\n# histogram of the raw data with a skew\r\npyplot.hist(data, bins=25)\r\npyplot.show()\r\n# reshape data to have rows and columns\r\ndata = data.reshape((len(data),1))\r\n# power transform the raw data\r\npower = PowerTransformer(method='yeo-johnson', standardize=True)\r\ndata_trans = power.fit_transform(data)\r\n# histogram of the transformed data\r\npyplot.hist(data_trans, bins=25)\r\npyplot.show()<\/pre>\n<p>Running the example first creates a sample of 1,000 random Gaussian values and adds a skew to the dataset.<\/p>\n<p>A histogram is created from the skewed dataset and clearly shows the distribution pushed to the far left.<\/p>\n<div id=\"attachment_10925\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10925\" class=\"size-full wp-image-10925\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/05\/Histogram-of-Skewed-Gaussian-Distribution.png\" alt=\"Histogram of Skewed Gaussian Distribution\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Histogram-of-Skewed-Gaussian-Distribution.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Histogram-of-Skewed-Gaussian-Distribution-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Histogram-of-Skewed-Gaussian-Distribution-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Histogram-of-Skewed-Gaussian-Distribution-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10925\" class=\"wp-caption-text\">Histogram of Skewed Gaussian Distribution<\/p>\n<\/div>\n<p>Then a <em>PowerTransformer<\/em> is used to make the data distribution more-Gaussian and standardize the result, centering the values on the mean value of 0 and a standard deviation of 1.0.<\/p>\n<p>A histogram of the transform data is created showing a more-Gaussian shaped data distribution.<\/p>\n<div id=\"attachment_10926\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10926\" class=\"size-full wp-image-10926\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/05\/Histogram-of-Skewed-Gaussian-Data-After-Power-Transform.png\" alt=\"Histogram of Skewed Gaussian Data After Power Transform\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Histogram-of-Skewed-Gaussian-Data-After-Power-Transform.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Histogram-of-Skewed-Gaussian-Data-After-Power-Transform-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Histogram-of-Skewed-Gaussian-Data-After-Power-Transform-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Histogram-of-Skewed-Gaussian-Data-After-Power-Transform-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10926\" class=\"wp-caption-text\">Histogram of Skewed Gaussian Data After Power Transform<\/p>\n<\/div>\n<p>In the following sections will take a closer look at how to use these two power transforms on a real dataset.<\/p>\n<p>Next, let&rsquo;s introduce the dataset.<\/p>\n<h2>Sonar Dataset<\/h2>\n<p>The sonar dataset is a standard machine learning dataset for binary classification.<\/p>\n<p>It involves 60 real-valued inputs and a 2-class target variable. There are 208 examples in the dataset and the classes are reasonably balanced.<\/p>\n<p>A baseline classification algorithm can achieve a classification accuracy of about 53.4 percent using repeated stratified 10-fold cross-validation. <a href=\"https:\/\/machinelearningmastery.com\/results-for-standard-classification-and-regression-machine-learning-datasets\/\">Top performance<\/a> on this dataset is about 88 percent using repeated stratified 10-fold cross-validation.<\/p>\n<p>The dataset describes radar returns of rocks or simulated mines.<\/p>\n<p>You can learn more about the dataset from here:<\/p>\n<ul>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\">Sonar Dataset<\/a><\/li>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.names\">Sonar Dataset Description<\/a><\/li>\n<\/ul>\n<p>No need to download the dataset; we will download it automatically from our worked examples.<\/p>\n<p>First, let&rsquo;s load and summarize the dataset. The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># load and summarize the sonar dataset\r\nfrom pandas import read_csv\r\nfrom pandas.plotting import scatter_matrix\r\nfrom matplotlib import pyplot\r\n# Load dataset\r\nurl = \"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\"\r\ndataset = read_csv(url, header=None)\r\n# summarize the shape of the dataset\r\nprint(dataset.shape)\r\n# summarize each variable\r\nprint(dataset.describe())\r\n# histograms of the variables\r\ndataset.hist()\r\npyplot.show()<\/pre>\n<p>Running the example first summarizes the shape of the loaded dataset.<\/p>\n<p>This confirms the 60 input variables, one output variable, and 208 rows of data.<\/p>\n<p>A statistical summary of the input variables is provided showing that values are numeric and range approximately from 0 to 1.<\/p>\n<pre class=\"crayon-plain-tag\">(208, 61)\r\n               0           1           2   ...          57          58          59\r\ncount  208.000000  208.000000  208.000000  ...  208.000000  208.000000  208.000000\r\nmean     0.029164    0.038437    0.043832  ...    0.007949    0.007941    0.006507\r\nstd      0.022991    0.032960    0.038428  ...    0.006470    0.006181    0.005031\r\nmin      0.001500    0.000600    0.001500  ...    0.000300    0.000100    0.000600\r\n25%      0.013350    0.016450    0.018950  ...    0.003600    0.003675    0.003100\r\n50%      0.022800    0.030800    0.034300  ...    0.005800    0.006400    0.005300\r\n75%      0.035550    0.047950    0.057950  ...    0.010350    0.010325    0.008525\r\nmax      0.137100    0.233900    0.305900  ...    0.044000    0.036400    0.043900\r\n\r\n[8 rows x 60 columns]<\/pre>\n<p>Finally, a histogram is created for each input variable.<\/p>\n<p>If we ignore the clutter of the plots and focus on the histograms themselves, we can see that many variables have a skewed distribution.<\/p>\n<p>The dataset provides a good candidate for using a power transform to make the variables more-Gaussian.<\/p>\n<div id=\"attachment_10321\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10321\" class=\"size-full wp-image-10321\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Input-Variables-for-the-Sonar-Binary-Classification-Dataset.png\" alt=\"Histogram Plots of Input Variables for the Sonar Binary Classification Dataset\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Input-Variables-for-the-Sonar-Binary-Classification-Dataset.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Input-Variables-for-the-Sonar-Binary-Classification-Dataset-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Input-Variables-for-the-Sonar-Binary-Classification-Dataset-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Input-Variables-for-the-Sonar-Binary-Classification-Dataset-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10321\" class=\"wp-caption-text\">Histogram Plots of Input Variables for the Sonar Binary Classification Dataset<\/p>\n<\/div>\n<p>Next, let&rsquo;s fit and evaluate a machine learning model on the raw dataset.<\/p>\n<p>We will use a <a href=\"https:\/\/machinelearningmastery.com\/tutorial-to-implement-k-nearest-neighbors-in-python-from-scratch\/\">k-nearest neighbor algorithm<\/a> with default hyperparameters and evaluate it using repeated stratified k-fold cross-validation. The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate knn on the raw sonar dataset\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom pandas import read_csv\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.neighbors import KNeighborsClassifier\r\nfrom sklearn.preprocessing import LabelEncoder\r\nfrom matplotlib import pyplot\r\n# load dataset\r\nurl = \"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\"\r\ndataset = read_csv(url, header=None)\r\ndata = dataset.values\r\n# separate into input and output columns\r\nX, y = data[:, :-1], data[:, -1]\r\n# ensure inputs are floats and output is an integer label\r\nX = X.astype('float32')\r\ny = LabelEncoder().fit_transform(y.astype('str'))\r\n# define and configure the model\r\nmodel = KNeighborsClassifier()\r\n# evaluate the model\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\nn_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\r\n# report model performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))<\/pre>\n<p>Running the example evaluates a KNN model on the raw sonar dataset.<\/p>\n<p>We can see that the model achieved a mean classification accuracy of about 79.7 percent, showing that it has skill (better than 53.4 percent) and is in the ball-park of good performance (88 percent).<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: 0.797 (0.073)<\/pre>\n<p>Next, let&rsquo;s explore a Box-Cox power transform of the dataset.<\/p>\n<h2>Box-Cox Transform<\/h2>\n<p>The Box-Cox transform is named for the two authors of the method.<\/p>\n<p>It is a power transform that assumes the values of the input variable to which it is applied are <strong>strictly positive<\/strong>. That means 0 and negative values are not supported.<\/p>\n<blockquote>\n<p>It is important to note that the Box-Cox procedure can only be applied to data that is strictly positive.<\/p>\n<\/blockquote>\n<p>&mdash; Page 123, <a href=\"https:\/\/amzn.to\/2Yvcupn\">Feature Engineering and Selection<\/a>, 2019.<\/p>\n<p>We can apply the Box-Cox transform using the <em>PowerTransformer<\/em> class and setting the &ldquo;<em>method<\/em>&rdquo; argument to &ldquo;<em>box-cox<\/em>&ldquo;. Once defined, we can call the <em>fit_transform()<\/em> function and pass it to our dataset to create a Box-Cox transformed version of our dataset.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\npt = PowerTransformer(method='box-cox')\r\ndata = pt.fit_transform(data)<\/pre>\n<p>Our dataset does not have negative values but may have zero values. This may cause a problem.<\/p>\n<p>Let&rsquo;s try anyway.<\/p>\n<p>The complete example of creating a Box-Cox transform of the sonar dataset and plotting histograms of the result is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># visualize a box-cox transform of the sonar dataset\r\nfrom pandas import read_csv\r\nfrom pandas import DataFrame\r\nfrom pandas.plotting import scatter_matrix\r\nfrom sklearn.preprocessing import PowerTransformer\r\nfrom matplotlib import pyplot\r\n# Load dataset\r\nurl = \"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\"\r\ndataset = read_csv(url, header=None)\r\n# retrieve just the numeric input values\r\ndata = dataset.values[:, :-1]\r\n# perform a box-cox transform of the dataset\r\npt = PowerTransformer(method='box-cox')\r\ndata = pt.fit_transform(data)\r\n# convert the array back to a dataframe\r\ndataset = DataFrame(data)\r\n# histograms of the variables\r\ndataset.hist()\r\npyplot.show()<\/pre>\n<p>Running the example results in an error as follows:<\/p>\n<pre class=\"crayon-plain-tag\">ValueError: The Box-Cox transformation can only be applied to strictly positive data<\/pre>\n<p>As expected, we cannot use the transform on the raw data because it is <strong>not strictly positive<\/strong>.<\/p>\n<p>One way to solve this problem is to use a <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.MinMaxScaler.html\">MixMaxScaler transform<\/a> first to scale the data to positive values, then apply the transform.<\/p>\n<p>We can use a <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.pipeline.Pipeline.html\">Pipeline object<\/a> to apply both transforms in sequence; for example:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# perform a box-cox transform of the dataset\r\nscaler = MinMaxScaler(feature_range=(1, 2))\r\npower = PowerTransformer(method='box-cox')\r\npipeline = Pipeline(steps=[('s', scaler),('p', power)])\r\ndata = pipeline.fit_transform(data)<\/pre>\n<p>The updated version of applying the Box-Cox transform to the scaled dataset is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># visualize a box-cox transform of the scaled sonar dataset\r\nfrom pandas import read_csv\r\nfrom pandas import DataFrame\r\nfrom pandas.plotting import scatter_matrix\r\nfrom sklearn.preprocessing import PowerTransformer\r\nfrom sklearn.preprocessing import MinMaxScaler\r\nfrom sklearn.pipeline import Pipeline\r\nfrom matplotlib import pyplot\r\n# Load dataset\r\nurl = \"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\"\r\ndataset = read_csv(url, header=None)\r\n# retrieve just the numeric input values\r\ndata = dataset.values[:, :-1]\r\n# perform a box-cox transform of the dataset\r\nscaler = MinMaxScaler(feature_range=(1, 2))\r\npower = PowerTransformer(method='box-cox')\r\npipeline = Pipeline(steps=[('s', scaler),('p', power)])\r\ndata = pipeline.fit_transform(data)\r\n# convert the array back to a dataframe\r\ndataset = DataFrame(data)\r\n# histograms of the variables\r\ndataset.hist()\r\npyplot.show()<\/pre>\n<p>Running the example transforms the dataset and plots histograms of each input variable.<\/p>\n<p>We can see that the shape of the histograms for each variable looks more Gaussian than the raw data.<\/p>\n<div id=\"attachment_10322\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10322\" class=\"size-full wp-image-10322\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Box-Cox-Transformed-Input-Variables-for-the-Sonar-Dataset.png\" alt=\"Histogram Plots of Box-Cox Transformed Input Variables for the Sonar Dataset\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Box-Cox-Transformed-Input-Variables-for-the-Sonar-Dataset.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Box-Cox-Transformed-Input-Variables-for-the-Sonar-Dataset-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Box-Cox-Transformed-Input-Variables-for-the-Sonar-Dataset-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Box-Cox-Transformed-Input-Variables-for-the-Sonar-Dataset-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10322\" class=\"wp-caption-text\">Histogram Plots of Box-Cox Transformed Input Variables for the Sonar Dataset<\/p>\n<\/div>\n<p>Next, let&rsquo;s evaluate the same KNN model as the previous section, but in this case on a Box-Cox transform of the scaled dataset.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate knn on the box-cox sonar dataset\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom pandas import read_csv\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.neighbors import KNeighborsClassifier\r\nfrom sklearn.preprocessing import LabelEncoder\r\nfrom sklearn.preprocessing import PowerTransformer\r\nfrom sklearn.preprocessing import MinMaxScaler\r\nfrom sklearn.pipeline import Pipeline\r\nfrom matplotlib import pyplot\r\n# load dataset\r\nurl = \"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\"\r\ndataset = read_csv(url, header=None)\r\ndata = dataset.values\r\n# separate into input and output columns\r\nX, y = data[:, :-1], data[:, -1]\r\n# ensure inputs are floats and output is an integer label\r\nX = X.astype('float32')\r\ny = LabelEncoder().fit_transform(y.astype('str'))\r\n# define the pipeline\r\nscaler = MinMaxScaler(feature_range=(1, 2))\r\npower = PowerTransformer(method='box-cox')\r\nmodel = KNeighborsClassifier()\r\npipeline = Pipeline(steps=[('s', scaler),('p', power), ('m', model)])\r\n# evaluate the pipeline\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\nn_scores = cross_val_score(pipeline, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\r\n# report pipeline performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))<\/pre>\n<p>Running the example, we can see that the Box-Cox transform results in a lift in performance from 79.7 percent accuracy without the transform to about 81.1 percent with the transform.<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: 0.811 (0.085)<\/pre>\n<p>Next, let&rsquo;s take a closer look at the Yeo-Johnson transform.<\/p>\n<h2>Yeo-Johnson Transform<\/h2>\n<p>The Yeo-Johnson transform is also named for the authors.<\/p>\n<p>Unlike the Box-Cox transform, it does not require the values for each input variable to be strictly positive. It supports zero values and negative values. This means we can apply it to our dataset without scaling it first.<\/p>\n<p>We can apply the transform by defining a <em>PowerTransform<\/em> object and setting the &ldquo;<em>method<\/em>&rdquo; argument to &ldquo;<em>yeo-johnson<\/em>&rdquo; (the default).<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# perform a yeo-johnson transform of the dataset\r\npt = PowerTransformer(method='yeo-johnson')\r\ndata = pt.fit_transform(data)<\/pre>\n<p>The example below applies the Yeo-Johnson transform and creates histogram plots of each of the transformed variables.<\/p>\n<pre class=\"crayon-plain-tag\"># visualize a yeo-johnson transform of the sonar dataset\r\nfrom pandas import read_csv\r\nfrom pandas import DataFrame\r\nfrom pandas.plotting import scatter_matrix\r\nfrom sklearn.preprocessing import PowerTransformer\r\nfrom matplotlib import pyplot\r\n# Load dataset\r\nurl = \"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\"\r\ndataset = read_csv(url, header=None)\r\n# retrieve just the numeric input values\r\ndata = dataset.values[:, :-1]\r\n# perform a yeo-johnson transform of the dataset\r\npt = PowerTransformer(method='yeo-johnson')\r\ndata = pt.fit_transform(data)\r\n# convert the array back to a dataframe\r\ndataset = DataFrame(data)\r\n# histograms of the variables\r\ndataset.hist()\r\npyplot.show()<\/pre>\n<p>Running the example transforms the dataset and plots histograms of each input variable.<\/p>\n<p>We can see that the shape of the histograms for each variable look more Gaussian than the raw data, much like the box-cox transform.<\/p>\n<div id=\"attachment_10323\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10323\" class=\"size-full wp-image-10323\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Yeo-Johnson-Transformed-Input-Variables-for-the-Sonar-Dataset.png\" alt=\"Histogram Plots of Yeo-Johnson Transformed Input Variables for the Sonar Dataset\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Yeo-Johnson-Transformed-Input-Variables-for-the-Sonar-Dataset.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Yeo-Johnson-Transformed-Input-Variables-for-the-Sonar-Dataset-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Yeo-Johnson-Transformed-Input-Variables-for-the-Sonar-Dataset-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/02\/Histogram-Plots-of-Yeo-Johnson-Transformed-Input-Variables-for-the-Sonar-Dataset-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10323\" class=\"wp-caption-text\">Histogram Plots of Yeo-Johnson Transformed Input Variables for the Sonar Dataset<\/p>\n<\/div>\n<p>Next, let&rsquo;s evaluate the same KNN model as the previous section, but in this case on a Yeo-Johnson transform of the raw dataset.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate knn on the yeo-johnson sonar dataset\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom pandas import read_csv\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.neighbors import KNeighborsClassifier\r\nfrom sklearn.preprocessing import LabelEncoder\r\nfrom sklearn.preprocessing import PowerTransformer\r\nfrom sklearn.preprocessing import MinMaxScaler\r\nfrom sklearn.pipeline import Pipeline\r\nfrom matplotlib import pyplot\r\n# load dataset\r\nurl = \"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\"\r\ndataset = read_csv(url, header=None)\r\ndata = dataset.values\r\n# separate into input and output columns\r\nX, y = data[:, :-1], data[:, -1]\r\n# ensure inputs are floats and output is an integer label\r\nX = X.astype('float32')\r\ny = LabelEncoder().fit_transform(y.astype('str'))\r\n# define the pipeline\r\npower = PowerTransformer(method='yeo-johnson')\r\nmodel = KNeighborsClassifier()\r\npipeline = Pipeline(steps=[('p', power), ('m', model)])\r\n# evaluate the pipeline\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\nn_scores = cross_val_score(pipeline, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\r\n# report pipeline performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))<\/pre>\n<p>Running the example, we can see that the Yeo-Johnson transform results in a lift in performance from 79.7 percent accuracy without the transform to about 80.8 percent with the transform, less than the Box-Cox transform that achieved about 81.1 percent.<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: 0.808 (0.082)<\/pre>\n<p>Sometimes a lift in performance can be achieved by first standardizing the raw dataset prior to performing a Yeo-Johnson transform.<\/p>\n<p>We can explore this by adding a <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.StandardScaler.html\">StandardScaler<\/a> as a first step in the pipeline.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate knn on the yeo-johnson standardized sonar dataset\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom pandas import read_csv\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.neighbors import KNeighborsClassifier\r\nfrom sklearn.preprocessing import LabelEncoder\r\nfrom sklearn.preprocessing import PowerTransformer\r\nfrom sklearn.preprocessing import StandardScaler\r\nfrom sklearn.pipeline import Pipeline\r\nfrom matplotlib import pyplot\r\n# load dataset\r\nurl = \"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\"\r\ndataset = read_csv(url, header=None)\r\ndata = dataset.values\r\n# separate into input and output columns\r\nX, y = data[:, :-1], data[:, -1]\r\n# ensure inputs are floats and output is an integer label\r\nX = X.astype('float32')\r\ny = LabelEncoder().fit_transform(y.astype('str'))\r\n# define the pipeline\r\nscaler = StandardScaler()\r\npower = PowerTransformer(method='yeo-johnson')\r\nmodel = KNeighborsClassifier()\r\npipeline = Pipeline(steps=[('s', scaler), ('p', power), ('m', model)])\r\n# evaluate the pipeline\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\nn_scores = cross_val_score(pipeline, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')\r\n# report pipeline performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))<\/pre>\n<p>Running the example, we can see that standardizing the data prior to the Yeo-Johnson transform resulted in a small lift in performance from about 80.8 percent to about 81.6 percent, a small lift over the results for the Box-Cox transform.<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: 0.816 (0.077)<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Tutorials<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/continuous-probability-distributions-for-machine-learning\/\">Continuous Probability Distributions for Machine Learning<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/power-transform-time-series-forecast-data-python\/\">How to Use Power Transforms for Time Series Forecast Data with Python<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/how-to-transform-target-variables-for-regression-with-scikit-learn\/\">How to Transform Target Variables for Regression With Scikit-Learn<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/how-to-transform-data-to-fit-the-normal-distribution\/\">How to Transform Data to Better Fit The Normal Distribution<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/machine-learning-data-transforms-for-time-series-forecasting\/\">4 Common Machine Learning Data Transforms for Time Series Forecasting<\/a><\/li>\n<\/ul>\n<h3>Books<\/h3>\n<ul>\n<li><a href=\"https:\/\/amzn.to\/2zZOQXN\">Feature Engineering for Machine Learning<\/a>, 2018.<\/li>\n<li><a href=\"https:\/\/amzn.to\/3b2LHTL\">Applied Predictive Modeling<\/a>, 2013.<\/li>\n<li><a href=\"https:\/\/amzn.to\/2Yvcupn\">Feature Engineering and Selection<\/a>, 2019.<\/li>\n<\/ul>\n<h3>Dataset<\/h3>\n<ul>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.csv\">Sonar Dataset<\/a><\/li>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/sonar.names\">Sonar Dataset Description<\/a><\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/preprocessing.html#preprocessing-transformer\">Non-linear transformation, scikit-learn Guide<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.PowerTransformer.html\">sklearn.preprocessing.PowerTransformer API<\/a>.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Power_transform\">Power transform, Wikipedia<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to use power transforms in scikit-learn to make variables more Gaussian for modeling.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Many machine learning algorithms prefer or perform better when numerical variables have a Gaussian probability distribution.<\/li>\n<li>Power transforms are a technique for transforming numerical input or output variables to have a Gaussian or more-Gaussian-like probability distribution.<\/li>\n<li>How to use the PowerTransform in scikit-learn to use the Box-Cox and Yeo-Johnson transforms when preparing data for predictive modeling.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/power-transforms-with-scikit-learn\/\">How to Use Power Transforms for Machine Learning<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/power-transforms-with-scikit-learn\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Machine learning algorithms like Linear Regression and Gaussian Naive Bayes assume the numerical variables have a Gaussian probability distribution. Your data may [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2020\/05\/17\/how-to-use-power-transforms-for-machine-learning\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":3464,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3463"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=3463"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3463\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/3464"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=3463"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=3463"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=3463"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}