{"id":3632,"date":"2020-07-05T19:00:24","date_gmt":"2020-07-05T19:00:24","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2020\/07\/05\/how-to-use-feature-extraction-on-tabular-data-for-machine-learning\/"},"modified":"2020-07-05T19:00:24","modified_gmt":"2020-07-05T19:00:24","slug":"how-to-use-feature-extraction-on-tabular-data-for-machine-learning","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2020\/07\/05\/how-to-use-feature-extraction-on-tabular-data-for-machine-learning\/","title":{"rendered":"How to Use Feature Extraction on Tabular Data for Machine Learning"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Machine learning predictive modeling performance is only as good as your data, and your data is only as good as the way you prepare it for modeling.<\/p>\n<p>The most common approach to data preparation is to study a dataset and review the expectations of a machine learning algorithm, then carefully choose the most appropriate data preparation techniques to transform the raw data to best meet the expectations of the algorithm. This is slow, expensive, and requires a vast amount of expertise.<\/p>\n<p>An alternative approach to data preparation is to apply a suite of common and commonly useful data preparation techniques to the raw data in parallel and combine the results of all of the transforms together into a single large dataset from which a model can be fit and evaluated.<\/p>\n<p>This is an alternative philosophy for data preparation that treats data transforms as an approach to extract salient features from raw data to expose the structure of the problem to the learning algorithms. It requires learning algorithms that are scalable of weight input features and using those input features that are most relevant to the target that is being predicted.<\/p>\n<p>This approach requires less expertise, is computationally effective compared to a full grid search of data preparation methods, and can aid in the discovery of unintuitive data preparation solutions that achieve good or best performance for a given predictive modeling problem.<\/p>\n<p>In this tutorial, you will discover how to use feature extraction for data preparation with tabular data.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Feature extraction provides an alternate approach to data preparation for tabular data, where all data transforms are applied in parallel to raw input data and combined together to create one large dataset.<\/li>\n<li>How to use the feature extraction method for data preparation to improve model performance over a baseline for a standard classification dataset.<\/li>\n<li>How to add feature selection to the feature extraction modeling pipeline to give a further lift in modeling performance on a standard dataset.<\/li>\n<\/ul>\n<p>Discover data cleaning, feature selection, data transforms, dimensionality reduction and much more <a href=\"https:\/\/machinelearningmastery.com\/data-preparation-for-machine-learning\/\">in my new book<\/a>, with 30 step-by-step tutorials and full Python source code.<\/p>\n<p>Let&rsquo;s get started.<\/p>\n<div id=\"attachment_10984\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10984\" class=\"size-full wp-image-10984\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/07\/How-to-Use-Feature-Extraction-on-Tabular-Data-for-Data-Preparation.jpg\" alt=\"How to Use Feature Extraction on Tabular Data for Data Preparation\" width=\"800\" height=\"429\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/07\/How-to-Use-Feature-Extraction-on-Tabular-Data-for-Data-Preparation.jpg 800w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/07\/How-to-Use-Feature-Extraction-on-Tabular-Data-for-Data-Preparation-300x161.jpg 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/07\/How-to-Use-Feature-Extraction-on-Tabular-Data-for-Data-Preparation-768x412.jpg 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"><\/p>\n<p id=\"caption-attachment-10984\" class=\"wp-caption-text\">How to Use Feature Extraction on Tabular Data for Data Preparation<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/vonfer\/42261101585\/\">Nicolas Valdes<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Feature Extraction Technique for Data Preparation<\/li>\n<li>Dataset and Performance Baseline\n<ol>\n<li>Wine Classification Dataset<\/li>\n<li>Baseline Model Performance<\/li>\n<\/ol>\n<\/li>\n<li>Feature Extraction Approach to Data Preparation<\/li>\n<\/ol>\n<h2>Feature Extraction Technique for Data Preparation<\/h2>\n<p>Data preparation can be challenging.<\/p>\n<p>The approach that is most often prescribed and followed is to analyze the dataset, review the requirements of the algorithms, and transform the raw data to best meet the expectations of the algorithms.<\/p>\n<p>This can be effective, but is also slow and can require deep expertise both with data analysis and machine learning algorithms.<\/p>\n<p>An alternative approach is to treat the preparation of input variables as a hyperparameter of the modeling pipeline and to tune it along with the choice of algorithm and algorithm configuration.<\/p>\n<p>This too can be an effective approach exposing unintuitive solutions and requiring very little expertise, although it can be computationally expensive.<\/p>\n<p>An approach that seeks a middle ground between these two approaches to data preparation is to treat the transformation of input data as a <strong>feature engineering<\/strong> or <strong>feature extraction<\/strong> procedure. This involves applying a suite of common or commonly useful data preparation techniques to the raw data, then aggregating all features together to create one large dataset, then fit and evaluate a model on this data.<\/p>\n<p>The philosophy of the approach treats each data preparation technique as a transform that extracts salient features from raw data to be presented to the learning algorithm. Ideally, such transforms untangle complex relationships and compound input variables, in turn allowing the use of simpler modeling algorithms, such as linear machine learning techniques.<\/p>\n<p>For lack of a better name, we will refer to this as the &ldquo;<strong>Feature Engineering Method<\/strong>&rdquo; or the &ldquo;<strong>Feature Extraction Method<\/strong>&rdquo; for configuring data preparation for a predictive modeling project.<\/p>\n<p>It allows data analysis and algorithm expertise to be used in the selection of data preparation methods and allows unintuitive solutions to be found but at a much lower computational cost.<\/p>\n<p>The exclusion in the number of input features can also be explicitly addressed through the use of feature selection techniques that attempt to rank order the importance or value of the vast number of extracted features and only select a small subset of the most relevant to predicting the target variable.<\/p>\n<p>We can explore this approach to data preparation with a worked example.<\/p>\n<p>Before we dive into a worked example, let&rsquo;s first select a standard dataset and develop a baseline in performance.<\/p>\n<\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<p><center><\/p>\n<h3>Want to Get Started With Data Preparation?<\/h3>\n<p>Take my free 7-day email crash course now (with sample code).<\/p>\n<p>Click to sign-up and also get a free PDF Ebook version of the course.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1041bc0ec172a2%3A164f8be4f346dc\/4935938752774144\/\" target=\"_blank\" style=\"background: rgb(255, 206, 10); color: rgb(255, 255, 255); text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-weight: bold; font-size: 16px; line-height: 20px; padding: 10px; display: inline-block; max-width: 300px; border-radius: 5px; text-shadow: rgba(0, 0, 0, 0.25) 0px -1px 1px; box-shadow: rgba(255, 255, 255, 0.5) 0px 1px 3px inset, rgba(0, 0, 0, 0.5) 0px 1px 3px;\" rel=\"noopener noreferrer\">Download Your FREE Mini-Course<\/a><script data-leadbox=\"1041bc0ec172a2:164f8be4f346dc\" data-url=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1041bc0ec172a2%3A164f8be4f346dc\/4935938752774144\/\" data-config=\"%7B%7D\" type=\"text\/javascript\" src=\"https:\/\/machinelearningmastery.lpages.co\/leadbox-1589485176.js\"><\/script><\/p>\n<p><\/center><\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<h2>Dataset and Performance Baseline<\/h2>\n<p>In this section, we will first select a standard machine learning dataset and establish a baseline in performance on this dataset. This will provide the context for exploring the feature extraction method of data preparation in the next section.<\/p>\n<h3>Wine Classification Dataset<\/h3>\n<p>We will use the wine classification dataset.<\/p>\n<p>This dataset has 13 input variables that describe the chemical composition of samples of wine and requires that the wine be classified as one of three types.<\/p>\n<p>You can learn more about the dataset here:<\/p>\n<ul>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/wine.csv\">Wine Dataset (wine.csv)<\/a><\/li>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/wine.names\">Wine Dataset Description (wine.names)<\/a><\/li>\n<\/ul>\n<p>No need to download the dataset as we will download it automatically as part of our worked examples.<\/p>\n<p>Open the dataset and review the raw data. The first few rows of data are listed below.<\/p>\n<p>We can see that it is a <a href=\"https:\/\/machinelearningmastery.com\/types-of-classification-in-machine-learning\/\">multi-class classification<\/a> predictive modeling problem with numerical input variables, each of which has different scales.<\/p>\n<pre class=\"crayon-plain-tag\">14.23,1.71,2.43,15.6,127,2.8,3.06,.28,2.29,5.64,1.04,3.92,1065,1\r\n13.2,1.78,2.14,11.2,100,2.65,2.76,.26,1.28,4.38,1.05,3.4,1050,1\r\n13.16,2.36,2.67,18.6,101,2.8,3.24,.3,2.81,5.68,1.03,3.17,1185,1\r\n14.37,1.95,2.5,16.8,113,3.85,3.49,.24,2.18,7.8,.86,3.45,1480,1\r\n13.24,2.59,2.87,21,118,2.8,2.69,.39,1.82,4.32,1.04,2.93,735,1\r\n...<\/pre>\n<p>The example loads the dataset and splits it into the input and output columns, then summarizes the data arrays.<\/p>\n<pre class=\"crayon-plain-tag\"># example of loading and summarizing the wine dataset\r\nfrom pandas import read_csv\r\n# define the location of the dataset\r\nurl = 'https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/wine.csv'\r\n# load the dataset as a data frame\r\ndf = read_csv(url, header=None)\r\n# retrieve the numpy array\r\ndata = df.values\r\n# split the columns into input and output variables\r\nX, y = data[:, :-1], data[:, -1]\r\n# summarize the shape of the loaded data\r\nprint(X.shape, y.shape)<\/pre>\n<p>Running the example, we can see that the dataset was loaded correctly and that there are 179 rows of data with 13 input variables and a single target variable.<\/p>\n<pre class=\"crayon-plain-tag\">(178, 13) (178,)<\/pre>\n<p>Next, let&rsquo;s evaluate a model on this dataset and establish a baseline in performance.<\/p>\n<h3>Baseline Model Performance<\/h3>\n<p>We can establish a baseline in performance on the wine classification task by evaluating a model on the raw input data.<\/p>\n<p>In this case, we will evaluate a logistic regression model.<\/p>\n<p>First, we can perform minimum data preparation by ensuring the input variables are numeric and that the target variable is label encoded, as expected by the scikit-learn library.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# minimally prepare dataset\r\nX = X.astype('float')\r\ny = LabelEncoder().fit_transform(y.astype('str'))<\/pre>\n<p>Next, we can define our predictive model.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define the model\r\nmodel = LogisticRegression(solver='liblinear')<\/pre>\n<p>We will evaluate the model using the gold standard of <a href=\"https:\/\/machinelearningmastery.com\/k-fold-cross-validation\/\">repeated stratified k-fold cross-validation<\/a> with 10 folds and three repeats.<\/p>\n<p>Model performance will be evaluated using classification accuracy.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\nmodel = LogisticRegression(solver='liblinear')\r\n# define the cross-validation procedure\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n# evaluate model\r\nscores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)<\/pre>\n<p>At the end of the run, we will report the mean and standard deviation of the accuracy scores collected across all repeats and evaluation folds.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# report performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))<\/pre>\n<p>Tying this together, the complete example of evaluating a logistic regression model on the raw wine classification dataset is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># baseline model performance on the wine dataset\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom pandas import read_csv\r\nfrom sklearn.preprocessing import LabelEncoder\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.linear_model import LogisticRegression\r\n# load the dataset\r\nurl = 'https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/wine.csv'\r\ndf = read_csv(url, header=None)\r\ndata = df.values\r\nX, y = data[:, :-1], data[:, -1]\r\n# minimally prepare dataset\r\nX = X.astype('float')\r\ny = LabelEncoder().fit_transform(y.astype('str'))\r\n# define the model\r\nmodel = LogisticRegression(solver='liblinear')\r\n# define the cross-validation procedure\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n# evaluate model\r\nscores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n# report performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))<\/pre>\n<p>Running the example evaluates the model performance and reports the mean and standard deviation classification accuracy.<\/p>\n<p>Your results may vary given the stochastic nature of the learning algorithm, the evaluation procedure, and differences in precision across machines. Try running the example a few times.<\/p>\n<p>In this case, we can see that the logistic regression model fit on the raw input data achieved the average classification accuracy of about 95.3 percent, providing a baseline in performance.<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: 0.953 (0.048)<\/pre>\n<p>Next, let&rsquo;s explore whether we can improve the performance using the feature extraction based approach to data preparation.<\/p>\n<h2>Feature Extraction Approach to Data Preparation<\/h2>\n<p>In this section, we can explore whether we can improve performance using the feature extraction approach to data preparation.<\/p>\n<p>The first step is to select a suite of common and commonly useful data preparation techniques.<\/p>\n<p>In this case, given that the input variables are numeric, we will use a range of transforms to change the scale of the input variables such as MinMaxScaler, StandardScaler, and <a href=\"https:\/\/machinelearningmastery.com\/robust-scaler-transforms-for-machine-learning\/\">RobustScaler<\/a>, as well as transforms for chaining the distribution of the input variables such as <a href=\"https:\/\/machinelearningmastery.com\/quantile-transforms-for-machine-learning\/\">QuantileTransformer<\/a> and <a href=\"https:\/\/machinelearningmastery.com\/discretization-transforms-for-machine-learning\/\">KBinsDiscretizer<\/a>. Finally, we will also use transforms that remove linear dependencies between the input variables such as <a href=\"https:\/\/machinelearningmastery.com\/principal-components-analysis-for-dimensionality-reduction-in-python\/\">PCA<\/a> and <a href=\"https:\/\/machinelearningmastery.com\/singular-value-decomposition-for-dimensionality-reduction-in-python\/\">TruncatedSVD<\/a>.<\/p>\n<p>The <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.pipeline.FeatureUnion.html\">FeatureUnion class<\/a> can be used to define a list of transforms to perform, the results of which will be aggregated together, i.e. unioned. This will create a new dataset that has a vast number of columns.<\/p>\n<p>An estimate of the number of columns would be 13 input variables times five transforms or 65 plus the 14 columns output from the PCA and SVD dimensionality reduction methods, to give a total of about 79 features.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# transforms for the feature union\r\ntransforms = list()\r\ntransforms.append(('mms', MinMaxScaler()))\r\ntransforms.append(('ss', StandardScaler()))\r\ntransforms.append(('rs', RobustScaler()))\r\ntransforms.append(('qt', QuantileTransformer(n_quantiles=100, output_distribution='normal')))\r\ntransforms.append(('kbd', KBinsDiscretizer(n_bins=10, encode='ordinal', strategy='uniform')))\r\ntransforms.append(('pca', PCA(n_components=7)))\r\ntransforms.append(('svd', TruncatedSVD(n_components=7)))\r\n# create the feature union\r\nfu = FeatureUnion(transforms)<\/pre>\n<p>We can then create a modeling <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.pipeline.Pipeline.html\">Pipeline<\/a> with the FeatureUnion as the first step and the logistic regression model as the final step.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define the model\r\nmodel = LogisticRegression(solver='liblinear')\r\n# define the pipeline\r\nsteps = list()\r\nsteps.append(('fu', fu))\r\nsteps.append(('m', model))\r\npipeline = Pipeline(steps=steps)<\/pre>\n<p>The pipeline can then be evaluated using repeated stratified k-fold cross-validation as before.<\/p>\n<p>Tying this together, the complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># data preparation as feature engineering for wine dataset\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom pandas import read_csv\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.linear_model import LogisticRegression\r\nfrom sklearn.pipeline import Pipeline\r\nfrom sklearn.pipeline import FeatureUnion\r\nfrom sklearn.preprocessing import LabelEncoder\r\nfrom sklearn.preprocessing import MinMaxScaler\r\nfrom sklearn.preprocessing import StandardScaler\r\nfrom sklearn.preprocessing import RobustScaler\r\nfrom sklearn.preprocessing import QuantileTransformer\r\nfrom sklearn.preprocessing import KBinsDiscretizer\r\nfrom sklearn.decomposition import PCA\r\nfrom sklearn.decomposition import TruncatedSVD\r\n# load the dataset\r\nurl = 'https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/wine.csv'\r\ndf = read_csv(url, header=None)\r\ndata = df.values\r\nX, y = data[:, :-1], data[:, -1]\r\n# minimally prepare dataset\r\nX = X.astype('float')\r\ny = LabelEncoder().fit_transform(y.astype('str'))\r\n# transforms for the feature union\r\ntransforms = list()\r\ntransforms.append(('mms', MinMaxScaler()))\r\ntransforms.append(('ss', StandardScaler()))\r\ntransforms.append(('rs', RobustScaler()))\r\ntransforms.append(('qt', QuantileTransformer(n_quantiles=100, output_distribution='normal')))\r\ntransforms.append(('kbd', KBinsDiscretizer(n_bins=10, encode='ordinal', strategy='uniform')))\r\ntransforms.append(('pca', PCA(n_components=7)))\r\ntransforms.append(('svd', TruncatedSVD(n_components=7)))\r\n# create the feature union\r\nfu = FeatureUnion(transforms)\r\n# define the model\r\nmodel = LogisticRegression(solver='liblinear')\r\n# define the pipeline\r\nsteps = list()\r\nsteps.append(('fu', fu))\r\nsteps.append(('m', model))\r\npipeline = Pipeline(steps=steps)\r\n# define the cross-validation procedure\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n# evaluate model\r\nscores = cross_val_score(pipeline, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n# report performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))<\/pre>\n<p>Running the example evaluates the model performance and reports the mean and standard deviation classification accuracy.<\/p>\n<p>Your results may vary given the stochastic nature of the learning algorithm, the evaluation procedure, and differences in precision across machines. Try running the example a few times.<\/p>\n<p>In this case, we can see a lift in performance over the baseline performance, achieving a mean classification accuracy of about 96.8 percent as compared to 95.3 percent in the previous section.<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: 0.968 (0.037)<\/pre>\n<p>Try adding more data preparation methods to the FeatureUnion to see if you can improve the performance.<\/p>\n<p><strong>Can you get better results?<\/strong><br \/>\nLet me know what you discover in the comments below.<\/p>\n<p>We can also use feature selection to reduce the approximately 80 extracted features down to a subset of those that are most relevant to the model. In addition to reducing the complexity of the model, it can also result in a lift in performance by removing irrelevant and redundant input features.<\/p>\n<p>In this case, we will use the <a href=\"https:\/\/machinelearningmastery.com\/rfe-feature-selection-in-python\/\">Recursive Feature Elimination<\/a>, or RFE, technique for feature selection and configure it to select the 15 most relevant features.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define the feature selection\r\nrfe = RFE(estimator=LogisticRegression(solver='liblinear'), n_features_to_select=15)<\/pre>\n<p>We can then add the RFE feature selection to the modeling pipeline after the <em>FeatureUnion<\/em> and before the <em>LogisticRegression<\/em> algorithm.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define the pipeline\r\nsteps = list()\r\nsteps.append(('fu', fu))\r\nsteps.append(('rfe', rfe))\r\nsteps.append(('m', model))\r\npipeline = Pipeline(steps=steps)<\/pre>\n<p>Tying this together, the complete example of the feature selection data preparation method with feature selection is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># data preparation as feature engineering with feature selection for wine dataset\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom pandas import read_csv\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.linear_model import LogisticRegression\r\nfrom sklearn.pipeline import Pipeline\r\nfrom sklearn.pipeline import FeatureUnion\r\nfrom sklearn.preprocessing import LabelEncoder\r\nfrom sklearn.preprocessing import MinMaxScaler\r\nfrom sklearn.preprocessing import StandardScaler\r\nfrom sklearn.preprocessing import RobustScaler\r\nfrom sklearn.preprocessing import QuantileTransformer\r\nfrom sklearn.preprocessing import KBinsDiscretizer\r\nfrom sklearn.feature_selection import RFE\r\nfrom sklearn.decomposition import PCA\r\nfrom sklearn.decomposition import TruncatedSVD\r\n# load the dataset\r\nurl = 'https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/wine.csv'\r\ndf = read_csv(url, header=None)\r\ndata = df.values\r\nX, y = data[:, :-1], data[:, -1]\r\n# minimally prepare dataset\r\nX = X.astype('float')\r\ny = LabelEncoder().fit_transform(y.astype('str'))\r\n# transforms for the feature union\r\ntransforms = list()\r\ntransforms.append(('mms', MinMaxScaler()))\r\ntransforms.append(('ss', StandardScaler()))\r\ntransforms.append(('rs', RobustScaler()))\r\ntransforms.append(('qt', QuantileTransformer(n_quantiles=100, output_distribution='normal')))\r\ntransforms.append(('kbd', KBinsDiscretizer(n_bins=10, encode='ordinal', strategy='uniform')))\r\ntransforms.append(('pca', PCA(n_components=7)))\r\ntransforms.append(('svd', TruncatedSVD(n_components=7)))\r\n# create the feature union\r\nfu = FeatureUnion(transforms)\r\n# define the feature selection\r\nrfe = RFE(estimator=LogisticRegression(solver='liblinear'), n_features_to_select=15)\r\n# define the model\r\nmodel = LogisticRegression(solver='liblinear')\r\n# define the pipeline\r\nsteps = list()\r\nsteps.append(('fu', fu))\r\nsteps.append(('rfe', rfe))\r\nsteps.append(('m', model))\r\npipeline = Pipeline(steps=steps)\r\n# define the cross-validation procedure\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n# evaluate model\r\nscores = cross_val_score(pipeline, X, y, scoring='accuracy', cv=cv, n_jobs=-1)\r\n# report performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))<\/pre>\n<p>Running the example evaluates the model performance and reports the mean and standard deviation classification accuracy.<\/p>\n<p>Your results may vary given the stochastic nature of the learning algorithm, the evaluation procedure, and differences in precision across machines. Try running the example a few times.<\/p>\n<p>Again, we can see a further lift in performance from 96.8 percent with all extracted features to about 98.9 with feature selection used prior to modeling.<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: 0.989 (0.022)<\/pre>\n<p><strong>Can you achieve better performance with a different feature selection technique or with more or fewer selected features?<\/strong><br \/>\nLet me know what you discover in the comments below.<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Related Tutorials<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/results-for-standard-classification-and-regression-machine-learning-datasets\/\">Results for Standard Classification and Regression Machine Learning Datasets<\/a><\/li>\n<\/ul>\n<h3>Books<\/h3>\n<ul>\n<li><a href=\"https:\/\/amzn.to\/3aydNGf\">Feature Engineering and Selection<\/a>, 2019.<\/li>\n<li><a href=\"https:\/\/amzn.to\/2XZJNR2\">Feature Engineering for Machine Learning<\/a>, 2018.<\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.pipeline.Pipeline.html\">sklearn.pipeline.Pipeline API<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.pipeline.FeatureUnion.html\">sklearn.pipeline.FeatureUnion API<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to use feature extraction for data preparation with tabular data.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Feature extraction provides an alternate approach to data preparation for tabular data, where all data transforms are applied in parallel to raw input data and combined together to create one large dataset.<\/li>\n<li>How to use the feature extraction method for data preparation to improve model performance over a baseline for a standard classification dataset.<\/li>\n<li>How to add feature selection to the feature extraction modeling pipeline to give a further lift in modeling performance on a standard dataset.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/feature-extraction-on-tabular-data\/\">How to Use Feature Extraction on Tabular Data for Machine Learning<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/feature-extraction-on-tabular-data\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Machine learning predictive modeling performance is only as good as your data, and your data is only as good as the way [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2020\/07\/05\/how-to-use-feature-extraction-on-tabular-data-for-machine-learning\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":3633,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3632"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=3632"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3632\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/3633"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=3632"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=3632"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=3632"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}