{"id":3538,"date":"2020-06-07T19:00:25","date_gmt":"2020-06-07T19:00:25","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2020\/06\/07\/how-to-perform-feature-selection-for-regression-data\/"},"modified":"2020-06-07T19:00:25","modified_gmt":"2020-06-07T19:00:25","slug":"how-to-perform-feature-selection-for-regression-data","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2020\/06\/07\/how-to-perform-feature-selection-for-regression-data\/","title":{"rendered":"How to Perform Feature Selection for Regression Data"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p><strong>Feature selection<\/strong> is the process of identifying and selecting a subset of input variables that are most relevant to the target variable.<\/p>\n<p>Perhaps the simplest case of feature selection is the case where there are numerical input variables and a numerical target for regression predictive modeling. This is because the strength of the relationship between each input variable and the target can be calculated, called correlation, and compared relative to each other.<\/p>\n<p>In this tutorial, you will discover how to perform feature selection with numerical input data for regression predictive modeling.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>How to evaluate the importance of numerical input data using the correlation and mutual information statistics.<\/li>\n<li>How to perform feature selection for numerical input data when fitting and evaluating a regression model.<\/li>\n<li>How to tune the number of features selected in a modeling pipeline using a grid search.<\/li>\n<\/ul>\n<p>Let&rsquo;s get started.<\/p>\n<div id=\"attachment_10822\" style=\"width: 809px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10822\" class=\"size-full wp-image-10822\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/08\/How-to-Perform-Feature-Selection-for-Regression-Data.jpg\" alt=\"How to Perform Feature Selection for Regression Data\" width=\"799\" height=\"533\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/08\/How-to-Perform-Feature-Selection-for-Regression-Data.jpg 799w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/08\/How-to-Perform-Feature-Selection-for-Regression-Data-300x200.jpg 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/08\/How-to-Perform-Feature-Selection-for-Regression-Data-768x512.jpg 768w\" sizes=\"(max-width: 799px) 100vw, 799px\"><\/p>\n<p id=\"caption-attachment-10822\" class=\"wp-caption-text\">How to Perform Feature Selection for Regression Data<br \/>Photo by <a href=\"https:\/\/flickr.com\/photos\/archer10\/44973439431\/\">Dennis Jarvis<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into four parts; they are:<\/p>\n<ol>\n<li>Regression Dataset<\/li>\n<li>Numerical Feature Selection\n<ol>\n<li>Correlation Feature Selection<\/li>\n<li>Mutual Information Feature Selection<\/li>\n<\/ol>\n<\/li>\n<li>Modeling With Selected Features\n<ol>\n<li>Model Built Using All Features<\/li>\n<li>Model Built Using Correlation Features<\/li>\n<li>Model Built Using Mutual Information Features<\/li>\n<\/ol>\n<\/li>\n<li>Tune the Number of Selected Features<\/li>\n<\/ol>\n<h2>Regression Dataset<\/h2>\n<p>We will use a synthetic regression dataset as the basis of this tutorial.<\/p>\n<p>Recall that a regression problem is a problem in which we want to predict a numerical value. In this case, we require a dataset that also has numerical input variables.<\/p>\n<p>The <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_regression.html\">make_regression() function<\/a> from the scikit-learn library can be used to define a dataset. It provides control over the number of samples, number of input features, and, importantly, the number of relevant and redundant input features. This is critical as we specifically desire a dataset that we know has some redundant input features.<\/p>\n<p>In this case, we will define a dataset with 1,000 samples, each with 100 input features where 10 are informative and the remaining 90 are redundant.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# generate regression dataset\r\nX, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)<\/pre>\n<p>The hope is that feature selection techniques can identify some or all of those features that are relevant to the target, or, at the very least, identify and remove some of the redundant input features.<\/p>\n<p>Once defined, we can split the data into training and test sets so we can fit and evaluate a learning model.<\/p>\n<p>We will use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.train_test_split.html\">train_test_split() function<\/a> form scikit-learn and use 67 percent of the data for training and 33 percent for testing.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# split into train and test sets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)<\/pre>\n<p>Tying these elements together, the complete example of defining, splitting, and summarizing the raw regression dataset is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># load and summarize the dataset\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import train_test_split\r\n# generate regression dataset\r\nX, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)\r\n# split into train and test sets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)\r\n# summarize\r\nprint('Train', X_train.shape, y_train.shape)\r\nprint('Test', X_test.shape, y_test.shape)<\/pre>\n<p>Running the example reports the size of the input and output elements of the train and test sets.<\/p>\n<p>We can see that we have 670 examples for training and 330 for testing.<\/p>\n<pre class=\"crayon-plain-tag\">Train (670, 100) (670,)\r\nTest (330, 100) (330,)<\/pre>\n<p>Now that we have loaded and prepared the dataset, we can explore feature selection.<\/p>\n<h2>Numerical Feature Selection<\/h2>\n<p>There are two popular feature selection techniques that can be used for numerical input data and a numerical target variable.<\/p>\n<p>They are:<\/p>\n<ol>\n<li>Correlation Statistics.<\/li>\n<li>Mutual Information Statistics.<\/li>\n<\/ol>\n<p>Let&rsquo;s take a closer look at each in turn.<\/p>\n<h3>Correlation Feature Selection<\/h3>\n<p>Correlation is a measure of how two variables change together. Perhaps the most common correlation measure is <a href=\"https:\/\/en.wikipedia.org\/wiki\/Pearson_correlation_coefficient\">Pearson&rsquo;s correlation<\/a> that assumes a Gaussian distribution to each variable and reports on their linear relationship.<\/p>\n<blockquote>\n<p>For numeric predictors, the classic approach to quantifying each relationship with the outcome uses the sample correlation statistic.<\/p>\n<\/blockquote>\n<p>&mdash; Page 464, <a href=\"https:\/\/amzn.to\/3b2LHTL\">Applied Predictive Modeling<\/a>, 2013.<\/p>\n<p>For more on linear or parametric correlation, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/how-to-use-correlation-to-understand-the-relationship-between-variables\/\">How to Calculate Correlation Between Variables in Python<\/a><\/li>\n<\/ul>\n<p>Linear correlation scores are typically a value between -1 and 1 with 0 representing no relationship. For feature selection, we are often interested in a positive score with the larger the positive value, the larger the relationship, and, more likely, the feature should be selected for modeling. As such the linear correlation can be converted into a correlation statistic with only positive values.<\/p>\n<p>The scikit-learn machine library provides an implementation of the correlation statistic in the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.feature_selection.f_regression.html\">f_regression() function<\/a>. This function can be used in a feature selection strategy, such as selecting the top k most relevant features (largest values) via the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.feature_selection.SelectKBest.html\">SelectKBest class<\/a>.<\/p>\n<p>For example, we can define the <em>SelectKBest<\/em> class to use the <em>f_regression()<\/em> function and select all features, then transform the train and test sets.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# configure to select all features\r\nfs = SelectKBest(score_func=f_regression, k='all')\r\n# learn relationship from training data\r\nfs.fit(X_train, y_train)\r\n# transform train input data\r\nX_train_fs = fs.transform(X_train)\r\n# transform test input data\r\nX_test_fs = fs.transform(X_test)\r\nreturn X_train_fs, X_test_fs, fs<\/pre>\n<p>We can then print the scores for each variable (largest is better) and plot the scores for each variable as a bar graph to get an idea of how many features we should select.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# what are scores for the features\r\nfor i in range(len(fs.scores_)):\r\n\tprint('Feature %d: %f' % (i, fs.scores_[i]))\r\n# plot the scores\r\npyplot.bar([i for i in range(len(fs.scores_))], fs.scores_)\r\npyplot.show()<\/pre>\n<p>Tying this together with the data preparation for the dataset in the previous section, the complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># example of correlation feature selection for numerical data\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.feature_selection import SelectKBest\r\nfrom sklearn.feature_selection import f_regression\r\nfrom matplotlib import pyplot\r\n\r\n# feature selection\r\ndef select_features(X_train, y_train, X_test):\r\n\t# configure to select all features\r\n\tfs = SelectKBest(score_func=f_regression, k='all')\r\n\t# learn relationship from training data\r\n\tfs.fit(X_train, y_train)\r\n\t# transform train input data\r\n\tX_train_fs = fs.transform(X_train)\r\n\t# transform test input data\r\n\tX_test_fs = fs.transform(X_test)\r\n\treturn X_train_fs, X_test_fs, fs\r\n\r\n# load the dataset\r\nX, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)\r\n# split into train and test sets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)\r\n# feature selection\r\nX_train_fs, X_test_fs, fs = select_features(X_train, y_train, X_test)\r\n# what are scores for the features\r\nfor i in range(len(fs.scores_)):\r\n\tprint('Feature %d: %f' % (i, fs.scores_[i]))\r\n# plot the scores\r\npyplot.bar([i for i in range(len(fs.scores_))], fs.scores_)\r\npyplot.show()<\/pre>\n<p>Running the example first prints the scores calculated for each input feature and the target variable.<\/p>\n<p>Note that your specific results may vary. Try running the example a few times.<\/p>\n<p>We will not list the scores for all 100 input variables as it will take up too much space. Nevertheless, we can see that some variables have larger scores than others, e.g. less than 1 vs. 5, and others have a much larger scores, such as Feature 9 that has 101.<\/p>\n<pre class=\"crayon-plain-tag\">Feature 0: 0.009419\r\nFeature 1: 1.018881\r\nFeature 2: 1.205187\r\nFeature 3: 0.000138\r\nFeature 4: 0.167511\r\nFeature 5: 5.985083\r\nFeature 6: 0.062405\r\nFeature 7: 1.455257\r\nFeature 8: 0.420384\r\nFeature 9: 101.392225\r\n...<\/pre>\n<p>A bar chart of the feature importance scores for each input feature is created.<\/p>\n<p>The plot clearly shows 8 to 10 features are a lot more important than the other features.<\/p>\n<p>We could set <em>k=10<\/em> When configuring the <em>SelectKBest<\/em> to select these top features.<\/p>\n<div id=\"attachment_10819\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10819\" class=\"size-full wp-image-10819\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/05\/Bar-Chart-of-the-Input-Features-x-vs-Correlation-Feature-Importance-y.png\" alt=\"Bar Chart of the Input Features (x) vs. Correlation Feature Importance (y)\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Bar-Chart-of-the-Input-Features-x-vs-Correlation-Feature-Importance-y.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Bar-Chart-of-the-Input-Features-x-vs-Correlation-Feature-Importance-y-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Bar-Chart-of-the-Input-Features-x-vs-Correlation-Feature-Importance-y-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Bar-Chart-of-the-Input-Features-x-vs-Correlation-Feature-Importance-y-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10819\" class=\"wp-caption-text\">Bar Chart of the Input Features (x) vs. Correlation Feature Importance (y)<\/p>\n<\/div>\n<h3>Mutual Information Feature Selection<\/h3>\n<p>Mutual information from the field of <a href=\"https:\/\/machinelearningmastery.com\/what-is-information-entropy\/\">information theory<\/a> is the application of information gain (typically used in the construction of decision trees) to feature selection.<\/p>\n<p>Mutual information is calculated between two variables and measures the reduction in uncertainty for one variable given a known value of the other variable.<\/p>\n<p>You can learn more about mutual information in the following tutorial.<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/information-gain-and-mutual-information\">What Is Information Gain and Mutual Information for Machine Learning<\/a><\/li>\n<\/ul>\n<p>Mutual information is straightforward when considering the distribution of two discrete (categorical or ordinal) variables, such as categorical input and categorical output data. Nevertheless, it can be adapted for use with numerical input and output data.<\/p>\n<p>For technical details on how this can be achieved, see the 2014 paper titled &ldquo;<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3929353\/\">Mutual Information between Discrete and Continuous Data Sets<\/a>.&rdquo;<\/p>\n<p>The scikit-learn machine learning library provides an implementation of mutual information for feature selection with numeric input and output variables via the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.feature_selection.mutual_info_regression.html\">mutual_info_regression() function<\/a>.<\/p>\n<p>Like <em>f_regression()<\/em>, it can be used in the <em>SelectKBest<\/em> feature selection strategy (and other strategies).<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# configure to select all features\r\nfs = SelectKBest(score_func=mutual_info_regression, k='all')\r\n# learn relationship from training data\r\nfs.fit(X_train, y_train)\r\n# transform train input data\r\nX_train_fs = fs.transform(X_train)\r\n# transform test input data\r\nX_test_fs = fs.transform(X_test)\r\nreturn X_train_fs, X_test_fs, fs<\/pre>\n<p>We can perform feature selection using mutual information on the dataset and print and plot the scores (larger is better) as we did in the previous section.<\/p>\n<p>The complete example of using mutual information for numerical feature selection is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># example of mutual information feature selection for numerical input data\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.feature_selection import SelectKBest\r\nfrom sklearn.feature_selection import mutual_info_regression\r\nfrom matplotlib import pyplot\r\n\r\n# feature selection\r\ndef select_features(X_train, y_train, X_test):\r\n\t# configure to select all features\r\n\tfs = SelectKBest(score_func=mutual_info_regression, k='all')\r\n\t# learn relationship from training data\r\n\tfs.fit(X_train, y_train)\r\n\t# transform train input data\r\n\tX_train_fs = fs.transform(X_train)\r\n\t# transform test input data\r\n\tX_test_fs = fs.transform(X_test)\r\n\treturn X_train_fs, X_test_fs, fs\r\n\r\n# load the dataset\r\nX, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)\r\n# split into train and test sets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)\r\n# feature selection\r\nX_train_fs, X_test_fs, fs = select_features(X_train, y_train, X_test)\r\n# what are scores for the features\r\nfor i in range(len(fs.scores_)):\r\n\tprint('Feature %d: %f' % (i, fs.scores_[i]))\r\n# plot the scores\r\npyplot.bar([i for i in range(len(fs.scores_))], fs.scores_)\r\npyplot.show()<\/pre>\n<p>Running the example first prints the scores calculated for each input feature and the target variable.<\/p>\n<p>Note that your specific results may vary. Try running the example a few times.<\/p>\n<p>Again, we will not list the scores for all 100 input variables. We can see many features have a score of 0.0, whereas this technique has identified many more features that may be relevant to the target.<\/p>\n<pre class=\"crayon-plain-tag\">Feature 0: 0.045484\r\nFeature 1: 0.000000\r\nFeature 2: 0.000000\r\nFeature 3: 0.000000\r\nFeature 4: 0.024816\r\nFeature 5: 0.000000\r\nFeature 6: 0.022659\r\nFeature 7: 0.000000\r\nFeature 8: 0.000000\r\nFeature 9: 0.074320\r\n...<\/pre>\n<p>A bar chart of the feature importance scores for each input feature is created.<\/p>\n<p>Compared to the correlation feature selection method we can clearly see many more features scored as being relevant. This may be because of the statistical noise that we added to the dataset in its construction.<\/p>\n<div id=\"attachment_10820\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10820\" class=\"size-full wp-image-10820\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/05\/Bar-Chart-of-the-Input-Features-x-vs-the-Mutual-Information-Feature-Importance-y-1.png\" alt=\"Bar Chart of the Input Features (x) vs. the Mutual Information Feature Importance (y)\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Bar-Chart-of-the-Input-Features-x-vs-the-Mutual-Information-Feature-Importance-y-1.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Bar-Chart-of-the-Input-Features-x-vs-the-Mutual-Information-Feature-Importance-y-1-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Bar-Chart-of-the-Input-Features-x-vs-the-Mutual-Information-Feature-Importance-y-1-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/05\/Bar-Chart-of-the-Input-Features-x-vs-the-Mutual-Information-Feature-Importance-y-1-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10820\" class=\"wp-caption-text\">Bar Chart of the Input Features (x) vs. the Mutual Information Feature Importance (y)<\/p>\n<\/div>\n<p>Now that we know how to perform feature selection on numerical input data for a regression predictive modeling problem, we can try developing a model using the selected features and compare the results.<\/p>\n<h2>Modeling With Selected Features<\/h2>\n<p>There are many different techniques for scoring features and selecting features based on scores; how do you know which one to use?<\/p>\n<p>A robust approach is to evaluate models using different feature selection methods (and numbers of features) and select the method that results in a model with the best performance.<\/p>\n<p>In this section, we will evaluate a Linear Regression model with all features compared to a model built from features selected by correlation statistics and those features selected via mutual information.<\/p>\n<p>Linear regression is a good model for testing feature selection methods as it can perform better if irrelevant features are removed from the model.<\/p>\n<h3>Model Built Using All Features<\/h3>\n<p>As a first step, we will evaluate a <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.linear_model.LinearRegression.html\">LinearRegression<\/a> model using all the available features.<\/p>\n<p>The model is fit on the training dataset and evaluated on the test dataset.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluation of a model using all input features\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.linear_model import LinearRegression\r\nfrom sklearn.metrics import mean_absolute_error\r\n# load the dataset\r\nX, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)\r\n# split into train and test sets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)\r\n# fit the model\r\nmodel = LinearRegression()\r\nmodel.fit(X_train, y_train)\r\n# evaluate the model\r\nyhat = model.predict(X_test)\r\n# evaluate predictions\r\nmae = mean_absolute_error(y_test, yhat)\r\nprint('MAE: %.3f' % mae)<\/pre>\n<p>Running the example prints the mean absolute error (MAE) of the model on the training dataset.<\/p>\n<p>Note that your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p>In this case, we can see that the model achieves an error of about 0.086.<\/p>\n<p>We would prefer to use a subset of features that achieves an error that is as good or better than this.<\/p>\n<pre class=\"crayon-plain-tag\">MAE: 0.086<\/pre>\n<\/p>\n<h3>Model Built Using Correlation Features<\/h3>\n<p>We can use the correlation method to score the features and select the 10 most relevant ones.<\/p>\n<p>The <em>select_features()<\/em> function below is updated to achieve this.<\/p>\n<pre class=\"crayon-plain-tag\"># feature selection\r\ndef select_features(X_train, y_train, X_test):\r\n\t# configure to select a subset of features\r\n\tfs = SelectKBest(score_func=f_regression, k=10)\r\n\t# learn relationship from training data\r\n\tfs.fit(X_train, y_train)\r\n\t# transform train input data\r\n\tX_train_fs = fs.transform(X_train)\r\n\t# transform test input data\r\n\tX_test_fs = fs.transform(X_test)\r\n\treturn X_train_fs, X_test_fs, fs<\/pre>\n<p>The complete example of evaluating a linear regression model fit and evaluated on data using this feature selection method is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluation of a model using 10 features chosen with correlation\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.feature_selection import SelectKBest\r\nfrom sklearn.feature_selection import f_regression\r\nfrom sklearn.linear_model import LinearRegression\r\nfrom sklearn.metrics import mean_absolute_error\r\n\r\n# feature selection\r\ndef select_features(X_train, y_train, X_test):\r\n\t# configure to select a subset of features\r\n\tfs = SelectKBest(score_func=f_regression, k=10)\r\n\t# learn relationship from training data\r\n\tfs.fit(X_train, y_train)\r\n\t# transform train input data\r\n\tX_train_fs = fs.transform(X_train)\r\n\t# transform test input data\r\n\tX_test_fs = fs.transform(X_test)\r\n\treturn X_train_fs, X_test_fs, fs\r\n\r\n# load the dataset\r\nX, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)\r\n# split into train and test sets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)\r\n# feature selection\r\nX_train_fs, X_test_fs, fs = select_features(X_train, y_train, X_test)\r\n# fit the model\r\nmodel = LinearRegression()\r\nmodel.fit(X_train_fs, y_train)\r\n# evaluate the model\r\nyhat = model.predict(X_test_fs)\r\n# evaluate predictions\r\nmae = mean_absolute_error(y_test, yhat)\r\nprint('MAE: %.3f' % mae)<\/pre>\n<p>Running the example reports the performance of the model on just 10 of the 100 input features selected using the correlation statistic.<\/p>\n<p>Note that your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p>In this case, we see that the model achieved an error score of about 2.7, which is much larger than the baseline model that used all features and achieved an MAE of 0.086.<\/p>\n<p>This suggests that although the method has a strong idea of what features to select, building a model from these features alone does not result in a more skillful model. This could be because features that are important to the target are being left out, meaning that the method is being deceived about what is important.<\/p>\n<pre class=\"crayon-plain-tag\">MAE: 2.740<\/pre>\n<p>Let&rsquo;s go the other way and try to use the method to remove some redundant features rather than all redundant features.<\/p>\n<p>We can do this by setting the number of selected features to a much larger value, in this case, 88, hoping it can find and discard 12 of the 90 redundant features.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluation of a model using 88 features chosen with correlation\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.feature_selection import SelectKBest\r\nfrom sklearn.feature_selection import f_regression\r\nfrom sklearn.linear_model import LinearRegression\r\nfrom sklearn.metrics import mean_absolute_error\r\n\r\n# feature selection\r\ndef select_features(X_train, y_train, X_test):\r\n\t# configure to select a subset of features\r\n\tfs = SelectKBest(score_func=f_regression, k=88)\r\n\t# learn relationship from training data\r\n\tfs.fit(X_train, y_train)\r\n\t# transform train input data\r\n\tX_train_fs = fs.transform(X_train)\r\n\t# transform test input data\r\n\tX_test_fs = fs.transform(X_test)\r\n\treturn X_train_fs, X_test_fs, fs\r\n\r\n# load the dataset\r\nX, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)\r\n# split into train and test sets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)\r\n# feature selection\r\nX_train_fs, X_test_fs, fs = select_features(X_train, y_train, X_test)\r\n# fit the model\r\nmodel = LinearRegression()\r\nmodel.fit(X_train_fs, y_train)\r\n# evaluate the model\r\nyhat = model.predict(X_test_fs)\r\n# evaluate predictions\r\nmae = mean_absolute_error(y_test, yhat)\r\nprint('MAE: %.3f' % mae)<\/pre>\n<p>Running the example reports the performance of the model on 88 of the 100 input features selected using the correlation statistic.<\/p>\n<p>Note that your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p>In this case, we can see that removing some of the redundant features has resulted in a small lift in performance with an error of about 0.085 compared to the baseline that achieved an error of about 0.086.<\/p>\n<pre class=\"crayon-plain-tag\">MAE: 0.085<\/pre>\n<\/p>\n<h3>Model Built Using Mutual Information Features<\/h3>\n<p>We can repeat the experiment and select the top 88 features using a mutual information statistic.<\/p>\n<p>The updated version of the <em>select_features()<\/em> function to achieve this is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># feature selection\r\ndef select_features(X_train, y_train, X_test):\r\n\t# configure to select a subset of features\r\n\tfs = SelectKBest(score_func=mutual_info_regression, k=88)\r\n\t# learn relationship from training data\r\n\tfs.fit(X_train, y_train)\r\n\t# transform train input data\r\n\tX_train_fs = fs.transform(X_train)\r\n\t# transform test input data\r\n\tX_test_fs = fs.transform(X_test)\r\n\treturn X_train_fs, X_test_fs, fs<\/pre>\n<p>The complete example of using mutual information for feature selection to fit a linear regression model is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluation of a model using 88 features chosen with mutual information\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.feature_selection import SelectKBest\r\nfrom sklearn.feature_selection import mutual_info_regression\r\nfrom sklearn.linear_model import LinearRegression\r\nfrom sklearn.metrics import mean_absolute_error\r\n\r\n# feature selection\r\ndef select_features(X_train, y_train, X_test):\r\n\t# configure to select a subset of features\r\n\tfs = SelectKBest(score_func=mutual_info_regression, k=88)\r\n\t# learn relationship from training data\r\n\tfs.fit(X_train, y_train)\r\n\t# transform train input data\r\n\tX_train_fs = fs.transform(X_train)\r\n\t# transform test input data\r\n\tX_test_fs = fs.transform(X_test)\r\n\treturn X_train_fs, X_test_fs, fs\r\n\r\n# load the dataset\r\nX, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)\r\n# split into train and test sets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)\r\n# feature selection\r\nX_train_fs, X_test_fs, fs = select_features(X_train, y_train, X_test)\r\n# fit the model\r\nmodel = LinearRegression()\r\nmodel.fit(X_train_fs, y_train)\r\n# evaluate the model\r\nyhat = model.predict(X_test_fs)\r\n# evaluate predictions\r\nmae = mean_absolute_error(y_test, yhat)\r\nprint('MAE: %.3f' % mae)<\/pre>\n<p>Running the example fits the model on the 88 top selected features chosen using mutual information.<\/p>\n<p>Note: your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p>In this case, we can see a further reduction in error as compared to the correlation statistic, in this case, achieving a MAE of about 0.084 compared to 0.085 in the previous section.<\/p>\n<pre class=\"crayon-plain-tag\">MAE: 0.084<\/pre>\n<\/p>\n<h2>Tune the Number of Selected Features<\/h2>\n<p>In the previous example, we selected 88 features, but how do we know that is a good or best number of features to select?<\/p>\n<p>Instead of guessing, we can systematically test a range of different numbers of selected features and discover which results in the best performing model. This is called a grid search, where the <em>k<\/em> argument to the <em>SelectKBest<\/em> class can be tuned.<\/p>\n<p>It is a good practice to evaluate model configurations on regression tasks using <a href=\"https:\/\/machinelearningmastery.com\/k-fold-cross-validation\/\">repeated stratified k-fold cross-validation<\/a>. We will use three repeats of 10-fold cross-validation via the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.RepeatedKFold.html\">RepeatedKFold&nbsp;class<\/a>.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define the evaluation method\r\ncv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)<\/pre>\n<p>We can define a <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.pipeline.Pipeline.html\">Pipeline<\/a> that correctly prepares the feature selection transform on the training set and applies it to the train set and test set for each fold of the cross-validation.<\/p>\n<p>In this case, we will use the mutual information statistical method for selecting features.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define the pipeline to evaluate\r\nmodel = LinearRegression()\r\nfs = SelectKBest(score_func=mutual_info_regression)\r\npipeline = Pipeline(steps=[('sel',fs), ('lr', model)])<\/pre>\n<p>We can then define the grid of values to evaluate as 80 to 100.<\/p>\n<p>Note that the grid is a dictionary mapping of parameter-to-values to search, and given that we are using a <em>Pipeline<\/em>, we can access the <em>SelectKBest<\/em> object via the name we gave it &lsquo;<em>sel<\/em>&lsquo; and then the parameter name &lsquo;<em>k<\/em>&lsquo; separated by two underscores, or &lsquo;<em>sel__k<\/em>&lsquo;.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define the grid\r\ngrid = dict()\r\ngrid['sel__k'] = [i for i in range(X.shape[1]-20, X.shape[1]+1)]<\/pre>\n<p>We can then define and run the search.<\/p>\n<p>In this case, we will evaluate models using the negative mean absolute error (<em>neg_mean_absolute_error<\/em>). It is negative because the scikit-learn requires the score to be maximized, so the MAE is made negative, meaning scores scale from -infinity to 0 (best).<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define the grid search\r\nsearch = GridSearchCV(pipeline, grid, scoring='neg_mean_absolure_error', n_jobs=-1, cv=cv)\r\n# perform the search\r\nresults = search.fit(X, y)<\/pre>\n<p>Tying this together, the complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># compare different numbers of features selected using mutual information\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import RepeatedKFold\r\nfrom sklearn.feature_selection import SelectKBest\r\nfrom sklearn.feature_selection import mutual_info_regression\r\nfrom sklearn.linear_model import LinearRegression\r\nfrom sklearn.pipeline import Pipeline\r\nfrom sklearn.model_selection import GridSearchCV\r\n# define dataset\r\nX, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)\r\n# define the evaluation method\r\ncv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n# define the pipeline to evaluate\r\nmodel = LinearRegression()\r\nfs = SelectKBest(score_func=mutual_info_regression)\r\npipeline = Pipeline(steps=[('sel',fs), ('lr', model)])\r\n# define the grid\r\ngrid = dict()\r\ngrid['sel__k'] = [i for i in range(X.shape[1]-20, X.shape[1]+1)]\r\n# define the grid search\r\nsearch = GridSearchCV(pipeline, grid, scoring='neg_mean_squared_error', n_jobs=-1, cv=cv)\r\n# perform the search\r\nresults = search.fit(X, y)\r\n# summarize best\r\nprint('Best MAE: %.3f' % results.best_score_)\r\nprint('Best Config: %s' % results.best_params_)\r\n# summarize all\r\nmeans = results.cv_results_['mean_test_score']\r\nparams = results.cv_results_['params']\r\nfor mean, param in zip(means, params):\r\n    print(\"&gt;%.3f with: %r\" % (mean, param))<\/pre>\n<p>Running the example grid searches different numbers of selected features using mutual information statistics, where each modeling pipeline is evaluated using repeated cross-validation.<\/p>\n<p>Your specific results may vary given the stochastic nature of the learning algorithm and evaluating procedure. Try running the example a few times.<\/p>\n<p>In this case, we can see that the best number of selected features is 81, which achieves a MAE of about 0.082 (ignoring the sign).<\/p>\n<pre class=\"crayon-plain-tag\">Best MAE: -0.082\r\nBest Config: {'sel__k': 81}\r\n&gt;-1.100 with: {'sel__k': 80}\r\n&gt;-0.082 with: {'sel__k': 81}\r\n&gt;-0.082 with: {'sel__k': 82}\r\n&gt;-0.082 with: {'sel__k': 83}\r\n&gt;-0.082 with: {'sel__k': 84}\r\n&gt;-0.082 with: {'sel__k': 85}\r\n&gt;-0.082 with: {'sel__k': 86}\r\n&gt;-0.082 with: {'sel__k': 87}\r\n&gt;-0.082 with: {'sel__k': 88}\r\n&gt;-0.083 with: {'sel__k': 89}\r\n&gt;-0.083 with: {'sel__k': 90}\r\n&gt;-0.083 with: {'sel__k': 91}\r\n&gt;-0.083 with: {'sel__k': 92}\r\n&gt;-0.083 with: {'sel__k': 93}\r\n&gt;-0.083 with: {'sel__k': 94}\r\n&gt;-0.083 with: {'sel__k': 95}\r\n&gt;-0.083 with: {'sel__k': 96}\r\n&gt;-0.083 with: {'sel__k': 97}\r\n&gt;-0.083 with: {'sel__k': 98}\r\n&gt;-0.083 with: {'sel__k': 99}\r\n&gt;-0.083 with: {'sel__k': 100}<\/pre>\n<p>We might want to see the relationship between the number of selected features and MAE. In this relationship, we may expect that more features result in better performance, to a point.<\/p>\n<p>This relationship can be explored by manually evaluating each configuration of k for the <em>SelectKBest<\/em> from 81 to 100, gathering the sample of MAE scores, and plotting the results using box and whisker plots side by side. The spread and mean of these box plots would be expected to show any interesting relationship between the number of selected features and the MAE of the pipeline.<\/p>\n<p>Note that we started the spread of <em>k<\/em> values at 81 instead of 80 because the distribution of MAE scores for <em>k=80<\/em> is dramatically larger than all other values of k considered and it washed out the plot of the results on the graph.<\/p>\n<p>The complete example of achieving this is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># compare different numbers of features selected using mutual information\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_regression\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.model_selection import RepeatedKFold\r\nfrom sklearn.feature_selection import SelectKBest\r\nfrom sklearn.feature_selection import mutual_info_regression\r\nfrom sklearn.linear_model import LinearRegression\r\nfrom sklearn.pipeline import Pipeline\r\nfrom matplotlib import pyplot\r\n# define dataset\r\nX, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)\r\n# define number of features to evaluate\r\nnum_features = [i for i in range(X.shape[1]-19, X.shape[1]+1)]\r\n# enumerate each number of features\r\nresults = list()\r\nfor k in num_features:\r\n\t# create pipeline\r\n\tmodel = LinearRegression()\r\n\tfs = SelectKBest(score_func=mutual_info_regression, k=k)\r\n\tpipeline = Pipeline(steps=[('sel',fs), ('lr', model)])\r\n\t# evaluate the model\r\n\tcv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)\r\n\tscores = cross_val_score(pipeline, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)\r\n\tresults.append(scores)\r\n\t# summarize the results\r\n\tprint('&gt;%d %.3f (%.3f)' % (k, mean(scores), std(scores)))\r\n# plot model performance for comparison\r\npyplot.boxplot(results, labels=num_features, showmeans=True)\r\npyplot.show()<\/pre>\n<p>Running the example first reports the mean and standard deviation MAE for each number of selected features.<\/p>\n<p>Your specific results may vary given the stochastic nature of the learning algorithm and evaluating procedure. Try running the example a few times.<\/p>\n<p>In this case, reporting the mean and standard deviation of MAE is not very interesting, other than values of k in the 80s appear better than those in the 90s.<\/p>\n<pre class=\"crayon-plain-tag\">&gt;81 -0.082 (0.006)\r\n&gt;82 -0.082 (0.006)\r\n&gt;83 -0.082 (0.006)\r\n&gt;84 -0.082 (0.006)\r\n&gt;85 -0.082 (0.006)\r\n&gt;86 -0.082 (0.006)\r\n&gt;87 -0.082 (0.006)\r\n&gt;88 -0.082 (0.006)\r\n&gt;89 -0.083 (0.006)\r\n&gt;90 -0.083 (0.006)\r\n&gt;91 -0.083 (0.006)\r\n&gt;92 -0.083 (0.006)\r\n&gt;93 -0.083 (0.006)\r\n&gt;94 -0.083 (0.006)\r\n&gt;95 -0.083 (0.006)\r\n&gt;96 -0.083 (0.006)\r\n&gt;97 -0.083 (0.006)\r\n&gt;98 -0.083 (0.006)\r\n&gt;99 -0.083 (0.006)\r\n&gt;100 -0.083 (0.006)<\/pre>\n<p>Box and whisker plots are created side by side showing the trend of <em>k<\/em> vs. MAE where the green triangle represents the mean and orange line represents the median of the distribution.<\/p>\n<div id=\"attachment_10945\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10945\" class=\"size-full wp-image-10945\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/06\/Box-and-Whisker-Plots-of-MAE-for-Each-Number-of-Selected-Features-using-Mutual-Information2.png\" alt=\"Box and Whisker Plots of MAE for Each Number of Selected Features Using Mutual Information\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/06\/Box-and-Whisker-Plots-of-MAE-for-Each-Number-of-Selected-Features-using-Mutual-Information2.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/06\/Box-and-Whisker-Plots-of-MAE-for-Each-Number-of-Selected-Features-using-Mutual-Information2-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/06\/Box-and-Whisker-Plots-of-MAE-for-Each-Number-of-Selected-Features-using-Mutual-Information2-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/06\/Box-and-Whisker-Plots-of-MAE-for-Each-Number-of-Selected-Features-using-Mutual-Information2-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10945\" class=\"wp-caption-text\">Box and Whisker Plots of MAE for Each Number of Selected Features Using Mutual Information<\/p>\n<\/div>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Tutorials<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/feature-selection-with-real-and-categorical-data\/\">How to Choose a Feature Selection Method For Machine Learning<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/feature-selection-with-categorical-data\/\">How to Perform Feature Selection with Categorical Data<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/how-to-use-correlation-to-understand-the-relationship-between-variables\/\">How to Calculate Correlation Between Variables in Python<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/information-gain-and-mutual-information\">What Is Information Gain and Mutual Information for Machine Learning<\/a><\/li>\n<\/ul>\n<h3>Books<\/h3>\n<ul>\n<li><a href=\"https:\/\/amzn.to\/3b2LHTL\">Applied Predictive Modeling<\/a>, 2013.<\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/feature_selection.html\">Feature selection, Scikit-Learn User Guide<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_regression.html\">sklearn.datasets.make_regression API<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.feature_selection.f_regression.html\">sklearn.feature_selection.f_regression API<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.feature_selection.mutual_info_regression.html\">sklearn.feature_selection.mutual_info_regression API<\/a>.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Pearson_correlation_coefficient\">Pearson correlation coefficient, Wikipedia<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to perform feature selection with numerical input data for regression predictive modeling.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>How to evaluate the importance of numerical input data using the correlation and mutual information statistics.<\/li>\n<li>How to perform feature selection for numerical input data when fitting and evaluating a regression model.<\/li>\n<li>How to tune the number of features selected in a modeling pipeline using a grid search.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/feature-selection-for-regression-data\/\">How to Perform Feature Selection for Regression Data<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/feature-selection-for-regression-data\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Feature selection is the process of identifying and selecting a subset of input variables that are most relevant to the target variable. [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2020\/06\/07\/how-to-perform-feature-selection-for-regression-data\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":3539,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3538"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=3538"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3538\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/3539"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=3538"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=3538"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=3538"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}