{"id":3515,"date":"2020-05-31T19:00:37","date_gmt":"2020-05-31T19:00:37","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2020\/05\/31\/test-time-augmentation-for-structured-data-with-scikit-learn\/"},"modified":"2020-05-31T19:00:37","modified_gmt":"2020-05-31T19:00:37","slug":"test-time-augmentation-for-structured-data-with-scikit-learn","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2020\/05\/31\/test-time-augmentation-for-structured-data-with-scikit-learn\/","title":{"rendered":"Test-Time Augmentation For Structured Data With Scikit-Learn"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Test-time augmentation, or TTA for short, is a technique for improving the skill of predictive models.<\/p>\n<p>It is typically used to improve the predictive performance of deep learning models on image datasets where predictions are averaged across multiple augmented versions of each image in the test dataset.<\/p>\n<p>Although popular with image datasets and neural network models, test-time augmentation can be used with any machine learning algorithm on tabular datasets, such as those often seen in regression and classification predictive modeling problems.<\/p>\n<p>In this tutorial, you will discover how to use test-time augmentation for tabular data in scikit-learn.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Test-time augmentation is a technique for improving model performance and is commonly used for deep learning models on image datasets.<\/li>\n<li>How to implement test-time augmentation for regression and classification tabular datasets in Python with scikit-learn.<\/li>\n<li>How to tune the number of synthetic examples and amount of statistical noise used in test-time augmentation.<\/li>\n<\/ul>\n<p>Let&rsquo;s get started.<\/p>\n<div id=\"attachment_10643\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10643\" class=\"size-full wp-image-10643\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/08\/Test-Time-Augmentation-With-Scikit-Learn.jpg\" alt=\"Test-Time Augmentation With Scikit-Learn\" width=\"800\" height=\"534\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/08\/Test-Time-Augmentation-With-Scikit-Learn.jpg 800w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/08\/Test-Time-Augmentation-With-Scikit-Learn-300x200.jpg 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/08\/Test-Time-Augmentation-With-Scikit-Learn-768x513.jpg 768w\" sizes=\"(max-width: 800px) 100vw, 800px\"><\/p>\n<p id=\"caption-attachment-10643\" class=\"wp-caption-text\">Test-Time Augmentation With Scikit-Learn<br \/>Photo by <a href=\"https:\/\/flickr.com\/photos\/barnimages\/21187172316\/\">barnimages<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Test-Time Augmentation<\/li>\n<li>Standard Model Evaluation<\/li>\n<li>Test-Time Augmentation Example<\/li>\n<\/ol>\n<h2>Test-Time Augmentation<\/h2>\n<p>Test-time augmentation, or TTA for short, is a technique for improving the skill of a predictive model.<\/p>\n<p>It is a procedure implemented when using a fit model to make predictions, such as on a test dataset or on new data. The procedure involves creating multiple slightly modified copies of each example in the dataset. A prediction is made for each modified example and the predictions are averaged to give a more accurate prediction for the original example.<\/p>\n<p>TTA is often used with image classification, where image data augmentation is used to create multiple modified versions of each image, such as crops, zooms, rotations, and other image-specific modifications. As such, the technique results in a lift in the performance of image classification algorithms on standard datasets.<\/p>\n<p>In their 2015 paper that achieved then state-of-the-art results on the ILSVRC dataset titled &ldquo;<em>Very Deep Convolutional Networks for Large-Scale Image Recognition<\/em>,&rdquo; the authors use horizontal flip test-time augmentation:<\/p>\n<blockquote>\n<p>We also augment the test set by horizontal flipping of the images; the soft-max class posteriors of the original and flipped images are averaged to obtain the final scores for the image.<\/p>\n<\/blockquote>\n<p>&mdash; <a href=\"https:\/\/arxiv.org\/abs\/1409.1556\">Very Deep Convolutional Networks for Large-Scale Image Recognition<\/a>, 2015.<\/p>\n<p>For more on test-time augmentation with image data, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/how-to-use-test-time-augmentation-to-improve-model-performance-for-image-classification\/\">How to Use Test-Time Augmentation to Make Better Predictions<\/a><\/li>\n<\/ul>\n<p>Although often used for image data, test-time augmentation can also be used for other data types, such as tabular data (e.g. rows and columns of numbers).<\/p>\n<p>There are many ways that TTA can be used with tabular data. One simple approach involves creating copies of rows of data with small Gaussian noise added. The predictions from the copied rows can then be averaged to result in an improved prediction for regression or classification.<\/p>\n<p>We will explore how this might be achieved using the scikit-learn Python machine learning library.<\/p>\n<p>First, let&rsquo;s define a standard approach for evaluating a model.<\/p>\n<h2>Standard Model Evaluation<\/h2>\n<p>In this section, we will explore the typical way of evaluating a machine learning model before we introduce test-time augmentation in the next section.<\/p>\n<p>First, let&rsquo;s define a synthetic classification dataset.<\/p>\n<p>We will use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">make_classification() function<\/a> to create a dataset with 100 examples, each with 20 input variables.<\/p>\n<p>The example creates and summarizes the dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># test classification dataset\r\nfrom sklearn.datasets import make_classification\r\n# define dataset\r\nX, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# summarize the dataset\r\nprint(X.shape, y.shape)<\/pre>\n<p>Running the example creates the dataset and confirms the number of rows and columns of the dataset.<\/p>\n<pre class=\"crayon-plain-tag\">(100, 20) (100,)<\/pre>\n<p>This is a binary classification task and we will fit and evaluate a linear model, specifically, a logistic regression model.<\/p>\n<p>A good practice when evaluating machine learning models is to use repeated k-fold cross-validation. When the dataset is a classification problem, it is important to ensure that a stratified version of k-fold cross-validation is used. As such, we will use repeated stratified k-fold cross-validation with 10 folds and 5 repeats.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# prepare the cross-validation procedure\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=5, random_state=1)<\/pre>\n<p>We will enumerate the folds and repeats manually so that later we can perform test-time augmentation.<\/p>\n<p>Each loop, we must define and fit the model, then use the fit model to make a prediction, evaluate the predictions, and store the result.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\nscores = list()\r\nfor train_ix, test_ix in cv.split(X, y):\r\n\t# split the data\r\n\tX_train, X_test = X[train_ix], X[test_ix]\r\n\ty_train, y_test = y[train_ix], y[test_ix]\r\n\t# fit model\r\n\tmodel = LogisticRegression()\r\n\tmodel.fit(X_train, y_train)\r\n\t# evaluate model\r\n\ty_hat = model.predict(X_test)\r\n\tacc = accuracy_score(y_test, y_hat)\r\n\tscores.append(acc)<\/pre>\n<p>At the end, we can report the mean classification accuracy across all folds and repeats.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# report performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))<\/pre>\n<p>Tying this together, the complete example of evaluating a logistic regression model on the synthetic binary classification dataset is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate logistic regression using repeated stratified k-fold cross-validation\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.linear_model import LogisticRegression\r\nfrom sklearn.metrics import accuracy_score\r\n# create dataset\r\nX, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# prepare the cross-validation procedure\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=5, random_state=1)\r\nscores = list()\r\nfor train_ix, test_ix in cv.split(X, y):\r\n\t# split the data\r\n\tX_train, X_test = X[train_ix], X[test_ix]\r\n\ty_train, y_test = y[train_ix], y[test_ix]\r\n\t# fit model\r\n\tmodel = LogisticRegression()\r\n\tmodel.fit(X_train, y_train)\r\n\t# evaluate model\r\n\ty_hat = model.predict(X_test)\r\n\tacc = accuracy_score(y_test, y_hat)\r\n\tscores.append(acc)\r\n# report performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))<\/pre>\n<p>Running the example evaluates the logistic regression using repeated stratified k-fold cross-validation.<\/p>\n<p>Your specific results may differ given the stochastic nature of the learning algorithm. Consider running the example a few times,<\/p>\n<p>In this case, we can see that the model achieved the mean classification accuracy of 79.8 percent.<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: 0.798 (0.110)<\/pre>\n<p>Next, let&rsquo;s explore how we might update this example to use test-time augmentation.<\/p>\n<h2>Test-Time Augmentation Example<\/h2>\n<p>Implementing test-time augmentation involves two steps.<\/p>\n<p>The first step is to select a method for creating modified versions of each row in the test set.<\/p>\n<p>In this tutorial, we will add Gaussian random noise to each feature. An alternate approach might be to add uniformly random noise or even copy feature values from examples in the test dataset.<\/p>\n<p>The <a href=\"https:\/\/docs.scipy.org\/doc\/numpy-1.15.0\/reference\/generated\/numpy.random.normal.html\">normal() NumPy function<\/a> will be used to create a vector of random Gaussian values with a zero mean and small standard deviation. The standard deviation should be proportional to the distribution for each variable in the training dataset. In this case, we will keep the example simple and use a value of 0.02.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# create vector of random gaussians\r\ngauss = normal(loc=0.0, scale=feature_scale, size=len(row))\r\n# add to test case\r\nnew_row = row + gauss<\/pre>\n<p>Given a row of data from the test set, we can create a given number of modified copies. It is a good idea to use an odd number of copies, such as 3, 5, or 7, as when we average the labels assigned to each later, we want to break ties automatically.<\/p>\n<p>The <em>create_test_set()<\/em> function below implements this; given a row of data, it will return a test set that contains the row as well as &ldquo;<em>n_cases<\/em>&rdquo; modified copies, defaulting to 3 (so the test set size is 4).<\/p>\n<pre class=\"crayon-plain-tag\"># create a test set for a row of real data with an unknown label\r\ndef create_test_set(row, n_cases=3, feature_scale=0.2):\r\n\ttest_set = list()\r\n\ttest_set.append(row)\r\n\t# make copies of row\r\n\tfor _ in range(n_cases):\r\n\t\t# create vector of random gaussians\r\n\t\tgauss = normal(loc=0.0, scale=feature_scale, size=len(row))\r\n\t\t# add to test case\r\n\t\tnew_row = row + gauss\r\n\t\t# store in test set\r\n\t\ttest_set.append(new_row)\r\n\treturn test_set<\/pre>\n<p>An improvement to this approach would be to standardize or normalize the train and test datasets each loop and then use a standard deviation for the <a href=\"https:\/\/docs.scipy.org\/doc\/numpy-1.15.0\/reference\/generated\/numpy.random.normal.html\">normal()<\/a> that is consistent across features meaningful to the standard normal. This is left as an exercise for the reader.<\/p>\n<p>The second setup is to make use of the <em>create_test_set()<\/em> for each example in the test set, make a prediction for the constructed test set, and record the predicted label using a summary statistic across the predictions. Given that the prediction is categorical, the statistical mode would be appropriate, via the <a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.stats.mode.html\">mode() scipy function<\/a>. If the dataset was regression or we were predicting probabilities, the mean or median would be more appropriate.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# create the test set\r\ntest_set = create_test_set(row)\r\n# make a prediction for all examples in the test set\r\nlabels = model.predict(test_set)\r\n# select the label as the mode of the distribution\r\nlabel, _ = mode(labels)<\/pre>\n<p>The <em>test_time_augmentation()<\/em> function below implements this; given a model and a test set, it returns an array of predictions where each prediction was made using test-time augmentation.<\/p>\n<pre class=\"crayon-plain-tag\"># make predictions using test-time augmentation\r\ndef test_time_augmentation(model, X_test):\r\n\t# evaluate model\r\n\ty_hat = list()\r\n\tfor i in range(X_test.shape[0]):\r\n\t\t# retrieve the row\r\n\t\trow = X_test[i]\r\n\t\t# create the test set\r\n\t\ttest_set = create_test_set(row)\r\n\t\t# make a prediction for all examples in the test set\r\n\t\tlabels = model.predict(test_set)\r\n\t\t# select the label as the mode of the distribution\r\n\t\tlabel, _ = mode(labels)\r\n\t\t# store the prediction\r\n\t\ty_hat.append(label)\r\n\treturn y_hat<\/pre>\n<p>Tying all of this together, the complete example of evaluating the logistic regression model on the dataset using test-time augmentation is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate logistic regression using test-time augmentation\r\nfrom numpy.random import seed\r\nfrom numpy.random import normal\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom scipy.stats import mode\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.linear_model import LogisticRegression\r\nfrom sklearn.metrics import accuracy_score\r\n\r\n# create a test set for a row of real data with an unknown label\r\ndef create_test_set(row, n_cases=3, feature_scale=0.2):\r\n\ttest_set = list()\r\n\ttest_set.append(row)\r\n\t# make copies of row\r\n\tfor _ in range(n_cases):\r\n\t\t# create vector of random gaussians\r\n\t\tgauss = normal(loc=0.0, scale=feature_scale, size=len(row))\r\n\t\t# add to test case\r\n\t\tnew_row = row + gauss\r\n\t\t# store in test set\r\n\t\ttest_set.append(new_row)\r\n\treturn test_set\r\n\r\n# make predictions using test-time augmentation\r\ndef test_time_augmentation(model, X_test):\r\n\t# evaluate model\r\n\ty_hat = list()\r\n\tfor i in range(X_test.shape[0]):\r\n\t\t# retrieve the row\r\n\t\trow = X_test[i]\r\n\t\t# create the test set\r\n\t\ttest_set = create_test_set(row)\r\n\t\t# make a prediction for all examples in the test set\r\n\t\tlabels = model.predict(test_set)\r\n\t\t# select the label as the mode of the distribution\r\n\t\tlabel, _ = mode(labels)\r\n\t\t# store the prediction\r\n\t\ty_hat.append(label)\r\n\treturn y_hat\r\n\r\n# initialize numpy random number generator\r\nseed(1)\r\n# create dataset\r\nX, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n# prepare the cross-validation procedure\r\ncv = RepeatedStratifiedKFold(n_splits=10, n_repeats=5, random_state=1)\r\nscores = list()\r\nfor train_ix, test_ix in cv.split(X, y):\r\n\t# split the data\r\n\tX_train, X_test = X[train_ix], X[test_ix]\r\n\ty_train, y_test = y[train_ix], y[test_ix]\r\n\t# fit model\r\n\tmodel = LogisticRegression()\r\n\tmodel.fit(X_train, y_train)\r\n\t# make predictions using test-time augmentation\r\n\ty_hat = test_time_augmentation(model, X_test)\r\n\t# calculate the accuracy for this iteration\r\n\tacc = accuracy_score(y_test, y_hat)\r\n\t# store the result\r\n\tscores.append(acc)\r\n# report performance\r\nprint('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))<\/pre>\n<p>Running the example evaluates the logistic regression using repeated stratified k-fold cross-validation and test-time augmentation.<\/p>\n<p>Your specific results may differ given the stochastic nature of the learning algorithm. Consider running the example a few times.<\/p>\n<p>In this case, we can see that the model achieved the mean classification accuracy of 81.0 percent, which is better than the test harness that does not use test-time augmentation that achieved an accuracy of 79.8 percent.<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: 0.810 (0.114)<\/pre>\n<p>It might be interesting to grid search the number of synthetic examples created each time a prediction is made during test-time augmentation.<\/p>\n<p>The example below explores values between 1 and 20 and plots the results.<\/p>\n<pre class=\"crayon-plain-tag\"># compare the number of synthetic examples created during the test-time augmentation\r\nfrom numpy.random import seed\r\nfrom numpy.random import normal\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom scipy.stats import mode\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.linear_model import LogisticRegression\r\nfrom sklearn.metrics import accuracy_score\r\nfrom matplotlib import pyplot\r\n\r\n# create a test set for a row of real data with an unknown label\r\ndef create_test_set(row, n_cases=3, feature_scale=0.2):\r\n\ttest_set = list()\r\n\ttest_set.append(row)\r\n\t# make copies of row\r\n\tfor _ in range(n_cases):\r\n\t\t# create vector of random gaussians\r\n\t\tgauss = normal(loc=0.0, scale=feature_scale, size=len(row))\r\n\t\t# add to test case\r\n\t\tnew_row = row + gauss\r\n\t\t# store in test set\r\n\t\ttest_set.append(new_row)\r\n\treturn test_set\r\n\r\n# make predictions using test-time augmentation\r\ndef test_time_augmentation(model, X_test, cases):\r\n\t# evaluate model\r\n\ty_hat = list()\r\n\tfor i in range(X_test.shape[0]):\r\n\t\t# retrieve the row\r\n\t\trow = X_test[i]\r\n\t\t# create the test set\r\n\t\ttest_set = create_test_set(row, n_cases=cases)\r\n\t\t# make a prediction for all examples in the test set\r\n\t\tlabels = model.predict(test_set)\r\n\t\t# select the label as the mode of the distribution\r\n\t\tlabel, _ = mode(labels)\r\n\t\t# store the prediction\r\n\t\ty_hat.append(label)\r\n\treturn y_hat\r\n\r\n# evaluate different number of synthetic examples created at test time\r\nexamples = range(1, 21)\r\nresults = list()\r\nfor e in examples:\r\n\t# initialize numpy random number generator\r\n\tseed(1)\r\n\t# create dataset\r\n\tX, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n\t# prepare the cross-validation procedure\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=5, random_state=1)\r\n\tscores = list()\r\n\tfor train_ix, test_ix in cv.split(X, y):\r\n\t\t# split the data\r\n\t\tX_train, X_test = X[train_ix], X[test_ix]\r\n\t\ty_train, y_test = y[train_ix], y[test_ix]\r\n\t\t# fit model\r\n\t\tmodel = LogisticRegression()\r\n\t\tmodel.fit(X_train, y_train)\r\n\t\t# make predictions using test-time augmentation\r\n\t\ty_hat = test_time_augmentation(model, X_test, e)\r\n\t\t# calculate the accuracy for this iteration\r\n\t\tacc = accuracy_score(y_test, y_hat)\r\n\t\t# store the result\r\n\t\tscores.append(acc)\r\n\t# report performance\r\n\tprint('&gt;%d, acc: %.3f (%.3f)' % (e, mean(scores), std(scores)))\r\n\tresults.append(mean(scores))\r\n# plot the results\r\npyplot.plot(examples, results)\r\npyplot.show()<\/pre>\n<p>Running the example reports the accuracy for different numbers of synthetic examples created during test-time augmentation.<\/p>\n<p>Your specific results may differ given the stochastic nature of the learning algorithm. Consider running the example a few times.<\/p>\n<p>Recall that we used three examples in the previous example.<\/p>\n<p>In this case, it looks like a value of three might be optimal for this test harness, as all other values seem to result in lower performance.<\/p>\n<pre class=\"crayon-plain-tag\">&gt;1, acc: 0.800 (0.118)\r\n&gt;2, acc: 0.806 (0.114)\r\n&gt;3, acc: 0.810 (0.114)\r\n&gt;4, acc: 0.798 (0.105)\r\n&gt;5, acc: 0.802 (0.109)\r\n&gt;6, acc: 0.798 (0.107)\r\n&gt;7, acc: 0.800 (0.111)\r\n&gt;8, acc: 0.802 (0.110)\r\n&gt;9, acc: 0.806 (0.105)\r\n&gt;10, acc: 0.802 (0.110)\r\n&gt;11, acc: 0.798 (0.112)\r\n&gt;12, acc: 0.806 (0.110)\r\n&gt;13, acc: 0.802 (0.110)\r\n&gt;14, acc: 0.802 (0.109)\r\n&gt;15, acc: 0.798 (0.110)\r\n&gt;16, acc: 0.796 (0.111)\r\n&gt;17, acc: 0.806 (0.112)\r\n&gt;18, acc: 0.796 (0.111)\r\n&gt;19, acc: 0.800 (0.113)\r\n&gt;20, acc: 0.804 (0.109)<\/pre>\n<p>A line plot of number of examples vs. classification accuracy is created showing that perhaps odd numbers of examples generally result in better performance than even numbers of examples.<\/p>\n<p>This might be expected due to their ability to break ties when using the mode of the predictions.<\/p>\n<div id=\"attachment_10641\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10641\" class=\"size-full wp-image-10641\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/04\/Line-Plot-of-Number-of-Synthetic-Examples-in-TTA-vs-Classification-Accuracy.png\" alt=\"Line Plot of Number of Synthetic Examples in TTA vs. Classification Accuracy\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/04\/Line-Plot-of-Number-of-Synthetic-Examples-in-TTA-vs-Classification-Accuracy.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/04\/Line-Plot-of-Number-of-Synthetic-Examples-in-TTA-vs-Classification-Accuracy-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/04\/Line-Plot-of-Number-of-Synthetic-Examples-in-TTA-vs-Classification-Accuracy-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/04\/Line-Plot-of-Number-of-Synthetic-Examples-in-TTA-vs-Classification-Accuracy-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10641\" class=\"wp-caption-text\">Line Plot of Number of Synthetic Examples in TTA vs. Classification Accuracy<\/p>\n<\/div>\n<p>We can also perform the same sensitivity analysis with the amount of random noise added to examples in the test set during test-time augmentation.<\/p>\n<p>The example below demonstrates this with noise values between 0.01 and 0.3 with a grid of 0.01.<\/p>\n<pre class=\"crayon-plain-tag\"># compare amount of noise added to examples created during the test-time augmentation\r\nfrom numpy.random import seed\r\nfrom numpy.random import normal\r\nfrom numpy import arange\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom scipy.stats import mode\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.model_selection import RepeatedStratifiedKFold\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.linear_model import LogisticRegression\r\nfrom sklearn.metrics import accuracy_score\r\nfrom matplotlib import pyplot\r\n\r\n# create a test set for a row of real data with an unknown label\r\ndef create_test_set(row, n_cases=3, feature_scale=0.2):\r\n\ttest_set = list()\r\n\ttest_set.append(row)\r\n\t# make copies of row\r\n\tfor _ in range(n_cases):\r\n\t\t# create vector of random gaussians\r\n\t\tgauss = normal(loc=0.0, scale=feature_scale, size=len(row))\r\n\t\t# add to test case\r\n\t\tnew_row = row + gauss\r\n\t\t# store in test set\r\n\t\ttest_set.append(new_row)\r\n\treturn test_set\r\n\r\n# make predictions using test-time augmentation\r\ndef test_time_augmentation(model, X_test, noise):\r\n\t# evaluate model\r\n\ty_hat = list()\r\n\tfor i in range(X_test.shape[0]):\r\n\t\t# retrieve the row\r\n\t\trow = X_test[i]\r\n\t\t# create the test set\r\n\t\ttest_set = create_test_set(row, feature_scale=noise)\r\n\t\t# make a prediction for all examples in the test set\r\n\t\tlabels = model.predict(test_set)\r\n\t\t# select the label as the mode of the distribution\r\n\t\tlabel, _ = mode(labels)\r\n\t\t# store the prediction\r\n\t\ty_hat.append(label)\r\n\treturn y_hat\r\n\r\n# evaluate different number of synthetic examples created at test time\r\nnoise = arange(0.01, 0.31, 0.01)\r\nresults = list()\r\nfor n in noise:\r\n\t# initialize numpy random number generator\r\n\tseed(1)\r\n\t# create dataset\r\n\tX, y = make_classification(n_samples=100, n_features=20, n_informative=15, n_redundant=5, random_state=1)\r\n\t# prepare the cross-validation procedure\r\n\tcv = RepeatedStratifiedKFold(n_splits=10, n_repeats=5, random_state=1)\r\n\tscores = list()\r\n\tfor train_ix, test_ix in cv.split(X, y):\r\n\t\t# split the data\r\n\t\tX_train, X_test = X[train_ix], X[test_ix]\r\n\t\ty_train, y_test = y[train_ix], y[test_ix]\r\n\t\t# fit model\r\n\t\tmodel = LogisticRegression()\r\n\t\tmodel.fit(X_train, y_train)\r\n\t\t# make predictions using test-time augmentation\r\n\t\ty_hat = test_time_augmentation(model, X_test, n)\r\n\t\t# calculate the accuracy for this iteration\r\n\t\tacc = accuracy_score(y_test, y_hat)\r\n\t\t# store the result\r\n\t\tscores.append(acc)\r\n\t# report performance\r\n\tprint('&gt;noise=%.3f, acc: %.3f (%.3f)' % (n, mean(scores), std(scores)))\r\n\tresults.append(mean(scores))\r\n# plot the results\r\npyplot.plot(noise, results)\r\npyplot.show()<\/pre>\n<p>Running the example reports the accuracy for different amounts of statistical noise added to examples created during test-time augmentation.<\/p>\n<p>Your specific results may differ given the stochastic nature of the learning algorithm. Consider running the example a few times.<\/p>\n<p>Recall that we used a standard deviation of 0.02 in the first example.<\/p>\n<p>In this case, it looks like a value of about 0.230 might be optimal for this test harness, resulting in a slightly higher accuracy of 81.2 percent.<\/p>\n<pre class=\"crayon-plain-tag\">&gt;noise=0.010, acc: 0.798 (0.110)\r\n&gt;noise=0.020, acc: 0.798 (0.110)\r\n&gt;noise=0.030, acc: 0.798 (0.110)\r\n&gt;noise=0.040, acc: 0.800 (0.113)\r\n&gt;noise=0.050, acc: 0.802 (0.112)\r\n&gt;noise=0.060, acc: 0.804 (0.111)\r\n&gt;noise=0.070, acc: 0.806 (0.108)\r\n&gt;noise=0.080, acc: 0.806 (0.108)\r\n&gt;noise=0.090, acc: 0.806 (0.108)\r\n&gt;noise=0.100, acc: 0.806 (0.108)\r\n&gt;noise=0.110, acc: 0.806 (0.108)\r\n&gt;noise=0.120, acc: 0.806 (0.108)\r\n&gt;noise=0.130, acc: 0.806 (0.108)\r\n&gt;noise=0.140, acc: 0.806 (0.108)\r\n&gt;noise=0.150, acc: 0.808 (0.111)\r\n&gt;noise=0.160, acc: 0.808 (0.111)\r\n&gt;noise=0.170, acc: 0.808 (0.111)\r\n&gt;noise=0.180, acc: 0.810 (0.114)\r\n&gt;noise=0.190, acc: 0.810 (0.114)\r\n&gt;noise=0.200, acc: 0.810 (0.114)\r\n&gt;noise=0.210, acc: 0.810 (0.114)\r\n&gt;noise=0.220, acc: 0.810 (0.114)\r\n&gt;noise=0.230, acc: 0.812 (0.114)\r\n&gt;noise=0.240, acc: 0.812 (0.114)\r\n&gt;noise=0.250, acc: 0.812 (0.114)\r\n&gt;noise=0.260, acc: 0.812 (0.114)\r\n&gt;noise=0.270, acc: 0.810 (0.114)\r\n&gt;noise=0.280, acc: 0.808 (0.116)\r\n&gt;noise=0.290, acc: 0.808 (0.116)\r\n&gt;noise=0.300, acc: 0.808 (0.116)<\/pre>\n<p>A line plot of the amount of noise added to examples vs. classification accuracy is created, showing that perhaps a small range of noise around a standard deviation of 0.250 might be optimal on this test harness.<\/p>\n<div id=\"attachment_10642\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-10642\" class=\"size-full wp-image-10642\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/04\/Line-Plot-of-Statistical-Noise-Added-to-Examples-in-TTA-vs-Classification-Accuracy.png\" alt=\"Line Plot of Statistical Noise Added to Examples in TTA vs. Classification Accuracy\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/04\/Line-Plot-of-Statistical-Noise-Added-to-Examples-in-TTA-vs-Classification-Accuracy.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/04\/Line-Plot-of-Statistical-Noise-Added-to-Examples-in-TTA-vs-Classification-Accuracy-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/04\/Line-Plot-of-Statistical-Noise-Added-to-Examples-in-TTA-vs-Classification-Accuracy-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/04\/Line-Plot-of-Statistical-Noise-Added-to-Examples-in-TTA-vs-Classification-Accuracy-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-10642\" class=\"wp-caption-text\">Line Plot of Statistical Noise Added to Examples in TTA vs. Classification Accuracy<\/p>\n<\/div>\n<h3><strong>Why not use an oversampling method like SMOTE?<\/strong><\/h3>\n<p><a href=\"https:\/\/machinelearningmastery.com\/smote-oversampling-for-imbalanced-classification\/\">SMOTE<\/a> is a popular oversampling method for rebalancing observations for each class in a training dataset. It can create synthetic examples but requires knowledge of the class labels which does not make it easy for use in test-time augmentation.<\/p>\n<p>One approach might be to take a given example for which a prediction is required and assume it belongs to a given class. Then generate synthetic samples from the training dataset using the new example as the focal point of the synthesis, and classify them. This is then repeated for each class label. The total or average classification response (perhaps probability) can be tallied for each class group and the group with the largest response can be taken as the prediction.<\/p>\n<p>This is just off the cuff, I have not actually tried this approach. Have a go and let me know if it works.<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Tutorials<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/how-to-use-test-time-augmentation-to-improve-model-performance-for-image-classification\/\">How to Use Test-Time Augmentation to Make Better predictions<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/how-to-generate-random-numbers-in-python\/\">How to Generate Random Numbers in Python<\/a><\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_classification.html\">sklearn.datasets.make_classification API<\/a>.<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.RepeatedStratifiedKFold.html\">sklearn.model_selection.RepeatedStratifiedKFold API<\/a>.<\/li>\n<li><a href=\"https:\/\/docs.scipy.org\/doc\/numpy-1.15.0\/reference\/generated\/numpy.random.normal.html\">numpy.random.normal API<\/a>.<\/li>\n<li><a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.stats.mode.html\">scipy.stats.mode API<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to use test-time augmentation for tabular data in scikit-learn.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Test-time augmentation is a technique for improving model performance and is commonly used for deep learning models on image datasets.<\/li>\n<li>How to implement test-time augmentation for regression and classification tabular datasets in Python with scikit-learn.<\/li>\n<li>How to tune the number of synthetic examples and amount of statistical noise used in test-time augmentation.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/test-time-augmentation-with-scikit-learn\/\">Test-Time Augmentation For Structured Data With Scikit-Learn<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/test-time-augmentation-with-scikit-learn\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Test-time augmentation, or TTA for short, is a technique for improving the skill of predictive models. It is typically used to improve [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2020\/05\/31\/test-time-augmentation-for-structured-data-with-scikit-learn\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":3516,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3515"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=3515"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/3515\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/3516"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=3515"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=3515"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=3515"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}