{"id":2013,"date":"2019-04-14T19:00:01","date_gmt":"2019-04-14T19:00:01","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/04\/14\/how-to-use-test-time-augmentation-to-improve-model-performance-for-image-classification\/"},"modified":"2019-04-14T19:00:01","modified_gmt":"2019-04-14T19:00:01","slug":"how-to-use-test-time-augmentation-to-improve-model-performance-for-image-classification","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/04\/14\/how-to-use-test-time-augmentation-to-improve-model-performance-for-image-classification\/","title":{"rendered":"How to Use Test-Time Augmentation to Improve Model Performance for Image Classification"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Data augmentation is a technique often used to improve performance and reduce generalization error when training neural network models for computer vision problems.<\/p>\n<p>The image data augmentation technique can also be applied when making predictions with a fit model in order to allow the model to make predictions for multiple different versions of each image in the test dataset. The predictions on the augmented images can be averaged, which can result in better predictive performance.<\/p>\n<p>In this tutorial, you will discover test-time augmentation for improving the performance of models for image classification tasks.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Test-time augmentation is the application of data augmentation techniques normally used during training when making predictions.<\/li>\n<li>How to implement test-time augmentation from scratch in Keras.<\/li>\n<li>How to use test-time augmentation to improve the performance of a convolutional neural network model on a standard image classification task.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_7440\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7440\" class=\"size-full wp-image-7440\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/04\/How-to-Use-Test-Time-Augmentation-to-Improve-Model-Performance-for-Image-Classification.jpg\" alt=\"How to Use Test-Time Augmentation to Improve Model Performance for Image Classification\" width=\"640\" height=\"427\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/04\/How-to-Use-Test-Time-Augmentation-to-Improve-Model-Performance-for-Image-Classification.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/04\/How-to-Use-Test-Time-Augmentation-to-Improve-Model-Performance-for-Image-Classification-300x200.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p id=\"caption-attachment-7440\" class=\"wp-caption-text\">How to Use Test-Time Augmentation to Improve Model Performance for Image Classification<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/daveynin\/7206430966\/\">daveynin<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into five parts; they are:<\/p>\n<ol>\n<li>Test-Time Augmentation<\/li>\n<li>Test-Time Augmentation in Keras<\/li>\n<li>Dataset and Baseline Model<\/li>\n<li>Example of Test-Time Augmentation<\/li>\n<li>How to Tune Test-Time Augmentation Configuration<\/li>\n<\/ol>\n<h2>Test-Time Augmentation<\/h2>\n<p>Data augmentation is an approach typically used during the training of the model that expands the training set with modified copies of samples from the training dataset.<\/p>\n<p>Data augmentation is often performed with image data, where copies of images in the training dataset are created with some image manipulation techniques performed, such as zooms, flips, shifts, and more.<\/p>\n<p>The artificially expanded training dataset can result in a more skillful model, as often the performance of deep learning models continues to scale in concert with the size of the training dataset. In addition, the modified or augmented versions of the images in the training dataset assist the model in extracting and learning features in a way that is invariant to their position, lighting, and more.<\/p>\n<p>Test-time augmentation, or TTA for short, is an application of data augmentation to the test dataset.<\/p>\n<p>Specifically, it involves creating multiple augmented copies of each image in the test set, having the model make a prediction for each, then returning an ensemble of those predictions.<\/p>\n<p>Augmentations are chosen to give the model the best opportunity for correctly classifying a given image, and the number of copies of an image for which a model must make a prediction is often small, such as less than 10 or 20.<\/p>\n<p>Often, a single simple test-time augmentation is performed, such as a shift, crop, or image flip.<\/p>\n<p>In their 2015 paper that achieved then state-of-the-art results on the ILSVRC dataset titled \u201c<a href=\"https:\/\/arxiv.org\/abs\/1409.1556\">Very Deep Convolutional Networks for Large-Scale Image Recognition<\/a>,\u201d the authors use horizontal flip test-time augmentation:<\/p>\n<blockquote>\n<p>We also augment the test set by horizontal flipping of the images; the soft-max class posteriors of the original and flipped images are averaged to obtain the final scores for the image.<\/p>\n<\/blockquote>\n<p>Similarly, in their 2015 paper on the inception architecture titled \u201c<a href=\"https:\/\/arxiv.org\/abs\/1512.00567\">Rethinking the Inception Architecture for Computer Vision<\/a>,\u201d the authors at Google use cropping test-time augmentation, which they refer to as multi-crop evaluation.<\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<p><center><\/p>\n<h3>Want Results with Deep Learning for Computer Vision?<\/h3>\n<p>Take my free 7-day email crash course now (with sample code).<\/p>\n<p>Click to sign-up and also get a free PDF Ebook version of the course.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1458ca1e0972a2%3A164f8be4f346dc\/4715926590455808\/\" target=\"_blank\" style=\"background: rgb(255, 206, 10); color: rgb(255, 255, 255); text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-weight: bold; font-size: 16px; line-height: 20px; padding: 10px; display: inline-block; max-width: 300px; border-radius: 5px; text-shadow: rgba(0, 0, 0, 0.25) 0px -1px 1px; box-shadow: rgba(255, 255, 255, 0.5) 0px 1px 3px inset, rgba(0, 0, 0, 0.5) 0px 1px 3px;\" rel=\"noopener noreferrer\">Download Your FREE Mini-Course<\/a><script data-leadbox=\"1458ca1e0972a2:164f8be4f346dc\" data-url=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1458ca1e0972a2%3A164f8be4f346dc\/4715926590455808\/\" data-config=\"%7B%7D\" type=\"text\/javascript\" src=\"https:\/\/machinelearningmastery.lpages.co\/leadbox-1553357564.js\"><\/script><\/p>\n<p><\/center><\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<h2>Test-Time Augmentation in Keras<\/h2>\n<p>Test-time augmentation is not provided natively in the Keras deep learning library but can be implemented easily.<\/p>\n<p>The <a href=\"https:\/\/keras.io\/preprocessing\/image\/\">ImageDataGenerator class<\/a> can be used to configure the choice of test-time augmentation. For example, the data generator below is configured for horizontal flip image data augmentation.<\/p>\n<pre class=\"crayon-plain-tag\"># configure image data augmentation\r\ndatagen = ImageDataGenerator(horizontal_flip=True)<\/pre>\n<p>The augmentation can then be applied to each sample in the test dataset separately.<\/p>\n<p>First, the dimensions of the single image can be expanded from <em>[rows][cols][channels]<\/em> to <em>[samples][rows][cols][channels]<\/em>, where the number of samples is one, for the single image. This transforms the array for the image into an array of samples with one image.<\/p>\n<pre class=\"crayon-plain-tag\"># convert image into dataset\r\nsamples = expand_dims(image, 0)<\/pre>\n<p>Next, an iterator can be created for the sample, and the batch size can be used to specify the number of augmented images to generate, such as 10.<\/p>\n<pre class=\"crayon-plain-tag\"># prepare iterator\r\nit = datagen.flow(samples, batch_size=10)<\/pre>\n<p>The iterator can then be passed to the <em>predict_generator()<\/em> function of the model in order to make a prediction. Specifically, a batch of 10 augmented images will be generated and the model will make a prediction for each.<\/p>\n<pre class=\"crayon-plain-tag\"># make predictions for each augmented image\r\nyhats = model.predict_generator(it, steps=10, verbose=0)<\/pre>\n<p>Finally, an ensemble prediction can be made. A prediction was made for each image, and each prediction contains a probability of the image belonging to each class, in the case of image multiclass classification.<\/p>\n<p>An ensemble prediction can be made using <a href=\"https:\/\/machinelearningmastery.com\/weighted-average-ensemble-for-deep-learning-neural-networks\/\">soft voting<\/a> where the probabilities of each class are summed across the predictions and a class prediction is made by calculating the <a href=\"https:\/\/docs.scipy.org\/doc\/numpy\/reference\/generated\/numpy.argmax.html\">argmax()<\/a> of the summed predictions, returning the index or class number of the largest summed probability.<\/p>\n<pre class=\"crayon-plain-tag\"># sum across predictions\r\nsummed = numpy.sum(yhats, axis=0)\r\n# argmax across classes\r\nreturn argmax(summed)<\/pre>\n<p>We can tie these elements together into a function that will take a configured data generator, fit model, and single image, and will return a class prediction (integer) using test-time augmentation.<\/p>\n<pre class=\"crayon-plain-tag\"># make a prediction using test-time augmentation\r\ndef tta_prediction(datagen, model, image, n_examples):\r\n\t# convert image into dataset\r\n\tsamples = expand_dims(image, 0)\r\n\t# prepare iterator\r\n\tit = datagen.flow(samples, batch_size=n_examples)\r\n\t# make predictions for each augmented image\r\n\tyhats = model.predict_generator(it, steps=n_examples, verbose=0)\r\n\t# sum across predictions\r\n\tsummed = numpy.sum(yhats, axis=0)\r\n\t# argmax across classes\r\n\treturn argmax(summed)<\/pre>\n<p>Now that we know how to make predictions in Keras using test-time augmentation, let\u2019s work through an example to demonstrate the approach.<\/p>\n<h2>Dataset and Baseline Model<\/h2>\n<p>We can demonstrate test-time augmentation using a standard computer vision dataset and a convolutional neural network.<\/p>\n<p>Before we can do that, we must select a dataset and a baseline model.<\/p>\n<p>We will use the CIFAR-10 dataset, comprised of 60,000 32\u00d732 pixel color photographs of objects from 10 classes, such as frogs, birds, cats, ships, etc. CIFAR-10 is a well-understood dataset and widely used for benchmarking computer vision algorithms in the field of machine learning. The problem is \u201c<em>solved<\/em>.\u201d Top performance on the problem is achieved by deep learning convolutional neural networks with a classification accuracy above 96% or 97% on the test dataset.<\/p>\n<p>We will also use a convolutional neural network, or CNN, model that is capable of achieving good (better than random) results, but not state-of-the-art results, on the problem. This will be sufficient to demonstrate the lift in performance that test-time augmentation can provide.<\/p>\n<p>The CIFAR-10 dataset can be loaded easily via the Keras API by calling the <em>cifar10.load_data()<\/em> function, that returns a tuple with the training and test datasets split into input (images) and output (class labels) components.<\/p>\n<pre class=\"crayon-plain-tag\"># load dataset\r\n(trainX, trainY), (testX, testY) = load_data()<\/pre>\n<p>It is good practice to normalize the pixel values from the range 0-255 down to the range 0-1 prior to modeling. This ensures that the inputs are small and close to zero, and will, in turn, mean that the weights of the model will be kept small, leading to faster and better learning.<\/p>\n<pre class=\"crayon-plain-tag\"># normalize pixel values\r\ntrainX = trainX.astype('float32') \/ 255\r\ntestX = testX.astype('float32') \/ 255<\/pre>\n<p>The class labels are integers and must be converted to a one hot encoding prior to modeling.<\/p>\n<p>This can be achieved using the <em>to_categorical()<\/em> Keras utility function.<\/p>\n<pre class=\"crayon-plain-tag\"># one hot encode target values\r\ntrainY = to_categorical(trainY)\r\ntestY = to_categorical(testY)<\/pre>\n<p>We are now ready to define a model for this multi-class classification problem.<\/p>\n<p>The model has a convolutional layer with 32 filter maps with a 3\u00d73 kernel using the <a href=\"https:\/\/machinelearningmastery.com\/how-to-fix-vanishing-gradients-using-the-rectified-linear-activation-function\/\">rectifier linear activation<\/a>, \u201c<em>same<\/em>\u201d padding so the output is the same size as the input and the <em>He weight initialization<\/em>. This is followed by a batch normalization layer and a max pooling layer.<\/p>\n<p>This pattern is repeated with a convolutional, batch norm, and max pooling layer, although the number of filters is increased to 64. The output is then flattened before being interpreted by a dense layer and finally provided to the output layer to make a prediction.<\/p>\n<pre class=\"crayon-plain-tag\"># define model\r\nmodel = Sequential()\r\nmodel.add(Conv2D(32, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', input_shape=(32, 32, 3)))\r\nmodel.add(BatchNormalization())\r\nmodel.add(MaxPooling2D((2, 2)))\r\nmodel.add(Conv2D(64, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform'))\r\nmodel.add(BatchNormalization())\r\nmodel.add(MaxPooling2D((2, 2)))\r\nmodel.add(Flatten())\r\nmodel.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))\r\nmodel.add(BatchNormalization())\r\nmodel.add(Dense(10, activation='softmax'))<\/pre>\n<p>The <a href=\"https:\/\/machinelearningmastery.com\/adam-optimization-algorithm-for-deep-learning\/\">Adam variation of stochastic gradient descent<\/a> is used to find the model weights.<\/p>\n<p>The <a href=\"https:\/\/machinelearningmastery.com\/loss-and-loss-functions-for-training-deep-learning-neural-networks\/\">categorical cross entropy loss function<\/a> is used, required for multi-class classification, and classification accuracy is monitored during training.<\/p>\n<pre class=\"crayon-plain-tag\"># compile model\r\nmodel.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])<\/pre>\n<p>The model is fit for three training epochs and a large batch size of 128 images is used.<\/p>\n<pre class=\"crayon-plain-tag\"># fit model\r\nmodel.fit(trainX, trainY, epochs=3, batch_size=128)<\/pre>\n<p>Once fit, the model is evaluated on the test dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate model\r\n_, acc = model.evaluate(testX, testY, verbose=0)\r\nprint(acc)<\/pre>\n<p>The complete example is listed below and will easily run on the CPU in a few minutes.<\/p>\n<pre class=\"crayon-plain-tag\"># baseline cnn model for the cifar10 problem\r\nfrom keras.datasets.cifar10 import load_data\r\nfrom keras.utils import to_categorical\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2D\r\nfrom keras.layers import MaxPooling2D\r\nfrom keras.layers import Dense\r\nfrom keras.layers import Flatten\r\nfrom keras.layers import BatchNormalization\r\n# load dataset\r\n(trainX, trainY), (testX, testY) = load_data()\r\n# normalize pixel values\r\ntrainX = trainX.astype('float32') \/ 255\r\ntestX = testX.astype('float32') \/ 255\r\n# one hot encode target values\r\ntrainY = to_categorical(trainY)\r\ntestY = to_categorical(testY)\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Conv2D(32, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', input_shape=(32, 32, 3)))\r\nmodel.add(BatchNormalization())\r\nmodel.add(MaxPooling2D((2, 2)))\r\nmodel.add(Conv2D(64, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform'))\r\nmodel.add(BatchNormalization())\r\nmodel.add(MaxPooling2D((2, 2)))\r\nmodel.add(Flatten())\r\nmodel.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))\r\nmodel.add(BatchNormalization())\r\nmodel.add(Dense(10, activation='softmax'))\r\n# compile model\r\nmodel.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])\r\n# fit model\r\nhistory = model.fit(trainX, trainY, epochs=3, batch_size=128)\r\n# evaluate model\r\n_, acc = model.evaluate(testX, testY, verbose=0)\r\nprint(acc)<\/pre>\n<p>Running the example shows that the model is capable of learning the problem well and quickly.<\/p>\n<p>A test set accuracy of about 66% is achieved, which is okay, but not terrific. The chosen model configuration has already started to overfit and could benefit from the use of <a href=\"https:\/\/machinelearningmastery.com\/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error\/\">regularization<\/a> and further tuning. Nevertheless, this provides a good starting point for demonstrating test-time augmentation.<\/p>\n<pre class=\"crayon-plain-tag\">Epoch 1\/3\r\n50000\/50000 [==============================] - 64s 1ms\/step - loss: 1.2135 - acc: 0.5766\r\nEpoch 2\/3\r\n50000\/50000 [==============================] - 63s 1ms\/step - loss: 0.8498 - acc: 0.7035\r\nEpoch 3\/3\r\n50000\/50000 [==============================] - 63s 1ms\/step - loss: 0.6799 - acc: 0.7632\r\n0.6679<\/pre>\n<p>Neural networks are stochastic algorithms and the same model fit on the same data multiple times may find a different set of weights and, in turn, have different performance each time.<\/p>\n<p>In order to even out the estimate of model performance, we can change the example to re-run the fit and evaluation of the model multiple times and report the mean and standard deviation of the distribution of scores on the test dataset.<\/p>\n<p>First, we can define a function named <em>load_dataset()<\/em> that will load the CIFAR-10 dataset and prepare it for modeling.<\/p>\n<pre class=\"crayon-plain-tag\"># load and return the cifar10 dataset ready for modeling\r\ndef load_dataset():\r\n\t# load dataset\r\n\t(trainX, trainY), (testX, testY) = load_data()\r\n\t# normalize pixel values\r\n\ttrainX = trainX.astype('float32') \/ 255\r\n\ttestX = testX.astype('float32') \/ 255\r\n\t# one hot encode target values\r\n\ttrainY = to_categorical(trainY)\r\n\ttestY = to_categorical(testY)\r\n\treturn trainX, trainY, testX, testY<\/pre>\n<p>Next, we can define a function named define_model() that will define a model for the CIFAR-10 dataset, ready to be fit and then evaluated.<\/p>\n<pre class=\"crayon-plain-tag\"># define the cnn model for the cifar10 dataset\r\ndef define_model():\r\n\t# define model\r\n\tmodel = Sequential()\r\n\tmodel.add(Conv2D(32, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', input_shape=(32, 32, 3)))\r\n\tmodel.add(BatchNormalization())\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Conv2D(64, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform'))\r\n\tmodel.add(BatchNormalization())\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Flatten())\r\n\tmodel.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(BatchNormalization())\r\n\tmodel.add(Dense(10, activation='softmax'))\r\n\t# compile model\r\n\tmodel.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])\r\n\treturn model<\/pre>\n<p>Next, an <em>evaluate_model()<\/em> function is defined that will fit the defined model on the training dataset and then evaluate it on the test dataset, returning the estimated classification accuracy for the run.<\/p>\n<pre class=\"crayon-plain-tag\"># fit and evaluate a defined model\r\ndef evaluate_model(model, trainX, trainY, testX, testY):\r\n\t# fit model\r\n\tmodel.fit(trainX, trainY, epochs=3, batch_size=128, verbose=0)\r\n\t# evaluate model\r\n\t_, acc = model.evaluate(testX, testY, verbose=0)\r\n\treturn acc<\/pre>\n<p>Next, we can define a function with new behavior to repeatedly define, fit, and evaluate a new model and return the distribution of accuracy scores.<\/p>\n<p>The <em>repeated_evaluation()<\/em>\u00a0function below implements this, taking the dataset and using a default of 10 repeated evaluations.<\/p>\n<pre class=\"crayon-plain-tag\"># repeatedly evaluate model, return distribution of scores\r\ndef repeated_evaluation(trainX, trainY, testX, testY, repeats=10):\r\n\tscores = list()\r\n\tfor _ in range(repeats):\r\n\t\t# define model\r\n\t\tmodel = define_model()\r\n\t\t# fit and evaluate model\r\n\t\taccuracy = evaluate_model(model, trainX, trainY, testX, testY)\r\n\t\t# store score\r\n\t\tscores.append(accuracy)\r\n\t\tprint('> %.3f' % accuracy)\r\n\treturn scores<\/pre>\n<p>Finally, we can call the <em>load_dataset()<\/em> function to prepare the dataset, then <em>repeated_evaluation()<\/em> to get a distribution of accuracy scores that can be summarized by reporting the mean and standard deviation.<\/p>\n<pre class=\"crayon-plain-tag\"># load dataset\r\ntrainX, trainY, testX, testY = load_dataset()\r\n# evaluate model\r\nscores = repeated_evaluation(trainX, trainY, testX, testY)\r\n# summarize result\r\nprint('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))<\/pre>\n<p>Tying all of this together, the complete code example of repeatedly evaluating a CNN model on the MNIST dataset is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># baseline cnn model for the cifar10 problem, repeated evaluation\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom keras.datasets.cifar10 import load_data\r\nfrom keras.utils import to_categorical\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2D\r\nfrom keras.layers import MaxPooling2D\r\nfrom keras.layers import Dense\r\nfrom keras.layers import Flatten\r\nfrom keras.layers import BatchNormalization\r\n\r\n# load and return the cifar10 dataset ready for modeling\r\ndef load_dataset():\r\n\t# load dataset\r\n\t(trainX, trainY), (testX, testY) = load_data()\r\n\t# normalize pixel values\r\n\ttrainX = trainX.astype('float32') \/ 255\r\n\ttestX = testX.astype('float32') \/ 255\r\n\t# one hot encode target values\r\n\ttrainY = to_categorical(trainY)\r\n\ttestY = to_categorical(testY)\r\n\treturn trainX, trainY, testX, testY\r\n\r\n# define the cnn model for the cifar10 dataset\r\ndef define_model():\r\n\t# define model\r\n\tmodel = Sequential()\r\n\tmodel.add(Conv2D(32, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', input_shape=(32, 32, 3)))\r\n\tmodel.add(BatchNormalization())\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Conv2D(64, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform'))\r\n\tmodel.add(BatchNormalization())\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Flatten())\r\n\tmodel.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(BatchNormalization())\r\n\tmodel.add(Dense(10, activation='softmax'))\r\n\t# compile model\r\n\tmodel.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])\r\n\treturn model\r\n\r\n# fit and evaluate a defined model\r\ndef evaluate_model(model, trainX, trainY, testX, testY):\r\n\t# fit model\r\n\tmodel.fit(trainX, trainY, epochs=3, batch_size=128, verbose=0)\r\n\t# evaluate model\r\n\t_, acc = model.evaluate(testX, testY, verbose=0)\r\n\treturn acc\r\n\r\n# repeatedly evaluate model, return distribution of scores\r\ndef repeated_evaluation(trainX, trainY, testX, testY, repeats=10):\r\n\tscores = list()\r\n\tfor _ in range(repeats):\r\n\t\t# define model\r\n\t\tmodel = define_model()\r\n\t\t# fit and evaluate model\r\n\t\taccuracy = evaluate_model(model, trainX, trainY, testX, testY)\r\n\t\t# store score\r\n\t\tscores.append(accuracy)\r\n\t\tprint('> %.3f' % accuracy)\r\n\treturn scores\r\n\r\n# load dataset\r\ntrainX, trainY, testX, testY = load_dataset()\r\n# evaluate model\r\nscores = repeated_evaluation(trainX, trainY, testX, testY)\r\n# summarize result\r\nprint('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))<\/pre>\n<p>Running the example may take a while on modern CPU hardware and is much faster on GPU hardware.<\/p>\n<p>The accuracy of the model is reported for each repeated evaluation and the final mean model performance is reported.<\/p>\n<p>In this case, we can see that the mean accuracy of the chosen model configuration is about 68%, which is close to the estimate from a single model run.<\/p>\n<pre class=\"crayon-plain-tag\">> 0.690\r\n> 0.662\r\n> 0.698\r\n> 0.681\r\n> 0.686\r\n> 0.680\r\n> 0.697\r\n> 0.696\r\n> 0.689\r\n> 0.679\r\nAccuracy: 0.686 (0.010)<\/pre>\n<p>Now that we have developed a baseline model for a standard dataset, let\u2019s look at updating the example to use test-time augmentation.<\/p>\n<h2>Example of Test-Time Augmentation<\/h2>\n<p>We can now update our repeated evaluation of the CNN model on CIFAR-10 to use test-time augmentation.<\/p>\n<p>The <em>tta_prediction()<\/em> function developed in the section above on how to implement test-time augmentation in Keras can be used directly.<\/p>\n<pre class=\"crayon-plain-tag\"># make a prediction using test-time augmentation\r\ndef tta_prediction(datagen, model, image, n_examples):\r\n\t# convert image into dataset\r\n\tsamples = expand_dims(image, 0)\r\n\t# prepare iterator\r\n\tit = datagen.flow(samples, batch_size=n_examples)\r\n\t# make predictions for each augmented image\r\n\tyhats = model.predict_generator(it, steps=n_examples, verbose=0)\r\n\t# sum across predictions\r\n\tsummed = numpy.sum(yhats, axis=0)\r\n\t# argmax across classes\r\n\treturn argmax(summed)<\/pre>\n<p>We can develop a function that will drive the test-time augmentation by defining the <em>ImageDataGenerator<\/em> configuration and call <em>tta_prediction()<\/em> for each image in the test dataset.<\/p>\n<p>It is important to consider the types of image augmentations that may benefit a model fit on the CIFAR-10 dataset. Augmentations that cause minor modifications to the photographs might be useful. This might include augmentations such as zooms, shifts, and horizontal flips.<\/p>\n<p>In this example, we will only use horizontal flips.<\/p>\n<pre class=\"crayon-plain-tag\"># configure image data augmentation\r\ndatagen = ImageDataGenerator(horizontal_flip=True)<\/pre>\n<p>We will configure the image generator to create seven photos, from which the mean prediction for each example in the test set will be made.<\/p>\n<p>The <em>tta_evaluate_model()<\/em> function below configures the <em>ImageDataGenerator<\/em> then enumerates the test dataset, making a class label prediction for each image in the test dataset. The accuracy is then calculated by comparing the predicted class labels to the class labels in the test dataset. This requires that we reverse the one hot encoding performed in <em>load_dataset()<\/em> by using <em>argmax()<\/em>.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate a model on a dataset using test-time augmentation\r\ndef tta_evaluate_model(model, testX, testY):\r\n\t# configure image data augmentation\r\n\tdatagen = ImageDataGenerator(horizontal_flip=True)\r\n\t# define the number of augmented images to generate per test set image\r\n\tn_examples_per_image = 7\r\n\tyhats = list()\r\n\tfor i in range(len(testX)):\r\n\t\t# make augmented prediction\r\n\t\tyhat = tta_prediction(datagen, model, testX[i], n_examples_per_image)\r\n\t\t# store for evaluation\r\n\t\tyhats.append(yhat)\r\n\t# calculate accuracy\r\n\ttestY_labels = argmax(testY, axis=1)\r\n\tacc = accuracy_score(testY_labels, yhats)\r\n\treturn acc<\/pre>\n<p>The <em>evaluate_model()<\/em> function can then be updated to call <em>tta_evaluate_model()<\/em> in order to get model accuracy scores.<\/p>\n<pre class=\"crayon-plain-tag\"># fit and evaluate a defined model\r\ndef evaluate_model(model, trainX, trainY, testX, testY):\r\n\t# fit model\r\n\tmodel.fit(trainX, trainY, epochs=3, batch_size=128, verbose=0)\r\n\t# evaluate model using tta\r\n\tacc = tta_evaluate_model(model, testX, testY)\r\n\treturn acc<\/pre>\n<p>Tying all of this together, the complete example of the repeated evaluation of a CNN for CIFAR-10 with test-time augmentation is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># cnn model for the cifar10 problem with test-time augmentation\r\nimport numpy\r\nfrom numpy import argmax\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom numpy import expand_dims\r\nfrom sklearn.metrics import accuracy_score\r\nfrom keras.datasets.cifar10 import load_data\r\nfrom keras.utils import to_categorical\r\nfrom keras.preprocessing.image import ImageDataGenerator\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2D\r\nfrom keras.layers import MaxPooling2D\r\nfrom keras.layers import Dense\r\nfrom keras.layers import Flatten\r\nfrom keras.layers import BatchNormalization\r\n\r\n# load and return the cifar10 dataset ready for modeling\r\ndef load_dataset():\r\n\t# load dataset\r\n\t(trainX, trainY), (testX, testY) = load_data()\r\n\t# normalize pixel values\r\n\ttrainX = trainX.astype('float32') \/ 255\r\n\ttestX = testX.astype('float32') \/ 255\r\n\t# one hot encode target values\r\n\ttrainY = to_categorical(trainY)\r\n\ttestY = to_categorical(testY)\r\n\treturn trainX, trainY, testX, testY\r\n\r\n# define the cnn model for the cifar10 dataset\r\ndef define_model():\r\n\t# define model\r\n\tmodel = Sequential()\r\n\tmodel.add(Conv2D(32, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', input_shape=(32, 32, 3)))\r\n\tmodel.add(BatchNormalization())\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Conv2D(64, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform'))\r\n\tmodel.add(BatchNormalization())\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Flatten())\r\n\tmodel.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(BatchNormalization())\r\n\tmodel.add(Dense(10, activation='softmax'))\r\n\t# compile model\r\n\tmodel.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])\r\n\treturn model\r\n\r\n# make a prediction using test-time augmentation\r\ndef tta_prediction(datagen, model, image, n_examples):\r\n\t# convert image into dataset\r\n\tsamples = expand_dims(image, 0)\r\n\t# prepare iterator\r\n\tit = datagen.flow(samples, batch_size=n_examples)\r\n\t# make predictions for each augmented image\r\n\tyhats = model.predict_generator(it, steps=n_examples, verbose=0)\r\n\t# sum across predictions\r\n\tsummed = numpy.sum(yhats, axis=0)\r\n\t# argmax across classes\r\n\treturn argmax(summed)\r\n\r\n# evaluate a model on a dataset using test-time augmentation\r\ndef tta_evaluate_model(model, testX, testY):\r\n\t# configure image data augmentation\r\n\tdatagen = ImageDataGenerator(horizontal_flip=True)\r\n\t# define the number of augmented images to generate per test set image\r\n\tn_examples_per_image = 7\r\n\tyhats = list()\r\n\tfor i in range(len(testX)):\r\n\t\t# make augmented prediction\r\n\t\tyhat = tta_prediction(datagen, model, testX[i], n_examples_per_image)\r\n\t\t# store for evaluation\r\n\t\tyhats.append(yhat)\r\n\t# calculate accuracy\r\n\ttestY_labels = argmax(testY, axis=1)\r\n\tacc = accuracy_score(testY_labels, yhats)\r\n\treturn acc\r\n\r\n# fit and evaluate a defined model\r\ndef evaluate_model(model, trainX, trainY, testX, testY):\r\n\t# fit model\r\n\tmodel.fit(trainX, trainY, epochs=3, batch_size=128, verbose=0)\r\n\t# evaluate model using tta\r\n\tacc = tta_evaluate_model(model, testX, testY)\r\n\treturn acc\r\n\r\n# repeatedly evaluate model, return distribution of scores\r\ndef repeated_evaluation(trainX, trainY, testX, testY, repeats=10):\r\n\tscores = list()\r\n\tfor _ in range(repeats):\r\n\t\t# define model\r\n\t\tmodel = define_model()\r\n\t\t# fit and evaluate model\r\n\t\taccuracy = evaluate_model(model, trainX, trainY, testX, testY)\r\n\t\t# store score\r\n\t\tscores.append(accuracy)\r\n\t\tprint('> %.3f' % accuracy)\r\n\treturn scores\r\n\r\n# load dataset\r\ntrainX, trainY, testX, testY = load_dataset()\r\n# evaluate model\r\nscores = repeated_evaluation(trainX, trainY, testX, testY)\r\n# summarize result\r\nprint('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))<\/pre>\n<p>Running the example may take some time given the repeated evaluation and the slower manual test-time augmentation used to evaluate each model.<\/p>\n<p>In this case, we can see a modest lift in performance from about 68.6% on the test set without test-time augmentation to about 69.8% accuracy on the test set with test-time augmentation.<\/p>\n<pre class=\"crayon-plain-tag\">> 0.719\r\n> 0.716\r\n> 0.709\r\n> 0.694\r\n> 0.690\r\n> 0.694\r\n> 0.680\r\n> 0.676\r\n> 0.702\r\n> 0.704\r\nAccuracy: 0.698 (0.013)<\/pre>\n<\/p>\n<h2>How to Tune Test-Time Augmentation Configuration<\/h2>\n<p>Choosing the augmentation configurations that give the biggest lift in model performance can be challenging.<\/p>\n<p>Not only are there many augmentation methods to choose from and configuration options for each, but the time to fit and evaluate a model on a single set of configuration options can take a long time, even if fit on a fast GPU.<\/p>\n<p>Instead, I recommend fitting the model once and saving it to file. For example:<\/p>\n<pre class=\"crayon-plain-tag\"># save model\r\nmodel.save('model.h5')<\/pre>\n<p>Then load the model from a separate file and evaluate different test-time augmentation schemes on a small validation dataset or small subset of the test set.<\/p>\n<p>For example:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# load model\r\nmodel = load_model('model.h5')\r\n# evaluate model\r\ndatagen = ImageDataGenerator(...)\r\n...<\/pre>\n<p>Once you find a set of augmentation options that give the biggest lift, you can then evaluate the model on the whole test set or trial a repeated evaluation experiment as above.<\/p>\n<p>Test-time augmentation configuration not only includes the options for the <em>ImageDataGenerator<\/em>, but also the number of images generated from which the average prediction will be made for each example in the test set.<\/p>\n<p>I used this approach to choose the test-time augmentation in the previous section, discovering that seven examples worked better than three or five, and that random zooming and random shifts appeared to decrease model accuracy.<\/p>\n<p>Remember, if you also use image data augmentation for the training dataset and that augmentation uses a type of pixel scaling that involves calculating statistics on the dataset (e.g. you call <em>datagen.fit()<\/em>), then those same statistics and pixel scaling techniques must also be used during test-time augmentation.<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>API<\/h3>\n<ul>\n<li><a href=\"https:\/\/keras.io\/preprocessing\/image\/\">Image Preprocessing Keras API<\/a>.<\/li>\n<li><a href=\"https:\/\/keras.io\/models\/sequential\/\">Keras Sequential Model API<\/a>.<\/li>\n<li><a href=\"https:\/\/docs.scipy.org\/doc\/numpy\/reference\/generated\/numpy.argmax.html\">numpy.argmax API<\/a><\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/www.depends-on-the-definition.com\/test-time-augmentation-keras\/\">Image Segmentation With Test Time Augmentation With Keras<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/tsterbak\/keras_tta\">keras_tta, Simple test-time augmentation (TTA) for keras python library<\/a>.<\/li>\n<li><a href=\"https:\/\/github.com\/qubvel\/tta_wrapper\">tta_wrapper, Test Time image Augmentation (TTA) wrapper for Keras model<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered test-time augmentation for improving the performance of models for image classification tasks.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Test-time augmentation is the application of data augmentation techniques normally used during training when making predictions.<\/li>\n<li>How to implement test-time augmentation from scratch in Keras.<\/li>\n<li>How to use test-time augmentation to improve the performance of a convolutional neural network model on a standard image classification task.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/how-to-use-test-time-augmentation-to-improve-model-performance-for-image-classification\/\">How to Use Test-Time Augmentation to Improve Model Performance for Image Classification<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/how-to-use-test-time-augmentation-to-improve-model-performance-for-image-classification\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Data augmentation is a technique often used to improve performance and reduce generalization error when training neural network models for computer vision [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/04\/14\/how-to-use-test-time-augmentation-to-improve-model-performance-for-image-classification\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2014,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2013"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2013"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2013\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2014"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2013"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2013"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2013"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}