{"id":2506,"date":"2019-08-27T19:00:03","date_gmt":"2019-08-27T19:00:03","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/08\/27\/how-to-implement-the-inception-score-is-for-evaluating-gans\/"},"modified":"2019-08-27T19:00:03","modified_gmt":"2019-08-27T19:00:03","slug":"how-to-implement-the-inception-score-is-for-evaluating-gans","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/08\/27\/how-to-implement-the-inception-score-is-for-evaluating-gans\/","title":{"rendered":"How to Implement the Inception Score (IS) for Evaluating GANs"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/what-are-generative-adversarial-networks-gans\/\">Generative Adversarial Networks<\/a>, or GANs for short, is a deep learning neural network architecture for training a generator model for generating synthetic images.<\/p>\n<p>A problem with generative models is that there is no objective way to evaluate the quality of the generated images.<\/p>\n<p>As such, it is common to periodically generate and save images during the model training process and use subjective human evaluation of the generated images in order to both evaluate the quality of the generated images and to select a final generator model.<\/p>\n<p>Many attempts have been made to establish an objective measure of generated image quality. An early and somewhat widely adopted example of an objective evaluation method for generated images is the Inception Score, or IS.<\/p>\n<p>In this tutorial, you will discover the inception score for evaluating the quality of generated images.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>How to calculate the inception score and the intuition behind what it measures.<\/li>\n<li>How to implement the inception score in Python with NumPy and the Keras deep learning library.<\/li>\n<li>How to calculate the inception score for small images such as those in the CIFAR-10 dataset.<\/li>\n<\/ul>\n<p>Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras <a href=\"https:\/\/machinelearningmastery.com\/generative_adversarial_networks\/\" rel=\"nofollow\">in my new GANs book<\/a>, with 29 step-by-step tutorials and full source code.<\/p>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_8548\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-8548\" class=\"size-full wp-image-8548\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/08\/How-to-Implement-the-Inception-Score-IS-From-Scratch-for-Evaluating-Generated-Images.jpg\" alt=\"How to Implement the Inception Score (IS) From Scratch for Evaluating Generated Images\" width=\"640\" height=\"427\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/How-to-Implement-the-Inception-Score-IS-From-Scratch-for-Evaluating-Generated-Images.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/08\/How-to-Implement-the-Inception-Score-IS-From-Scratch-for-Evaluating-Generated-Images-300x200.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p id=\"caption-attachment-8548\" class=\"wp-caption-text\">How to Implement the Inception Score (IS) From Scratch for Evaluating Generated Images<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/alaffat\/34963676170\/\">alfredo affatato<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into five parts; they are:<\/p>\n<ol>\n<li>What Is the Inception Score?<\/li>\n<li>How to Calculate the Inception Score<\/li>\n<li>How to Implement the Inception Score With NumPy<\/li>\n<li>How to Implement the Inception Score With Keras<\/li>\n<li>Problems With the Inception Score<\/li>\n<\/ol>\n<h2>What Is the Inception Score?<\/h2>\n<p>The Inception Score, or IS for short, is an objective metric for evaluating the quality of generated images, specifically synthetic images output by generative adversarial network models.<\/p>\n<p>The inception score was proposed by <a href=\"https:\/\/ai.google\/research\/people\/106222\">Tim Salimans<\/a>, et al. in their 2016 paper titled \u201c<a href=\"https:\/\/arxiv.org\/abs\/1606.03498\">Improved Techniques for Training GANs<\/a>.\u201d<\/p>\n<p>In the paper, the authors use a crowd-sourcing platform (<a href=\"https:\/\/www.mturk.com\/\">Amazon Mechanical Turk<\/a>) to evaluate a large number of GAN generated images. They developed the inception score as an attempt to remove the subjective human evaluation of images.<\/p>\n<p>The authors discover that their scores correlated well with the subjective evaluation.<\/p>\n<blockquote>\n<p>As an alternative to human annotators, we propose an automatic method to evaluate samples, which we find to correlate well with human evaluation \u2026<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/arxiv.org\/abs\/1606.03498\">Improved Techniques for Training GANs<\/a>, 2016.<\/p>\n<p>The inception score involves using a pre-trained deep learning neural network model for image classification to classify the generated images. Specifically, the <a href=\"https:\/\/machinelearningmastery.com\/how-to-implement-major-architecture-innovations-for-convolutional-neural-networks\/\">Inception v3 model<\/a> described by <a href=\"https:\/\/ai.google\/research\/people\/ChristianSzegedy\">Christian Szegedy<\/a>, et al. in their 2015 paper titled \u201c<a href=\"https:\/\/arxiv.org\/abs\/1512.00567\">Rethinking the Inception Architecture for Computer Vision<\/a>.\u201d The reliance on the inception model gives the inception score its name.<\/p>\n<p>A large number of generated images are classified using the model. Specifically, the probability of the image belonging to each class is predicted. These predictions are then summarized into the inception score.<\/p>\n<p>The score seeks to capture two properties of a collection of generated images:<\/p>\n<ul>\n<li><strong>Image Quality<\/strong>. Do images look like a specific object?<\/li>\n<li><strong>Image Diversity<\/strong>. Is a wide range of objects generated?<\/li>\n<\/ul>\n<p>The inception score has a lowest value of 1.0 and a highest value of the number of classes supported by the classification model; in this case, the Inception v3 model supports the 1,000 classes of the <a href=\"https:\/\/machinelearningmastery.com\/introduction-to-the-imagenet-large-scale-visual-recognition-challenge-ilsvrc\/\">ILSVRC 2012 dataset<\/a>, and as such, the highest inception score on this dataset is 1,000.<\/p>\n<p>The <a href=\"https:\/\/machinelearningmastery.com\/how-to-develop-a-cnn-from-scratch-for-cifar-10-photo-classification\/\">CIFAR-10 dataset<\/a> is a collection of 50,000 images divided into 10 classes of objects. The original paper that introduces the inception calculated the score on the real CIFAR-10 training dataset, achieving a result of 11.24 +\/- 0.12.<\/p>\n<p>Using the GAN model also introduced in their paper, they achieved an inception score of 8.09 +\/- .07 when generating synthetic images for this dataset.<\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<p><center><\/p>\n<h3>Want to Develop GANs from Scratch?<\/h3>\n<p>Take my free 7-day email crash course now (with sample code).<\/p>\n<p>Click to sign-up and also get a free PDF Ebook version of the course.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/162526e1b172a2%3A164f8be4f346dc\/5926953912500224\/\" target=\"_blank\" style=\"background: rgb(255, 206, 10); color: rgb(255, 255, 255); text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-weight: bold; font-size: 16px; line-height: 20px; padding: 10px; display: inline-block; max-width: 300px; border-radius: 5px; text-shadow: rgba(0, 0, 0, 0.25) 0px -1px 1px; box-shadow: rgba(255, 255, 255, 0.5) 0px 1px 3px inset, rgba(0, 0, 0, 0.5) 0px 1px 3px;\" rel=\"noopener noreferrer\">Download Your FREE Mini-Course<\/a><script data-leadbox=\"162526e1b172a2:164f8be4f346dc\" data-url=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/162526e1b172a2%3A164f8be4f346dc\/5926953912500224\/\" data-config=\"%7B%7D\" type=\"text\/javascript\" src=\"https:\/\/machinelearningmastery.lpages.co\/leadbox-1562872266.js\"><\/script><\/p>\n<p><\/center><\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<h2>How to Calculate the Inception Score<\/h2>\n<p>The inception score is calculated by first using a pre-trained Inception v3 model to predict the class probabilities for each generated image.<\/p>\n<p>These are conditional probabilities, e.g. class label conditional on the generated image. Images that are classified strongly as one class over all other classes indicate a high quality. As such, the conditional probability of all generated images in the collection should have a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Entropy_(information_theory)\">low entropy<\/a>.<\/p>\n<blockquote>\n<p>Images that contain meaningful objects should have a conditional label distribution p(y|x) with low entropy.<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/arxiv.org\/abs\/1606.03498\">Improved Techniques for Training GANs<\/a>, 2016.<\/p>\n<p>The entropy is calculated as the negative sum of each observed probability multiplied by the log of the probability. The intuition here is that large probabilities have less information than small probabilities.<\/p>\n<ul>\n<li>entropy = -sum(p_i * log(p_i))<\/li>\n<\/ul>\n<p>The conditional probability captures our interest in image quality.<\/p>\n<p>To capture our interest in a variety of images, we use the marginal probability. This is the probability distribution of all generated images. We, therefore, would prefer the integral of the marginal probability distribution to have a high entropy.<\/p>\n<blockquote>\n<p>Moreover, we expect the model to generate varied images, so the marginal integral p(y|x = G(z))dz should have high entropy.<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/arxiv.org\/abs\/1606.03498\">Improved Techniques for Training GANs<\/a>, 2016.<\/p>\n<p>These elements are combined by calculating the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence\">Kullback-Leibler divergence<\/a>, or KL divergence (relative entropy), between the conditional and marginal probability distributions.<\/p>\n<p>Calculating the divergence between two distributions is written using the \u201c||\u201d operator, therefore we can say we are interested in the KL divergence between C for conditional and M for marginal distributions or:<\/p>\n<ul>\n<li>KL (C || M)<\/li>\n<\/ul>\n<p>Specifically, we are interested in the average of the KL divergence for all generated images.<\/p>\n<blockquote>\n<p>Combining these two requirements, the metric that we propose is: exp(Ex KL(p(y|x)||p(y))).<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/arxiv.org\/abs\/1606.03498\">Improved Techniques for Training GANs<\/a>, 2016.<\/p>\n<p>We don\u2019t need to translate the calculation of the inception score. Thankfully, the authors of the paper also provide <a href=\"https:\/\/github.com\/openai\/improved-gan\">source code on GitHub<\/a> that includes an <a href=\"https:\/\/github.com\/openai\/improved-gan\/blob\/master\/inception_score\/model.py\">implementation of the inception score<\/a>.<\/p>\n<p>The calculation of the score assumes a large number of images for a range of objects, such as 50,000.<\/p>\n<p>The images are split into 10 groups, e.g 5,000 images per group, and the inception score is calculated on each group of images, then the average and standard deviation of the score is reported.<\/p>\n<p>The calculation of the inception score on a group of images involves first using the inception v3 model to calculate the conditional probability for each image (p(y|x)). The marginal probability is then calculated as the average of the conditional probabilities for the images in the group (p(y)).<\/p>\n<p>The KL divergence is then calculated for each image as the conditional probability multiplied by the log of the conditional probability minus the log of the marginal probability.<\/p>\n<ul>\n<li>KL divergence = p(y|x) * (log(p(y|x)) \u2013 log(p(y)))<\/li>\n<\/ul>\n<p>The KL divergence is then summed over all images and averaged over all classes and the exponent of the result is calculated to give the final score.<\/p>\n<p>This defines the official inception score implementation used when reported in most papers that use the score, although variations on how to calculate the score do exist.<\/p>\n<h2>How to Implement the Inception Score With NumPy<\/h2>\n<p>Implementing the calculation of the inception score in Python with NumPy arrays is straightforward.<\/p>\n<p>First, let\u2019s define a function that will take a collection of conditional probabilities and calculate the inception score.<\/p>\n<p>The <em>calculate_inception_score()<\/em> function listed below implements the procedure.<\/p>\n<p>One small change is the introduction of an epsilon (a tiny number close to zero) when calculating the log probabilities to avoid blowing up when trying to calculate the log of a zero probability. This is probably not needed in practice (e.g. with real generated images) but is useful here and good practice when working with log probabilities.<\/p>\n<pre class=\"crayon-plain-tag\"># calculate the inception score for p(y|x)\r\ndef calculate_inception_score(p_yx, eps=1E-16):\r\n\t# calculate p(y)\r\n\tp_y = expand_dims(p_yx.mean(axis=0), 0)\r\n\t# kl divergence for each image\r\n\tkl_d = p_yx * (log(p_yx + eps) - log(p_y + eps))\r\n\t# sum over classes\r\n\tsum_kl_d = kl_d.sum(axis=1)\r\n\t# average over images\r\n\tavg_kl_d = mean(sum_kl_d)\r\n\t# undo the logs\r\n\tis_score = exp(avg_kl_d)\r\n\treturn is_score<\/pre>\n<p>We can then test out this function to calculate the inception score for some contrived conditional probabilities.<\/p>\n<p>We can imagine the case of three classes of image and a perfect confident prediction for each class for three images.<\/p>\n<pre class=\"crayon-plain-tag\"># conditional probabilities for high quality images\r\np_yx = asarray([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]])<\/pre>\n<p>We would expect the inception score for this case to be 3.0 (or very close to it). This is because we have the same number of images for each image class (one image for each of the three classes) and each conditional probability is maximally confident.<\/p>\n<p>The complete example for calculating the inception score for these probabilities is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># calculate inception score in numpy\r\nfrom numpy import asarray\r\nfrom numpy import expand_dims\r\nfrom numpy import log\r\nfrom numpy import mean\r\nfrom numpy import exp\r\n\r\n# calculate the inception score for p(y|x)\r\ndef calculate_inception_score(p_yx, eps=1E-16):\r\n\t# calculate p(y)\r\n\tp_y = expand_dims(p_yx.mean(axis=0), 0)\r\n\t# kl divergence for each image\r\n\tkl_d = p_yx * (log(p_yx + eps) - log(p_y + eps))\r\n\t# sum over classes\r\n\tsum_kl_d = kl_d.sum(axis=1)\r\n\t# average over images\r\n\tavg_kl_d = mean(sum_kl_d)\r\n\t# undo the logs\r\n\tis_score = exp(avg_kl_d)\r\n\treturn is_score\r\n\r\n# conditional probabilities for high quality images\r\np_yx = asarray([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]])\r\nscore = calculate_inception_score(p_yx)\r\nprint(score)<\/pre>\n<p>Running the example gives the expected score of 3.0 (or a number extremely close).<\/p>\n<pre class=\"crayon-plain-tag\">2.999999999999999<\/pre>\n<p>We can also try the worst case.<\/p>\n<p>This is where we still have the same number of images for each class (one for each of the three classes), but the objects are unknown, giving a uniform predicted probability distribution across each class.<\/p>\n<pre class=\"crayon-plain-tag\"># conditional probabilities for low quality images\r\np_yx = asarray([[0.5, 0.5, 0.5], [0.5, 0.5, 0.5], [0.5, 0.5, 0.5]])\r\nscore = calculate_inception_score(p_yx)\r\nprint(score)<\/pre>\n<p>In this case, we would expect the inception score to be the worst possible where there is no difference between the conditional and marginal distributions, e.g. an inception score of 1.0.<\/p>\n<p>Tying this together, the complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># calculate inception score in numpy\r\nfrom numpy import asarray\r\nfrom numpy import expand_dims\r\nfrom numpy import log\r\nfrom numpy import mean\r\nfrom numpy import exp\r\n\r\n# calculate the inception score for p(y|x)\r\ndef calculate_inception_score(p_yx, eps=1E-16):\r\n\t# calculate p(y)\r\n\tp_y = expand_dims(p_yx.mean(axis=0), 0)\r\n\t# kl divergence for each image\r\n\tkl_d = p_yx * (log(p_yx + eps) - log(p_y + eps))\r\n\t# sum over classes\r\n\tsum_kl_d = kl_d.sum(axis=1)\r\n\t# average over images\r\n\tavg_kl_d = mean(sum_kl_d)\r\n\t# undo the logs\r\n\tis_score = exp(avg_kl_d)\r\n\treturn is_score\r\n\r\n# conditional probabilities for low quality images\r\np_yx = asarray([[0.5, 0.5, 0.5], [0.5, 0.5, 0.5], [0.5, 0.5, 0.5]])\r\nscore = calculate_inception_score(p_yx)\r\nprint(score)<\/pre>\n<p>Running the example reports the expected inception score of 1.0.<\/p>\n<pre class=\"crayon-plain-tag\">1.0<\/pre>\n<p>You may want to experiment with the calculation of the inception score and test other pathological cases.<\/p>\n<h2>How to Implement the Inception Score With Keras<\/h2>\n<p>Now that we know how to calculate the inception score and to implement it in Python, we can develop an implementation in Keras.<\/p>\n<p>This involves using the real Inception v3 model to classify images and to average the calculation of the score across multiple splits of a collection of images.<\/p>\n<p>First, we can load the Inception v3 model in Keras directly.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# load inception v3 model\r\nmodel = InceptionV3()<\/pre>\n<p>The model expects images to be color and to have the shape 299\u00d7299 pixels.<\/p>\n<p>Additionally, the pixel values must be scaled in the same way as the training data images, before they can be classified.<\/p>\n<p>This can be achieved by converting the pixel values from integers to floating point values and then calling the <em>preprocess_input()<\/em> function for the images.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# convert from uint8 to float32\r\nprocessed = images.astype('float32')\r\n# pre-process raw images for inception v3 model\r\nprocessed = preprocess_input(processed)<\/pre>\n<p>Then the conditional probabilities for each of the 1,000 image classes can be predicted for the images.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# predict class probabilities for images\r\nyhat = model.predict(images)<\/pre>\n<p>The inception score can then be calculated directly on the NumPy array of probabilities as we did in the previous section.<\/p>\n<p>Before we do that, we must split the conditional probabilities into groups, controlled by a <em>n_split<\/em> argument and set to the default of 10 as was used in the original paper.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\nn_part = floor(images.shape[0] \/ n_split)<\/pre>\n<p>We can then enumerate over the conditional probabilities in blocks of <em>n_part<\/em> images or predictions and calculate the inception score.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# retrieve p(y|x)\r\nix_start, ix_end = i * n_part, (i+1) * n_part\r\np_yx = yhat[ix_start:ix_end]<\/pre>\n<p>After calculating the scores for each split of conditional probabilities, we can calculate and return the average and standard deviation inception scores.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# average across images\r\nis_avg, is_std = mean(scores), std(scores)<\/pre>\n<p>Tying all of this together, the <em>calculate_inception_score()<\/em> function below takes an array of images with the expected size and pixel values in [0,255] and calculates the average and standard deviation inception scores using the inception v3 model in Keras.<\/p>\n<pre class=\"crayon-plain-tag\"># assumes images have the shape 299x299x3, pixels in [0,255]\r\ndef calculate_inception_score(images, n_split=10, eps=1E-16):\r\n\t# load inception v3 model\r\n\tmodel = InceptionV3()\r\n\t# convert from uint8 to float32\r\n\tprocessed = images.astype('float32')\r\n\t# pre-process raw images for inception v3 model\r\n\tprocessed = preprocess_input(processed)\r\n\t# predict class probabilities for images\r\n\tyhat = model.predict(processed)\r\n\t# enumerate splits of images\/predictions\r\n\tscores = list()\r\n\tn_part = floor(images.shape[0] \/ n_split)\r\n\tfor i in range(n_split):\r\n\t\t# retrieve p(y|x)\r\n\t\tix_start, ix_end = i * n_part, i * n_part + n_part\r\n\t\tp_yx = yhat[ix_start:ix_end]\r\n\t\t# calculate p(y)\r\n\t\tp_y = expand_dims(p_yx.mean(axis=0), 0)\r\n\t\t# calculate KL divergence using log probabilities\r\n\t\tkl_d = p_yx * (log(p_yx + eps) - log(p_y + eps))\r\n\t\t# sum over classes\r\n\t\tsum_kl_d = kl_d.sum(axis=1)\r\n\t\t# average over images\r\n\t\tavg_kl_d = mean(sum_kl_d)\r\n\t\t# undo the log\r\n\t\tis_score = exp(avg_kl_d)\r\n\t\t# store\r\n\t\tscores.append(is_score)\r\n\t# average across images\r\n\tis_avg, is_std = mean(scores), std(scores)\r\n\treturn is_avg, is_std<\/pre>\n<p>We can test this function with 50 artificial images with the value 1.0 for all pixels.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# pretend to load images\r\nimages = ones((50, 299, 299, 3))\r\nprint('loaded', images.shape)<\/pre>\n<p>This will calculate the score for each group of five images and the low quality would suggest that an average inception score of 1.0 will be reported.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># calculate inception score with Keras\r\nfrom math import floor\r\nfrom numpy import ones\r\nfrom numpy import expand_dims\r\nfrom numpy import log\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom numpy import exp\r\nfrom keras.applications.inception_v3 import InceptionV3\r\nfrom keras.applications.inception_v3 import preprocess_input\r\n\r\n# assumes images have the shape 299x299x3, pixels in [0,255]\r\ndef calculate_inception_score(images, n_split=10, eps=1E-16):\r\n\t# load inception v3 model\r\n\tmodel = InceptionV3()\r\n\t# convert from uint8 to float32\r\n\tprocessed = images.astype('float32')\r\n\t# pre-process raw images for inception v3 model\r\n\tprocessed = preprocess_input(processed)\r\n\t# predict class probabilities for images\r\n\tyhat = model.predict(processed)\r\n\t# enumerate splits of images\/predictions\r\n\tscores = list()\r\n\tn_part = floor(images.shape[0] \/ n_split)\r\n\tfor i in range(n_split):\r\n\t\t# retrieve p(y|x)\r\n\t\tix_start, ix_end = i * n_part, i * n_part + n_part\r\n\t\tp_yx = yhat[ix_start:ix_end]\r\n\t\t# calculate p(y)\r\n\t\tp_y = expand_dims(p_yx.mean(axis=0), 0)\r\n\t\t# calculate KL divergence using log probabilities\r\n\t\tkl_d = p_yx * (log(p_yx + eps) - log(p_y + eps))\r\n\t\t# sum over classes\r\n\t\tsum_kl_d = kl_d.sum(axis=1)\r\n\t\t# average over images\r\n\t\tavg_kl_d = mean(sum_kl_d)\r\n\t\t# undo the log\r\n\t\tis_score = exp(avg_kl_d)\r\n\t\t# store\r\n\t\tscores.append(is_score)\r\n\t# average across images\r\n\tis_avg, is_std = mean(scores), std(scores)\r\n\treturn is_avg, is_std\r\n\r\n# pretend to load images\r\nimages = ones((50, 299, 299, 3))\r\nprint('loaded', images.shape)\r\n# calculate inception score\r\nis_avg, is_std = calculate_inception_score(images)\r\nprint('score', is_avg, is_std)<\/pre>\n<p>Running the example first defines the 50 fake images, then calculates the inception score on each batch and reports the expected inception score of 1.0, with a standard deviation of 0.0.<\/p>\n<p><strong>Note<\/strong>: the first time the InceptionV3 model is used, Keras will download the model weights and save them into the <em>~\/.keras\/models\/<\/em> directory on your workstation. The weights are about 100 megabytes and may take a moment to download depending on the speed of your internet connection.<\/p>\n<pre class=\"crayon-plain-tag\">loaded (50, 299, 299, 3)\r\nscore 1.0 0.0<\/pre>\n<p>We can test the calculation of the inception score on some real images.<\/p>\n<p>The Keras API provides access to the <a href=\"https:\/\/machinelearningmastery.com\/how-to-develop-a-cnn-from-scratch-for-cifar-10-photo-classification\/\">CIFAR-10 dataset<\/a>.<\/p>\n<p>These are color photos with the small size of 32\u00d732 pixels. First, we can split the images into groups, then upsample the images to the expected size of 299\u00d7299, preprocess the pixel values, predict the class probabilities, then calculate the inception score.<\/p>\n<p>This will be a useful example if you intend to calculate the inception score on your own generated images, as you may have to either scale the images to the expected size for the inception v3 model or change the model to perform the upsampling for you.<\/p>\n<p>First, the images can be loaded and shuffled to ensure each split covers a diverse set of classes.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# load cifar10 images\r\n(images, _), (_, _) = cifar10.load_data()\r\n# shuffle images\r\nshuffle(images)<\/pre>\n<p>Next, we need a way to scale the images.<\/p>\n<p>We will use the <a href=\"https:\/\/scikit-image.org\/\">scikit-image library<\/a> to resize the NumPy array of pixel values to the required size. The <em>scale_images()<\/em> function below implements this.<\/p>\n<pre class=\"crayon-plain-tag\"># scale an array of images to a new size\r\ndef scale_images(images, new_shape):\r\n\timages_list = list()\r\n\tfor image in images:\r\n\t\t# resize with nearest neighbor interpolation\r\n\t\tnew_image = resize(image, new_shape, 0)\r\n\t\t# store\r\n\t\timages_list.append(new_image)\r\n\treturn asarray(images_list)<\/pre>\n<p>Note, you may have to install the scikit-image library if it is not already installed. This can be achieved as follows:<\/p>\n<pre class=\"crayon-plain-tag\">sudo pip install scikit-image<\/pre>\n<p>We can then enumerate the number of splits, select a subset of the images, scale them, pre-process them, and use the model to predict the conditional class probabilities.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# retrieve images\r\nix_start, ix_end = i * n_part, (i+1) * n_part\r\nsubset = images[ix_start:ix_end]\r\n# convert from uint8 to float32\r\nsubset = subset.astype('float32')\r\n# scale images to the required size\r\nsubset = scale_images(subset, (299,299,3))\r\n# pre-process images, scale to [-1,1]\r\nsubset = preprocess_input(subset)\r\n# predict p(y|x)\r\np_yx = model.predict(subset)<\/pre>\n<p>The rest of the calculation of the inception score is the same.<\/p>\n<p>Tying this all together, the complete example for calculating the inception score on the real CIFAR-10 training dataset is listed below.<\/p>\n<p>Based on the similar calculation reported in the original inception score paper, we would expect the reported score on this dataset to be approximately 11. Interestingly, the <a href=\"https:\/\/paperswithcode.com\/sota\/image-generation-on-cifar-10\">best inception score for CIFAR-10<\/a> with generated images is about 8.8 at the time of writing using a progressive growing GAN.<\/p>\n<pre class=\"crayon-plain-tag\"># calculate inception score for cifar-10 in Keras\r\nfrom math import floor\r\nfrom numpy import ones\r\nfrom numpy import expand_dims\r\nfrom numpy import log\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom numpy import exp\r\nfrom numpy.random import shuffle\r\nfrom keras.applications.inception_v3 import InceptionV3\r\nfrom keras.applications.inception_v3 import preprocess_input\r\nfrom keras.datasets import cifar10\r\nfrom skimage.transform import resize\r\nfrom numpy import asarray\r\n\r\n# scale an array of images to a new size\r\ndef scale_images(images, new_shape):\r\n\timages_list = list()\r\n\tfor image in images:\r\n\t\t# resize with nearest neighbor interpolation\r\n\t\tnew_image = resize(image, new_shape, 0)\r\n\t\t# store\r\n\t\timages_list.append(new_image)\r\n\treturn asarray(images_list)\r\n\r\n# assumes images have any shape and pixels in [0,255]\r\ndef calculate_inception_score(images, n_split=10, eps=1E-16):\r\n\t# load inception v3 model\r\n\tmodel = InceptionV3()\r\n\t# enumerate splits of images\/predictions\r\n\tscores = list()\r\n\tn_part = floor(images.shape[0] \/ n_split)\r\n\tfor i in range(n_split):\r\n\t\t# retrieve images\r\n\t\tix_start, ix_end = i * n_part, (i+1) * n_part\r\n\t\tsubset = images[ix_start:ix_end]\r\n\t\t# convert from uint8 to float32\r\n\t\tsubset = subset.astype('float32')\r\n\t\t# scale images to the required size\r\n\t\tsubset = scale_images(subset, (299,299,3))\r\n\t\t# pre-process images, scale to [-1,1]\r\n\t\tsubset = preprocess_input(subset)\r\n\t\t# predict p(y|x)\r\n\t\tp_yx = model.predict(subset)\r\n\t\t# calculate p(y)\r\n\t\tp_y = expand_dims(p_yx.mean(axis=0), 0)\r\n\t\t# calculate KL divergence using log probabilities\r\n\t\tkl_d = p_yx * (log(p_yx + eps) - log(p_y + eps))\r\n\t\t# sum over classes\r\n\t\tsum_kl_d = kl_d.sum(axis=1)\r\n\t\t# average over images\r\n\t\tavg_kl_d = mean(sum_kl_d)\r\n\t\t# undo the log\r\n\t\tis_score = exp(avg_kl_d)\r\n\t\t# store\r\n\t\tscores.append(is_score)\r\n\t# average across images\r\n\tis_avg, is_std = mean(scores), std(scores)\r\n\treturn is_avg, is_std\r\n\r\n# load cifar10 images\r\n(images, _), (_, _) = cifar10.load_data()\r\n# shuffle images\r\nshuffle(images)\r\nprint('loaded', images.shape)\r\n# calculate inception score\r\nis_avg, is_std = calculate_inception_score(images)\r\nprint('score', is_avg, is_std)<\/pre>\n<p>Running the example loads the dataset, prepares the model, and calculates the inception score on the CIFAR-10 training dataset.<\/p>\n<p>We can see that the score is 11.3, which is close to the expected score of 11.24.<\/p>\n<p><strong>Note<\/strong>: the first time that the CIFAR-10 dataset is used, Keras will download the images in a compressed format and store them in the <em>~\/.keras\/datasets\/<\/em> directory. The download is about 161 megabytes and may take a few minutes based on the speed of your internet connection.<\/p>\n<pre class=\"crayon-plain-tag\">loaded (50000, 32, 32, 3)\r\nscore 11.317895 0.14821531<\/pre>\n<\/p>\n<h2>Problems With the Inception Score<\/h2>\n<p>The inception score is effective, but it is not perfect.<\/p>\n<p>Generally, the inception score is appropriate for generated images of objects known to the model used to calculate the conditional class probabilities.<\/p>\n<p>In this case, because the inception v3 model is used, this means that it is most suitable for 1,000 object types used in the <a href=\"http:\/\/image-net.org\/challenges\/LSVRC\/2012\/\">ILSVRC 2012 dataset<\/a>. This is a lot of classes, but not all objects that may interest us.<\/p>\n<p>You can see a full list of the classes here:<\/p>\n<ul>\n<li><a href=\"http:\/\/image-net.org\/challenges\/LSVRC\/2012\/browse-synsets\">1,000 Object Classes of the ILSVRC 2012 dataset<\/a>.<\/li>\n<\/ul>\n<p>It also requires that the images are square and have the relatively small size of about 300\u00d7300 pixels, including any scaling required to get your generated images to that size.<\/p>\n<p>A good score also requires having a good distribution of generated images across the possible objects supported by the model, and close to an even number of examples for each class. This can be hard to control for many GAN models that don\u2019t offer controls over the types of objects generated.<\/p>\n<p><a href=\"http:\/\/web.stanford.edu\/~sbarratt\/\">Shane Barratt<\/a> and Rishi Sharma take a closer look at the inception score and list a number of technical issues and edge cases in there 2018 paper titled \u201c<a href=\"https:\/\/arxiv.org\/abs\/1801.01973\">A Note on the Inception Score<\/a>.\u201d This is a good reference if you wish to dive deeper.<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Papers<\/h3>\n<ul>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1606.03498\">Improved Techniques for Training GANs<\/a>, 2016.<\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1801.01973\">A Note on the Inception Score<\/a>, 2018.<\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1512.00567\">Rethinking the Inception Architecture for Computer Vision<\/a>, 2015.<\/li>\n<\/ul>\n<h3>Projects<\/h3>\n<ul>\n<li><a href=\"https:\/\/github.com\/openai\/improved-gan\">Code for the paper \u201cImproved Techniques for Training GANs\u201d<\/a><\/li>\n<li><a href=\"http:\/\/image-net.org\/challenges\/LSVRC\/2012\/\">Large Scale Visual Recognition Challenge 2012 (ILSVRC2012)<\/a><\/li>\n<\/ul>\n<h3>API<\/h3>\n<ul>\n<li><a href=\"https:\/\/keras.io\/applications\/#inceptionv3\">Keras Inception v3 Model<\/a><\/li>\n<li><a href=\"https:\/\/scikit-image.org\/\">scikit-image Library<\/a><\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/paperswithcode.com\/sota\/image-generation-on-cifar-10\">Image Generation on CIFAR-10<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/openai\/improved-gan\/issues\/29\">Inception Score calculation<\/a>, 2017.<\/li>\n<li><a href=\"https:\/\/medium.com\/octavian-ai\/a-simple-explanation-of-the-inception-score-372dff6a8c7a\">A simple explanation of the Inception Score<\/a><\/li>\n<li><a href=\"https:\/\/sudomake.ai\/inception-score-explained\/\">Inception Score \u2014 evaluating the realism of your GAN<\/a>, 2018.<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence\">Kullback\u2013Leibler divergence, Wikipedia.<\/a><\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Entropy_(information_theory)\">Entropy (information theory), Wikipedia.<\/a><\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered the inception score for evaluating the quality of generated images.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>How to calculate the inception score and the intuition behind what it measures.<\/li>\n<li>How to implement the inception score in Python with NumPy and the Keras deep learning library.<\/li>\n<li>How to calculate the inception score for small images such as those in the CIFAR-10 dataset.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/how-to-implement-the-inception-score-from-scratch-for-evaluating-generated-images\/\">How to Implement the Inception Score (IS) for Evaluating GANs<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/how-to-implement-the-inception-score-from-scratch-for-evaluating-generated-images\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Generative Adversarial Networks, or GANs for short, is a deep learning neural network architecture for training a generator model for generating synthetic [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/08\/27\/how-to-implement-the-inception-score-is-for-evaluating-gans\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2507,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2506"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2506"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2506\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2507"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2506"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2506"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2506"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}