{"id":2289,"date":"2019-06-23T19:00:38","date_gmt":"2019-06-23T19:00:38","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/06\/23\/a-gentle-introduction-to-upsampling-and-transpose-convolution-layers-for-gans\/"},"modified":"2019-06-23T19:00:38","modified_gmt":"2019-06-23T19:00:38","slug":"a-gentle-introduction-to-upsampling-and-transpose-convolution-layers-for-gans","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/06\/23\/a-gentle-introduction-to-upsampling-and-transpose-convolution-layers-for-gans\/","title":{"rendered":"A Gentle Introduction to Upsampling and Transpose Convolution Layers for GANs"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Generative Adversarial Networks, or GANs, are an architecture for training generative models, such as deep convolutional neural networks for generating images.<\/p>\n<p>The GAN architecture is comprised of both a generator and a discriminator model. The generator is responsible for creating new outputs, such as images, that plausibly could have come from the original dataset. The generator model is typically implemented using a deep convolutional neural network and results-specialized layers that learn to fill in features in an image rather than extract features from an input image.<\/p>\n<p>Two common types of layers that can be used in the generator model are a upsample layer that simply doubles the dimensions of the input and the transpose convolutional layer that performs an inverse convolution operation.<\/p>\n<p>In this tutorial, you will discover how to use Upsampling and Transpose Convolutional Layers in Generative Adversarial Networks when generating images.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Generative models in the GAN architecture are required to upsample input data in order to generate an output image.<\/li>\n<li>The Upsampling layer is a simple layer with no weights that will double the dimensions of input and can be used in a generative model when followed by a traditional convolutional layer.<\/li>\n<li>The Transpose Convolutional layer is an inverse convolutional layer that will both upsample input and learn how to fill in details during the model training process.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_8080\" style=\"width: 649px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-8080\" class=\"size-full wp-image-8080\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/06\/A-Gentle-Introduction-to-Upsampling-and-Transpose-Convolution-Layers-for-Generative-Adversarial-Networks.jpg\" alt=\"A Gentle Introduction to Upsampling and Transpose Convolution Layers for Generative Adversarial Networks\" width=\"639\" height=\"426\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/06\/A-Gentle-Introduction-to-Upsampling-and-Transpose-Convolution-Layers-for-Generative-Adversarial-Networks.jpg 639w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/06\/A-Gentle-Introduction-to-Upsampling-and-Transpose-Convolution-Layers-for-Generative-Adversarial-Networks-300x200.jpg 300w\" sizes=\"(max-width: 639px) 100vw, 639px\"><\/p>\n<p id=\"caption-attachment-8080\" class=\"wp-caption-text\">A Gentle Introduction to Upsampling and Transpose Convolution Layers for Generative Adversarial Networks<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/blmnevada\/29182914095\/\">BLM Nevada<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ul>\n<li>Need for Upsampling in GANs<\/li>\n<li>How to Use the Upsampling Layer<\/li>\n<li>How to Use the Transpose Convolutional Layer<\/li>\n<\/ul>\n<h2>Need for Upsampling in Generative Adversarial Networks<\/h2>\n<p>Generative Adversarial Networks are an architecture for neural networks for training a generative model.<\/p>\n<p>The architecture is comprised of a generator and a discriminator model, both of which are implemented as a deep convolutional neural network. The discriminator is responsible for classifying images as either real (from the domain) or fake (generated). The generator is responsible for generating new plausible examples from the problem domain.<\/p>\n<p>The generator works by taking a random point from the latent space as input and outputting a complete image, in a one-shot manner.<\/p>\n<p>A traditional convolutional neural network for image classification, and related tasks, will use <a href=\"https:\/\/machinelearningmastery.com\/pooling-layers-for-convolutional-neural-networks\/\">pooling layers<\/a> to downsample input images. For example, an average pooling or max pooling layer will reduce the feature maps from a convolutional by half on each dimension, resulting in an output that is one quarter the area of the input.<\/p>\n<p>Convolutional layers themselves also perform a form of downsampling by applying each filter across the input images or feature maps; the resulting activations are an output feature map that is smaller because of the border effects. Often <a href=\"https:\/\/machinelearningmastery.com\/padding-and-stride-for-convolutional-neural-networks\/\">padding<\/a> is used to counter this effect.<\/p>\n<p>The generator model in a GAN requires an inverse operation of a pooling layer in a traditional convolutional layer. It needs a layer to translate from coarse salient features to a more dense and detailed output.<\/p>\n<p>A simple version of an unpooling or opposite pooling layer is called an upsampling layer. It works by repeating the rows and columns of the input.<\/p>\n<p>A more elaborate approach is to perform a backwards convolutional operation, originally referred to as a deconvolution, which is incorrect, but is more commonly referred to as a fractional convolutional layer or a transposed convolutional layer.<\/p>\n<p>Both of these layers can be used on a GAN to perform the required upsampling operation to transform a small input into a large image output.<\/p>\n<p>In the following sections, we will take a closer look at each and develop an intuition for how they work so that we can use them effectively in our GAN models.<\/p>\n<h2>How to Use the Upsampling Layer<\/h2>\n<p>Perhaps the simplest way to upsample an input is to double each row and column.<\/p>\n<p>For example, an input image with the shape 2\u00d72 would be output as 4\u00d74.<\/p>\n<pre class=\"crayon-plain-tag\">1, 2\r\nInput = (3, 4)\r\n\r\n          1, 1, 2, 2\r\nOutput = (1, 1, 2, 2)\r\n          3, 3, 4, 4\r\n          3, 3, 4, 4<\/pre>\n<\/p>\n<h3>Worked Example Using the UpSampling2D Layer<\/h3>\n<p>The Keras deep learning library provides this capability in a layer called <em>UpSampling2D<\/em>.<\/p>\n<p>It can be added to a convolutional neural network and repeats the rows and columns provided as input in the output. For example:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(UpSampling2D())<\/pre>\n<p>We can demonstrate the behavior of this layer with a simple contrived example.<\/p>\n<p>First, we can define a contrived input image that is 2\u00d72 pixels. We can use specific values for each pixel so that after upsampling, we can see exactly what effect the operation had on the input.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define input data\r\nX = asarray([[1, 2],\r\n\t\t\t [3, 4]])\r\n# show input data for context\r\nprint(X)<\/pre>\n<p>Once the image is defined, we must add a channel dimension (e.g. grayscale) and also a sample dimension (e.g. we have 1 sample) so that we can pass it as input to the model.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# reshape input data into one sample a sample with a channel\r\nX = X.reshape((1, 2, 2, 1))<\/pre>\n<p>We can now define our model.<\/p>\n<p>The model has only the <em>UpSampling2D<\/em> layer which takes 2\u00d72 grayscale images as input directly and outputs the result of the upsampling operation.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(UpSampling2D(input_shape=(2, 2, 1)))\r\n# summarize the model\r\nmodel.summary()<\/pre>\n<p>We can then use the model to make a prediction, that is upsample a provided input image.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# make a prediction with the model\r\nyhat = model.predict(X)<\/pre>\n<p>The output will have four dimensions, like the input, therefore, we can convert it back to a 2\u00d72 array to make it easier to review the result.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# reshape output to remove channel to make printing easier\r\nyhat = yhat.reshape((4, 4))\r\n# summarize output\r\nprint(yhat)<\/pre>\n<p>Tying all of this together, the complete example of using the <em>UpSampling2D<\/em> layer in Keras is provided below.<\/p>\n<pre class=\"crayon-plain-tag\"># example of using the upsampling layer\r\nfrom numpy import asarray\r\nfrom keras.models import Sequential\r\nfrom keras.layers import UpSampling2D\r\n# define input data\r\nX = asarray([[1, 2],\r\n\t\t\t [3, 4]])\r\n# show input data for context\r\nprint(X)\r\n# reshape input data into one sample a sample with a channel\r\nX = X.reshape((1, 2, 2, 1))\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(UpSampling2D(input_shape=(2, 2, 1)))\r\n# summarize the model\r\nmodel.summary()\r\n# make a prediction with the model\r\nyhat = model.predict(X)\r\n# reshape output to remove channel to make printing easier\r\nyhat = yhat.reshape((4, 4))\r\n# summarize output\r\nprint(yhat)<\/pre>\n<p>Running the example first creates and summarizes our 2\u00d72 input data.<\/p>\n<p>Next, the model is summarized. We can see that it will output a 4\u00d74 result as we expect, and importantly, the layer has no parameters or model weights. This is because it is not learning anything; it is just doubling the input.<\/p>\n<p>Finally, the model is used to upsample our input, resulting in a doubling of each row and column for our input data, as we expected.<\/p>\n<pre class=\"crayon-plain-tag\">[[1 2]\r\n [3 4]]\r\n\r\n_________________________________________________________________\r\nLayer (type)                 Output Shape              Param #\r\n=================================================================\r\nup_sampling2d_1 (UpSampling2 (None, 4, 4, 1)           0\r\n=================================================================\r\nTotal params: 0\r\nTrainable params: 0\r\nNon-trainable params: 0\r\n_________________________________________________________________\r\n\r\n\r\n[[1. 1. 2. 2.]\r\n [1. 1. 2. 2.]\r\n [3. 3. 4. 4.]\r\n [3. 3. 4. 4.]]<\/pre>\n<p>By default, the <em>UpSampling2D<\/em> will double each input dimension. This is defined by the \u2018<em>size<\/em>\u2018 argument that is set to the tuple (2,2).<\/p>\n<p>You may want to use different factors on each dimension, such as double the width and triple the height. This could be achieved by setting the \u2018<em>size<\/em>\u2018 argument to (2, 3). The result of applying this operation to a 2\u00d72 image would be a 4\u00d76 output image (e.g. 2\u00d72 and 2\u00d73). For example:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# example of using different scale factors for each dimension\r\nmodel.add(UpSampling2D(size=(2, 3)))<\/pre>\n<p>Additionally, by default, the <em>UpSampling2D<\/em> layer will use a nearest neighbor algorithm to fill in the new rows and columns. This has the effect of simply doubling rows and columns, as described and is specified by the \u2018<em>interpolation<\/em>\u2018 argument set to \u2018<em>nearest<\/em>\u2018.<\/p>\n<p>Alternately, a bilinear interpolation method can be used which draws upon multiple surrounding points. This can be specified via setting the \u2018<em>interpolation<\/em>\u2018 argument to \u2018<em>bilinear<\/em>\u2018. For example:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# example of using bilinear interpolation when upsampling\r\nmodel.add(UpSampling2D(interpolation='bilinear'))<\/pre>\n<\/p>\n<h3>Simple Generator Model With the UpSampling2D Layer<\/h3>\n<p>The <em>UpSampling2D<\/em> layer is simple and effective, although does not perform any learning.<\/p>\n<p>It is not able to fill in useful detail in the upsampling operation. To be useful in a GAN, each <em>UpSampling2D<\/em> layer must be followed by a <a href=\"https:\/\/machinelearningmastery.com\/convolutional-layers-for-deep-learning-neural-networks\/\">Conv2D layer<\/a> that will learn to interpret the doubled input and be trained to translate it into meaningful detail.<\/p>\n<p>We can demonstrate this with an example.<\/p>\n<p>In this case, our little GAN generator model must produce a 10\u00d710 image and take a 100 element vector from the latent space as input.<\/p>\n<p>First, a Dense fully connected layer can be used to interpret the input vector and create a sufficient number of activations (outputs) that can be reshaped into a low-resolution version of our output image, in this case, 128 versions of a 5\u00d75 image.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define model\r\nmodel = Sequential()\r\n# define input shape, output enough activations for for 128 5x5 image\r\nmodel.add(Dense(128 * 5 * 5, input_dim=100))\r\n# reshape vector of activations into 128 feature maps with 5x5\r\nmodel.add(Reshape((5, 5, 128)))<\/pre>\n<p>Next, the 5\u00d75 feature maps can be upsampled to a 10\u00d710 feature map.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# double input from 128 5x5 to 1 10x10 feature map\r\nmodel.add(UpSampling2D())<\/pre>\n<p>Finally, the upsampled feature maps can be interpreted and filled in with hopefully useful detail by a Conv2D layer.<\/p>\n<p>The Conv2D has a single feature map as output to create the single image we require.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# fill in detail in the upsampled feature maps\r\nmodel.add(Conv2D(1, (3,3), padding='same'))<\/pre>\n<p>Tying this together, the complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># example of using upsampling in a simple generator model\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Dense\r\nfrom keras.layers import Reshape\r\nfrom keras.layers import UpSampling2D\r\nfrom keras.layers import Conv2D\r\n# define model\r\nmodel = Sequential()\r\n# define input shape, output enough activations for for 128 5x5 image\r\nmodel.add(Dense(128 * 5 * 5, input_dim=100))\r\n# reshape vector of activations into 128 feature maps with 5x5\r\nmodel.add(Reshape((5, 5, 128)))\r\n# double input from 128 5x5 to 1 10x10 feature map\r\nmodel.add(UpSampling2D())\r\n# fill in detail in the upsampled feature maps and output a single image\r\nmodel.add(Conv2D(1, (3,3), padding='same'))\r\n# summarize model\r\nmodel.summary()<\/pre>\n<p>Running the example creates the model and summarizes the output shape of each layer.<\/p>\n<p>We can see that the Dense layer outputs 3,200 activations that are then reshaped into 128 feature maps with the shape 5\u00d75.<\/p>\n<p>The widths and heights are doubled to 10\u00d710 by the <em>UpSampling2D<\/em> layer, resulting in a feature map with quadruple the area.<\/p>\n<p>Finally, the Conv2D processes these feature maps and adds in detail, outputting a single 10\u00d710 image.<\/p>\n<pre class=\"crayon-plain-tag\">_________________________________________________________________\r\nLayer (type)                 Output Shape              Param #\r\n=================================================================\r\ndense_1 (Dense)              (None, 3200)              323200\r\n_________________________________________________________________\r\nreshape_1 (Reshape)          (None, 5, 5, 128)         0\r\n_________________________________________________________________\r\nup_sampling2d_1 (UpSampling2 (None, 10, 10, 128)       0\r\n_________________________________________________________________\r\nconv2d_1 (Conv2D)            (None, 10, 10, 1)         1153\r\n=================================================================\r\nTotal params: 324,353\r\nTrainable params: 324,353\r\nNon-trainable params: 0\r\n_________________________________________________________________<\/pre>\n<\/p>\n<h2>How to Use the Transpose Convolutional Layer<\/h2>\n<p>The transpose convolutional layer is more complex than a simple upsampling layer.<\/p>\n<p>A simple way to think about it is that it both performs the upsample operation and interprets the coarse input data to fill in the detail while it is upsampling. It is like a layer that combines the <em>UpSampling2D<\/em> and Conv2D layers into one layer. This is a crude understanding, but a practical starting point.<\/p>\n<blockquote>\n<p>The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/arxiv.org\/abs\/1603.07285\">A Guide To Convolution Arithmetic For Deep Learning<\/a>, 2016.<\/p>\n<p>In fact, the transpose convolutional layer performs an inverse convolution operation.<\/p>\n<p>Specifically, the forward and backward passes of the convolutional layer are reversed.<\/p>\n<blockquote>\n<p>One way to put it is to note that the kernel defines a convolution, but whether it\u2019s a direct convolution or a transposed convolution is determined by how the forward and backward passes are computed.<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/arxiv.org\/abs\/1603.07285\">A Guide To Convolution Arithmetic For Deep Learning<\/a>, 2016.<\/p>\n<p>It is sometimes called a deconvolution or deconvolutional layer and models that use these layers can be referred to as deconvolutional networks, or deconvnets.<\/p>\n<blockquote>\n<p>A deconvnet can be thought of as a convnet model that uses the same components (filtering, pooling) but in reverse, so instead of mapping pixels to features does the opposite.<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/arxiv.org\/abs\/1311.2901\">Visualizing and Understanding Convolutional Networks<\/a>, 2013.<\/p>\n<p>Referring to this operation as a deconvolution is technically incorrect as a deconvolution is a specific mathematical operation not performed by this layer.<\/p>\n<p>In fact, the traditional <a href=\"https:\/\/machinelearningmastery.com\/convolutional-layers-for-deep-learning-neural-networks\/\">convolutional layer<\/a> does not technically perform a convolutional operation, it performs a cross-correlation.<\/p>\n<blockquote>\n<p>The deconvolution layer, to which people commonly refer, first appears in Zeiler\u2019s paper as part of the deconvolutional network but does not have a specific name. [\u2026] It also has many names including (but not limited to) sub\u00adpixel or fractional convolutional layer, transposed convolutional layer, inverse, up or backward convolutional layer.<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/arxiv.org\/abs\/1609.07009\">Is the deconvolution layer the same as a convolutional layer?<\/a>, 2016.<\/p>\n<p>It is a very flexible layer, although we will focus on its use in the generative models from upsampling an input image.<\/p>\n<p>The transpose convolutional layer is much like a normal convolutional layer. It requires that you specify the number of filters and the kernel size of each filter. The key to the layer is the stride.<\/p>\n<p>Typically, the <a href=\"https:\/\/machinelearningmastery.com\/padding-and-stride-for-convolutional-neural-networks\/\">stride of a convolutional layer<\/a> is (1\u00d71), that is a filter is moved along one pixel horizontally for each read from left-to-right, then down pixel for the next row of reads. A stride of 2\u00d72 on a normal convolutional layer has the effect of downsampling the input, much like a <a href=\"https:\/\/machinelearningmastery.com\/pooling-layers-for-convolutional-neural-networks\/\">pooling layer<\/a>. In fact, a 2\u00d72 stride can be used instead of a pooling layer in the discriminator model.<\/p>\n<p>The transpose convolutional layer is like an inverse convolutional layer. As such, you would intuitively think that a 2\u00d72 stride would upsample the input instead of downsample, which is exactly what happens.<\/p>\n<p>Stride or strides refers to the manner of a filter scanning across an input in a traditional convolutional layer. Whereas, in a transpose convolutional layer, stride refers to the manner in which outputs in the feature map are laid down.<\/p>\n<p>This effect can be implemented with a normal convolutional layer using a fractional input stride (f), e.g. with a stride of f=1\/2. When inverted, the output stride is set to the numerator of this fraction, e.g. f=2.<\/p>\n<blockquote>\n<p>In a sense, upsampling with factor f is convolution with a fractional input stride of 1\/f. So long as f is integral, a natural way to upsample is therefore backwards convolution (sometimes called deconvolution) with an output stride of f.<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/arxiv.org\/abs\/1411.4038\">Fully Convolutional Networks for Semantic Segmentation<\/a>, 2014.<\/p>\n<p>One way that this effect can be achieved with a normal convolutional layer is by inserting new rows and columns of 0.0 values in the input data.<\/p>\n<blockquote>\n<p>Finally note that it is always possible to emulate a transposed convolution with a direct convolution. The disadvantage is that it usually involves adding many columns and rows of zeros to the input \u2026<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/arxiv.org\/abs\/1603.07285\">A Guide To Convolution Arithmetic For Deep Learning<\/a>, 2016.<\/p>\n<p>Let\u2019s make this concrete with an example.<\/p>\n<p>Consider an input image wit the size 2\u00d72 as follows:<\/p>\n<pre class=\"crayon-plain-tag\">1, 2\r\nInput = (3, 4)<\/pre>\n<p>Assuming a single filter with a 1\u00d71 kernel and model weights that result in no changes to the inputs when output (e.g. a model weight of 1.0 and a bias of 0.0), a transpose convolutional operation with an output stride of 1\u00d71 will reproduce the output as-is:<\/p>\n<pre class=\"crayon-plain-tag\">1, 2\r\nOutput = (3, 4)<\/pre>\n<p>With an output stride of (2,2), the 1\u00d71 convolution requires the insertion of additional rows and columns into the input image so that the reads of the operation can be performed. Therefore, the input looks as follows:<\/p>\n<pre class=\"crayon-plain-tag\">1, 0, 2, 0\r\nInput = (0, 0, 0, 0)\r\n         3, 0, 4, 0\r\n         0, 0, 0, 0<\/pre>\n<p>The model can then read across this input using an output stride of (2,2) and will output a 4\u00d74 image, in this case with no change as our model weights have no effect by design:<\/p>\n<pre class=\"crayon-plain-tag\">1, 0, 2, 0\r\nOutput = (0, 0, 0, 0)\r\n          3, 0, 4, 0\r\n          0, 0, 0, 0<\/pre>\n<\/p>\n<h3>Worked Example Using the Conv2DTranspose Layer<\/h3>\n<p>Keras provides the transpose convolution capability via the Conv2DTranspose layer.<\/p>\n<p>It can be added to your model directly; for example:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Conv2DTranspose(...))<\/pre>\n<p>We can demonstrate the behavior of this layer with a simple contrived example.<\/p>\n<p>First, we can define a contrived input image that is 2\u00d72 pixels, as we did in the previous section. We can use specific values for each pixel so that after the transpose convolutional operation, we can see exactly what effect the operation had on the input.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define input data\r\nX = asarray([[1, 2],\r\n\t\t\t [3, 4]])\r\n# show input data for context\r\nprint(X)<\/pre>\n<p>Once the image is defined, we must add a channel dimension (e.g. grayscale) and also a sample dimension (e.g. we have 1 sample) so that we can pass it as input to the model.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# reshape input data into one sample a sample with a channel\r\nX = X.reshape((1, 2, 2, 1))<\/pre>\n<p>We can now define our model.<\/p>\n<p>The model has only the <em>Conv2DTranspose<\/em> layer, which takes 2\u00d72 grayscale images as input directly and outputs the result of the operation.<\/p>\n<p>The Conv2DTranspose both upsamples and performs a convolution. As such, we must specify both the number of filters and the size of the filters as we do for Conv2D layers. Additionally, we must specify a stride of (2,2) because the upsampling is achieved by the stride behavior of the convolution on the input.<\/p>\n<p>Specifying a stride of (2,2) has the effect of spacing out the input. Specifically, rows and columns of 0.0 values are inserted to achieve the desired stride.<\/p>\n<p>In this example, we will use one filter, with a 1\u00d71 kernel and a stride of 2\u00d72 so that the 2\u00d72 input image is upsampled to 4\u00d74.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Conv2DTranspose(1, (1,1), strides=(2,2), input_shape=(2, 2, 1)))\r\n# summarize the model\r\nmodel.summary()<\/pre>\n<p>To make it clear what the <em>Conv2DTranspose<\/em> layer is doing, we will fix the single weight in the single filter to the value of 1.0 and use a bias value of 0.0.<\/p>\n<p>These weights, along with a kernel size of (1,1) will mean that values in the input will be multiplied by 1 and output as-is, and the 0 values in the new rows and columns added via the stride of 2\u00d72 will be output as 0 (e.g. 1 * 0 in each case).<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define weights that they do nothing\r\nweights = [asarray([[[[1]]]]), asarray([0])]\r\n# store the weights in the model\r\nmodel.set_weights(weights)<\/pre>\n<p>We can then use the model to make a prediction, that is upsample a provided input image.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# make a prediction with the model\r\nyhat = model.predict(X)<\/pre>\n<p>The output will have four dimensions, like the input, therefore, we can convert it back to a 2\u00d72 array to make it easier to review the result.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# reshape output to remove channel to make printing easier\r\nyhat = yhat.reshape((4, 4))\r\n# summarize output\r\nprint(yhat)<\/pre>\n<p>Tying all of this together, the complete example of using the <em>Conv2DTranspose<\/em>\u00a0layer in Keras is provided below.<\/p>\n<pre class=\"crayon-plain-tag\"># example of using the transpose convolutional layer\r\nfrom numpy import asarray\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2DTranspose\r\n# define input data\r\nX = asarray([[1, 2],\r\n\t\t\t [3, 4]])\r\n# show input data for context\r\nprint(X)\r\n# reshape input data into one sample a sample with a channel\r\nX = X.reshape((1, 2, 2, 1))\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Conv2DTranspose(1, (1,1), strides=(2,2), input_shape=(2, 2, 1)))\r\n# summarize the model\r\nmodel.summary()\r\n# define weights that they do nothing\r\nweights = [asarray([[[[1]]]]), asarray([0])]\r\n# store the weights in the model\r\nmodel.set_weights(weights)\r\n# make a prediction with the model\r\nyhat = model.predict(X)\r\n# reshape output to remove channel to make printing easier\r\nyhat = yhat.reshape((4, 4))\r\n# summarize output\r\nprint(yhat)<\/pre>\n<p>Running the example first creates and summarizes our 2\u00d72 input data.<\/p>\n<p>Next, the model is summarized. We can see that it will output a 4\u00d74 result as we expect, and importantly, the layer two parameters or model weights. One for the single 1\u00d71 filter and one for the bias. Unlike the <em>UpSample2D<\/em> layer, the <em>Conv2DTranspose<\/em> will learn during training and will attempt to fill in detail as part of the upsampling process.<\/p>\n<p>Finally, the model is used to upsample our input. We can see that the calculations of the cells that involve real values as input result in the real value as output (e.g. 1\u00d71, 1\u00d72, etc.). We can see that where new rows and columns have been inserted by the stride of 2\u00d72, that their 0.0 values multiplied by the 1.0 values in the single 1\u00d71 filter have resulted in 0 values in the output.<\/p>\n<pre class=\"crayon-plain-tag\">[[1 2]\r\n [3 4]]\r\n\r\n_________________________________________________________________\r\nLayer (type)                 Output Shape              Param #\r\n=================================================================\r\nconv2d_transpose_1 (Conv2DTr (None, 4, 4, 1)           2\r\n=================================================================\r\nTotal params: 2\r\nTrainable params: 2\r\nNon-trainable params: 0\r\n_________________________________________________________________\r\n\r\n\r\n[[1. 0. 2. 0.]\r\n [0. 0. 0. 0.]\r\n [3. 0. 4. 0.]\r\n [0. 0. 0. 0.]]<\/pre>\n<p><strong>Remember<\/strong>: this is a contrived case where we artificially specified the model weights so that we could see the effect of the transpose convolutional operation.<\/p>\n<p>In practice, we will use a large number of filters (e.g. 64 or 128), a larger kernel (e.g. 3\u00d73, 5\u00d75, etc.), and the layer will be initialized with random weights that will learn how to effectively upsample with detail during training.<\/p>\n<p>In fact, you might imagine how different sized kernels will result in different sized outputs, more than doubling the width and height of the input. In this case, the \u2018<em>padding<\/em>\u2018 argument of the layer can be set to \u2018<em>same<\/em>\u2018 to force the output to have the desired (doubled) output shape; for example:<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# example of using padding to ensure that the output is only doubled\r\nmodel.add(Conv2DTranspose(1, (3,3), strides=(2,2), padding='same', input_shape=(2, 2, 1)))<\/pre>\n<\/p>\n<h3>Simple Generator Model With the Conv2DTranspose Layer<\/h3>\n<p>The <em>Conv2DTranspose<\/em> is more complex than the <em>UpSample2D<\/em> layer, but it is also effective when used in GAN models, specifically the generator model.<\/p>\n<p>Either approach can be used, although the <em>Conv2DTranspose<\/em> layer is preferred, perhaps because of the simpler generator models and possibly better results, although GAN performance and skill is notoriously difficult to quantify.<\/p>\n<p>We can demonstrate using the <em>Conv2DTranspose<\/em> layer in a generator model with another simple example.<\/p>\n<p>In this case, our little GAN generator model must produce a 10\u00d710 image and take a 100-element vector from the latent space as input, as in the previous <em>UpSample2D<\/em> example.<\/p>\n<p>First, a Dense fully connected layer can be used to interpret the input vector and create a sufficient number of activations (outputs) that can be reshaped into a low-resolution version of our output image, in this case, 128 versions of a 5\u00d75 image.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define model\r\nmodel = Sequential()\r\n# define input shape, output enough activations for for 128 5x5 image\r\nmodel.add(Dense(128 * 5 * 5, input_dim=100))\r\n# reshape vector of activations into 128 feature maps with 5x5\r\nmodel.add(Reshape((5, 5, 128)))<\/pre>\n<p>Next, the 5\u00d75 feature maps can be upsampled to a 10\u00d710 feature map.<\/p>\n<p>We will use a 3\u00d73 kernel size for the single filter, which will result in a slightly larger than doubled width and height in the output feature map (11\u00d711).<\/p>\n<p>Therefore, we will set \u2018<em>padding<\/em>\u2018 to \u2018same\u2019 to ensure the output dimensions are 10\u00d710 as required.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# double input from 128 5x5 to 1 10x10 feature map\r\nmodel.add(Conv2DTranspose(1, (3,3), strides=(2,2), padding='same'))<\/pre>\n<p>Tying this together, the complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># example of using transpose conv in a simple generator model\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Dense\r\nfrom keras.layers import Reshape\r\nfrom keras.layers import Conv2DTranspose\r\nfrom keras.layers import Conv2D\r\n# define model\r\nmodel = Sequential()\r\n# define input shape, output enough activations for for 128 5x5 image\r\nmodel.add(Dense(128 * 5 * 5, input_dim=100))\r\n# reshape vector of activations into 128 feature maps with 5x5\r\nmodel.add(Reshape((5, 5, 128)))\r\n# double input from 128 5x5 to 1 10x10 feature map\r\nmodel.add(Conv2DTranspose(1, (3,3), strides=(2,2), padding='same'))\r\n# summarize model\r\nmodel.summary()<\/pre>\n<p>Running the example creates the model and summarizes the output shape of each layer.<\/p>\n<p>We can see that the Dense layer outputs 3,200 activations that are then reshaped into 128 feature maps with the shape 5\u00d75.<\/p>\n<p>The widths and heights are doubled to 10\u00d710 by the Conv2DTranspose layer resulting in a single feature map with quadruple the area.<\/p>\n<pre class=\"crayon-plain-tag\">_________________________________________________________________\r\nLayer (type)                 Output Shape              Param #\r\n=================================================================\r\ndense_1 (Dense)              (None, 3200)              323200\r\n_________________________________________________________________\r\nreshape_1 (Reshape)          (None, 5, 5, 128)         0\r\n_________________________________________________________________\r\nconv2d_transpose_1 (Conv2DTr (None, 10, 10, 1)         1153\r\n=================================================================\r\nTotal params: 324,353\r\nTrainable params: 324,353\r\nNon-trainable params: 0\r\n_________________________________________________________________<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Papers<\/h3>\n<ul>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1603.07285\">A Guide To Convolution Arithmetic For Deep Learning<\/a>, 2016.<\/li>\n<li><a href=\"https:\/\/ieeexplore.ieee.org\/document\/5539957\">Deconvolutional Networks<\/a>, 2010.<\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1609.07009\">Is The Deconvolution Layer The Same As A Convolutional Layer?<\/a>, 2016.<\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1311.2901\">Visualizing and Understanding Convolutional Networks<\/a>, 2013.<\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1411.4038\">Fully Convolutional Networks for Semantic Segmentation<\/a>, 2014.<\/li>\n<\/ul>\n<h3>API<\/h3>\n<ul>\n<li><a href=\"https:\/\/keras.io\/layers\/convolutional\/\">Keras Convolutional Layers API<\/a><\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/github.com\/vdumoulin\/conv_arithmetic\">Convolution Arithmetic Project, GitHub<\/a>.<\/li>\n<li><a href=\"https:\/\/datascience.stackexchange.com\/questions\/6107\/what-are-deconvolutional-layers\">What are deconvolutional layers?, Data Science Stack Exchange<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to use Upsampling and Transpose Convolutional Layers in Generative Adversarial Networks when generating images.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Generative models in the GAN architecture are required to upsample input data in order to generate an output image.<\/li>\n<li>The Upsampling layer is a simple layer with no weights that will double the dimensions of input and can be used in a generative model when followed by a traditional convolutional layer.<\/li>\n<li>The Transpose Convolutional layer is an inverse convolutional layer that will both upsample input and learn how to fill in details during the model training process.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/upsampling-and-transpose-convolution-layers-for-generative-adversarial-networks\/\">A Gentle Introduction to Upsampling and Transpose Convolution Layers for GANs<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/upsampling-and-transpose-convolution-layers-for-generative-adversarial-networks\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Generative Adversarial Networks, or GANs, are an architecture for training generative models, such as deep convolutional neural networks for generating images. The [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/06\/23\/a-gentle-introduction-to-upsampling-and-transpose-convolution-layers-for-gans\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2290,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2289"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2289"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2289\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2290"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2289"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2289"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2289"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}