{"id":2031,"date":"2019-04-18T19:00:46","date_gmt":"2019-04-18T19:00:46","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/04\/18\/a-gentle-introduction-to-padding-and-stride-for-convolutional-neural-networks\/"},"modified":"2019-04-18T19:00:46","modified_gmt":"2019-04-18T19:00:46","slug":"a-gentle-introduction-to-padding-and-stride-for-convolutional-neural-networks","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/04\/18\/a-gentle-introduction-to-padding-and-stride-for-convolutional-neural-networks\/","title":{"rendered":"A Gentle Introduction to Padding and Stride for Convolutional Neural Networks"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>The convolutional layer in convolutional neural networks systematically applies filters to an input and creates output feature maps.<\/p>\n<p>Although the convolutional layer is very simple, it is capable of achieving sophisticated and impressive results. Nevertheless, it can be challenging to develop an intuition for how the shape of the filters impacts the shape of the output feature map and how related configuration hyperparameters such as padding and stride should be configured.<\/p>\n<p>In this tutorial, you will discover an intuition for filter size, the need for padding, and stride in convolutional neural networks.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>How filter size or kernel size impacts the shape of the output feature map.<\/li>\n<li>How the filter size creates a border effect in the feature map and how it can be overcome with padding.<\/li>\n<li>How the stride of the filter on the input image can be used to downsample the size of the output feature map.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_7451\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7451\" class=\"size-full wp-image-7451\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/04\/A-Gentle-Introduction-to-Padding-and-Stride-for-Convolutional-Neural-Networks.jpg\" alt=\"A Gentle Introduction to Padding and Stride for Convolutional Neural Networks\" width=\"640\" height=\"426\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/04\/A-Gentle-Introduction-to-Padding-and-Stride-for-Convolutional-Neural-Networks.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/04\/A-Gentle-Introduction-to-Padding-and-Stride-for-Convolutional-Neural-Networks-300x200.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p id=\"caption-attachment-7451\" class=\"wp-caption-text\">A Gentle Introduction to Padding and Stride for Convolutional Neural Networks<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/rainriver\/4138418274\/\">Red~Star<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into five parts; they are:<\/p>\n<ol>\n<li>Convolutional Layer<\/li>\n<li>Problem of Border Effects<\/li>\n<li>Effect of Filter Size (Kernel Size)<\/li>\n<li>Fix the Border Effect Problem With Padding<\/li>\n<li>Downsample Input With Stride<\/li>\n<\/ol>\n<div class=\"woo-sc-hr\"><\/div>\n<p><center><\/p>\n<h3>Want Results with Deep Learning for Computer Vision?<\/h3>\n<p>Take my free 7-day email crash course now (with sample code).<\/p>\n<p>Click to sign-up and also get a free PDF Ebook version of the course.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1458ca1e0972a2%3A164f8be4f346dc\/4715926590455808\/\" target=\"_blank\" style=\"background: rgb(255, 206, 10); color: rgb(255, 255, 255); text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-weight: bold; font-size: 16px; line-height: 20px; padding: 10px; display: inline-block; max-width: 300px; border-radius: 5px; text-shadow: rgba(0, 0, 0, 0.25) 0px -1px 1px; box-shadow: rgba(255, 255, 255, 0.5) 0px 1px 3px inset, rgba(0, 0, 0, 0.5) 0px 1px 3px;\" rel=\"noopener noreferrer\">Download Your FREE Mini-Course<\/a><script data-leadbox=\"1458ca1e0972a2:164f8be4f346dc\" data-url=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1458ca1e0972a2%3A164f8be4f346dc\/4715926590455808\/\" data-config=\"%7B%7D\" type=\"text\/javascript\" src=\"https:\/\/machinelearningmastery.lpages.co\/leadbox-1553357564.js\"><\/script><\/p>\n<p><\/center><\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<h2>Convolutional Layer<\/h2>\n<p>In a convolutional neural network, a convolutional layer is responsible for the systematic application of one or more filters to an input.<\/p>\n<p>The multiplication of the filter to the input image results in a single output. The input is typically three-dimensional images (e.g. rows, columns and channels), and in turn, the filters are also three-dimensional with the same number of channels and fewer rows and columns than the input image. As such, the filter is repeatedly applied to each part of the input image, resulting in a two-dimensional output map of activations, called a feature map.<\/p>\n<p>Keras provides an implementation of the convolutional layer called a Conv2D.<\/p>\n<p>It requires that you specify the expected shape of the input images in terms of rows (height), columns (width), and channels (depth) or <em>[rows, columns, channels]<\/em>.<\/p>\n<p>The filter contains the weights that must be learned during the training of the layer. The filter weights represent the structure or feature that the filter will detect and the strength of the activation indicates the degree to which the feature was detected.<\/p>\n<p>The layer requires that both the number of filters be specified and that the shape of the filters be specified.<\/p>\n<p>We can demonstrate this with a small example. In this example, we define a single input image or sample that has one channel and is an eight pixel by eight pixel square with all 0 values and a two-pixel wide vertical line in the center.<\/p>\n<pre class=\"crayon-plain-tag\"># define input data\r\ndata = [[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0]]\r\ndata = asarray(data)\r\ndata = data.reshape(1, 8, 8, 1)<\/pre>\n<p>Next, we can define a model that expects input samples to have the shape (8, 8, 1) and has a single hidden convolutional layer with a single filter with the shape of three pixels by three pixels.<\/p>\n<pre class=\"crayon-plain-tag\"># create model\r\nmodel = Sequential()\r\nmodel.add(Conv2D(1, (3,3), input_shape=(8, 8, 1)))\r\n# summarize model\r\nmodel.summary()<\/pre>\n<p>The filter is initialized with random weights as part of the initialization of the model. We will overwrite the random weights and hard code our own 3\u00d73 filter that will detect vertical lines.<\/p>\n<p>That is the filter will strongly activate when it detects a vertical line and weakly activate when it does not. We expect that by applying this filter across the input image, the output feature map will show that the vertical line was detected.<\/p>\n<pre class=\"crayon-plain-tag\"># define a vertical line detector\r\ndetector = [[[[0]],[[1]],[[0]]],\r\n            [[[0]],[[1]],[[0]]],\r\n            [[[0]],[[1]],[[0]]]]\r\nweights = [asarray(detector), asarray([0.0])]\r\n# store the weights in the model\r\nmodel.set_weights(weights)<\/pre>\n<p>Next, we can apply the filter to our input image by calling the <em>predict()<\/em> function on the model.<\/p>\n<pre class=\"crayon-plain-tag\"># apply filter to input data\r\nyhat = model.predict(data)<\/pre>\n<p>The result is a four-dimensional output with one batch, a given number of rows and columns, and one filter, or <em>[batch, rows, columns, filters]<\/em>.<\/p>\n<p>We can print the activations in the single feature map to confirm that the line was detected.<\/p>\n<pre class=\"crayon-plain-tag\"># enumerate rows\r\nfor r in range(yhat.shape[1]):\r\n\t# print each column in the row\r\n\tprint([yhat[0,r,c,0] for c in range(yhat.shape[2])])<\/pre>\n<p>Tying all of this together, the complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># example of using a single convolutional layer\r\nfrom numpy import asarray\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2D\r\n# define input data\r\ndata = [[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0]]\r\ndata = asarray(data)\r\ndata = data.reshape(1, 8, 8, 1)\r\n# create model\r\nmodel = Sequential()\r\nmodel.add(Conv2D(1, (3,3), input_shape=(8, 8, 1)))\r\n# summarize model\r\nmodel.summary()\r\n# define a vertical line detector\r\ndetector = [[[[0]],[[1]],[[0]]],\r\n            [[[0]],[[1]],[[0]]],\r\n            [[[0]],[[1]],[[0]]]]\r\nweights = [asarray(detector), asarray([0.0])]\r\n# store the weights in the model\r\nmodel.set_weights(weights)\r\n# apply filter to input data\r\nyhat = model.predict(data)\r\n# enumerate rows\r\nfor r in range(yhat.shape[1]):\r\n\t# print each column in the row\r\n\tprint([yhat[0,r,c,0] for c in range(yhat.shape[2])])<\/pre>\n<p>Running the example first summarizes the structure of the model.<\/p>\n<p>Of note is that the single hidden convolutional layer will take the 8\u00d78 pixel input image and will produce a feature map with the dimensions of 6\u00d76. We will go into why this is the case in the next section.<\/p>\n<p>We can also see that the layer has 10 parameters, that is nine weights for the filter (3\u00d73) and one weight for the bias.<\/p>\n<p>Finally, the feature map is printed. We can see from reviewing the numbers in the 6\u00d76 matrix that indeed the manually specified filter detected the vertical line in the middle of our input image.<\/p>\n<pre class=\"crayon-plain-tag\">_________________________________________________________________\r\nLayer (type)                 Output Shape              Param #\r\n=================================================================\r\nconv2d_1 (Conv2D)            (None, 6, 6, 1)           10\r\n=================================================================\r\nTotal params: 10\r\nTrainable params: 10\r\nNon-trainable params: 0\r\n_________________________________________________________________\r\n\r\n\r\n[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]\r\n[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]\r\n[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]\r\n[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]\r\n[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]\r\n[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]<\/pre>\n<\/p>\n<h2>Problem of Border Effects<\/h2>\n<p>In the previous section, we defined a single filter with the size of three pixels high and three pixels wide (rows, columns).<\/p>\n<p>We saw that the application of the 3\u00d73 filter, referred to as the kernel size in Keras, to the 8\u00d78 input image resulted in a feature map with the size of 6\u00d76.<\/p>\n<p>That is, the input image with 64 pixels was reduced to a feature map with 36 pixels. Where did the other 28 pixels go?<\/p>\n<p>The filter is applied systematically to the input image. It starts at the top left corner of the image and is moved from left to right one pixel column at a time until the edge of the filter reaches the edge of the image.<\/p>\n<p>For a 3\u00d73 pixel filter applied to a 8\u00d78 input image, we can see that it can only be applied six times, resulting in the width of six in the output feature map.<\/p>\n<p>For example, let\u2019s work through each of the six patches of the input image (left) dot product (\u201c.\u201d operator) the filter (right):<\/p>\n<pre class=\"crayon-plain-tag\">0, 0, 0   0, 1, 0\r\n0, 0, 0 . 0, 1, 0 = 0\r\n0, 0, 0   0, 1, 0<\/pre>\n<p>Moved right one pixel:<\/p>\n<pre class=\"crayon-plain-tag\">0, 0, 1   0, 1, 0\r\n0, 0, 1 . 0, 1, 0 = 0\r\n0, 0, 1   0, 1, 0<\/pre>\n<p>Moved right one pixel:<\/p>\n<pre class=\"crayon-plain-tag\">0, 1, 1   0, 1, 0\r\n0, 1, 1 . 0, 1, 0 = 3\r\n0, 1, 1   0, 1, 0<\/pre>\n<p>Moved right one pixel:<\/p>\n<pre class=\"crayon-plain-tag\">1, 1, 0   0, 1, 0\r\n1, 1, 0 . 0, 1, 0 = 3\r\n1, 1, 0   0, 1, 0<\/pre>\n<p>Moved right one pixel:<\/p>\n<pre class=\"crayon-plain-tag\">1, 0, 0   0, 1, 0\r\n1, 0, 0 . 0, 1, 0 = 0\r\n1, 0, 0   0, 1, 0<\/pre>\n<p>Moved right one pixel:<\/p>\n<pre class=\"crayon-plain-tag\">0, 0, 0   0, 1, 0\r\n0, 0, 0 . 0, 1, 0 = 0\r\n0, 0, 0   0, 1, 0<\/pre>\n<p>That gives us the first row and each column of the output feature map:<\/p>\n<pre class=\"crayon-plain-tag\">0.0, 0.0, 3.0, 3.0, 0.0, 0.0<\/pre>\n<p>The reduction in the size of the input to the feature map is referred to as border effects. It is caused by the interaction of the filter with the border of the image.<\/p>\n<p>This is often not a problem for large images and small filters but can be a problem with small images. It can also become a problem once a number of convolutional layers are stacked.<\/p>\n<p>For example, below is the same model updated to have two stacked convolutional layers.<\/p>\n<p>This means that a 3\u00d73 filter is applied to the 8\u00d78 input image to result in a 6\u00d76 feature map as in the previous section. A 3\u00d73 filter is then applied to the 6\u00d76 feature map.<\/p>\n<pre class=\"crayon-plain-tag\"># example of stacked convolutional layers\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2D\r\n# create model\r\nmodel = Sequential()\r\nmodel.add(Conv2D(1, (3,3), input_shape=(8, 8, 1)))\r\nmodel.add(Conv2D(1, (3,3)))\r\n# summarize model\r\nmodel.summary()<\/pre>\n<p>Running the example summarizes the shape of the output from each layer.<\/p>\n<p>We can see that the application of filters to the feature map output of the first layer, in turn, results in a smaller 4\u00d74 feature map.<\/p>\n<p>This can become a problem as we develop very deep convolutional neural network models with tens or hundreds of layers. We will simply run out of data in our feature maps upon which to operate.<\/p>\n<pre class=\"crayon-plain-tag\">_________________________________________________________________\r\nLayer (type)                 Output Shape              Param #\r\n=================================================================\r\nconv2d_1 (Conv2D)            (None, 6, 6, 1)           10\r\n_________________________________________________________________\r\nconv2d_2 (Conv2D)            (None, 4, 4, 1)           10\r\n=================================================================\r\nTotal params: 20\r\nTrainable params: 20\r\nNon-trainable params: 0\r\n_________________________________________________________________<\/pre>\n<\/p>\n<h2>Effect of Filter Size (Kernel Size)<\/h2>\n<p>Different sized filters will detect different sized features in the input image and, in turn, will result in differently sized feature maps.<\/p>\n<p>It is common to use 3\u00d73 sized filters, and perhaps 5\u00d75 or even 7\u00d77 sized filters, for larger input images.<\/p>\n<p>For example, below is an example of the model with a single filter updated to use a filter size of 5\u00d75 pixels.<\/p>\n<pre class=\"crayon-plain-tag\"># example of a convolutional layer\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2D\r\n# create model\r\nmodel = Sequential()\r\nmodel.add(Conv2D(1, (5,5), input_shape=(8, 8, 1)))\r\n# summarize model\r\nmodel.summary()<\/pre>\n<p>Running the example demonstrates that the 5\u00d75 filter can only be applied to the 8\u00d78 input image 4 times, resulting in a 4\u00d74 feature map output.<\/p>\n<pre class=\"crayon-plain-tag\">_________________________________________________________________\r\nLayer (type)                 Output Shape              Param #\r\n=================================================================\r\nconv2d_1 (Conv2D)            (None, 4, 4, 1)           26\r\n=================================================================\r\nTotal params: 26\r\nTrainable params: 26\r\nNon-trainable params: 0\r\n_________________________________________________________________<\/pre>\n<p>It may help to further develop the intuition of the relationship between filter size and the output feature map to look at two extreme cases.<\/p>\n<p>The first is a filter with the size of 1\u00d71 pixels.<\/p>\n<pre class=\"crayon-plain-tag\"># example of a convolutional layer\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2D\r\n# create model\r\nmodel = Sequential()\r\nmodel.add(Conv2D(1, (1,1), input_shape=(8, 8, 1)))\r\n# summarize model\r\nmodel.summary()<\/pre>\n<p>Running the example demonstrates that the output feature map has the same size as the input, specifically 8\u00d78. This is because the filter only has a single weight (and a bias).<\/p>\n<pre class=\"crayon-plain-tag\">_________________________________________________________________\r\nLayer (type)                 Output Shape              Param #\r\n=================================================================\r\nconv2d_1 (Conv2D)            (None, 8, 8, 1)           2\r\n=================================================================\r\nTotal params: 2\r\nTrainable params: 2\r\nNon-trainable params: 0\r\n_________________________________________________________________<\/pre>\n<p>The other extreme is a filter with the same size as the input, in this case, 8\u00d78 pixels.<\/p>\n<pre class=\"crayon-plain-tag\"># example of a convolutional layer\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2D\r\n# create model\r\nmodel = Sequential()\r\nmodel.add(Conv2D(1, (8,8), input_shape=(8, 8, 1)))\r\n# summarize model\r\nmodel.summary()<\/pre>\n<p>Running the example, we can see that, as you might expect, there is one weight for each pixel in the input image (64 + 1 for the bias) and that the output is a feature map with a single pixel.<\/p>\n<pre class=\"crayon-plain-tag\">_________________________________________________________________\r\nLayer (type)                 Output Shape              Param #\r\n=================================================================\r\nconv2d_1 (Conv2D)            (None, 1, 1, 1)           65\r\n=================================================================\r\nTotal params: 65\r\nTrainable params: 65\r\nNon-trainable params: 0\r\n_________________________________________________________________<\/pre>\n<p>Now that we are familiar with the effect of filter sizes on the size of the resulting feature map, let\u2019s look at how we can stop losing pixels.<\/p>\n<h2>Fix the Border Effect Problem With Padding<\/h2>\n<p>By default, a filter starts at the left of the image with the left-hand side of the filter sitting on the far left pixels of the image. The filter is then stepped across the image one column at a time until the right-hand side of the filter is sitting on the far right pixels of the image.<\/p>\n<p>An alternative approach to applying a filter to an image is to ensure that each pixel in the image is given an opportunity to be at the center of the filter.<\/p>\n<p>By default, this is not the case, as the pixels on the edge of the input are only ever exposed to the edge of the filter. By starting the filter outside the frame of the image, it gives the pixels on the border of the image more of an opportunity for interacting with the filter, more of an opportunity for features to be detected by the filter, and in turn, an output feature map that has the same shape as the input image.<\/p>\n<p>For example, in the case of applying a 3\u00d73 filter to the 8\u00d78 input image, we can add a border of one pixel around the outside of the image. This has the effect of artificially creating a 10\u00d710 input image. When the 3\u00d73 filter is applied, it results in an 8\u00d78 feature map. The added pixel values could have the value zero value that has no effect with the dot product operation when the filter is applied.<\/p>\n<pre class=\"crayon-plain-tag\">x, x, x   0, 1, 0\r\nx, 0, 0 . 0, 1, 0 = 0\r\nx, 0, 0   0, 1, 0<\/pre>\n<p>The addition of pixels to the edge of the image is called padding.<\/p>\n<p>In Keras, this is specified via the \u201c<em>padding<\/em>\u201d argument on the Conv2D layer, which has the default value of \u2018<em>valid<\/em>\u2018 (no padding). This means that the filter is applied only to valid ways to the input.<\/p>\n<p>The \u2018<em>padding<\/em>\u2018 value of \u2018<em>same<\/em>\u2018 calculates and adds the padding required to the input image (or feature map) to ensure that the output has the same shape as the input.<\/p>\n<p>The example below adds padding to the convolutional layer in our worked example.<\/p>\n<pre class=\"crayon-plain-tag\"># example a convolutional layer with padding\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2D\r\n# create model\r\nmodel = Sequential()\r\nmodel.add(Conv2D(1, (3,3), padding='same', input_shape=(8, 8, 1)))\r\n# summarize model\r\nmodel.summary()<\/pre>\n<p>Running the example demonstrates that the shape of the output feature map is the same as the input image: that the padding had the desired effect.<\/p>\n<pre class=\"crayon-plain-tag\">_________________________________________________________________\r\nLayer (type)                 Output Shape              Param #\r\n=================================================================\r\nconv2d_1 (Conv2D)            (None, 8, 8, 1)           10\r\n=================================================================\r\nTotal params: 10\r\nTrainable params: 10\r\nNon-trainable params: 0\r\n_________________________________________________________________<\/pre>\n<p>The addition of padding allows the development of very deep models in such a way that the feature maps do not dwindle away to nothing.<\/p>\n<p>The example below demonstrates this with three stacked convolutional layers.<\/p>\n<pre class=\"crayon-plain-tag\"># example a deep cnn with padding\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2D\r\n# create model\r\nmodel = Sequential()\r\nmodel.add(Conv2D(1, (3,3), padding='same', input_shape=(8, 8, 1)))\r\nmodel.add(Conv2D(1, (3,3), padding='same'))\r\nmodel.add(Conv2D(1, (3,3), padding='same'))\r\n# summarize model\r\nmodel.summary()<\/pre>\n<p>Running the example, we can see that with the addition of padding, the shape of the output feature maps remains fixed at 8\u00d78 even three layers deep.<\/p>\n<pre class=\"crayon-plain-tag\">_________________________________________________________________\r\nLayer (type)                 Output Shape              Param #\r\n=================================================================\r\nconv2d_1 (Conv2D)            (None, 8, 8, 1)           10\r\n_________________________________________________________________\r\nconv2d_2 (Conv2D)            (None, 8, 8, 1)           10\r\n_________________________________________________________________\r\nconv2d_3 (Conv2D)            (None, 8, 8, 1)           10\r\n=================================================================\r\nTotal params: 30\r\nTrainable params: 30\r\nNon-trainable params: 0\r\n_________________________________________________________________<\/pre>\n<\/p>\n<h2>Downsample Input With Stride<\/h2>\n<p>The filter is moved across the image left to right, top to bottom, with a one-pixel column change on the horizontal movements, then a one-pixel row change on the vertical movements.<\/p>\n<p>The amount of movement between applications of the filter to the input image is referred to as the stride, and it is almost always symmetrical in height and width dimensions.<\/p>\n<p>The default stride or strides in two dimensions is (1,1) for the height and the width movement, performed when needed. And this default works well in most cases.<\/p>\n<p>The stride can be changed, which has an effect both on how the filter is applied to the image and, in turn, the size of the resulting feature map.<\/p>\n<p>For example, the stride can be changed to (2,2). This has the effect of moving the filter two pixels left for each horizontal movement of the filter and two pixels down for each vertical movement of the filter when creating the feature map.<\/p>\n<p>We can demonstrate this with an example using the 8\u00d78 image with a vertical line (left) dot product (\u201c.\u201d operator) with the vertical line filter (right) with a stride of two pixels:<\/p>\n<pre class=\"crayon-plain-tag\">0, 0, 0   0, 1, 0\r\n0, 0, 0 . 0, 1, 0 = 0\r\n0, 0, 0   0, 1, 0<\/pre>\n<p>Moved right two pixels:<\/p>\n<pre class=\"crayon-plain-tag\">0, 1, 1   0, 1, 0\r\n0, 1, 1 . 0, 1, 0 = 3\r\n0, 1, 1   0, 1, 0<\/pre>\n<p>Moved right two pixels:<\/p>\n<pre class=\"crayon-plain-tag\">1, 0, 0   0, 1, 0\r\n1, 0, 0 . 0, 1, 0 = 0\r\n1, 0, 0   0, 1, 0<\/pre>\n<p>We can see that there are only three valid applications of the 3\u00d73 filters to the 8\u00d78 input image with a stride of two. This will be the same in the vertical dimension.<\/p>\n<p>This has the effect of applying the filter in such a way that the normal feature map output (6\u00d76) is down-sampled so that the size of each dimension is reduced by half (3\u00d73), resulting in 1\/4 the number of pixels (36 pixels down to 9).<\/p>\n<p>The stride can be specified in Keras on the <em>Conv2D<\/em> layer via the \u2018<em>stride<\/em>\u2018 argument and specified as a tuple with height and width.<\/p>\n<p>The example demonstrates the application of our manual vertical line filter on the 8\u00d78 input image with a convolutional layer that has a stride of two.<\/p>\n<pre class=\"crayon-plain-tag\"># example of vertical line filter with a stride of 2\r\nfrom numpy import asarray\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2D\r\n# define input data\r\ndata = [[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0],\r\n\t\t[0, 0, 0, 1, 1, 0, 0, 0]]\r\ndata = asarray(data)\r\ndata = data.reshape(1, 8, 8, 1)\r\n# create model\r\nmodel = Sequential()\r\nmodel.add(Conv2D(1, (3,3), strides=(2, 2), input_shape=(8, 8, 1)))\r\n# summarize model\r\nmodel.summary()\r\n# define a vertical line detector\r\ndetector = [[[[0]],[[1]],[[0]]],\r\n            [[[0]],[[1]],[[0]]],\r\n            [[[0]],[[1]],[[0]]]]\r\nweights = [asarray(detector), asarray([0.0])]\r\n# store the weights in the model\r\nmodel.set_weights(weights)\r\n# apply filter to input data\r\nyhat = model.predict(data)\r\n# enumerate rows\r\nfor r in range(yhat.shape[1]):\r\n\t# print each column in the row\r\n\tprint([yhat[0,r,c,0] for c in range(yhat.shape[2])])<\/pre>\n<p>Running the example, we can see from the summary of the model that the shape of the output feature map will be 3\u00d73.<\/p>\n<p>Applying the handcrafted filter to the input image and printing the resulting activation feature map, we can see that, indeed, the filter still detected the vertical line, and can represent this finding with less information.<\/p>\n<p>Downsampling may be desirable in some cases where deeper knowledge of the filters used in the model or of the model architecture allows for some compression in the resulting feature maps.<\/p>\n<pre class=\"crayon-plain-tag\">_________________________________________________________________\r\nLayer (type)                 Output Shape              Param #\r\n=================================================================\r\nconv2d_1 (Conv2D)            (None, 3, 3, 1)           10\r\n=================================================================\r\nTotal params: 10\r\nTrainable params: 10\r\nNon-trainable params: 0\r\n_________________________________________________________________\r\n\r\n\r\n[0.0, 3.0, 0.0]\r\n[0.0, 3.0, 0.0]\r\n[0.0, 3.0, 0.0]<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Posts<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/crash-course-convolutional-neural-networks\/\">Crash Course in Convolutional Neural Networks for Machine Learning<\/a><\/li>\n<\/ul>\n<h3>Books<\/h3>\n<ul>\n<li>Chapter 9: Convolutional Networks, <a href=\"https:\/\/amzn.to\/2Dl124s\">Deep Learning<\/a>, 2016.<\/li>\n<li>Chapter 5: Deep Learning for Computer Vision, <a href=\"https:\/\/amzn.to\/2Dnshvc\">Deep Learning with Python<\/a>, 2017.<\/li>\n<\/ul>\n<h3>API<\/h3>\n<ul>\n<li><a href=\"https:\/\/keras.io\/layers\/convolutional\/\">Keras Convolutional Layers API<\/a><\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered an intuition for filter size, the need for padding, and stride in convolutional neural networks.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>How filter size or kernel size impacts the shape of the output feature map.<\/li>\n<li>How the filter size creates a border effect in the feature map and how it can be overcome with padding.<\/li>\n<li>How the stride of the filter on the input image can be used to downsample the size of the output feature map.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/padding-and-stride-for-convolutional-neural-networks\/\">A Gentle Introduction to Padding and Stride for Convolutional Neural Networks<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/padding-and-stride-for-convolutional-neural-networks\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee The convolutional layer in convolutional neural networks systematically applies filters to an input and creates output feature maps. Although the convolutional layer [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/04\/18\/a-gentle-introduction-to-padding-and-stride-for-convolutional-neural-networks\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2032,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2031"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2031"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2031\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2032"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2031"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2031"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2031"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}