{"id":5840,"date":"2022-08-19T06:28:57","date_gmt":"2022-08-19T06:28:57","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2022\/08\/19\/using-depthwise-separable-convolutions-in-tensorflow\/"},"modified":"2022-08-19T06:28:57","modified_gmt":"2022-08-19T06:28:57","slug":"using-depthwise-separable-convolutions-in-tensorflow","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2022\/08\/19\/using-depthwise-separable-convolutions-in-tensorflow\/","title":{"rendered":"Using Depthwise Separable Convolutions in Tensorflow"},"content":{"rendered":"<p>Author: Zhe Ming Chng<\/p>\n<div>\n<p>Looking at all of the very large convolutional neural networks such as ResNets, VGGs, and the like, it begs the question on how we can make all of these networks smaller with less parameters while still maintaining the same level of accuracy or even improving generalization of the model using a smaller amount of parameters. One approach is depthwise separable convolutions, also known by separable convolutions in TensorFlow and Pytorch (not to be confused with spatially separable convolutions which are also referred to as separable convolutions). Depthwise separable convolutions were introduced by Sifre in \u201cRigid-motion scattering for image classification\u201d and has been adopted by popular model architectures such as MobileNet and a similar version in Xception. It splits the channel and spatial convolutions that are usually combined together in normal convolutional layers<\/p>\n<p>In this tutorial, we\u2019ll be looking at what depthwise separable convolutions are and how we can use them to speed up our convolutional neural network image models.<\/p>\n<p>After completing this tutorial, you will learn:<\/p>\n<ul>\n<li>What is a depthwise, pointwise, and depthwise separable convolution<\/li>\n<li>How to implement depthwise separable convolutions in Tensorflow<\/li>\n<li>Using them as part of our computer vision models<\/li>\n<\/ul>\n<p>Let\u2019s get started!<\/p>\n<div id=\"attachment_13796\" style=\"width: 810px\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-13796\" class=\"size-full wp-image-13796\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/07\/arisa-chattasa-o58Xi32Rnlk-unsplash.jpg\" alt=\"\" width=\"800\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/07\/arisa-chattasa-o58Xi32Rnlk-unsplash.jpg 1920w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/07\/arisa-chattasa-o58Xi32Rnlk-unsplash-300x200.jpg 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/07\/arisa-chattasa-o58Xi32Rnlk-unsplash-1024x684.jpg 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/07\/arisa-chattasa-o58Xi32Rnlk-unsplash-768x513.jpg 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/07\/arisa-chattasa-o58Xi32Rnlk-unsplash-1536x1026.jpg 1536w\" sizes=\"(max-width: 1920px) 100vw, 1920px\"><\/p>\n<p id=\"caption-attachment-13796\" class=\"wp-caption-text\">Using Depthwise Separable Convolutions in Tensorflow<br \/>Photo by <a href=\"https:\/\/unsplash.com\/photos\/o58Xi32Rnlk\">Arisa Chattasa<\/a>. Some rights reserved.<\/p>\n<\/div>\n<h2>Overview<\/h2>\n<p>This tutorial is split into 3 parts:<\/p>\n<ul>\n<li>What is a depthwise separable convolution<\/li>\n<li>Why are they useful<\/li>\n<li>Using depthwise separable convolutions in computer vision model<\/li>\n<\/ul>\n<h2>What is a Depthwise Separable Convolution<\/h2>\n<p>Before diving into depthwise and depthwise separable convolutions, it might be helpful to have a quick recap on convolutions. Convolutions in image processing is a process of applying a kernel over volume, where we do a weighted sum of the pixels with the weights as the values of the kernels. Visually as follows:<\/p>\n<div id=\"attachment_13657\" style=\"width: 799px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/normal_convolution.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13657\" loading=\"lazy\" class=\"wp-image-13657\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/normal_convolution.png\" alt=\"\" width=\"789\" height=\"233\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/normal_convolution.png 1095w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/normal_convolution-300x88.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/normal_convolution-1024x302.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/normal_convolution-768x227.png 768w\" sizes=\"(max-width: 789px) 100vw, 789px\"><\/a><\/p>\n<p id=\"caption-attachment-13657\" class=\"wp-caption-text\">Applying a 3\u00d73 kernel on a 10x10x3 outputs an 8x8x1 volume<\/p>\n<\/div>\n<p>Now, let\u2019s introduce a depthwise convolution. A depthwise convolution is basically a convolution along only one spatial dimension of the image. Visually, this is what a single depthwise convolutional filter would look like and do:<\/p>\n<div id=\"attachment_13658\" style=\"width: 825px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/depthwise_green.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13658\" loading=\"lazy\" class=\"wp-image-13658\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/depthwise_green.png\" alt=\"\" width=\"815\" height=\"230\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/depthwise_green.png 1163w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/depthwise_green-300x85.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/depthwise_green-1024x289.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/depthwise_green-768x217.png 768w\" sizes=\"(max-width: 815px) 100vw, 815px\"><\/a><\/p>\n<p id=\"caption-attachment-13658\" class=\"wp-caption-text\">Applying a depthwise <code>3x3<\/code> kernel on the green channel in this example<\/p>\n<\/div>\n<p>The key difference between a normal convolutional layer and a depthwise convolution is that the depthwise convolution applies the convolution along only one spatial dimension (i.e. channel) while a normal convolution is applied across all spatial dimensions\/channels at each step.<\/p>\n<p>If we look at what an entire depthwise layer does on all RGB channels,<\/p>\n<div id=\"attachment_13659\" style=\"width: 853px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/depthwise_combined.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13659\" loading=\"lazy\" class=\"wp-image-13659\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/depthwise_combined.png\" alt=\"\" width=\"843\" height=\"233\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/depthwise_combined.png 1529w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/depthwise_combined-300x83.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/depthwise_combined-1024x283.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/depthwise_combined-768x212.png 768w\" sizes=\"(max-width: 843px) 100vw, 843px\"><\/a><\/p>\n<p id=\"caption-attachment-13659\" class=\"wp-caption-text\">Applying a depthwise convolutional filter on <code>10x10x3<\/code> input volume outputs <code>8x8x3<\/code> volume<\/p>\n<\/div>\n<p>Notice that since we are applying one convolutional filter for each output channel, the number of output channels is equal to the number of input channels. After applying this depthwise convolutional layer, we then apply a pointwise convolutional layer.<\/p>\n<p>Simply put a pointwise convolutional layer is a regular convolutional layer with a <code>1x1<\/code> kernel (hence looking at a single point across all the channels). Visually, it looks like this:<\/p>\n<div id=\"attachment_13660\" style=\"width: 878px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/pointwise.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13660\" loading=\"lazy\" class=\"wp-image-13660\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/pointwise.png\" alt=\"\" width=\"868\" height=\"206\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/pointwise.png 1309w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/pointwise-300x71.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/pointwise-1024x243.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/pointwise-768x182.png 768w\" sizes=\"(max-width: 868px) 100vw, 868px\"><\/a><\/p>\n<p id=\"caption-attachment-13660\" class=\"wp-caption-text\">Applying a pointwise convolution on a <code>10x10x3<\/code> input volume outputs a <code>10x10x1<\/code> output volume<\/p>\n<\/div>\n<h2>Why are Depthwise Separable Convolutions Useful?<\/h2>\n<p>Now, you might be wondering, what\u2019s the use of doing two operations with the depthwise separable convolutions? Given that the title of this article is to speed up computer vision models, how does doing two operations instead of one help to speed things up?<\/p>\n<p>To answer that question, let\u2019s look at the number of parameters in the model (there would be some additional overhead associated with doing two convolutions instead of one though). Let\u2019s say we wanted to apply 64 convolutional filters to our RGB image to have 64 channels in our output. Number of parameters in normal convolutional layer (including bias term) is $ 3 times 3 times 3 times 64 + 64 = 1792$. On the other hand, using a depthwise separable convolutional layer would only have $(3 times 3 times 1 times 3 + 3) + (1 times 1 times 3 times 64 + 64) = 30 + 256 = 286$\u00a0 parameters, which is a significant reduction, with depthwise separable convolutions having less than 6 times the parameters of the normal convolution.<\/p>\n<p>This can help to reduce the number of computations and parameters, which reduces training\/inference time and can help to regularize our model respectively.<\/p>\n<p>Let\u2019s see this in action. For our inputs, let\u2019s use the CIFAR10 image dataset of <code>32x32x3<\/code> images,<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">import tensorflow.keras as keras\r\nfrom keras.datasets import mnist\r\n\r\n# load dataset\r\n(trainX, trainY), (testX, testY) = keras.datasets.cifar10.load_data()<\/pre>\n<p>Then, we implement a depthwise separable convolution layer. There\u2019s an implementation in Tensorflow but we\u2019ll go into that in the final example.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">class DepthwiseSeparableConv2D(keras.layers.Layer):\r\n  def __init__(self, filters, kernel_size, padding, activation):\r\n    super(DepthwiseSeparableConv2D, self).__init__()\r\n    self.depthwise = DepthwiseConv2D(kernel_size = kernel_size, padding = padding, activation = activation)\r\n    self.pointwise = Conv2D(filters = filters, kernel_size = (1, 1), activation = activation)\r\n\r\n  def call(self, input_tensor):\r\n    x = self.depthwise(input_tensor)\r\n    return self.pointwise(x)<\/pre>\n<p>Constructing a model with using a depthwise separable convolutional layer and looking at the number of parameters,<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">visible = Input(shape=(32, 32, 3))\r\ndepthwise_separable = DepthwiseSeparableConv2D(filters=64, kernel_size=(3,3), padding=\"valid\", activation=\"relu\")(visible)\r\ndepthwise_model = Model(inputs=visible, outputs=depthwise_separable)\r\ndepthwise_model.summary()<\/pre>\n<p>which gives the output<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">_________________________________________________________________\r\n Layer (type)                Output Shape              Param #   \r\n=================================================================\r\n input_15 (InputLayer)       [(None, 32, 32, 3)]       0         \r\n                                                                 \r\n depthwise_separable_conv2d_  (None, 30, 30, 64)       286       \r\n 11 (DepthwiseSeparableConv2                                     \r\n D)                                                              \r\n\r\n=================================================================\r\nTotal params: 286\r\nTrainable params: 286\r\nNon-trainable params: 0\r\n_________________________________________________________________<\/pre>\n<p>which we can compare with a similar model using a regular 2D convolutional layer,<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">normal = Conv2D(filters=64, kernel_size=(3,3), padding=\u201dvalid\u201d, activation=\u201drelu\u201d)(visible)<\/pre>\n<p>which gives the output<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">_________________________________________________________________\r\n Layer (type)                Output Shape              Param #   \r\n=================================================================\r\n input (InputLayer)       [(None, 32, 32, 3)]       0         \r\n                                                                 \r\n conv2d (Conv2D)          (None, 30, 30, 64)        1792      \r\n                                                                 \r\n=================================================================\r\nTotal params: 1,792\r\nTrainable params: 1,792\r\nNon-trainable params: 0\r\n_________________________________________________________________<\/pre>\n<p>That corroborates with our initial calculations on the number of parameters done earlier and shows the reduction in number of parameters that can be achieved by using depthwise separable convolutions.<\/p>\n<p>More specifically, let\u2019s look at the number and size of kernels in a normal convolutional layer and a depthwise separable one. When looking at a regular 2D convolutional layer with $c$ channels as inputs, $w times h$ kernel spatial resolution, and $n$ channels as output, we would need to have $(n, w, h, c)$ parameters, that is $n$ filters, with each filter having a kernel size of $(w, h, c)$. However, this is different for a similar depthwise separable convolution even with the same number of input channels, kernel spatial resolution, and output channels. First, there\u2019s the depthwise convolution which involves $c$ filters, each with a kernel size of $(w, h, 1)$ which outputs $c$ channels since it acts on each filter. This depthwise convolutional layer has $(c, w, h, 1)$ parameters (plus some bias units). Then comes the pointwise convolution which takes in the $c$ channels from the depthwise layer, and outputs $n$ channels, so we have $n$ filters each with a kernel size of $(1, 1, n)$. This pointwise convolutional layer has $(n, 1, 1, n)$ parameters (plus some bias units).<\/p>\n<p>You might be thinking right now, but why do they work?<\/p>\n<p>One way of thinking about it, from the Xception paper by Chollet is that depthwise separable convolutions have the assumption that we can separately map cross-channel and spatial correlations. Given this, there will be bunch of redundant weights in the convolutional layer which we can reduce by separating the convolution into two convolutions of the depthwise and pointwise component. One way of thinking about it for those familiar with linear algebra is how we are able to decompose a matrix into outer product of two vectors when the column vectors in the matrix are multiples of each other.<\/p>\n<h2>Using Depthwise Separable Convolutions in Computer Vision Models<\/h2>\n<p>Now that we\u2019ve seen the reduction in parameters that we can achieve by using a depthwise separable convolution over a normal convolutional filter, let\u2019s see how we can use it in practice with Tensorflow\u2019s <code>SeparableConv2D<\/code> filter.<\/p>\n<p>For this example, we will be using the CIFAR-10 image dataset used in the above example, while for the model we will be using a model built off VGG blocks. The potential of depthwise separable convolutions is in deeper models where the regularization effect is more beneficial to the model and the reduction in parameters is more obvious as opposed to a lighter weight model such as LeNet-5.<\/p>\n<p>Creating our model using VGG blocks using normal convolutional layers,<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">from keras.models import Model\r\nfrom keras.layers import Input, Conv2D, MaxPooling2D, Dense, Flatten, SeparableConv2D\r\nimport tensorflow as tf\r\n\r\n# function for creating a vgg block\r\ndef vgg_block(layer_in, n_filters, n_conv):\r\n\t# add convolutional layers\r\n\tfor _ in range(n_conv):\r\n\t\tlayer_in = Conv2D(filters = n_filters, kernel_size = (3,3), padding='same', activation=\"relu\")(layer_in)\r\n\t# add max pooling layer\r\n\tlayer_in = MaxPooling2D((2,2), strides=(2,2))(layer_in)\r\n\treturn layer_in\r\n\r\nvisible = Input(shape=(32, 32, 3))\r\nlayer = vgg_block(visible, 64, 2)\r\nlayer = vgg_block(layer, 128, 2)\r\nlayer = vgg_block(layer, 256, 2)\r\nlayer = Flatten()(layer)\r\nlayer = Dense(units=10, activation=\"softmax\")(layer)\r\n\r\n# create model\r\nmodel = Model(inputs=visible, outputs=layer)\r\n\r\n# summarize model\r\nmodel.summary()\r\n\r\nmodel.compile(optimizer=\"adam\", loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics=\"acc\")\r\n\r\nhistory = model.fit(x=trainX, y=trainY, batch_size=128, epochs=10, validation_data=(testX, testY))<\/pre>\n<p>Then we look at the results of this 6-layer convolutional neural network with normal convolutional layers,<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">_________________________________________________________________\r\n Layer (type)                Output Shape              Param #   \r\n=================================================================\r\n input_1 (InputLayer)        [(None, 32, 32, 3)]       0         \r\n                                                                 \r\n conv2d (Conv2D)             (None, 32, 32, 64)        1792                                                                       \r\n\r\n conv2d_1 (Conv2D)           (None, 32, 32, 64)        36928     \r\n                                                                 \r\n max_pooling2d (MaxPooling2D  (None, 16, 16, 64)       0         \r\n )                                                               \r\n                                                                 \r\n conv2d_2 (Conv2D)           (None, 16, 16, 128)       73856     \r\n                                                                 \r\n conv2d_3 (Conv2D)           (None, 16, 16, 128)       147584                                                                     \r\n\r\n max_pooling2d_1 (MaxPooling  (None, 8, 8, 128)        0         \r\n 2D)                                                             \r\n                                                                 \r\n conv2d_4 (Conv2D)           (None, 8, 8, 256)         295168    \r\n                                                                 \r\n conv2d_5 (Conv2D)           (None, 8, 8, 256)         590080    \r\n                                                                 \r\n max_pooling2d_2 (MaxPooling  (None, 4, 4, 256)        0         \r\n 2D)                                                             \r\n                                                                 \r\n flatten (Flatten)           (None, 4096)              0         \r\n                                                                 \r\n dense (Dense)               (None, 10)                40970     \r\n                                                                 \r\n=================================================================\r\nTotal params: 1,186,378\r\nTrainable params: 1,186,378\r\nNon-trainable params: 0\r\n_________________________________________________________________\r\nEpoch 1\/10\r\n391\/391 [==============================] - 11s 27ms\/step - loss: 1.7468 - acc: 0.4496 - val_loss: 1.3347 - val_acc: 0.5297\r\nEpoch 2\/10\r\n391\/391 [==============================] - 10s 26ms\/step - loss: 1.0224 - acc: 0.6399 - val_loss: 0.9457 - val_acc: 0.6717\r\nEpoch 3\/10\r\n391\/391 [==============================] - 10s 26ms\/step - loss: 0.7846 - acc: 0.7282 - val_loss: 0.8566 - val_acc: 0.7109\r\nEpoch 4\/10\r\n391\/391 [==============================] - 10s 26ms\/step - loss: 0.6394 - acc: 0.7784 - val_loss: 0.8289 - val_acc: 0.7235\r\nEpoch 5\/10\r\n391\/391 [==============================] - 10s 26ms\/step - loss: 0.5385 - acc: 0.8118 - val_loss: 0.7445 - val_acc: 0.7516\r\nEpoch 6\/10\r\n391\/391 [==============================] - 11s 27ms\/step - loss: 0.4441 - acc: 0.8461 - val_loss: 0.7927 - val_acc: 0.7501\r\nEpoch 7\/10\r\n391\/391 [==============================] - 11s 27ms\/step - loss: 0.3786 - acc: 0.8672 - val_loss: 0.8279 - val_acc: 0.7455\r\nEpoch 8\/10\r\n391\/391 [==============================] - 10s 26ms\/step - loss: 0.3261 - acc: 0.8855 - val_loss: 0.8886 - val_acc: 0.7560\r\nEpoch 9\/10\r\n391\/391 [==============================] - 10s 27ms\/step - loss: 0.2747 - acc: 0.9044 - val_loss: 1.0134 - val_acc: 0.7387\r\nEpoch 10\/10\r\n391\/391 [==============================] - 10s 26ms\/step - loss: 0.2519 - acc: 0.9126 - val_loss: 0.9571 - val_acc: 0.7484<\/pre>\n<p>Let\u2019s try out the same architecture but replace the normal convolutional layers with Keras\u2019 <code>SeparableConv2D<\/code> layers instead:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># depthwise separable VGG block\r\ndef vgg_depthwise_block(layer_in, n_filters, n_conv):\r\n\t# add convolutional layers\r\n\tfor _ in range(n_conv):\r\n\t\tlayer_in = SeparableConv2D(filters = n_filters, kernel_size = (3,3), padding='same', activation='relu')(layer_in)\r\n\t# add max pooling layer\r\n\tlayer_in = MaxPooling2D((2,2), strides=(2,2))(layer_in)\r\n\treturn layer_in\r\n\r\nvisible = Input(shape=(32, 32, 3))\r\nlayer = vgg_depthwise_block(visible, 64, 2)\r\nlayer = vgg_depthwise_block(layer, 128, 2)\r\nlayer = vgg_depthwise_block(layer, 256, 2)\r\nlayer = Flatten()(layer)\r\nlayer = Dense(units=10, activation=\"softmax\")(layer)\r\n# create model\r\nmodel = Model(inputs=visible, outputs=layer)\r\n\r\n# summarize model\r\nmodel.summary()\r\n\r\nmodel.compile(optimizer=\"adam\", loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics=\"acc\")\r\n\r\nhistory_dsconv = model.fit(x=trainX, y=trainY, batch_size=128, epochs=10, validation_data=(testX, testY))<\/pre>\n<p>Running the above code gives us the result:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">_________________________________________________________________\r\n Layer (type)                Output Shape              Param #   \r\n=================================================================\r\n input_1 (InputLayer)        [(None, 32, 32, 3)]       0         \r\n                                                                 \r\n separable_conv2d (Separab  (None, 32, 32, 64)       283       \r\nleConv2D)                                                       \r\n                                                                 \r\n separable_conv2d_2 (Separab  (None, 32, 32, 64)       4736      \r\n leConv2D)                                                       \r\n                                                                 \r\n max_pooling2d (MaxPoolin  (None, 16, 16, 64)       0         \r\n g2D)                                                            \r\n                                                                 \r\n separable_conv2d_3 (Separab  (None, 16, 16, 128)      8896      \r\n leConv2D)                                                       \r\n                                                                 \r\n separable_conv2d_4 (Separab  (None, 16, 16, 128)      17664     \r\n leConv2D)                                                       \r\n                                                                 \r\n max_pooling2d_2 (MaxPoolin  (None, 8, 8, 128)        0         \r\n g2D)                                                            \r\n                                                                 \r\n separable_conv2d_5 (Separa  (None, 8, 8, 256)        34176     \r\n bleConv2D)                                                      \r\n                                                                 \r\n separable_conv2d_6 (Separa  (None, 8, 8, 256)        68096     \r\n bleConv2D)                                                      \r\n                                                                 \r\n max_pooling2d_3 (MaxPoolin  (None, 4, 4, 256)        0         \r\n g2D)                                                            \r\n                                                                 \r\n flatten (Flatten)         (None, 4096)              0         \r\n                                                                 \r\n dense (Dense)             (None, 10)                40970     \r\n                                                                 \r\n=================================================================\r\nTotal params: 174,821\r\nTrainable params: 174,821\r\nNon-trainable params: 0\r\n_________________________________________________________________\r\nEpoch 1\/10\r\n391\/391 [==============================] - 10s 22ms\/step - loss: 1.7578 - acc: 0.3534 - val_loss: 1.4138 - val_acc: 0.4918\r\nEpoch 2\/10\r\n391\/391 [==============================] - 8s 21ms\/step - loss: 1.2712 - acc: 0.5452 - val_loss: 1.1618 - val_acc: 0.5861\r\nEpoch 3\/10\r\n391\/391 [==============================] - 8s 22ms\/step - loss: 1.0560 - acc: 0.6286 - val_loss: 0.9950 - val_acc: 0.6501\r\nEpoch 4\/10\r\n391\/391 [==============================] - 8s 21ms\/step - loss: 0.9175 - acc: 0.6800 - val_loss: 0.9327 - val_acc: 0.6721\r\nEpoch 5\/10\r\n391\/391 [==============================] - 9s 22ms\/step - loss: 0.7939 - acc: 0.7227 - val_loss: 0.8348 - val_acc: 0.7056\r\nEpoch 6\/10\r\n391\/391 [==============================] - 8s 22ms\/step - loss: 0.7120 - acc: 0.7515 - val_loss: 0.8228 - val_acc: 0.7153\r\nEpoch 7\/10\r\n391\/391 [==============================] - 8s 21ms\/step - loss: 0.6346 - acc: 0.7772 - val_loss: 0.7444 - val_acc: 0.7415\r\nEpoch 8\/10\r\n391\/391 [==============================] - 8s 21ms\/step - loss: 0.5534 - acc: 0.8061 - val_loss: 0.7417 - val_acc: 0.7537\r\nEpoch 9\/10\r\n391\/391 [==============================] - 8s 21ms\/step - loss: 0.4865 - acc: 0.8301 - val_loss: 0.7348 - val_acc: 0.7582\r\nEpoch 10\/10\r\n391\/391 [==============================] - 8s 21ms\/step - loss: 0.4321 - acc: 0.8485 - val_loss: 0.7968 - val_acc: 0.7458<\/pre>\n<p>Notice that there are significantly less parameters in the depthwise separable convolution version (~200k vs ~1.2m parameters), along with a slightly lower train time per epoch. Depthwise separable convolutions is more likely to work better on deeper models that might face an overfitting problem and on layers with larger kernels since there is a greater decrease in parameters and computations that would offset the additional computational cost of doing two convolutions instead of one. Next, we plot the train and validation and accuracy of the two models, to see differences in the training performance of the models:<\/p>\n<p>\u00a0<\/p>\n<div id=\"attachment_13662\" style=\"width: 478px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/train_val_accnormal.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13662\" loading=\"lazy\" class=\"wp-image-13662\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/train_val_accnormal.png\" alt=\"\" width=\"468\" height=\"339\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/train_val_accnormal.png 720w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/train_val_accnormal-300x217.png 300w\" sizes=\"(max-width: 468px) 100vw, 468px\"><\/a><\/p>\n<p id=\"caption-attachment-13662\" class=\"wp-caption-text\">Training and validation accuracy of network with normal convolutional layers<\/p>\n<\/div>\n<div id=\"attachment_13661\" style=\"width: 482px\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-13661\" loading=\"lazy\" class=\"wp-image-13661\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/train_val_accdsconv.png\" alt=\"\" width=\"472\" height=\"301\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/train_val_accdsconv.png 946w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/train_val_accdsconv-300x192.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/06\/train_val_accdsconv-768x490.png 768w\" sizes=\"(max-width: 472px) 100vw, 472px\"><\/p>\n<p id=\"caption-attachment-13661\" class=\"wp-caption-text\">Training and validation accuracy of network with depthwise separable convolutional layers<\/p>\n<\/div>\n<p>The highest validation accuracy is similar for both models, but the depthwise separable convolution appears to have less overfitting to the train set, which might help it generalize better to new data.<\/p>\n<p>Combining all the code together for the depthwise separable convolutions version of the model,<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">import tensorflow.keras as keras\r\nfrom keras.datasets import mnist\r\n\r\n# load dataset\r\n(trainX, trainY), (testX, testY) = keras.datasets.cifar10.load_data()\r\n# depthwise separable VGG block\r\ndef vgg_depthwise_block(layer_in, n_filters, n_conv):\r\n\t# add convolutional layers\r\n\tfor _ in range(n_conv):\r\n\t\tlayer_in = SeparableConv2D(filters = n_filters, kernel_size = (3,3), padding='same',activation='relu')(layer_in)\r\n\t# add max pooling layer\r\n\tlayer_in = MaxPooling2D((2,2), strides=(2,2))(layer_in)\r\n\treturn layer_in\r\n\r\nvisible = Input(shape=(32, 32, 3))\r\nlayer = vgg_depthwise_block(visible, 64, 2)\r\nlayer = vgg_depthwise_block(layer, 128, 2)\r\nlayer = vgg_depthwise_block(layer, 256, 2)\r\nlayer = Flatten()(layer)\r\nlayer = Dense(units=10, activation=\"softmax\")(layer)\r\n# create model\r\nmodel = Model(inputs=visible, outputs=layer)\r\n\r\n# summarize model\r\nmodel.summary()\r\n\r\nmodel.compile(optimizer=\"adam\", loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics=\"acc\")\r\n\r\nhistory_dsconv = model.fit(x=trainX, y=trainY, batch_size=128, epochs=10, validation_data=(testX, testY))<\/pre>\n<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<p><strong>Papers:<\/strong><\/p>\n<ul>\n<li>\n<a href=\"https:\/\/www.di.ens.fr\/data\/publications\/papers\/phd_sifre.pdf\">Rigid-Motion Scattering For Image Classification<\/a> (depthwise separable convolutions)<\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1704.04861\">MobileNet<\/a><\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1610.02357\">Xception<\/a><\/li>\n<\/ul>\n<p><strong>APIs:<\/strong><\/p>\n<ul>\n<li>\n<a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/layers\/SeparableConv2D\">Depthwise Separable Convolutions<\/a> in Tensorflow (SeparableConv2D)<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this post, you\u2019ve seen what are depthwise, pointwise, and depthwise separable convolutions. You\u2019ve also seen how using depthwise separable convolutions allows us to get competitive results while using a significantly smaller number of parameters.<\/p>\n<p>Specifically, you\u2019ve learnt:<\/p>\n<ul>\n<li>What is a depthwise, pointwise, and depthwise separable convolution<\/li>\n<li>How to implement depthwise separable convolutions in Tensorflow<\/li>\n<li>Using them as part of our computer vision models<\/li>\n<\/ul>\n<p>\u00a0<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/using-depthwise-separable-convolutions-in-tensorflow\/\">Using Depthwise Separable Convolutions in Tensorflow<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/using-depthwise-separable-convolutions-in-tensorflow\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Zhe Ming Chng Looking at all of the very large convolutional neural networks such as ResNets, VGGs, and the like, it begs the question [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2022\/08\/19\/using-depthwise-separable-convolutions-in-tensorflow\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":5841,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5840"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=5840"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5840\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/5841"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=5840"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=5840"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=5840"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}