{"id":2105,"date":"2019-05-07T19:00:02","date_gmt":"2019-05-07T19:00:02","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/07\/how-to-develop-a-convolutional-neural-network-from-scratch-for-mnist-handwritten-digit-classification\/"},"modified":"2019-05-07T19:00:02","modified_gmt":"2019-05-07T19:00:02","slug":"how-to-develop-a-convolutional-neural-network-from-scratch-for-mnist-handwritten-digit-classification","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/07\/how-to-develop-a-convolutional-neural-network-from-scratch-for-mnist-handwritten-digit-classification\/","title":{"rendered":"How to Develop a Convolutional Neural Network From Scratch for MNIST Handwritten Digit Classification"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>The MNIST handwritten digit classification problem is a standard dataset used in computer vision and deep learning.<\/p>\n<p>Although the dataset is effectively solved, it can be used as the basis for learning and practicing how to develop, evaluate, and use convolutional deep learning neural networks for image classification from scratch. This includes how to develop a robust test harness for estimating the performance of the model, how to explore improvements to the model, and how to save the model and later load it to make predictions on new data.<\/p>\n<p>In this tutorial, you will discover how to develop a convolutional neural network for handwritten digit classification from scratch.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>How to develop a test harness to develop a robust evaluation of a model and establish a baseline of performance for a classification task.<\/li>\n<li>How to explore extensions to a baseline model to improve learning and model capacity.<\/li>\n<li>How to develop a finalized model, evaluate the performance of the final model, and use it to make predictions on new images.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_7563\" style=\"width: 650px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7563\" class=\"size-full wp-image-7563\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/05\/How-to-Develop-a-Convolutional-Neural-Network-From-Scratch-for-MNIST-Handwritten-Digit-Classification.jpg\" alt=\"How to Develop a Convolutional Neural Network From Scratch for MNIST Handwritten Digit Classification\" width=\"640\" height=\"426\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/05\/How-to-Develop-a-Convolutional-Neural-Network-From-Scratch-for-MNIST-Handwritten-Digit-Classification.jpg 640w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/05\/How-to-Develop-a-Convolutional-Neural-Network-From-Scratch-for-MNIST-Handwritten-Digit-Classification-300x200.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\"><\/p>\n<p id=\"caption-attachment-7563\" class=\"wp-caption-text\">How to Develop a Convolutional Neural Network From Scratch for MNIST Handwritten Digit Classification<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/geographyalltheway_photos\/2918451763\/\">Richard Allaway<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into five parts; they are:<\/p>\n<ol>\n<li>MNIST Handwritten Digit Classification Dataset<\/li>\n<li>Model Evaluation Methodology<\/li>\n<li>How to Develop a Baseline Model<\/li>\n<li>How to Develop an Improved Model<\/li>\n<li>How to Finalize the Model and Make Predictions<\/li>\n<\/ol>\n<div class=\"woo-sc-hr\"><\/div>\n<p><center><\/p>\n<h3>Want Results with Deep Learning for Computer Vision?<\/h3>\n<p>Take my free 7-day email crash course now (with sample code).<\/p>\n<p>Click to sign-up and also get a free PDF Ebook version of the course.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1458ca1e0972a2%3A164f8be4f346dc\/4715926590455808\/\" target=\"_blank\" style=\"background: rgb(255, 206, 10); color: rgb(255, 255, 255); text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-weight: bold; font-size: 16px; line-height: 20px; padding: 10px; display: inline-block; max-width: 300px; border-radius: 5px; text-shadow: rgba(0, 0, 0, 0.25) 0px -1px 1px; box-shadow: rgba(255, 255, 255, 0.5) 0px 1px 3px inset, rgba(0, 0, 0, 0.5) 0px 1px 3px;\" rel=\"noopener noreferrer\">Download Your FREE Mini-Course<\/a><script data-leadbox=\"1458ca1e0972a2:164f8be4f346dc\" data-url=\"https:\/\/machinelearningmastery.lpages.co\/leadbox\/1458ca1e0972a2%3A164f8be4f346dc\/4715926590455808\/\" data-config=\"%7B%7D\" type=\"text\/javascript\" src=\"https:\/\/machinelearningmastery.lpages.co\/leadbox-1553357564.js\"><\/script><\/p>\n<p><\/center><\/p>\n<div class=\"woo-sc-hr\"><\/div>\n<h2>MNIST Handwritten Digit Classification Dataset<\/h2>\n<p>The <a href=\"https:\/\/en.wikipedia.org\/wiki\/MNIST_database\">MNIST dataset<\/a> is an acronym that stands for the Modified National Institute of Standards and Technology dataset.<\/p>\n<p>It is a dataset of 60,000 small square 28\u00d728 pixel grayscale images of handwritten single digits between 0 and 9.<\/p>\n<p>The task is to classify a given image of a handwritten digit into one of 10 classes representing integer values from 0 to 9, inclusively.<\/p>\n<p>It is a widely used and deeply understood dataset and, for the most part, is \u201c<em>solved<\/em>.\u201d Top-performing models are deep learning convolutional neural networks that achieve a classification accuracy of above 99%, with an error rate between 0.4 %and 0.2% on the hold out test dataset.<\/p>\n<p>The example below loads the MNIST dataset using the Keras API and creates a plot of the first nine images in the training dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># example of loading the mnist dataset\r\nfrom keras.datasets import mnist\r\nfrom matplotlib import pyplot\r\n# load dataset\r\n(trainX, trainy), (testX, testy) = mnist.load_data()\r\n# summarize loaded dataset\r\nprint('Train: X=%s, y=%s' % (trainX.shape, trainy.shape))\r\nprint('Test: X=%s, y=%s' % (testX.shape, testy.shape))\r\n# plot first few images\r\nfor i in range(9):\r\n\t# define subplot\r\n\tpyplot.subplot(330 + 1 + i)\r\n\t# plot raw pixel data\r\n\tpyplot.imshow(trainX[i], cmap=pyplot.get_cmap('gray'))\r\n# show the figure\r\npyplot.show()<\/pre>\n<p>Running the example loads the MNIST train and test dataset and prints their shape.<\/p>\n<p>We can see that there are 60,000 examples in the training dataset and 10,000 in the test dataset and that images are indeed square with 28\u00d728 pixels.<\/p>\n<pre class=\"crayon-plain-tag\">Train: X=(60000, 28, 28), y=(60000,)Test: X=(10000, 28, 28), y=(10000,)<\/pre>\n<p>A plot of the first nine images in the dataset is also created showing the natural handwritten nature of the images to be classified.<\/p>\n<div id=\"attachment_7555\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7555\" class=\"size-large wp-image-7555\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/02\/Plot-of-a-Subset-of-Images-from-the-MNIST-Dataset-1024x768.png\" alt=\"Plot of a Subset of Images From the MNIST Dataset\" width=\"1024\" height=\"768\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Plot-of-a-Subset-of-Images-from-the-MNIST-Dataset-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Plot-of-a-Subset-of-Images-from-the-MNIST-Dataset-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Plot-of-a-Subset-of-Images-from-the-MNIST-Dataset-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Plot-of-a-Subset-of-Images-from-the-MNIST-Dataset.png 1280w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><\/p>\n<p id=\"caption-attachment-7555\" class=\"wp-caption-text\">Plot of a Subset of Images From the MNIST Dataset<\/p>\n<\/div>\n<h2>Model Evaluation Methodology<\/h2>\n<p>Although the MNIST dataset is effectively solved, it can be a useful starting point for developing and practicing a methodology for solving image classification tasks using convolutional neural networks.<\/p>\n<p>Instead of reviewing the literature on well-performing models on the dataset, we can develop a new model from scratch.<\/p>\n<p>The dataset already has a well-defined train and test dataset that we can use.<\/p>\n<p>In order to estimate the performance of a model for a given training run, we can further split the training set into a train and validation dataset. Performance on the train and validation dataset over each run can then be plotted to provide learning curves and insight into how well a model is learning the problem.<\/p>\n<p>The Keras API supports this by specifying the \u201c<em>validation_data<\/em>\u201d argument to the <em>model.fit()<\/em> function when training the model, that will, in turn, return an object that describes model performance for the chosen loss and metrics on each training epoch.<\/p>\n<pre class=\"crayon-plain-tag\"># record model performance on a validation dataset during training\r\nhistory = model.fit(..., validation_data=(valX, valY))<\/pre>\n<p>In order to estimate the performance of a model on the problem in general, we can use <a href=\"https:\/\/machinelearningmastery.com\/k-fold-cross-validation\/\">k-fold cross-validation<\/a>, perhaps five-fold cross-validation. This will give some account of the models variance with both respect to differences in the training and test datasets, and in terms of the stochastic nature of the learning algorithm. The performance of a model can be taken as the mean performance across k-folds, given the standard deviation, that could be used to estimate a confidence interval if desired.<\/p>\n<p>We can use the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.KFold.html\">KFold class<\/a> from the scikit-learn API to implement the k-fold cross-validation evaluation of a given neural network model. There are many ways to achieve this, although we can choose a flexible approach where the <em>KFold<\/em> class is only used to specify the row indexes used for each spit.<\/p>\n<pre class=\"crayon-plain-tag\"># example of k-fold cv for a neural net\r\ndata = ...\r\nmodel = ...\r\n# prepare cross validation\r\nkfold = KFold(5, shuffle=True, random_state=1)\r\n# enumerate splits\r\nfor train_ix, test_ix in kfold.split(data):\r\n\t...<\/pre>\n<p>We will hold back the actual test dataset and use it as an evaluation of our final model.<\/p>\n<h2>How to Develop a Baseline Model<\/h2>\n<p>The first step is to develop a baseline model.<\/p>\n<p>This is critical as it both involves developing the infrastructure for the test harness so that any model we design can be evaluated on the dataset, and it establishes a baseline in model performance on the problem, by which all improvements can be compared.<\/p>\n<p>The design of the test harness is modular, and we can develop a separate function for each piece. This allows a given aspect of the test harness to be modified or inter-changed, if we desire, separately from the rest.<\/p>\n<p>We can develop this test harness with five key elements. They are the loading of the dataset, the preparation of the dataset, the definition of the model, the evaluation of the model, and the presentation of results.<\/p>\n<h3>Load Dataset<\/h3>\n<p>We know some things about the dataset.<\/p>\n<p>For example, we know that the images are all pre-aligned (e.g. each image only contains a hand-drawn digit), that the images all have the same square size of 28\u00d728 pixels, and that the images are grayscale.<\/p>\n<p>Therefore, we can load the images and reshape the data arrays to have a single color channel.<\/p>\n<pre class=\"crayon-plain-tag\"># load dataset\r\n(trainX, trainY), (testX, testY) = mnist.load_data()\r\n# reshape dataset to have a single channel\r\ntrainX = trainX.reshape((trainX.shape[0], 28, 28, 1))\r\ntestX = testX.reshape((testX.shape[0], 28, 28, 1))<\/pre>\n<p>We also know that there are 10 classes and that classes are represented as unique integers.<\/p>\n<p>We can, therefore, use a one hot encoding for the class element of each sample, transforming the integer into a 10 element binary vector with a 1 for the index of the class value, and 0 values for all other classes. We can achieve this with the <em>to_categorical()<\/em> utility function.<\/p>\n<pre class=\"crayon-plain-tag\"># one hot encode target values\r\ntrainY = to_categorical(trainY)\r\ntestY = to_categorical(testY)<\/pre>\n<p>The <em>load_dataset()<\/em> function implements these behaviors and can be used to load the dataset.<\/p>\n<pre class=\"crayon-plain-tag\"># load train and test dataset\r\ndef load_dataset():\r\n\t# load dataset\r\n\t(trainX, trainY), (testX, testY) = mnist.load_data()\r\n\t# reshape dataset to have a single channel\r\n\ttrainX = trainX.reshape((trainX.shape[0], 28, 28, 1))\r\n\ttestX = testX.reshape((testX.shape[0], 28, 28, 1))\r\n\t# one hot encode target values\r\n\ttrainY = to_categorical(trainY)\r\n\ttestY = to_categorical(testY)\r\n\treturn trainX, trainY, testX, testY<\/pre>\n<\/p>\n<h3>Prepare Pixel Data<\/h3>\n<p>We know that the pixel values for each image in the dataset are unsigned integers in the range between black and white, or 0 and 255.<\/p>\n<p>We do not know the best way to scale the pixel values for modeling, but we know that some scaling will be required.<\/p>\n<p>A good starting point is to normalize the pixel values of grayscale images, e.g. rescale them to the range [0,1]. This involves first converting the data type from unsigned integers to floats, then dividing the pixel values by the maximum value.<\/p>\n<pre class=\"crayon-plain-tag\"># convert from integers to floats\r\ntrain_norm = train.astype('float32')\r\ntest_norm = test.astype('float32')\r\n# normalize to range 0-1\r\ntrain_norm = train_norm \/ 255.0\r\ntest_norm = test_norm \/ 255.0<\/pre>\n<p>The <em>prep_pixels()<\/em> function below implements these behaviors and is provided with the pixel values for both the train and test datasets that will need to be scaled.<\/p>\n<pre class=\"crayon-plain-tag\"># scale pixels\r\ndef prep_pixels(train, test):\r\n\t# convert from integers to floats\r\n\ttrain_norm = train.astype('float32')\r\n\ttest_norm = test.astype('float32')\r\n\t# normalize to range 0-1\r\n\ttrain_norm = train_norm \/ 255.0\r\n\ttest_norm = test_norm \/ 255.0\r\n\t# return normalized images\r\n\treturn train_norm, test_norm<\/pre>\n<p>This function must be called to prepare the pixel values prior to any modeling.<\/p>\n<h3>Define Model<\/h3>\n<p>Next, we need to define a baseline convolutional neural network model for the problem.<\/p>\n<p>The model has two main aspects: the feature extraction front end comprised of convolutional and pooling layers, and the classifier backend that will make a prediction.<\/p>\n<p>For the convolutional front-end, we can start with a single convolutional layer with a small filter size (3,3) and a modest number of filters (32) followed by a max pooling layer. The filter maps can then be flattened to provide features to the classifier.<\/p>\n<p>Given that the problem is a multi-class classification task, we know that we will require an output layer with 10 nodes in order to predict the probability distribution of an image belonging to each of the 10 classes. This will also require the use of a softmax activation function. Between the feature extractor and the output layer, we can add a dense layer to interpret the features, in this case with 100 nodes.<\/p>\n<p>All layers will use the ReLU activation function and the He weight initialization scheme, both best practices.<\/p>\n<p>We will use a conservative configuration for the stochastic gradient descent optimizer with a learning rate of 0.01 and a momentum of 0.9. The categorical cross-entropy loss function will be optimized, suitable for multi-class classification, and we will monitor the classification accuracy metric, which is appropriate given we have the same number of examples in each of the 10 classes.<\/p>\n<p>The <em>define_model()<\/em> function below will define and return this model.<\/p>\n<pre class=\"crayon-plain-tag\"># define cnn model\r\ndef define_model():\r\n\tmodel = Sequential()\r\n\tmodel.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Flatten())\r\n\tmodel.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(10, activation='softmax'))\r\n\t# compile model\r\n\topt = SGD(lr=0.01, momentum=0.9)\r\n\tmodel.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])\r\n\treturn model<\/pre>\n<\/p>\n<h3>Evaluate Model<\/h3>\n<p>After the model is defined, we need to evaluate it.<\/p>\n<p>The model will be evaluated using five-fold cross-validation. The value of <em>k=5<\/em> was chosen to provide a baseline for both repeated evaluation and to not be so large as to require a long running time. Each test set will be 20% of the training dataset, or about 12,000 examples, close to the size of the actual test set for this problem.<\/p>\n<p>The training dataset is shuffled prior to being split, and the sample shuffling is performed each time, so that any model we evaluate will have the same train and test datasets in each fold, providing an apples-to-apples comparison between models.<\/p>\n<p>We will train the baseline model for a modest 10 training epochs with a default batch size of 32 examples. The test set for each fold will be used to evaluate the model both during each epoch of the training run, so that we can later create learning curves, and at the end of the run, so that we can estimate the performance of the model. As such, we will keep track of the resulting history from each run, as well as the classification accuracy of the fold.<\/p>\n<p>The <em>evaluate_model()<\/em> function below implements these behaviors, taking the defined model and training dataset as arguments and returning a list of accuracy scores and training histories that can be later summarized.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate a model using k-fold cross-validation\r\ndef evaluate_model(model, dataX, dataY, n_folds=5):\r\n\tscores, histories = list(), list()\r\n\t# prepare cross validation\r\n\tkfold = KFold(n_folds, shuffle=True, random_state=1)\r\n\t# enumerate splits\r\n\tfor train_ix, test_ix in kfold.split(dataX):\r\n\t\t# select rows for train and test\r\n\t\ttrainX, trainY, testX, testY = dataX[train_ix], dataY[train_ix], dataX[test_ix], dataY[test_ix]\r\n\t\t# fit model\r\n\t\thistory = model.fit(trainX, trainY, epochs=10, batch_size=32, validation_data=(testX, testY), verbose=0)\r\n\t\t# evaluate model\r\n\t\t_, acc = model.evaluate(testX, testY, verbose=0)\r\n\t\tprint('> %.3f' % (acc * 100.0))\r\n\t\t# stores scores\r\n\t\tscores.append(acc)\r\n\t\thistories.append(history)\r\n\treturn scores, histories<\/pre>\n<\/p>\n<h3>Present Results<\/h3>\n<p>Once the model has been evaluated, we can present the results.<\/p>\n<p>There are two key aspects to present: the diagnostics of the learning behavior of the model during training and the estimation of the model performance. These can be implemented using separate functions.<\/p>\n<p>First, the diagnostics involve creating a line plot showing model performance on the train and test set during each fold of the k-fold cross-validation. These plots are valuable for getting an idea of whether a model is overfitting, underfitting, or has a good fit for the dataset.<\/p>\n<p>We will create a single figure with two subplots, one for loss and one for accuracy. Blue lines will indicate model performance on the training dataset and orange lines will indicate performance on the hold out test dataset. The <em>summarize_diagnostics()<\/em> function below creates and shows this plot given the collected training histories.<\/p>\n<pre class=\"crayon-plain-tag\"># plot diagnostic learning curves\r\ndef summarize_diagnostics(histories):\r\n\tfor i in range(len(histories)):\r\n\t\t# plot loss\r\n\t\tpyplot.subplot(211)\r\n\t\tpyplot.title('Cross Entropy Loss')\r\n\t\tpyplot.plot(histories[i].history['loss'], color='blue', label='train')\r\n\t\tpyplot.plot(histories[i].history['val_loss'], color='orange', label='test')\r\n\t\t# plot accuracy\r\n\t\tpyplot.subplot(212)\r\n\t\tpyplot.title('Classification Accuracy')\r\n\t\tpyplot.plot(histories[i].history['acc'], color='blue', label='train')\r\n\t\tpyplot.plot(histories[i].history['val_acc'], color='orange', label='test')\r\n\tpyplot.show()<\/pre>\n<p>Next, the classification accuracy scores collected during each fold can be summarized by calculating the mean and standard deviation. This provides an estimate of the average expected performance of the model trained on this dataset, with an estimate of the average variance in the mean. We will also summarize the distribution of scores by creating and showing a box and whisker plot.<\/p>\n<p>The <em>summarize_performance()<\/em> function below implements this for a given list of scores collected during model evaluation.<\/p>\n<pre class=\"crayon-plain-tag\"># summarize model performance\r\ndef summarize_performance(scores):\r\n\t# print summary\r\n\tprint('Accuracy: mean=%.3f std=%.3f, n=%d' % (mean(scores)*100, std(scores)*100, len(scores)))\r\n\t# box and whisker plots of results\r\n\tpyplot.boxplot(scores)\r\n\tpyplot.show()<\/pre>\n<\/p>\n<h3>Complete Example<\/h3>\n<p>We need a function that will drive the test harness.<\/p>\n<p>This involves calling all of the define functions.<\/p>\n<pre class=\"crayon-plain-tag\"># run the test harness for evaluating a model\r\ndef run_test_harness():\r\n\t# load dataset\r\n\ttrainX, trainY, testX, testY = load_dataset()\r\n\t# prepare pixel data\r\n\ttrainX, testX = prep_pixels(trainX, testX)\r\n\t# define model\r\n\tmodel = define_model()\r\n\t# evaluate model\r\n\tscores, histories = evaluate_model(model, trainX, trainY)\r\n\t# learning curves\r\n\tsummarize_diagnostics(histories)\r\n\t# summarize estimated performance\r\n\tsummarize_performance(scores)<\/pre>\n<p>We now have everything we need; the complete code example for a baseline convolutional neural network model on the MNIST dataset is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># baseline cnn model for mnist\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom matplotlib import pyplot\r\nfrom sklearn.model_selection import KFold\r\nfrom keras.datasets import mnist\r\nfrom keras.utils import to_categorical\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2D\r\nfrom keras.layers import MaxPooling2D\r\nfrom keras.layers import Dense\r\nfrom keras.layers import Flatten\r\nfrom keras.optimizers import SGD\r\n\r\n# load train and test dataset\r\ndef load_dataset():\r\n\t# load dataset\r\n\t(trainX, trainY), (testX, testY) = mnist.load_data()\r\n\t# reshape dataset to have a single channel\r\n\ttrainX = trainX.reshape((trainX.shape[0], 28, 28, 1))\r\n\ttestX = testX.reshape((testX.shape[0], 28, 28, 1))\r\n\t# one hot encode target values\r\n\ttrainY = to_categorical(trainY)\r\n\ttestY = to_categorical(testY)\r\n\treturn trainX, trainY, testX, testY\r\n\r\n# scale pixels\r\ndef prep_pixels(train, test):\r\n\t# convert from integers to floats\r\n\ttrain_norm = train.astype('float32')\r\n\ttest_norm = test.astype('float32')\r\n\t# normalize to range 0-1\r\n\ttrain_norm = train_norm \/ 255.0\r\n\ttest_norm = test_norm \/ 255.0\r\n\t# return normalized images\r\n\treturn train_norm, test_norm\r\n\r\n# define cnn model\r\ndef define_model():\r\n\tmodel = Sequential()\r\n\tmodel.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Flatten())\r\n\tmodel.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(10, activation='softmax'))\r\n\t# compile model\r\n\topt = SGD(lr=0.01, momentum=0.9)\r\n\tmodel.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])\r\n\treturn model\r\n\r\n# evaluate a model using k-fold cross-validation\r\ndef evaluate_model(model, dataX, dataY, n_folds=5):\r\n\tscores, histories = list(), list()\r\n\t# prepare cross validation\r\n\tkfold = KFold(n_folds, shuffle=True, random_state=1)\r\n\t# enumerate splits\r\n\tfor train_ix, test_ix in kfold.split(dataX):\r\n\t\t# select rows for train and test\r\n\t\ttrainX, trainY, testX, testY = dataX[train_ix], dataY[train_ix], dataX[test_ix], dataY[test_ix]\r\n\t\t# fit model\r\n\t\thistory = model.fit(trainX, trainY, epochs=10, batch_size=32, validation_data=(testX, testY), verbose=0)\r\n\t\t# evaluate model\r\n\t\t_, acc = model.evaluate(testX, testY, verbose=0)\r\n\t\tprint('> %.3f' % (acc * 100.0))\r\n\t\t# stores scores\r\n\t\tscores.append(acc)\r\n\t\thistories.append(history)\r\n\treturn scores, histories\r\n\r\n# plot diagnostic learning curves\r\ndef summarize_diagnostics(histories):\r\n\tfor i in range(len(histories)):\r\n\t\t# plot loss\r\n\t\tpyplot.subplot(211)\r\n\t\tpyplot.title('Cross Entropy Loss')\r\n\t\tpyplot.plot(histories[i].history['loss'], color='blue', label='train')\r\n\t\tpyplot.plot(histories[i].history['val_loss'], color='orange', label='test')\r\n\t\t# plot accuracy\r\n\t\tpyplot.subplot(212)\r\n\t\tpyplot.title('Classification Accuracy')\r\n\t\tpyplot.plot(histories[i].history['acc'], color='blue', label='train')\r\n\t\tpyplot.plot(histories[i].history['val_acc'], color='orange', label='test')\r\n\tpyplot.show()\r\n\r\n# summarize model performance\r\ndef summarize_performance(scores):\r\n\t# print summary\r\n\tprint('Accuracy: mean=%.3f std=%.3f, n=%d' % (mean(scores)*100, std(scores)*100, len(scores)))\r\n\t# box and whisker plots of results\r\n\tpyplot.boxplot(scores)\r\n\tpyplot.show()\r\n\r\n# run the test harness for evaluating a model\r\ndef run_test_harness():\r\n\t# load dataset\r\n\ttrainX, trainY, testX, testY = load_dataset()\r\n\t# prepare pixel data\r\n\ttrainX, testX = prep_pixels(trainX, testX)\r\n\t# define model\r\n\tmodel = define_model()\r\n\t# evaluate model\r\n\tscores, histories = evaluate_model(model, trainX, trainY)\r\n\t# learning curves\r\n\tsummarize_diagnostics(histories)\r\n\t# summarize estimated performance\r\n\tsummarize_performance(scores)\r\n\r\n# entry point, run the test harness\r\nrun_test_harness()<\/pre>\n<p>Running the example prints the classification accuracy for each fold of the cross-validation process. This is helpful to get an idea that the model evaluation is progressing.<\/p>\n<p>We can see two cases where the model achieves perfect skill and one case where it achieved lower than 99% accuracy. These are good results.<\/p>\n<pre class=\"crayon-plain-tag\">> 98.558\r\n> 99.842\r\n> 99.992\r\n> 100.000\r\n> 100.000<\/pre>\n<p>Next, a diagnostic plot is shown, giving insight into the learning behavior of the model across each fold.<\/p>\n<p>In this case, we can see that the model generally achieves a good fit, with train and test learning curves converging. There is no obvious sign of over- or underfitting.<\/p>\n<div id=\"attachment_7556\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7556\" class=\"size-large wp-image-7556\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/02\/Loss-and-Accuracy-Learning-Curves-for-the-Baseline-Model-During-k-Fold-Cross-Validation-1024x768.png\" alt=\"Loss and Accuracy Learning Curves for the Baseline Model During k-Fold Cross-Validation\" width=\"1024\" height=\"768\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Loss-and-Accuracy-Learning-Curves-for-the-Baseline-Model-During-k-Fold-Cross-Validation-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Loss-and-Accuracy-Learning-Curves-for-the-Baseline-Model-During-k-Fold-Cross-Validation-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Loss-and-Accuracy-Learning-Curves-for-the-Baseline-Model-During-k-Fold-Cross-Validation-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Loss-and-Accuracy-Learning-Curves-for-the-Baseline-Model-During-k-Fold-Cross-Validation.png 1280w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><\/p>\n<p id=\"caption-attachment-7556\" class=\"wp-caption-text\">Loss and Accuracy Learning Curves for the Baseline Model During k-Fold Cross-Validation<\/p>\n<\/div>\n<p>Next, a summary of the model performance is calculated. We can see in this case, the model has an estimated skill of about 99.6%, which is impressive, although it has a high standard deviation of about half a percent.<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: mean=99.678 std=0.563, n=5<\/pre>\n<p>Finally, a box and whisker plot is created to summarize the distribution of accuracy scores.<\/p>\n<div id=\"attachment_7557\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7557\" class=\"size-large wp-image-7557\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/02\/Box-and-Whisker-Plot-of-Accuracy-Scores-for-the-Baseline-Model-Evaluated-Using-k-Fold-Cross-Validation-1024x768.png\" alt=\"Box and Whisker Plot of Accuracy Scores for the Baseline Model Evaluated Using k-Fold Cross-Validation\" width=\"1024\" height=\"768\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Box-and-Whisker-Plot-of-Accuracy-Scores-for-the-Baseline-Model-Evaluated-Using-k-Fold-Cross-Validation-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Box-and-Whisker-Plot-of-Accuracy-Scores-for-the-Baseline-Model-Evaluated-Using-k-Fold-Cross-Validation-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Box-and-Whisker-Plot-of-Accuracy-Scores-for-the-Baseline-Model-Evaluated-Using-k-Fold-Cross-Validation-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Box-and-Whisker-Plot-of-Accuracy-Scores-for-the-Baseline-Model-Evaluated-Using-k-Fold-Cross-Validation.png 1280w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><\/p>\n<p id=\"caption-attachment-7557\" class=\"wp-caption-text\">Box and Whisker Plot of Accuracy Scores for the Baseline Model Evaluated Using k-Fold Cross-Validation<\/p>\n<\/div>\n<p>As we would expect, the distribution is tight, above 99.8% accuracy, with one outlier result.<\/p>\n<p>We now have a robust test harness and a well-performing baseline model.<\/p>\n<h2>How to Develop an Improved Model<\/h2>\n<p>There are many ways that we might explore improvements to the baseline model.<\/p>\n<p>We will look at areas of model configuration that often result in an improvement, so-called low-hanging fruit. The first is a change to the learning algorithm, and the second is an increase in the depth of the model.<\/p>\n<h3>Improvement to Learning<\/h3>\n<p>There are many aspects of the learning algorithm that can be explored for improvement.<\/p>\n<p>Perhaps the point of biggest leverage is the learning rate, such as evaluating the impact that smaller or larger values of the learning rate may have, as well as schedules that change the learning rate during training.<\/p>\n<p>Another approach that can rapidly accelerate the learning of a model and can result in large performance improvements is batch normalization. We will evaluate the effect that batch normalization has on our baseline model.<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/how-to-accelerate-learning-of-deep-neural-networks-with-batch-normalization\/\">Batch normalization<\/a> can be used after convolutional and fully connected layers. It has the effect of changing the distribution of the output of the layer, specifically by standardizing the outputs. This has the effect of stabilizing and accelerating the learning process.<\/p>\n<p>We can update the model definition to use batch normalization after the activation function for the convolutional and dense layers of our baseline model. The updated version of <em>define_model()<\/em> function with batch normalization is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># define cnn model\r\ndef define_model():\r\n\tmodel = Sequential()\r\n\tmodel.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))\r\n\tmodel.add(BatchNormalization())\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Flatten())\r\n\tmodel.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(BatchNormalization())\r\n\tmodel.add(Dense(10, activation='softmax'))\r\n\t# compile model\r\n\topt = SGD(lr=0.01, momentum=0.9)\r\n\tmodel.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])\r\n\treturn model<\/pre>\n<p>The complete code listing with this change is provided below.<\/p>\n<pre class=\"crayon-plain-tag\"># cnn model with batch normalization for mnist\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom matplotlib import pyplot\r\nfrom sklearn.model_selection import KFold\r\nfrom keras.datasets import mnist\r\nfrom keras.utils import to_categorical\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2D\r\nfrom keras.layers import MaxPooling2D\r\nfrom keras.layers import Dense\r\nfrom keras.layers import Flatten\r\nfrom keras.optimizers import SGD\r\nfrom keras.layers import BatchNormalization\r\n\r\n# load train and test dataset\r\ndef load_dataset():\r\n\t# load dataset\r\n\t(trainX, trainY), (testX, testY) = mnist.load_data()\r\n\t# reshape dataset to have a single channel\r\n\ttrainX = trainX.reshape((trainX.shape[0], 28, 28, 1))\r\n\ttestX = testX.reshape((testX.shape[0], 28, 28, 1))\r\n\t# one hot encode target values\r\n\ttrainY = to_categorical(trainY)\r\n\ttestY = to_categorical(testY)\r\n\treturn trainX, trainY, testX, testY\r\n\r\n# scale pixels\r\ndef prep_pixels(train, test):\r\n\t# convert from integers to floats\r\n\ttrain_norm = train.astype('float32')\r\n\ttest_norm = test.astype('float32')\r\n\t# normalize to range 0-1\r\n\ttrain_norm = train_norm \/ 255.0\r\n\ttest_norm = test_norm \/ 255.0\r\n\t# return normalized images\r\n\treturn train_norm, test_norm\r\n\r\n# define cnn model\r\ndef define_model():\r\n\tmodel = Sequential()\r\n\tmodel.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))\r\n\tmodel.add(BatchNormalization())\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Flatten())\r\n\tmodel.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(BatchNormalization())\r\n\tmodel.add(Dense(10, activation='softmax'))\r\n\t# compile model\r\n\topt = SGD(lr=0.01, momentum=0.9)\r\n\tmodel.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])\r\n\treturn model\r\n\r\n# evaluate a model using k-fold cross-validation\r\ndef evaluate_model(model, dataX, dataY, n_folds=5):\r\n\tscores, histories = list(), list()\r\n\t# prepare cross validation\r\n\tkfold = KFold(n_folds, shuffle=True, random_state=1)\r\n\t# enumerate splits\r\n\tfor train_ix, test_ix in kfold.split(dataX):\r\n\t\t# select rows for train and test\r\n\t\ttrainX, trainY, testX, testY = dataX[train_ix], dataY[train_ix], dataX[test_ix], dataY[test_ix]\r\n\t\t# fit model\r\n\t\thistory = model.fit(trainX, trainY, epochs=10, batch_size=32, validation_data=(testX, testY), verbose=0)\r\n\t\t# evaluate model\r\n\t\t_, acc = model.evaluate(testX, testY, verbose=0)\r\n\t\tprint('> %.3f' % (acc * 100.0))\r\n\t\t# stores scores\r\n\t\tscores.append(acc)\r\n\t\thistories.append(history)\r\n\treturn scores, histories\r\n\r\n# plot diagnostic learning curves\r\ndef summarize_diagnostics(histories):\r\n\tfor i in range(len(histories)):\r\n\t\t# plot loss\r\n\t\tpyplot.subplot(211)\r\n\t\tpyplot.title('Cross Entropy Loss')\r\n\t\tpyplot.plot(histories[i].history['loss'], color='blue', label='train')\r\n\t\tpyplot.plot(histories[i].history['val_loss'], color='orange', label='test')\r\n\t\t# plot accuracy\r\n\t\tpyplot.subplot(212)\r\n\t\tpyplot.title('Classification Accuracy')\r\n\t\tpyplot.plot(histories[i].history['acc'], color='blue', label='train')\r\n\t\tpyplot.plot(histories[i].history['val_acc'], color='orange', label='test')\r\n\tpyplot.show()\r\n\r\n# summarize model performance\r\ndef summarize_performance(scores):\r\n\t# print summary\r\n\tprint('Accuracy: mean=%.3f std=%.3f, n=%d' % (mean(scores)*100, std(scores)*100, len(scores)))\r\n\t# box and whisker plots of results\r\n\tpyplot.boxplot(scores)\r\n\tpyplot.show()\r\n\r\n# run the test harness for evaluating a model\r\ndef run_test_harness():\r\n\t# load dataset\r\n\ttrainX, trainY, testX, testY = load_dataset()\r\n\t# prepare pixel data\r\n\ttrainX, testX = prep_pixels(trainX, testX)\r\n\t# define model\r\n\tmodel = define_model()\r\n\t# evaluate model\r\n\tscores, histories = evaluate_model(model, trainX, trainY)\r\n\t# learning curves\r\n\tsummarize_diagnostics(histories)\r\n\t# summarize estimated performance\r\n\tsummarize_performance(scores)\r\n\r\n# entry point, run the test harness\r\nrun_test_harness()<\/pre>\n<p>Running the example again reports model performance for each fold of the cross-validation process.<\/p>\n<p>We can see perhaps a small drop in model performance as compared to the baseline across the cross-validation folds.<\/p>\n<pre class=\"crayon-plain-tag\">> 98.592\r\n> 99.792\r\n> 99.933\r\n> 99.992\r\n> 99.983<\/pre>\n<p>A plot of the learning curves is created, in this case showing that the speed of learning (improvement over epochs) does not appear to be different from the baseline model.<\/p>\n<p>The plots suggest that batch normalization, at least as implemented in this case, does not offer any benefit.<\/p>\n<div id=\"attachment_7558\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7558\" class=\"size-large wp-image-7558\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/02\/Loss-and-Accuracy-Learning-Curves-for-the-BatchNormalization-Model-During-k-Fold-Cross-Validation-1024x768.png\" alt=\"Loss and Accuracy Learning Curves for the BatchNormalization Model During k-Fold Cross-Validation\" width=\"1024\" height=\"768\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Loss-and-Accuracy-Learning-Curves-for-the-BatchNormalization-Model-During-k-Fold-Cross-Validation-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Loss-and-Accuracy-Learning-Curves-for-the-BatchNormalization-Model-During-k-Fold-Cross-Validation-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Loss-and-Accuracy-Learning-Curves-for-the-BatchNormalization-Model-During-k-Fold-Cross-Validation-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Loss-and-Accuracy-Learning-Curves-for-the-BatchNormalization-Model-During-k-Fold-Cross-Validation.png 1280w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><\/p>\n<p id=\"caption-attachment-7558\" class=\"wp-caption-text\">Loss and Accuracy Learning Curves for the BatchNormalization Model During k-Fold Cross-Validation<\/p>\n<\/div>\n<p>Next, the estimated performance of the model is presented, showing performance with a slight decrease in the mean accuracy of the model: 99.658 as compared to 99.678 with the baseline model, but perhaps a small decrease in the standard deviation.<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: mean=99.658 std=0.538, n=5<\/pre>\n<\/p>\n<div id=\"attachment_7559\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7559\" class=\"size-large wp-image-7559\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/02\/Box-and-Whisker-Plot-of-Accuracy-Scores-for-the-BatchNormalization-Model-Evaluated-Using-k-Fold-Cross-Validation-1024x768.png\" alt=\"Box and Whisker Plot of Accuracy Scores for the BatchNormalization Model Evaluated Using k-Fold Cross-Validation\" width=\"1024\" height=\"768\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Box-and-Whisker-Plot-of-Accuracy-Scores-for-the-BatchNormalization-Model-Evaluated-Using-k-Fold-Cross-Validation-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Box-and-Whisker-Plot-of-Accuracy-Scores-for-the-BatchNormalization-Model-Evaluated-Using-k-Fold-Cross-Validation-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Box-and-Whisker-Plot-of-Accuracy-Scores-for-the-BatchNormalization-Model-Evaluated-Using-k-Fold-Cross-Validation-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Box-and-Whisker-Plot-of-Accuracy-Scores-for-the-BatchNormalization-Model-Evaluated-Using-k-Fold-Cross-Validation.png 1280w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><\/p>\n<p id=\"caption-attachment-7559\" class=\"wp-caption-text\">Box and Whisker Plot of Accuracy Scores for the BatchNormalization Model Evaluated Using k-Fold Cross-Validation<\/p>\n<\/div>\n<h3>Increase in Model Depth<\/h3>\n<p>There are many ways to change the model configuration in order to explore improvements over the baseline model.<\/p>\n<p>Two common approaches involve changing the <a href=\"https:\/\/machinelearningmastery.com\/how-to-control-neural-network-model-capacity-with-nodes-and-layers\/\">capacity<\/a> of the feature extraction part of the model or changing the capacity or function of the classifier part of the model. Perhaps the point of biggest influence is a change to the feature extractor.<\/p>\n<p>We can increase the depth of the feature extractor part of the model, following a VGG-like pattern of adding more convolutional and pooling layers with the same sized filter, while increasing the number of filters. In this case, we will add a double convolutional layer with 64 filters each, followed by another max pooling layer.<\/p>\n<p>The updated version of the <em>define_model()<\/em> function with this change is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># define cnn model\r\ndef define_model():\r\n\tmodel = Sequential()\r\n\tmodel.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Flatten())\r\n\tmodel.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(10, activation='softmax'))\r\n\t# compile model\r\n\topt = SGD(lr=0.01, momentum=0.9)\r\n\tmodel.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])\r\n\treturn model<\/pre>\n<p>For completeness, the entire code listing, including this change, is provided below.<\/p>\n<pre class=\"crayon-plain-tag\"># deeper cnn model for mnist\r\nfrom numpy import mean\r\nfrom numpy import std\r\nfrom matplotlib import pyplot\r\nfrom sklearn.model_selection import KFold\r\nfrom keras.datasets import mnist\r\nfrom keras.utils import to_categorical\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2D\r\nfrom keras.layers import MaxPooling2D\r\nfrom keras.layers import Dense\r\nfrom keras.layers import Flatten\r\nfrom keras.optimizers import SGD\r\n\r\n# load train and test dataset\r\ndef load_dataset():\r\n\t# load dataset\r\n\t(trainX, trainY), (testX, testY) = mnist.load_data()\r\n\t# reshape dataset to have a single channel\r\n\ttrainX = trainX.reshape((trainX.shape[0], 28, 28, 1))\r\n\ttestX = testX.reshape((testX.shape[0], 28, 28, 1))\r\n\t# one hot encode target values\r\n\ttrainY = to_categorical(trainY)\r\n\ttestY = to_categorical(testY)\r\n\treturn trainX, trainY, testX, testY\r\n\r\n# scale pixels\r\ndef prep_pixels(train, test):\r\n\t# convert from integers to floats\r\n\ttrain_norm = train.astype('float32')\r\n\ttest_norm = test.astype('float32')\r\n\t# normalize to range 0-1\r\n\ttrain_norm = train_norm \/ 255.0\r\n\ttest_norm = test_norm \/ 255.0\r\n\t# return normalized images\r\n\treturn train_norm, test_norm\r\n\r\n# define cnn model\r\ndef define_model():\r\n\tmodel = Sequential()\r\n\tmodel.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Flatten())\r\n\tmodel.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(10, activation='softmax'))\r\n\t# compile model\r\n\topt = SGD(lr=0.01, momentum=0.9)\r\n\tmodel.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])\r\n\treturn model\r\n\r\n# evaluate a model using k-fold cross-validation\r\ndef evaluate_model(model, dataX, dataY, n_folds=5):\r\n\tscores, histories = list(), list()\r\n\t# prepare cross validation\r\n\tkfold = KFold(n_folds, shuffle=True, random_state=1)\r\n\t# enumerate splits\r\n\tfor train_ix, test_ix in kfold.split(dataX):\r\n\t\t# select rows for train and test\r\n\t\ttrainX, trainY, testX, testY = dataX[train_ix], dataY[train_ix], dataX[test_ix], dataY[test_ix]\r\n\t\t# fit model\r\n\t\thistory = model.fit(trainX, trainY, epochs=10, batch_size=32, validation_data=(testX, testY), verbose=0)\r\n\t\t# evaluate model\r\n\t\t_, acc = model.evaluate(testX, testY, verbose=0)\r\n\t\tprint('> %.3f' % (acc * 100.0))\r\n\t\t# stores scores\r\n\t\tscores.append(acc)\r\n\t\thistories.append(history)\r\n\treturn scores, histories\r\n\r\n# plot diagnostic learning curves\r\ndef summarize_diagnostics(histories):\r\n\tfor i in range(len(histories)):\r\n\t\t# plot loss\r\n\t\tpyplot.subplot(211)\r\n\t\tpyplot.title('Cross Entropy Loss')\r\n\t\tpyplot.plot(histories[i].history['loss'], color='blue', label='train')\r\n\t\tpyplot.plot(histories[i].history['val_loss'], color='orange', label='test')\r\n\t\t# plot accuracy\r\n\t\tpyplot.subplot(212)\r\n\t\tpyplot.title('Classification Accuracy')\r\n\t\tpyplot.plot(histories[i].history['acc'], color='blue', label='train')\r\n\t\tpyplot.plot(histories[i].history['val_acc'], color='orange', label='test')\r\n\tpyplot.show()\r\n\r\n# summarize model performance\r\ndef summarize_performance(scores):\r\n\t# print summary\r\n\tprint('Accuracy: mean=%.3f std=%.3f, n=%d' % (mean(scores)*100, std(scores)*100, len(scores)))\r\n\t# box and whisker plots of results\r\n\tpyplot.boxplot(scores)\r\n\tpyplot.show()\r\n\r\n# run the test harness for evaluating a model\r\ndef run_test_harness():\r\n\t# load dataset\r\n\ttrainX, trainY, testX, testY = load_dataset()\r\n\t# prepare pixel data\r\n\ttrainX, testX = prep_pixels(trainX, testX)\r\n\t# define model\r\n\tmodel = define_model()\r\n\t# evaluate model\r\n\tscores, histories = evaluate_model(model, trainX, trainY)\r\n\t# learning curves\r\n\tsummarize_diagnostics(histories)\r\n\t# summarize estimated performance\r\n\tsummarize_performance(scores)\r\n\r\n# entry point, run the test harness\r\nrun_test_harness()<\/pre>\n<p>Running the example reports model performance for each fold of the cross-validation process.<\/p>\n<p>The per-fold scores may suggest some improvement over the baseline.<\/p>\n<pre class=\"crayon-plain-tag\">> 98.925\r\n> 99.867\r\n> 99.983\r\n> 99.992\r\n> 100.000<\/pre>\n<p>A plot of the learning curves is created, in this case showing that the models still have a good fit on the problem, with no clear signs of overfitting. The plots may even suggest that further training epochs could be helpful.<\/p>\n<div id=\"attachment_7560\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7560\" class=\"size-large wp-image-7560\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/02\/Loss-and-Accuracy-Learning-Curves-for-the-Deeper-Model-During-k-Fold-Cross-Validation-1024x768.png\" alt=\"Loss and Accuracy Learning Curves for the Deeper Model During k-Fold Cross-Validation\" width=\"1024\" height=\"768\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Loss-and-Accuracy-Learning-Curves-for-the-Deeper-Model-During-k-Fold-Cross-Validation-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Loss-and-Accuracy-Learning-Curves-for-the-Deeper-Model-During-k-Fold-Cross-Validation-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Loss-and-Accuracy-Learning-Curves-for-the-Deeper-Model-During-k-Fold-Cross-Validation-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Loss-and-Accuracy-Learning-Curves-for-the-Deeper-Model-During-k-Fold-Cross-Validation.png 1280w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><\/p>\n<p id=\"caption-attachment-7560\" class=\"wp-caption-text\">Loss and Accuracy Learning Curves for the Deeper Model During k-Fold Cross-Validation<\/p>\n<\/div>\n<p>Next, the estimated performance of the model is presented, showing a small improvement in performance as compared to the baseline from 99.678 to 99.753, with a small drop in the standard deviation as well.<\/p>\n<pre class=\"crayon-plain-tag\">Accuracy: mean=99.753 std=0.417, n=5<\/pre>\n<\/p>\n<div id=\"attachment_7561\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7561\" class=\"size-large wp-image-7561\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/02\/Box-and-Whisker-Plot-of-Accuracy-Scores-for-the-Deeper-Model-Evaluated-Using-k-Fold-Cross-Validation-1024x768.png\" alt=\"Box and Whisker Plot of Accuracy Scores for the Deeper Model Evaluated Using k-Fold Cross-Validation\" width=\"1024\" height=\"768\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Box-and-Whisker-Plot-of-Accuracy-Scores-for-the-Deeper-Model-Evaluated-Using-k-Fold-Cross-Validation-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Box-and-Whisker-Plot-of-Accuracy-Scores-for-the-Deeper-Model-Evaluated-Using-k-Fold-Cross-Validation-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Box-and-Whisker-Plot-of-Accuracy-Scores-for-the-Deeper-Model-Evaluated-Using-k-Fold-Cross-Validation-768x576.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/Box-and-Whisker-Plot-of-Accuracy-Scores-for-the-Deeper-Model-Evaluated-Using-k-Fold-Cross-Validation.png 1280w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><\/p>\n<p id=\"caption-attachment-7561\" class=\"wp-caption-text\">Box and Whisker Plot of Accuracy Scores for the Deeper Model Evaluated Using k-Fold Cross-Validation<\/p>\n<\/div>\n<h2>How to Finalize the Model and Make Predictions<\/h2>\n<p>The process of model improvement may continue for as long as we have ideas and the time and resources to test them out.<\/p>\n<p>At some point, a final model configuration must be chosen and adopted. In this case, we will choose the deeper model as our final model.<\/p>\n<p>First, we will finalize our model, but fitting a model on the entire training dataset and saving the model to file for later use. We will then load the model and evaluate its performance on the hold out test dataset to get an idea of how well the chosen model actually performs in practice. Finally, we will use the saved model to make a prediction on a single image.<\/p>\n<h3>Save Final Model<\/h3>\n<p>A final model is typically fit on all available data, such as the combination of all train and test dataset.<\/p>\n<p>In this tutorial, we are intentionally holding back a test dataset so that we can estimate the performance of the final model, which can be a good idea in practice. As such, we will fit our model on the training dataset only.<\/p>\n<pre class=\"crayon-plain-tag\"># fit model\r\nmodel.fit(trainX, trainY, epochs=10, batch_size=32, verbose=0)<\/pre>\n<p>Once fit, we can save the final model to an H5 file by calling the <em>save()<\/em> function on the model and pass in the chosen filename.<\/p>\n<pre class=\"crayon-plain-tag\"># save model\r\nmodel.save('final_model.h5')<\/pre>\n<p>Note, saving and loading a Keras model requires that the <a href=\"https:\/\/www.h5py.org\/\">h5py library<\/a> is installed on your workstation.<\/p>\n<p>The complete example of fitting the final deep model on the training dataset and saving it to file is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># save the final model to file\r\nfrom keras.datasets import mnist\r\nfrom keras.utils import to_categorical\r\nfrom keras.models import Sequential\r\nfrom keras.layers import Conv2D\r\nfrom keras.layers import MaxPooling2D\r\nfrom keras.layers import Dense\r\nfrom keras.layers import Flatten\r\nfrom keras.optimizers import SGD\r\n\r\n# load train and test dataset\r\ndef load_dataset():\r\n\t# load dataset\r\n\t(trainX, trainY), (testX, testY) = mnist.load_data()\r\n\t# reshape dataset to have a single channel\r\n\ttrainX = trainX.reshape((trainX.shape[0], 28, 28, 1))\r\n\ttestX = testX.reshape((testX.shape[0], 28, 28, 1))\r\n\t# one hot encode target values\r\n\ttrainY = to_categorical(trainY)\r\n\ttestY = to_categorical(testY)\r\n\treturn trainX, trainY, testX, testY\r\n\r\n# scale pixels\r\ndef prep_pixels(train, test):\r\n\t# convert from integers to floats\r\n\ttrain_norm = train.astype('float32')\r\n\ttest_norm = test.astype('float32')\r\n\t# normalize to range 0-1\r\n\ttrain_norm = train_norm \/ 255.0\r\n\ttest_norm = test_norm \/ 255.0\r\n\t# return normalized images\r\n\treturn train_norm, test_norm\r\n\r\n# define cnn model\r\ndef define_model():\r\n\tmodel = Sequential()\r\n\tmodel.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(MaxPooling2D((2, 2)))\r\n\tmodel.add(Flatten())\r\n\tmodel.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))\r\n\tmodel.add(Dense(10, activation='softmax'))\r\n\t# compile model\r\n\topt = SGD(lr=0.01, momentum=0.9)\r\n\tmodel.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])\r\n\treturn model\r\n\r\n# run the test harness for evaluating a model\r\ndef run_test_harness():\r\n\t# load dataset\r\n\ttrainX, trainY, testX, testY = load_dataset()\r\n\t# prepare pixel data\r\n\ttrainX, testX = prep_pixels(trainX, testX)\r\n\t# define model\r\n\tmodel = define_model()\r\n\t# fit model\r\n\tmodel.fit(trainX, trainY, epochs=10, batch_size=32, verbose=0)\r\n\t# save model\r\n\tmodel.save('final_model.h5')\r\n\r\n# entry point, run the test harness\r\nrun_test_harness()<\/pre>\n<p>After running this example, you will now have a 1.2-megabyte file with the name \u2018<em>final_model.h5<\/em>\u2018 in your current working directory.<\/p>\n<h3>Evaluate Final Model<\/h3>\n<p>We can now load the final model and evaluate it on the hold out test dataset.<\/p>\n<p>This is something we might do if we were interested in presenting the performance of the chosen model to project stakeholders.<\/p>\n<p>The model can be loaded via the <em>load_model()<\/em> function.<\/p>\n<p>The complete example of loading the saved model and evaluating it on the test dataset is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># evaluate the deep model on the test dataset\r\nfrom keras.datasets import mnist\r\nfrom keras.models import load_model\r\nfrom keras.utils import to_categorical\r\n\r\n# load train and test dataset\r\ndef load_dataset():\r\n\t# load dataset\r\n\t(trainX, trainY), (testX, testY) = mnist.load_data()\r\n\t# reshape dataset to have a single channel\r\n\ttrainX = trainX.reshape((trainX.shape[0], 28, 28, 1))\r\n\ttestX = testX.reshape((testX.shape[0], 28, 28, 1))\r\n\t# one hot encode target values\r\n\ttrainY = to_categorical(trainY)\r\n\ttestY = to_categorical(testY)\r\n\treturn trainX, trainY, testX, testY\r\n\r\n# scale pixels\r\ndef prep_pixels(train, test):\r\n\t# convert from integers to floats\r\n\ttrain_norm = train.astype('float32')\r\n\ttest_norm = test.astype('float32')\r\n\t# normalize to range 0-1\r\n\ttrain_norm = train_norm \/ 255.0\r\n\ttest_norm = test_norm \/ 255.0\r\n\t# return normalized images\r\n\treturn train_norm, test_norm\r\n\r\n# run the test harness for evaluating a model\r\ndef run_test_harness():\r\n\t# load dataset\r\n\ttrainX, trainY, testX, testY = load_dataset()\r\n\t# prepare pixel data\r\n\ttrainX, testX = prep_pixels(trainX, testX)\r\n\t# load model\r\n\tmodel = load_model('final_model.h5')\r\n\t# evaluate model on test dataset\r\n\t_, acc = model.evaluate(testX, testY, verbose=0)\r\n\tprint('> %.3f' % (acc * 100.0))\r\n\r\n# entry point, run the test harness\r\nrun_test_harness()<\/pre>\n<p>Running the example loads the saved model and evaluates the model on the hold out test dataset.<\/p>\n<p>The classification accuracy for the model on the test dataset is calculated and printed. In this case, we can see that the model achieved an accuracy of 99.090%, or just less than 1%, which is not bad at all and reasonably close to the estimated 99.753% with a standard deviation of about half a percent (e.g. 99% of scores).<\/p>\n<pre class=\"crayon-plain-tag\">> 99.090<\/pre>\n<\/p>\n<h3>Make Prediction<\/h3>\n<p>We can use our saved model to make a prediction on new images.<\/p>\n<p>The model assumes that new images are grayscale, that they have been aligned so that one image contains one centered handwritten digit, and that the size of the image is square with the size 28\u00d728 pixels.<\/p>\n<p>Below is an image extracted from the MNIST test dataset. You can save it in your current working directory with the filename \u2018<em>sample_image.png<\/em>\u2018.<\/p>\n<div id=\"attachment_7562\" style=\"width: 310px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-7562\" class=\"wp-image-7562 size-medium\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/02\/sample_image-300x298.png\" alt=\"Sample Handwritten Digit\" width=\"300\" height=\"298\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/sample_image-300x298.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/sample_image-150x150.png 150w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/sample_image-768x763.png 768w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/sample_image-1024x1017.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2019\/02\/sample_image.png 1490w\" sizes=\"(max-width: 300px) 100vw, 300px\"><\/p>\n<p id=\"caption-attachment-7562\" class=\"wp-caption-text\">Sample Handwritten Digit<\/p>\n<\/div>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/02\/sample_image.png\">Download the sample image (sample_image.png)<\/a><\/li>\n<\/ul>\n<p>We will pretend this is an entirely new and unseen image, prepared in the required way, and see how we might use our saved model to predict the integer that the image represents (e.g. we expect \u201c<em>7<\/em>\u201c).<\/p>\n<p>First, we can load the image, force it to be in grayscale format, and force the size to be 28\u00d728 pixels. The loaded image can then be resized to have a single channel and represent a single sample in a dataset. The <em>load_image()<\/em> function implements this and will return the loaded image ready for classification.<\/p>\n<p>Importantly, the pixel values are prepared in the same way as the pixel values were prepared for the training dataset when fitting the final model, in this case, normalized.<\/p>\n<pre class=\"crayon-plain-tag\"># load and prepare the image\r\ndef load_image(filename):\r\n\t# load the image\r\n\timg = load_img(filename, grayscale=True, target_size=(28, 28))\r\n\t# convert to array\r\n\timg = img_to_array(img)\r\n\t# reshape into a single sample with 1 channel\r\n\timg = img.reshape(1, 28, 28, 1)\r\n\t# prepare pixel data\r\n\timg = img.astype('float32')\r\n\timg = img \/ 255.0\r\n\treturn img<\/pre>\n<p>Next, we can load the model as in the previous section and call the <em>predict_classes()<\/em> function to predict the digit that the image represents.<\/p>\n<pre class=\"crayon-plain-tag\"># predict the class\r\ndigit = model.predict_classes(img)<\/pre>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># make a prediction for a new image.\r\nfrom keras.preprocessing.image import load_img\r\nfrom keras.preprocessing.image import img_to_array\r\nfrom keras.models import load_model\r\n\r\n# load and prepare the image\r\ndef load_image(filename):\r\n\t# load the image\r\n\timg = load_img(filename, grayscale=True, target_size=(28, 28))\r\n\t# convert to array\r\n\timg = img_to_array(img)\r\n\t# reshape into a single sample with 1 channel\r\n\timg = img.reshape(1, 28, 28, 1)\r\n\t# prepare pixel data\r\n\timg = img.astype('float32')\r\n\timg = img \/ 255.0\r\n\treturn img\r\n\r\n# load an image and predict the class\r\ndef run_example():\r\n\t# load the image\r\n\timg = load_image('sample_image.png')\r\n\t# load model\r\n\tmodel = load_model('final_model.h5')\r\n\t# predict the class\r\n\tdigit = model.predict_classes(img)\r\n\tprint(digit[0])\r\n\r\n# entry point, run the example\r\nrun_example()<\/pre>\n<p>Running the example first loads and prepares the image, loads the model, and then correctly predicts that the loaded image represents the digit \u2018<em>7<\/em>\u2018.<\/p>\n<pre class=\"crayon-plain-tag\">7<\/pre>\n<\/p>\n<h2>Extensions<\/h2>\n<p>This section lists some ideas for extending the tutorial that you may wish to explore.<\/p>\n<ul>\n<li><strong>Tune Pixel Scaling<\/strong>. Explore how alternate pixel scaling methods impact model performance as compared to the baseline model, including centering and standardization.<\/li>\n<li><strong>Tune the Learning Rate<\/strong>. Explore how different learning rates impact the model performance as compared to the baseline model, such as 0.001 and 0.0001.<\/li>\n<li><strong>Tune Model Depth<\/strong>. Explore how adding more layers to the model impact the model performance as compared to the baseline model, such as another block of convolutional and pooling layers or another dense layer in the classifier part of the model.<\/li>\n<\/ul>\n<p>If you explore any of these extensions, I\u2019d love to know.<br \/>\nPost your findings in the comments below.<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/keras.io\/datasets\/\">Keras Datasets API<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/keras-team\/keras\/tree\/master\/keras\/datasets\">Keras Datasets Code<\/a><\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.KFold.html\">sklearn.model_selection.KFold API<\/a><\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/MNIST_database\">MNIST database, Wikipedia.<\/a><\/li>\n<li><a href=\"http:\/\/rodrigob.github.io\/are_we_there_yet\/build\/classification_datasets_results.html\">Classification datasets results, What is the class of this image?<\/a><\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to develop a convolutional neural network for handwritten digit classification from scratch.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>How to develop a test harness to develop a robust evaluation of a model and establish a baseline of performance for a classification task.<\/li>\n<li>How to explore extensions to a baseline model to improve learning and model capacity.<\/li>\n<li>How to develop a finalized model, evaluate the performance of the final model, and use it to make predictions on new images.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/how-to-develop-a-convolutional-neural-network-from-scratch-for-mnist-handwritten-digit-classification\/\">How to Develop a Convolutional Neural Network From Scratch for MNIST Handwritten Digit Classification<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/how-to-develop-a-convolutional-neural-network-from-scratch-for-mnist-handwritten-digit-classification\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee The MNIST handwritten digit classification problem is a standard dataset used in computer vision and deep learning. Although the dataset is effectively [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/07\/how-to-develop-a-convolutional-neural-network-from-scratch-for-mnist-handwritten-digit-classification\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2106,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2105"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2105"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2105\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2106"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2105"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2105"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2105"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}