{"id":2943,"date":"2019-12-18T18:00:59","date_gmt":"2019-12-18T18:00:59","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/12\/18\/tensorflow-2-tutorial-get-started-in-deep-learning-with-tf-keras\/"},"modified":"2019-12-18T18:00:59","modified_gmt":"2019-12-18T18:00:59","slug":"tensorflow-2-tutorial-get-started-in-deep-learning-with-tf-keras","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/12\/18\/tensorflow-2-tutorial-get-started-in-deep-learning-with-tf-keras\/","title":{"rendered":"TensorFlow 2 Tutorial: Get Started in Deep Learning With tf.keras"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Predictive modeling with deep learning is a skill that modern developers need to know.<\/p>\n<p>TensorFlow is the premier open-source deep learning framework developed and maintained by Google. Although using TensorFlow directly can be challenging, the modern tf.keras API beings the simplicity and ease of use of Keras to the TensorFlow project.<\/p>\n<p>Using tf.keras allows you to design, fit, evaluate, and use deep learning models to make predictions in just a few lines of code. It makes common deep learning tasks, such as classification and regression predictive modeling, accessible to average developers looking to get things done.<\/p>\n<p>In this tutorial, you will discover a step-by-step guide to developing deep learning models in TensorFlow using the tf.keras API.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>The difference between Keras and tf.keras and how to install and confirm TensorFlow is working.<\/li>\n<li>The 5-step life-cycle of tf.keras models and how to use the sequential and functional APIs.<\/li>\n<li>How to develop MLP, CNN, and RNN models with tf.keras for regression, classification, and time series forecasting.<\/li>\n<li>How to use the advanced features of the tf.keras API to inspect and diagnose your model.<\/li>\n<li>How to improve the performance of your tf.keras model by reducing overfitting and accelerating training.<\/li>\n<\/ul>\n<p>This is a large tutorial, and a lot of fun. You might want to bookmark it.<\/p>\n<p>The examples are small and focused; you can finish this tutorial in about 60 minutes.<\/p>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_9900\" style=\"width: 809px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9900\" class=\"size-full wp-image-9900\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/How-to-Develop-Deep-Learning-Models-With-tf.keras_.jpg\" alt=\"How to Develop Deep Learning Models With tf.keras\" width=\"799\" height=\"533\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/How-to-Develop-Deep-Learning-Models-With-tf.keras_.jpg 799w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/How-to-Develop-Deep-Learning-Models-With-tf.keras_-300x200.jpg 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/How-to-Develop-Deep-Learning-Models-With-tf.keras_-768x512.jpg 768w\" sizes=\"(max-width: 799px) 100vw, 799px\"><\/p>\n<p id=\"caption-attachment-9900\" class=\"wp-caption-text\">How to Develop Deep Learning Models With tf.keras<br \/>Photo by <a href=\"https:\/\/flickr.com\/photos\/gogostevie\/4148516651\/\">Stephen Harlan<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>TensorFlow Tutorial Overview<\/h2>\n<p>This tutorial is designed to be your complete introduction to tf.keras for your deep learning project.<\/p>\n<p>The focus is on using the API for common deep learning model development tasks; we will not be diving into the math and theory of deep learning. For that, I recommend <a href=\"https:\/\/amzn.to\/2Y8JuBv\">starting with this excellent book<\/a>.<\/p>\n<p>The best way to learn deep learning in python is by doing. Dive in. You can circle back for more theory later.<\/p>\n<p>I have designed each code example to use best practices and to be standalone so that you can copy and paste it directly into your project and adapt it to your specific needs. This will give you a massive head start over trying to figure out the API from official documentation alone.<\/p>\n<p>It is a large tutorial and as such, it is divided into five parts; they are:<\/p>\n<ol>\n<li>Install TensorFlow and tf.keras\n<ol>\n<li>What Are Keras and tf.keras?<\/li>\n<li>How to Install TensorFlow<\/li>\n<li>How to Confirm TensorFlow Is Installed<\/li>\n<\/ol>\n<\/li>\n<li>Deep Learning Model Life-Cycle\n<ol>\n<li>The 5-Step Model Life-Cycle<\/li>\n<li>Sequential Model API (Simple)<\/li>\n<li>Functional Model API (Advanced)<\/li>\n<\/ol>\n<\/li>\n<li>How to Develop Deep Learning Models\n<ol>\n<li>Develop Multilayer Perceptron Models<\/li>\n<li>Develop Convolutional Neural Network Models<\/li>\n<li>Develop Recurrent Neural Network Models<\/li>\n<\/ol>\n<\/li>\n<li>How to Use Advanced Model Features\n<ol>\n<li>How to Visualize a Deep Learning Model<\/li>\n<li>How to Plot Model Learning Curves<\/li>\n<li>How to Save and Load Your Model<\/li>\n<\/ol>\n<\/li>\n<li>How to Get Better Model Performance\n<ol>\n<li>How to Reduce Overfitting With Dropout<\/li>\n<li>How to Accelerate Training With Batch Normalization<\/li>\n<li>How to Halt Training at the Right Time With Early Stopping<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<h3>You Can Do Deep Learning in Python!<\/h3>\n<p>Work through the tutorial at your own pace.<\/p>\n<p><strong>You do not need to understand everything (at least not right now)<\/strong>. Your goal is to run through the tutorial end-to-end and get results. You do not need to understand everything on the first pass. List down your questions as you go. Make heavy use of the API documentation to learn about all of the functions that you\u2019re using.<\/p>\n<p><strong>You do not need to know the math first<\/strong>. Math is a compact way of describing how algorithms work, specifically tools from <a href=\"https:\/\/machinelearningmastery.com\/start-here\/#linear_algebra\">linear algebra<\/a>, <a href=\"https:\/\/machinelearningmastery.com\/start-here\/#probability\">probability<\/a>, and <a href=\"https:\/\/machinelearningmastery.com\/start-here\/#statistical_methods\">statistics<\/a>. These are not the only tools that you can use to learn how algorithms work. You can also use code and explore algorithm behavior with different inputs and outputs. Knowing the math will not tell you what algorithm to choose or how to best configure it. You can only discover that through careful, controlled experiments.<\/p>\n<p><strong>You do not need to know how the algorithms work<\/strong>. It is important to know about the limitations and how to configure deep learning algorithms. But learning about algorithms can come later. You need to build up this algorithm knowledge slowly over a long period of time. Today, start by getting comfortable with the platform.<\/p>\n<p><strong>You do not need to be a Python programmer<\/strong>. The syntax of the Python language can be intuitive if you are new to it. Just like other languages, focus on function calls (e.g. function()) and assignments (e.g. a = \u201cb\u201d). This will get you most of the way. You are a developer, so you know how to pick up the basics of a language really fast. Just get started and dive into the details later.<\/p>\n<p><strong>You do not need to be a deep learning expert<\/strong>. You can learn about the benefits and limitations of various algorithms later, and there are plenty of posts that you can read later to brush up on the steps of a deep learning project and the importance of evaluating model skill using cross-validation.<\/p>\n<h2>1. Install TensorFlow and tf.keras<\/h2>\n<p>In this section, you will discover what tf.keras is, how to install it, and how to confirm that it is installed correctly.<\/p>\n<h3>1.1 What Are Keras and tf.keras?<\/h3>\n<p><a href=\"https:\/\/keras.io\/\">Keras<\/a> is an open-source deep learning library written in Python.<\/p>\n<p>The project was started in 2015 by <a href=\"https:\/\/twitter.com\/fchollet\">Francois Chollet<\/a>. It quickly became a popular framework for developers, becoming one of, if not the most, popular deep learning libraries.<\/p>\n<p>During the period of 2015-2019, developing deep learning models using mathematical libraries like TensorFlow, Theano, and PyTorch was cumbersome, requiring tens or even hundreds of lines of code to achieve the simplest tasks. The focus of these libraries was on research, flexibility, and speed, not ease of use.<\/p>\n<p>Keras was popular because the API was clean and simple, allowing standard deep learning models to be defined, fit, and evaluated in just a few lines of code.<\/p>\n<p>A secondary reason Keras took-off was because it allowed you to use any one among the range of popular deep learning mathematical libraries as the backend (e.g. used to perform the computation), such as <a href=\"https:\/\/github.com\/tensorflow\/tensorflow\">TensorFlow<\/a>, <a href=\"https:\/\/github.com\/Theano\/Theano\">Theano<\/a>, and later, <a href=\"https:\/\/github.com\/microsoft\/CNTK\">CNTK<\/a>. This allowed the power of these libraries to be harnessed (e.g. GPUs) with a very clean and simple interface.<\/p>\n<p>In 2019, Google released a new version of their TensorFlow deep learning library (TensorFlow 2) that integrated the Keras API directly and promoted this interface as the default or standard interface for deep learning development on the platform.<\/p>\n<p>This integration is commonly referred to as the <em>tf.keras<\/em> interface or API (\u201c<em>tf<\/em>\u201d is short for \u201c<em>TensorFlow<\/em>\u201c). This is to distinguish it from the so-called standalone Keras open source project.<\/p>\n<ul>\n<li><strong>Standalone Keras<\/strong>. The standalone open source project that supports TensorFlow, Theano and CNTK backends.<\/li>\n<li><strong>tf.keras<\/strong>. The Keras API integrated into TensorFlow 2.<\/li>\n<\/ul>\n<p>The Keras API implementation in Keras is referred to as \u201c<em>tf.keras<\/em>\u201d because this is the Python idiom used when referencing the API. First, the TensorFlow module is imported and named \u201c<em>tf<\/em>\u201c; then, Keras API elements are accessed via calls to <em>tf.keras<\/em>; for example:<\/p>\n<pre class=\"crayon-plain-tag\"># example of tf.keras python idiom\r\nimport tensorflow as tf\r\n# use keras API\r\nmodel = tf.keras.Sequential()\r\n...<\/pre>\n<p>I generally don\u2019t use this idiom myself; I don\u2019t think it reads cleanly.<\/p>\n<p>Given that TensorFlow was the de facto standard backend for the Keras open source project, the integration means that a single library can now be used instead of two separate libraries. Further, the standalone Keras project now recommends all future Keras development use the <em>tf.keras<\/em> API.<\/p>\n<blockquote>\n<p>At this time, we recommend that Keras users who use multi-backend Keras with the TensorFlow backend switch to tf.keras in TensorFlow 2.0. tf.keras is better maintained and has better integration with TensorFlow features (eager execution, distribution support and other).<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/keras.io\/\">Keras Project Homepage<\/a>.<\/p>\n<h3>1.2 How to Install TensorFlow<\/h3>\n<p>Before installing TensorFlow, ensure that you have Python installed, such as Python 3.6 or higher.<\/p>\n<p>If you don\u2019t have Python installed, you can install it using Anaconda. This tutorial will show you how:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/setup-python-environment-machine-learning-deep-learning-anaconda\/\">How to Setup Your Python Environment for Machine Learning With Anaconda<\/a><\/li>\n<\/ul>\n<p>There are many ways to install the TensorFlow open-source deep learning library.<\/p>\n<p>The most common, and perhaps the simplest, way to install TensorFlow on your workstation is by using <em>pip<\/em>.<\/p>\n<p>For example, on the command line, you can type:<\/p>\n<pre class=\"crayon-plain-tag\">sudo pip install tensorflow<\/pre>\n<p>If you prefer to use an installation method more specific to your platform or package manager, you can see a complete list of installation instructions here:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.tensorflow.org\/install\">Install TensorFlow 2 Guide<\/a><\/li>\n<\/ul>\n<p>There is no need to set up the GPU now.<\/p>\n<p>All examples in this tutorial will work just fine on a modern CPU. If you want to configure TensorFlow for your GPU, you can do that after completing this tutorial. Don\u2019t get distracted!<\/p>\n<h3>1.3 How to Confirm TensorFlow Is Installed<\/h3>\n<p>Once TensorFlow is installed, it is important to confirm that the library was installed successfully and that you can start using it.<\/p>\n<p><em>Don\u2019t skip this step<\/em>.<\/p>\n<p>If TensorFlow is not installed correctly or raises an error on this step, you won\u2019t be able to run the examples later.<\/p>\n<p>Create a new file called <em>versions.py<\/em> and copy and paste the following code into the file.<\/p>\n<pre class=\"crayon-plain-tag\"># check version\r\nimport tensorflow\r\nprint(tensorflow.__version__)<\/pre>\n<p>Save the file, then open your <a href=\"https:\/\/machinelearningmastery.com\/faq\/single-faq\/how-do-i-run-a-script-from-the-command-line\">command line<\/a> and change directory to where you saved the file.<\/p>\n<p>Then type:<\/p>\n<pre class=\"crayon-plain-tag\">python versions.py<\/pre>\n<p>You should then see output like the following:<\/p>\n<pre class=\"crayon-plain-tag\">2.0.0<\/pre>\n<p>This confirms that TensorFlow is installed correctly and that we are all using the same version.<\/p>\n<p><strong>What version did you get?\u00a0<\/strong><br \/>\nPost your output in the comments below.<\/p>\n<p>This also shows you how to run a Python script from the command line. I recommend running all code from the command line in this manner, and <a href=\"https:\/\/machinelearningmastery.com\/faq\/single-faq\/why-dont-use-or-recommend-notebooks\">not from a notebook or an IDE<\/a>.<\/p>\n<h4>If You Get Warning Messages<\/h4>\n<p>Sometimes when you use the <em>tf.keras<\/em> API, you may see warnings printed.<\/p>\n<p>This might include messages that your hardware supports features that your TensorFlow installation was not configured to use.<\/p>\n<p>Some examples on my workstation include:<\/p>\n<pre class=\"crayon-plain-tag\">Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA\r\nXLA service 0x7fde3f2e6180 executing computations on platform Host. Devices:\r\nStreamExecutor device (0): Host, Default Version<\/pre>\n<p>They are not your fault. <strong>You did nothing wrong<\/strong>.<\/p>\n<p>These are information messages and they will not prevent the execution of your code. You can safely ignore messages of this type for now.<\/p>\n<p>It\u2019s an intentional design decision made by the TensorFlow team to show these warning messages. A downside of this decision is that it confuses beginners and it trains developers to ignore all messages, including those that potentially may impact the execution.<\/p>\n<p>Now that you know what tf.keras is, how to install TensorFlow, and how to confirm your development environment is working, let\u2019s look at the life-cycle of deep learning models in TensorFlow.<\/p>\n<h2>2. Deep Learning Model Life-Cycle<\/h2>\n<p>In this section, you will discover the life-cycle for a deep learning model and the two tf.keras APIs that you can use to define models.<\/p>\n<h3>2.1 The 5-Step Model Life-Cycle<\/h3>\n<p>A model has a life-cycle, and this very simple knowledge provides the backbone for both modeling a dataset and understanding the tf.keras API.<\/p>\n<p>The five steps in the life-cycle are as follows:<\/p>\n<ol>\n<li>Define the model.<\/li>\n<li>Compile the model.<\/li>\n<li>Fit the model.<\/li>\n<li>Evaluate the model.<\/li>\n<li>Make predictions.<\/li>\n<\/ol>\n<p>Let\u2019s take a closer look at each step in turn.<\/p>\n<h4>Define the Model<\/h4>\n<p>Defining the model requires that you first select the type of model that you need and then choose the architecture or network topology.<\/p>\n<p>From an API perspective, this involves defining the layers of the model, configuring each layer with a number of nodes and activation function, and connecting the layers together into a cohesive model.<\/p>\n<p>Models can be defined either with the Sequential API or the Functional API, and we will take a look at this in the next section.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define the model\r\nmodel = ...<\/pre>\n<\/p>\n<h4>Compile the Model<\/h4>\n<p>Compiling the model requires that you first select a loss function that you want to optimize, such as mean squared error or cross-entropy.<\/p>\n<p>It also requires that you select an algorithm to perform the optimization procedure, typically stochastic gradient descent, or a modern variation, such as Adam. It may also require that you select any performance metrics to keep track of during the model training process.<\/p>\n<p>From an API perspective, this involves calling a function to compile the model with the chosen configuration, which will prepare the appropriate data structures required for the efficient use of the model you have defined.<\/p>\n<p>The optimizer can be specified as a string for a known optimizer class, e.g. \u2018<em>sgd<\/em>\u2018 for stochastic gradient descent, or you can configure an instance of an optimizer class and use that.<\/p>\n<p>For a list of supported optimizers, see this:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/optimizers\">tf.keras Optimizers<\/a><\/li>\n<\/ul>\n<pre class=\"crayon-plain-tag\">...\r\n# compile the model\r\nopt = SGD(learning_rate=0.01, momentum=0.9)\r\nmodel.compile(optimizer=opt, loss='binary_crossentropy')<\/pre>\n<p>The three most common loss functions are:<\/p>\n<ul>\n<li>\u2018<em>binary_crossentropy<\/em>\u2018 for binary classification.<\/li>\n<li>\u2018<em>sparse_categorical_crossentropy<\/em>\u2018 for multi-class classification.<\/li>\n<li>\u2018<em>mse<\/em>\u2018 (mean squared error) for regression.<\/li>\n<\/ul>\n<pre class=\"crayon-plain-tag\">...\r\n# compile the model\r\nmodel.compile(optimizer='sgd', loss='mse')<\/pre>\n<p>For a list of supported loss functions, see:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/losses\">tf.keras Loss Functions<\/a><\/li>\n<\/ul>\n<p>Metrics are defined as a list of strings for known metric functions or a list of functions to call to evaluate predictions.<\/p>\n<p>For a list of supported metrics, see:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/metrics\">tf.keras Metrics<\/a><\/li>\n<\/ul>\n<pre class=\"crayon-plain-tag\">...\r\n# compile the model\r\nmodel.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])<\/pre>\n<\/p>\n<h4>Fit the Model<\/h4>\n<p>Fitting the model requires that you first select the training configuration, such as the number of epochs (loops through the training dataset) and the batch size (number of samples in an epoch used to estimate model error).<\/p>\n<p>Training applies the chosen optimization algorithm to minimize the chosen loss function and updates the model using the backpropagation of error algorithm.<\/p>\n<p>Fitting the model is the slow part of the whole process and can take seconds to hours to days, depending on the complexity of the model, the hardware you\u2019re using, and the size of the training dataset.<\/p>\n<p>From an API perspective, this involves calling a function to perform the training process. This function will block (not return) until the training process has finished.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# fit the model\r\nmodel.fit(X, y, epochs=100, batch_size=32)<\/pre>\n<p>For help on how to choose the batch size, see this tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/how-to-control-the-speed-and-stability-of-training-neural-networks-with-gradient-descent-batch-size\/\">How to Control the Stability of Training Neural Networks With the Batch Size<\/a><\/li>\n<\/ul>\n<p>While fitting the model, a progress bar will summarize the status of each epoch and the overall training process. This can be simplified to a simple report of model performance each epoch by setting the \u201c<em>verbose<\/em>\u201d argument to 2. All output can be turned off during training by setting \u201c<em>verbose<\/em>\u201d to 0.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# fit the model\r\nmodel.fit(X, y, epochs=100, batch_size=32, verbose=0)<\/pre>\n<\/p>\n<h4>Evaluate the Model<\/h4>\n<p>Evaluating the model requires that you first choose a holdout dataset used to evaluate the model. This should be data not used in the training process so that we can get an unbiased estimate of the performance of the model when making predictions on new data.<\/p>\n<p>The speed of model evaluation is proportional to the amount of data you want to use for the evaluation, although it is much faster than training as the model is not changed.<\/p>\n<p>From an API perspective, this involves calling a function with the holdout dataset and getting a loss and perhaps other metrics that can be reported.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# evaluate the model\r\nloss = model.evaluate(X, y, verbose=0)<\/pre>\n<\/p>\n<h4>Make a Prediction<\/h4>\n<p>Making a prediction is the final step in the life-cycle. It is why we wanted the model in the first place.<\/p>\n<p>It requires you have new data for which a prediction is required, e.g. where you do not have the target values.<\/p>\n<p>From an API perspective, you simply call a function to make a prediction of a class label, probability, or numerical value: whatever you designed your model to predict.<\/p>\n<p>You may want to save the model and later load it to make predictions. You may also choose to fit a model on all of the available data before you start using it.<\/p>\n<p>Now that we are familiar with the model life-cycle, let\u2019s take a look at the two main ways to use the tf.keras API to build models: sequential and functional.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# make a prediction\r\nyhat = model.predict(X)<\/pre>\n<\/p>\n<h3>2.2 Sequential Model API (Simple)<\/h3>\n<p>The sequential model API is the simplest and is the API that I recommend, especially when getting started.<\/p>\n<p>It is referred to as \u201c<em>sequential<\/em>\u201d because it involves defining a <a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/Sequential\">Sequential class<\/a> and adding layers to the model one by one in a linear manner, from input to output.<\/p>\n<p>The example below defines a Sequential MLP model that accepts eight inputs, has one hidden layer with 10 nodes and then an output layer with one node to predict a numerical value.<\/p>\n<pre class=\"crayon-plain-tag\"># example of a model defined with the sequential api\r\nfrom tensorflow.keras import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\n# define the model\r\nmodel = Sequential()\r\nmodel.add(Dense(10, input_shape=(8,)))\r\nmodel.add(Dense(1))<\/pre>\n<p>Note that the visible layer of the network is defined by the \u201c<em>input_shape<\/em>\u201d argument on the first hidden layer. That means in the above example, the model expects the input for one sample to be a vector of eight numbers.<\/p>\n<p>The sequential API is easy to use because you keep calling <em>model.add()<\/em> until you have added all of your layers.<\/p>\n<p>For example, here is a deep MLP with five hidden layers.<\/p>\n<pre class=\"crayon-plain-tag\"># example of a model defined with the sequential api\r\nfrom tensorflow.keras import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\n# define the model\r\nmodel = Sequential()\r\nmodel.add(Dense(100, input_shape=(8,)))\r\nmodel.add(Dense(80))\r\nmodel.add(Dense(30))\r\nmodel.add(Dense(10))\r\nmodel.add(Dense(5))\r\nmodel.add(Dense(1))<\/pre>\n<\/p>\n<h3>2.3 Functional Model API (Advanced)<\/h3>\n<p>The functional API is more complex but is also more flexible.<\/p>\n<p>It involves explicitly connecting the output of one layer to the input of another layer. Each connection is specified.<\/p>\n<p>First, an input layer must be defined via the <em>Input<\/em> class, and the shape of an input sample is specified. We must retain a reference to the input layer when defining the model.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\n# define the layers\r\nx_in = Input(shape=(8,))<\/pre>\n<p>Next, a fully connected layer can be connected to the input by calling the layer and passing the input layer. This will return a reference to the output connection in this new layer.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\nx = Dense(10)(x_in)<\/pre>\n<p>We can then connect this to an output layer in the same manner.<\/p>\n<pre class=\"crayon-plain-tag\">...\r\nx_out = Dense(1)(x)<\/pre>\n<p>Once connected, we define a Model object and specify the input and output layers. The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># example of a model defined with the functional api\r\nfrom tensorflow.keras import Model\r\nfrom tensorflow.keras import Input\r\nfrom tensorflow.keras.layers import Dense\r\n# define the layers\r\nx_in = Input(shape=(8,))\r\nx = Dense(10)(x_in)\r\nx_out = Dense(1)(x)\r\n# define the model\r\nmodel = Model(inputs=x_in, outputs=x_out)<\/pre>\n<p>As such, it allows for more complicated model designs, such as models that may have multiple input paths (separate vectors) and models that have multiple output paths (e.g. a word and a number).<\/p>\n<p>The functional API can be a lot of fun when you get used to it.<\/p>\n<p>For more on the functional API, see:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.tensorflow.org\/guide\/keras\/functional\">The Keras functional API in TensorFlow<\/a><\/li>\n<\/ul>\n<p>Now that we are familiar with the model life-cycle and the two APIs that can be used to define models, let\u2019s look at developing some standard models.<\/p>\n<h2>3. How to Develop Deep Learning Models<\/h2>\n<p>In this section, you will discover how to develop, evaluate, and make predictions with standard deep learning models, including Multilayer Perceptrons (MLP), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs).<\/p>\n<h3>3.1 Develop Multilayer Perceptron Models<\/h3>\n<p>A Multilayer Perceptron model, or MLP for short, is a standard fully connected neural network model.<\/p>\n<p>It is comprised of layers of nodes where each node is connected to all outputs from the previous layer and the output of each node is connected to all inputs for nodes in the next layer.<\/p>\n<p>An MLP is created by with one or more <em>Dense<\/em> layers. This model is appropriate for tabular data, that is data as it looks in a table or spreadsheet with one column for each variable and one row for each variable. There are three predictive modeling problems you may want to explore with an MLP; they are binary classification, multiclass classification, and regression.<\/p>\n<p>Let\u2019s fit a model on a real dataset for each of these cases.<\/p>\n<p>Note, the models in this section are effective, but not optimized. See if you can improve their performance. Post your findings in the comments below.<\/p>\n<h4>MLP for Binary Classification<\/h4>\n<p>We will use the Ionosphere binary (two-class) classification dataset to demonstrate an MLP for binary classification.<\/p>\n<p>This dataset involves predicting whether a structure is in the atmosphere or not given radar returns.<\/p>\n<p>The dataset will be downloaded automatically using <a href=\"https:\/\/pandas.pydata.org\/\">Pandas<\/a>, but you can learn more about it here.<\/p>\n<ul>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/ionosphere.csv\">Ionosphere Dataset (csv)<\/a>.<\/li>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/ionosphere.names\">Ionosphere Dataset Description (csv)<\/a>.<\/li>\n<\/ul>\n<p>We will use a <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.LabelEncoder.html\">LabelEncoder<\/a> to encode the string labels to integer values 0 and 1. The model will be fit on 67 percent of the data, and the remaining 33 percent will be used for evaluation, split using the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.train_test_split.html\">train_test_split()<\/a> function.<\/p>\n<p>It is a good practice to use \u2018<em>relu<\/em>\u2018 activation with a \u2018<em>he_normal<\/em>\u2018 weight initialization. This combination goes a long way to overcome the problem of vanishing gradients when training deep neural network models. For more on ReLU, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/rectified-linear-activation-function-for-deep-learning-neural-networks\/\">A Gentle Introduction to the Rectified Linear Unit (ReLU)<\/a><\/li>\n<\/ul>\n<p>The model predicts the probability of class 1 and uses the sigmoid activation function. The model is optimized using the <a href=\"https:\/\/machinelearningmastery.com\/adam-optimization-algorithm-for-deep-learning\/\">adam version of stochastic gradient descent<\/a> and seeks to minimize the <a href=\"https:\/\/machinelearningmastery.com\/cross-entropy-for-machine-learning\/\">cross-entropy loss<\/a>.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># mlp for binary classification\r\nfrom pandas import read_csv\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.preprocessing import LabelEncoder\r\nfrom tensorflow.keras import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\n# load the dataset\r\npath = 'https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/ionosphere.csv'\r\ndf = read_csv(path, header=None)\r\n# split into input and output columns\r\nX, y = df.values[:, :-1], df.values[:, -1]\r\n# ensure all data are floating point values\r\nX = X.astype('float32')\r\n# encode strings to integer\r\ny = LabelEncoder().fit_transform(y)\r\n# split into train and test datasets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)\r\nprint(X_train.shape, X_test.shape, y_train.shape, y_test.shape)\r\n# determine the number of input features\r\nn_features = X_train.shape[1]\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))\r\nmodel.add(Dense(8, activation='relu', kernel_initializer='he_normal'))\r\nmodel.add(Dense(1, activation='sigmoid'))\r\n# compile the model\r\nmodel.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])\r\n# fit the model\r\nmodel.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0)\r\n# evaluate the model\r\nloss, acc = model.evaluate(X_test, y_test, verbose=0)\r\nprint('Test Accuracy: %.3f' % acc)\r\n# make a prediction\r\nrow = [1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1,0.03760,0.85243,-0.17755,0.59755,-0.44945,0.60536,-0.38223,0.84356,-0.38542,0.58212,-0.32192,0.56971,-0.29674,0.36946,-0.47357,0.56811,-0.51171,0.41078,-0.46168,0.21266,-0.34090,0.42267,-0.54487,0.18641,-0.45300]\r\nyhat = model.predict([row])\r\nprint('Predicted: %.3f' % yhat)<\/pre>\n<p>Running the example first reports the shape of the dataset, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.<\/p>\n<p>Your specific results will vary given the <a href=\"https:\/\/machinelearningmastery.com\/stochastic-in-machine-learning\/\">stochastic nature of the learning algorithm<\/a>. Try running the example a few times.<\/p>\n<p><strong>What results did you get?<\/strong> Can you change the model to do better?<br \/>\nPost your findings to the comments below.<\/p>\n<p>In this case, we can see that the model achieved a classification accuracy of about 94 percent and then predicted a probability of 0.9 that the one row of data belongs to class 1.<\/p>\n<pre class=\"crayon-plain-tag\">(235, 34) (116, 34) (235,) (116,)\r\nTest Accuracy: 0.940\r\nPredicted: 0.991<\/pre>\n<\/p>\n<h4>MLP for Multiclass Classification<\/h4>\n<p>We will use the Iris flowers multiclass classification dataset to demonstrate an MLP for multiclass classification.<\/p>\n<p>This problem involves predicting the species of iris flower given measures of the flower.<\/p>\n<p>The dataset will be downloaded automatically using Pandas, but you can learn more about it here.<\/p>\n<ul>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/iris.csv\">Iris Dataset (csv)<\/a>.<\/li>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/iris.names\">Iris Dataset Description (csv)<\/a>.<\/li>\n<\/ul>\n<p>Given that it is a multiclass classification, the model must have one node for each class in the output layer and use the softmax activation function. The loss function is the \u2018<em>sparse_categorical_crossentropy<\/em>\u2018, which is appropriate for integer encoded class labels (e.g. 0 for one class, 1 for the next class, etc.)<\/p>\n<p>The complete example of fitting and evaluating an MLP on the iris flowers dataset is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># mlp for multiclass classification\r\nfrom numpy import argmax\r\nfrom pandas import read_csv\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.preprocessing import LabelEncoder\r\nfrom tensorflow.keras import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\n# load the dataset\r\npath = 'https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/iris.csv'\r\ndf = read_csv(path, header=None)\r\n# split into input and output columns\r\nX, y = df.values[:, :-1], df.values[:, -1]\r\n# ensure all data are floating point values\r\nX = X.astype('float32')\r\n# encode strings to integer\r\ny = LabelEncoder().fit_transform(y)\r\n# split into train and test datasets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)\r\nprint(X_train.shape, X_test.shape, y_train.shape, y_test.shape)\r\n# determine the number of input features\r\nn_features = X_train.shape[1]\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))\r\nmodel.add(Dense(8, activation='relu', kernel_initializer='he_normal'))\r\nmodel.add(Dense(3, activation='softmax'))\r\n# compile the model\r\nmodel.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])\r\n# fit the model\r\nmodel.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0)\r\n# evaluate the model\r\nloss, acc = model.evaluate(X_test, y_test, verbose=0)\r\nprint('Test Accuracy: %.3f' % acc)\r\n# make a prediction\r\nrow = [5.1,3.5,1.4,0.2]\r\nyhat = model.predict([row])\r\nprint('Predicted: %s (class=%d)' % (yhat, argmax(yhat)))<\/pre>\n<p>Running the example first reports the shape of the dataset, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.<\/p>\n<p>Your specific results will vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p><strong>What results did you get?<\/strong> Can you change the model to do better?<br \/>\nPost your findings to the comments below.<\/p>\n<p>In this case, we can see that the model achieved a classification accuracy of about 98 percent and then predicted a probability of a row of data belonging to each class, although class 0 has the highest probability.<\/p>\n<pre class=\"crayon-plain-tag\">(100, 4) (50, 4) (100,) (50,)\r\nTest Accuracy: 0.980\r\nPredicted: [[0.8680804 0.12356871 0.00835086]] (class=0)<\/pre>\n<\/p>\n<h4>MLP for Regression<\/h4>\n<p>We will use the Boston housing regression dataset to demonstrate an MLP for regression predictive modeling.<\/p>\n<p>This problem involves predicting house value based on properties of the house and neighborhood.<\/p>\n<p>The dataset will be downloaded automatically using Pandas, but you can learn more about it here.<\/p>\n<ul>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/housing.csv\">Boston Housing Dataset (csv)<\/a>.<\/li>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/housing.names\">Boston Housing Dataset Description (csv)<\/a>.<\/li>\n<\/ul>\n<p>This is a regression problem that involves predicting a single numerical value. As such, the output layer has a single node and uses the default or linear activation function (no activation function). The mean squared error (mse) loss is minimized when fitting the model.<\/p>\n<p>Recall that this is a regression, not classification; therefore, we cannot calculate classification accuracy. For more on this, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/classification-versus-regression-in-machine-learning\/\">Difference Between Classification and Regression in Machine Learning<\/a><\/li>\n<\/ul>\n<p>The complete example of fitting and evaluating an MLP on the Boston housing dataset is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># mlp for regression\r\nfrom numpy import sqrt\r\nfrom pandas import read_csv\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.preprocessing import LabelEncoder\r\nfrom tensorflow.keras import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\n# load the dataset\r\npath = 'https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/housing.csv'\r\ndf = read_csv(path, header=None)\r\n# split into input and output columns\r\nX, y = df.values[:, :-1], df.values[:, -1]\r\n# encode strings to integer\r\ny = LabelEncoder().fit_transform(y)\r\n# split into train and test datasets\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)\r\nprint(X_train.shape, X_test.shape, y_train.shape, y_test.shape)\r\n# determine the number of input features\r\nn_features = X_train.shape[1]\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Dense(10, activation='sigmoid', input_shape=(n_features,)))\r\nmodel.add(Dense(8, activation='relu', kernel_initializer='he_normal'))\r\nmodel.add(Dense(1))\r\n# compile the model\r\nmodel.compile(optimizer='adam', loss='mse')\r\n# fit the model\r\nmodel.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0)\r\n# evaluate the model\r\nerror = model.evaluate(X_test, y_test, verbose=0)\r\nprint('MSE: %.3f, RMSE: %.3f' % (error, sqrt(error)))\r\n# make a prediction\r\nrow = [0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98]\r\nyhat = model.predict([row])\r\nprint('Predicted: %.3f' % yhat)<\/pre>\n<p>Running the example first reports the shape of the dataset then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single row of data.<\/p>\n<p>Your specific results will vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p><strong>What results did you get?<\/strong> Can you change the model to do better?<br \/>\nPost your findings to the comments below.<\/p>\n<p>In this case, we can see that the model achieved an MSE of about 8,000 which is an RMSE of about 90 (units are thousands of dollars). A value of 41 is then predicted for the single example.<\/p>\n<pre class=\"crayon-plain-tag\">(339, 13) (167, 13) (339,) (167,)\r\nMSE: 8184.539, RMSE: 90.468\r\nPredicted: 41.152<\/pre>\n<\/p>\n<h3>3.2 Develop Convolutional Neural Network Models<\/h3>\n<p>Convolutional Neural Networks, or CNNs for short, are a type of network designed for image input.<\/p>\n<p>They are comprised of models with <a href=\"https:\/\/machinelearningmastery.com\/convolutional-layers-for-deep-learning-neural-networks\/\">convolutional layers<\/a> that extract features (called feature maps) and <a href=\"https:\/\/machinelearningmastery.com\/pooling-layers-for-convolutional-neural-networks\/\">pooling layers<\/a> that distill features down to the most salient elements.<\/p>\n<p>CNNs are most well-suited to image classification tasks, although they can be used on a wide array of tasks that take images as input.<\/p>\n<p>A popular image classification task is the <a href=\"https:\/\/en.wikipedia.org\/wiki\/MNIST_database\">MNIST handwritten digit classification<\/a>. It involves tens of thousands of handwritten digits that must be classified as a number between 0 and 9.<\/p>\n<p>The tf.keras API provides a convenience function to download and load this dataset directly.<\/p>\n<p>The example below loads the dataset and plots the first few images.<\/p>\n<pre class=\"crayon-plain-tag\"># example of loading and plotting the mnist dataset\r\nfrom tensorflow.keras.datasets.mnist import load_data\r\nfrom matplotlib import pyplot\r\n# load dataset\r\n(trainX, trainy), (testX, testy) = load_data()\r\n# summarize loaded dataset\r\nprint('Train: X=%s, y=%s' % (trainX.shape, trainy.shape))\r\nprint('Test: X=%s, y=%s' % (testX.shape, testy.shape))\r\n# plot first few images\r\nfor i in range(25):\r\n\t# define subplot\r\n\tpyplot.subplot(5, 5, i+1)\r\n\t# plot raw pixel data\r\n\tpyplot.imshow(trainX[i], cmap=pyplot.get_cmap('gray'))\r\n# show the figure\r\npyplot.show()<\/pre>\n<p>Running the example loads the MNIST dataset, then summarizes the default train and test datasets.<\/p>\n<pre class=\"crayon-plain-tag\">Train: X=(60000, 28, 28), y=(60000,)\r\nTest: X=(10000, 28, 28), y=(10000,)<\/pre>\n<p>A plot is then created showing a grid of examples of handwritten images in the training dataset.<\/p>\n<div id=\"attachment_9897\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9897\" class=\"size-full wp-image-9897\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/Plot-of-Handwritten-Digits-from-the-MNIST-dataset.png\" alt=\"Plot of Handwritten Digits From the MNIST dataset\" width=\"1280\" height=\"960\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/Plot-of-Handwritten-Digits-from-the-MNIST-dataset.png 1280w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/Plot-of-Handwritten-Digits-from-the-MNIST-dataset-300x225.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/Plot-of-Handwritten-Digits-from-the-MNIST-dataset-1024x768.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/Plot-of-Handwritten-Digits-from-the-MNIST-dataset-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-9897\" class=\"wp-caption-text\">Plot of Handwritten Digits From the MNIST dataset<\/p>\n<\/div>\n<p>We can train a CNN model to classify the images in the MNIST dataset.<\/p>\n<p>Note that the images are arrays of grayscale pixel data; therefore, we must add a channel dimension to the data before we can use the images as input to the model. The reason is that CNN models expect images in a <a href=\"https:\/\/machinelearningmastery.com\/a-gentle-introduction-to-channels-first-and-channels-last-image-formats-for-deep-learning\/\">channels-last format<\/a>, that is each example to the network has the dimensions [rows, columns, channels], where channels represent the color channels of the image data.<\/p>\n<p>It is also a good idea to scale the pixel values from the default range of 0-255 to 0-1 when training a CNN. For more on scaling pixel values, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/how-to-manually-scale-image-pixel-data-for-deep-learning\/\">How to Manually Scale Image Pixel Data for Deep Learning<\/a><\/li>\n<\/ul>\n<p>The complete example of fitting and evaluating a CNN model on the MNIST dataset is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># example of a cnn for image classification\r\nfrom numpy import unique\r\nfrom numpy import argmax\r\nfrom tensorflow.keras.datasets.mnist import load_data\r\nfrom tensorflow.keras import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\nfrom tensorflow.keras.layers import Conv2D\r\nfrom tensorflow.keras.layers import MaxPooling2D\r\nfrom tensorflow.keras.layers import Flatten\r\nfrom tensorflow.keras.layers import Dropout\r\n# load dataset\r\n(x_train, y_train), (x_test, y_test) = load_data()\r\n# reshape data to have a single channel\r\nx_train = x_train.reshape((x_train.shape[0], x_train.shape[1], x_train.shape[2], 1))\r\nx_test = x_test.reshape((x_test.shape[0], x_test.shape[1], x_test.shape[2], 1))\r\n# determine the shape of the input images\r\nin_shape = x_train.shape[1:]\r\n# determine the number of classes\r\nn_classes = len(unique(y_train))\r\nprint(in_shape, n_classes)\r\n# normalize pixel values\r\nx_train = x_train.astype('float32') \/ 255.0\r\nx_test = x_test.astype('float32') \/ 255.0\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Conv2D(32, (3,3), activation='relu', kernel_initializer='he_uniform', input_shape=in_shape))\r\nmodel.add(MaxPooling2D((2, 2)))\r\nmodel.add(Flatten())\r\nmodel.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))\r\nmodel.add(Dropout(0.5))\r\nmodel.add(Dense(n_classes, activation='softmax'))\r\n# define loss and optimizer\r\nmodel.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])\r\n# fit the model\r\nmodel.fit(x_train, y_train, epochs=10, batch_size=128, verbose=0)\r\n# evaluate the model\r\nloss, acc = model.evaluate(x_test, y_test, verbose=0)\r\nprint('Accuracy: %.3f' % acc)\r\n# make a prediction\r\nimage = x_train[0]\r\nyhat = model.predict([[image]])\r\nprint('Predicted: class=%d' % argmax(yhat))<\/pre>\n<p>Running the example first reports the shape of the dataset, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single image.<\/p>\n<p>Your specific results will vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p><strong>What results did you get?<\/strong> Can you change the model to do better?<br \/>\nPost your findings to the comments below.<\/p>\n<p>First, the shape of each image is reported along with the number of classes; we can see that each image is 28\u00d728 pixels and there are 10 classes as we expected.<\/p>\n<p>In this case, we can see that the model achieved a classification accuracy of about 98 percent on the test dataset. We can then see that the model predicted class 5 for the first image in the training set.<\/p>\n<pre class=\"crayon-plain-tag\">(28, 28, 1) 10\r\nAccuracy: 0.987\r\nPredicted: class=5<\/pre>\n<\/p>\n<h3>3.3 Develop Recurrent Neural Network Models<\/h3>\n<p>Recurrent Neural Networks, or RNNs for short, are designed to operate upon sequences of data.<\/p>\n<p>They have proven to be very effective for natural language processing problems where sequences of text are provided as input to the model. RNNs have also seen some modest success for time series forecasting and speech recognition.<\/p>\n<p>The most popular type of RNN is the Long Short-Term Memory network, or LSTM for short. LSTMs can be used in a model to accept a sequence of input data and make a prediction, such as assign a class label or predict a numerical value like the next value or values in the sequence.<\/p>\n<p>We will use the car sales dataset to demonstrate an LSTM RNN for univariate time series forecasting.<\/p>\n<p>This problem involves predicting the number of car sales per month.<\/p>\n<p>The dataset will be downloaded automatically using Pandas, but you can learn more about it here.<\/p>\n<ul>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/monthly-car-sales.csv\">Car Sales Dataset (csv)<\/a>.<\/li>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/monthly-car-sales.names\">Car Sales Dataset Description (csv)<\/a>.<\/li>\n<\/ul>\n<p>We will frame the problem to take a window of the last five months of data to predict the current month\u2019s data.<\/p>\n<p>To achieve this, we will define a new function named <em>split_sequence()<\/em> that will <a href=\"https:\/\/machinelearningmastery.com\/time-series-forecasting-supervised-learning\/\">split the input sequence into windows<\/a> of data appropriate for fitting a supervised learning model, like an LSTM.<\/p>\n<p>For example, if the sequence was:<\/p>\n<pre class=\"crayon-plain-tag\">1, 2, 3, 4, 5, 6, 7, 8, 9, 10<\/pre>\n<p>Then the samples for training the model will look like:<\/p>\n<pre class=\"crayon-plain-tag\">Input \t\t\t\tOutput\r\n1, 2, 3, 4, 5 \t\t6\r\n2, 3, 4, 5, 6 \t\t7\r\n3, 4, 5, 6, 7 \t\t8\r\n...<\/pre>\n<p>We will use the last 12 months of data as the test dataset.<\/p>\n<p>LSTMs expect each sample in the dataset to have two dimensions; the first is the number of time steps (in this case it is 5), and the second is the number of observations per time step (in this case it is 1).<\/p>\n<p>Because it is a regression type problem, we will use a linear activation function (no activation<br \/>\nfunction) in the output layer and optimize the mean squared error loss function. We will also evaluate the model using the mean absolute error (MAE) metric.<\/p>\n<p>The complete example of fitting and evaluating an LSTM for a univariate time series forecasting problem is listed below.<\/p>\n<pre class=\"crayon-plain-tag\"># lstm for time series forecasting\r\nfrom numpy import sqrt\r\nfrom numpy import asarray\r\nfrom pandas import read_csv\r\nfrom tensorflow.keras import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\nfrom tensorflow.keras.layers import LSTM\r\n\r\n# split a univariate sequence into samples\r\ndef split_sequence(sequence, n_steps):\r\n\tX, y = list(), list()\r\n\tfor i in range(len(sequence)):\r\n\t\t# find the end of this pattern\r\n\t\tend_ix = i + n_steps\r\n\t\t# check if we are beyond the sequence\r\n\t\tif end_ix > len(sequence)-1:\r\n\t\t\tbreak\r\n\t\t# gather input and output parts of the pattern\r\n\t\tseq_x, seq_y = sequence[i:end_ix], sequence[end_ix]\r\n\t\tX.append(seq_x)\r\n\t\ty.append(seq_y)\r\n\treturn asarray(X), asarray(y)\r\n\r\n# load the dataset\r\npath = 'https:\/\/raw.githubusercontent.com\/jbrownlee\/Datasets\/master\/monthly-car-sales.csv'\r\ndf = read_csv(path, header=0, index_col=0, squeeze=True)\r\n# retrieve the values\r\nvalues = df.values.astype('float32')\r\n# specify the window size\r\nn_steps = 5\r\n# split into samples\r\nX, y = split_sequence(values, n_steps)\r\n# reshape into [samples, timesteps, features]\r\nX = X.reshape((X.shape[0], X.shape[1], 1))\r\n# split into train\/test\r\nn_test = 12\r\nX_train, X_test, y_train, y_test = X[:-n_test], X[-n_test:], y[:-n_test], y[-n_test:]\r\nprint(X_train.shape, X_test.shape, y_train.shape, y_test.shape)\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(LSTM(100, activation='relu', kernel_initializer='he_normal', input_shape=(n_steps,1)))\r\nmodel.add(Dense(50, activation='relu', kernel_initializer='he_normal'))\r\nmodel.add(Dense(50, activation='relu', kernel_initializer='he_normal'))\r\nmodel.add(Dense(1))\r\n# compile the model\r\nmodel.compile(optimizer='adam', loss='mse', metrics=['mae'])\r\n# fit the model\r\nmodel.fit(X_train, y_train, epochs=350, batch_size=32, verbose=2, validation_data=(X_test, y_test))\r\n# evaluate the model\r\nmse, mae = model.evaluate(X_test, y_test, verbose=0)\r\nprint('MSE: %.3f, RMSE: %.3f, MAE: %.3f' % (mse, sqrt(mse), mae))\r\n# make a prediction\r\nrow = asarray([18024.0, 16722.0, 14385.0, 21342.0, 17180.0]).reshape((1, n_steps, 1))\r\nyhat = model.predict(row)\r\nprint('Predicted: %.3f' % (yhat))<\/pre>\n<p>Running the example first reports the shape of the dataset, then fits the model and evaluates it on the test dataset. Finally, a prediction is made for a single example.<\/p>\n<p>Your specific results will vary given the stochastic nature of the learning algorithm. Try running the example a few times.<\/p>\n<p><strong>What results did you get?<\/strong> Can you change the model to do better?<br \/>\nPost your findings to the comments below.<\/p>\n<p>First, the shape of the train and test datasets is displayed, confirming that the last 12 examples are used for model evaluation.<\/p>\n<p>In this case, the model achieved an MAE of about 2,800 and predicted the next value in the sequence from the test set as 13,199, where the expected value is 14,577 (pretty close).<\/p>\n<pre class=\"crayon-plain-tag\">(91, 5, 1) (12, 5, 1) (91,) (12,)\r\nMSE: 12755421.000, RMSE: 3571.473, MAE: 2856.084\r\nPredicted: 13199.325<\/pre>\n<p><strong>Note<\/strong>: it is good practice to scale and make the series stationary the data prior to fitting the model. I recommend this as an extension in order to achieve better performance. For more on preparing time series data for modeling, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/machine-learning-data-transforms-for-time-series-forecasting\/\">4 Common Machine Learning Data Transforms for Time Series Forecasting<\/a><\/li>\n<\/ul>\n<h2>4. How to Use Advanced Model Features<\/h2>\n<p>In this section, you will discover how to use some of the slightly more advanced model features, such as reviewing learning curves and saving models for later use.<\/p>\n<h3>4.1 How to Visualize a Deep Learning Model<\/h3>\n<p>The architecture of deep learning models can quickly become large and complex.<\/p>\n<p>As such, it is important to have a clear idea of the connections and data flow in your model. This is especially important if you are using the functional API to ensure you have indeed connected the layers of the model in the way you intended.<\/p>\n<p>There are two tools you can use to visualize your model: a text description and a plot.<\/p>\n<h4>Model Text Description<\/h4>\n<p>A text description of your model can be displayed by calling the <a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/Model#summary\">summary() function<\/a> on your model.<\/p>\n<p>The example below defines a small model with three layers and then summarizes the structure.<\/p>\n<pre class=\"crayon-plain-tag\"># example of summarizing a model\r\nfrom tensorflow.keras import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(8,)))\r\nmodel.add(Dense(8, activation='relu', kernel_initializer='he_normal'))\r\nmodel.add(Dense(1, activation='sigmoid'))\r\n# summarize the model\r\nmodel.summary()<\/pre>\n<p>Running the example prints a summary of each layer, as well as a total summary.<\/p>\n<p>This is an invaluable diagnostic for checking the output shapes and number of parameters (weights) in your model.<\/p>\n<pre class=\"crayon-plain-tag\">Model: \"sequential\"\r\n_________________________________________________________________\r\nLayer (type)                 Output Shape              Param #\r\n=================================================================\r\ndense (Dense)                (None, 10)                90\r\n_________________________________________________________________\r\ndense_1 (Dense)              (None, 8)                 88\r\n_________________________________________________________________\r\ndense_2 (Dense)              (None, 1)                 9\r\n=================================================================\r\nTotal params: 187\r\nTrainable params: 187\r\nNon-trainable params: 0\r\n_________________________________________________________________<\/pre>\n<\/p>\n<h4>Model Architecture Plot<\/h4>\n<p>You can create a plot of your model by calling the <a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/utils\/plot_model\">plot_model() function<\/a>.<\/p>\n<p>This will create an image file that contains a box and line diagram of the layers in your model.<\/p>\n<p>The example below creates a small three-layer model and saves a plot of the model architecture to \u2018<em>model.png<\/em>\u2018 that includes input and output shapes.<\/p>\n<pre class=\"crayon-plain-tag\"># example of plotting a model\r\nfrom tensorflow.keras import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\nfrom tensorflow.keras.utils import plot_model\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(8,)))\r\nmodel.add(Dense(8, activation='relu', kernel_initializer='he_normal'))\r\nmodel.add(Dense(1, activation='sigmoid'))\r\n# summarize the model\r\nplot_model(model, 'model.png', show_shapes=True)<\/pre>\n<p>Running the example creates a plot of the model showing a box for each layer with shape information, and arrows that connect the layers, showing the flow of data through the network.<\/p>\n<div id=\"attachment_9898\" style=\"width: 374px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9898\" class=\"size-full wp-image-9898\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/Plot-of-Neural-Network-Architecture.png\" alt=\"Plot of Neural Network Architecture\" width=\"364\" height=\"405\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/Plot-of-Neural-Network-Architecture.png 364w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/Plot-of-Neural-Network-Architecture-270x300.png 270w\" sizes=\"(max-width: 364px) 100vw, 364px\"><\/p>\n<p id=\"caption-attachment-9898\" class=\"wp-caption-text\">Plot of Neural Network Architecture<\/p>\n<\/div>\n<h3>4.2 How to Plot Model Learning Curves<\/h3>\n<p>Learning curves are a plot of neural network model performance over time, such as calculated at the end of each training epoch.<\/p>\n<p>Plots of learning curves provide insight into the learning dynamics of the model, such as whether the model is learning well, whether it is underfitting the training dataset, or whether it is overfitting the training dataset.<\/p>\n<p>For a gentle introduction to learning curves and how to use them to diagnose learning dynamics of models, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/learning-curves-for-diagnosing-machine-learning-model-performance\/\">How to use Learning Curves to Diagnose Machine Learning Model Performance<\/a><\/li>\n<\/ul>\n<p>You can easily create learning curves for your deep learning models.<\/p>\n<p>First, you must update your call to the fit function to include reference to a <a href=\"https:\/\/machinelearningmastery.com\/difference-test-validation-datasets\/\">validation dataset<\/a>. This is a portion of the training set not used to fit the model, and is instead used to evaluate the performance of the model during training.<\/p>\n<p>You can split the data manually and specify the <em>validation_data<\/em> argument, or you can use the <em>validation_split<\/em> argument and specify a percentage split of the training dataset and let the API perform the split for you. The latter is simpler for now.<\/p>\n<p>The fit function will return a <em>history<\/em> object that contains a trace of performance metrics recorded at the end of each training epoch. This includes the chosen loss function and each configured metric, such as accuracy, and each loss and metric is calculated for the training and validation datasets.<\/p>\n<p>A learning curve is a plot of the loss on the training dataset and the validation dataset. We can create this plot from the <em>history<\/em> object using the <a href=\"https:\/\/matplotlib.org\/\">Matplotlib<\/a> library.<\/p>\n<p>The example below fits a small neural network on a synthetic binary classification problem. A validation split of 30 percent is used to evaluate the model during training and the <a href=\"https:\/\/machinelearningmastery.com\/cross-entropy-for-machine-learning\/\">cross-entropy loss<\/a> on the train and validation datasets are then graphed using a line plot.<\/p>\n<pre class=\"crayon-plain-tag\"># example of plotting learning curves\r\nfrom sklearn.datasets import make_classification\r\nfrom tensorflow.keras import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\nfrom tensorflow.keras.optimizers import SGD\r\nfrom matplotlib import pyplot\r\n# create the dataset\r\nX, y = make_classification(n_samples=1000, n_classes=2, random_state=1)\r\n# determine the number of input features\r\nn_features = X.shape[1]\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))\r\nmodel.add(Dense(1, activation='sigmoid'))\r\n# compile the model\r\nsgd = SGD(learning_rate=0.001, momentum=0.8)\r\nmodel.compile(optimizer=sgd, loss='binary_crossentropy')\r\n# fit the model\r\nhistory = model.fit(X, y, epochs=100, batch_size=32, verbose=0, validation_split=0.3)\r\n# plot learning curves\r\npyplot.title('Learning Curves')\r\npyplot.xlabel('Epoch')\r\npyplot.ylabel('Cross Entropy')\r\npyplot.plot(history.history['loss'], label='train')\r\npyplot.plot(history.history['val_loss'], label='val')\r\npyplot.legend()\r\npyplot.show()<\/pre>\n<p>Running the example fits the model on the dataset. At the end of the run, the <em>history<\/em> object is returned and used as the basis for creating the line plot.<\/p>\n<p>The cross-entropy loss for the training dataset is accessed via the \u2018<em>loss<\/em>\u2018 key and the loss on the validation dataset is accessed via the \u2018<em>val_loss<\/em>\u2018 key on the history attribute of the history object.<\/p>\n<div id=\"attachment_9899\" style=\"width: 1290px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9899\" class=\"size-full wp-image-9899\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/Learning-Curves-of-Cross-Entropy-Loss-for-a-Deep-Learning-Model.png\" alt=\"Learning Curves of Cross-Entropy Loss for a Deep Learning Model\" width=\"1280\" height=\"960\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/Learning-Curves-of-Cross-Entropy-Loss-for-a-Deep-Learning-Model.png 1280w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/Learning-Curves-of-Cross-Entropy-Loss-for-a-Deep-Learning-Model-300x225.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/Learning-Curves-of-Cross-Entropy-Loss-for-a-Deep-Learning-Model-1024x768.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2019\/12\/Learning-Curves-of-Cross-Entropy-Loss-for-a-Deep-Learning-Model-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-9899\" class=\"wp-caption-text\">Learning Curves of Cross-Entropy Loss for a Deep Learning Model<\/p>\n<\/div>\n<h3>4.3 How to Save and Load Your Model<\/h3>\n<p>Training and evaluating models is great, but we may want to use a model later without retraining it each time.<\/p>\n<p>This can be achieved by saving the model to file and later loading it and using it to make predictions.<\/p>\n<p>This can be achieved using the <em>save()<\/em> function on the model to save the model. It can be loaded later using the <a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/models\/load_model\">load_model() function<\/a>.<\/p>\n<p>The model is saved in H5 format, an efficient array storage format. As such, you must ensure that the <a href=\"https:\/\/www.h5py.org\/\">h5py library<\/a> is installed on your workstation. This can be achieved using <em>pip<\/em>; for example:<\/p>\n<pre class=\"crayon-plain-tag\">pip install h5py<\/pre>\n<p>The example below fits a simple model on a synthetic binary classification problem and then saves the model file.<\/p>\n<pre class=\"crayon-plain-tag\"># example of saving a fit model\r\nfrom sklearn.datasets import make_classification\r\nfrom tensorflow.keras import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\nfrom tensorflow.keras.optimizers import SGD\r\n# create the dataset\r\nX, y = make_classification(n_samples=1000, n_features=4, n_classes=2, random_state=1)\r\n# determine the number of input features\r\nn_features = X.shape[1]\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))\r\nmodel.add(Dense(1, activation='sigmoid'))\r\n# compile the model\r\nsgd = SGD(learning_rate=0.001, momentum=0.8)\r\nmodel.compile(optimizer=sgd, loss='binary_crossentropy')\r\n# fit the model\r\nmodel.fit(X, y, epochs=100, batch_size=32, verbose=0, validation_split=0.3)\r\n# save model to file\r\nmodel.save('model.h5')<\/pre>\n<p>Running the example fits the model and saves it to file with the name \u2018<em>model.h5<\/em>\u2018.<\/p>\n<p>We can then load the model and use it to make a prediction, or continue training it, or do whatever we wish with it.<\/p>\n<p>The example below loads the model and uses it to make a prediction.<\/p>\n<pre class=\"crayon-plain-tag\"># example of loading a saved model\r\nfrom sklearn.datasets import make_classification\r\nfrom tensorflow.keras.models import load_model\r\n# create the dataset\r\nX, y = make_classification(n_samples=1000, n_features=4, n_classes=2, random_state=1)\r\n# load the model from file\r\nmodel = load_model('model.h5')\r\n# make a prediction\r\nrow = [1.91518414, 1.14995454, -1.52847073, 0.79430654]\r\nyhat = model.predict([row])\r\nprint('Predicted: %.3f' % yhat[0])<\/pre>\n<p>Running the example loads the image from file, then uses it to make a prediction on a new row of data and prints the result.<\/p>\n<pre class=\"crayon-plain-tag\">Predicted: 0.831<\/pre>\n<\/p>\n<h2>5. How to Get Better Model Performance<\/h2>\n<p>In this section, you will discover some of the techniques that you can use to improve the performance of your deep learning models.<\/p>\n<p>A big part of improving deep learning performance involves avoiding overfitting by slowing down the learning process or stopping the learning process at the right time.<\/p>\n<h3>5.1 How to Reduce Overfitting With Dropout<\/h3>\n<p>Dropout is a clever regularization method that reduces overfitting of the training dataset and makes the model more robust.<\/p>\n<p>This is achieved during training, where some number of layer outputs are randomly ignored or \u201c<em>dropped out<\/em>.\u201d This has the effect of making the layer look like \u2013 and be treated like \u2013 a layer with a different number of nodes and connectivity to the prior layer.<\/p>\n<p>Dropout has the effect of making the training process noisy, forcing nodes within a layer to probabilistically take on more or less responsibility for the inputs.<\/p>\n<p>For more on how dropout works, see this tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/dropout-for-regularizing-deep-neural-networks\/\">A Gentle Introduction to Dropout for Regularizing Deep Neural Networks<\/a><\/li>\n<\/ul>\n<p>You can add dropout to your models as a new layer prior to the layer that you want to have input connections dropped-out.<\/p>\n<p>This involves adding a layer called <a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/layers\/Dropout\">Dropout()<\/a> that takes an argument that specifies the probability that each output from the previous to drop. E.g. 0.4 means 40 percent of inputs will be dropped each update to the model.<\/p>\n<p>You can add Dropout layers in MLP, CNN, and RNN models, although there are also specialized versions of dropout for use with CNN and RNN models that you might also want to explore.<\/p>\n<p>The example below fits a small neural network model on a synthetic binary classification problem.<\/p>\n<p>A dropout layer with 50 percent dropout is inserted between the first hidden layer and the output layer.<\/p>\n<pre class=\"crayon-plain-tag\"># example of using dropout\r\nfrom sklearn.datasets import make_classification\r\nfrom tensorflow.keras import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\nfrom tensorflow.keras.layers import Dropout\r\nfrom matplotlib import pyplot\r\n# create the dataset\r\nX, y = make_classification(n_samples=1000, n_classes=2, random_state=1)\r\n# determine the number of input features\r\nn_features = X.shape[1]\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))\r\nmodel.add(Dropout(0.5))\r\nmodel.add(Dense(1, activation='sigmoid'))\r\n# compile the model\r\nmodel.compile(optimizer='adam', loss='binary_crossentropy')\r\n# fit the model\r\nmodel.fit(X, y, epochs=100, batch_size=32, verbose=0)<\/pre>\n<\/p>\n<h3>5.2 How to Accelerate Training With Batch Normalization<\/h3>\n<p>The scale and distribution of inputs to a layer can greatly impact how easy or quickly that layer can be trained.<\/p>\n<p>This is generally why it is a good idea to scale input data prior to modeling it with a neural network model.<\/p>\n<p>Batch normalization is a technique for training very deep neural networks that standardizes the inputs to a layer for each mini-batch. This has the effect of stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks.<\/p>\n<p>For more on how batch normalization works, see this tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/batch-normalization-for-training-of-deep-neural-networks\/\">A Gentle Introduction to Batch Normalization for Deep Neural Networks<\/a><\/li>\n<\/ul>\n<p>You can use batch normalization in your network by adding a batch normalization layer prior to the layer that you wish to have standardized inputs. You can use batch normalization with MLP, CNN, and RNN models.<\/p>\n<p>This can be achieved by adding the <a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/layers\/BatchNormalization\">BatchNormalization layer directly<\/a>.<\/p>\n<p>The example below defines a small MLP network for a binary classification prediction problem with a batch normalization layer between the first hidden layer and the output layer.<\/p>\n<pre class=\"crayon-plain-tag\"># example of using batch normalization\r\nfrom sklearn.datasets import make_classification\r\nfrom tensorflow.keras import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\nfrom tensorflow.keras.layers import BatchNormalization\r\nfrom matplotlib import pyplot\r\n# create the dataset\r\nX, y = make_classification(n_samples=1000, n_classes=2, random_state=1)\r\n# determine the number of input features\r\nn_features = X.shape[1]\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))\r\nmodel.add(BatchNormalization())\r\nmodel.add(Dense(1, activation='sigmoid'))\r\n# compile the model\r\nmodel.compile(optimizer='adam', loss='binary_crossentropy')\r\n# fit the model\r\nmodel.fit(X, y, epochs=100, batch_size=32, verbose=0)<\/pre>\n<p>Also, tf.keras has a range of other normalization layers you might like to explore; see:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.tensorflow.org\/addons\/tutorials\/layers_normalizations\">tf.keras Normalization Layers Guide<\/a>.<\/li>\n<\/ul>\n<h3>5.3 How to Halt Training at the Right Time With Early Stopping<\/h3>\n<p>Neural networks are challenging to train.<\/p>\n<p>Too little training and the model is underfit; too much training and the model overfits the training dataset. Both cases result in a model that is less effective than it could be.<\/p>\n<p>One approach to solving this problem is to use early stopping. This involves monitoring the loss on the training dataset and a validation dataset (a subset of the training set not used to fit the model). As soon as loss for the validation set starts to show signs of overfitting, the training process can be stopped.<\/p>\n<p>For more on early stopping, see the tutorial:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/early-stopping-to-avoid-overtraining-neural-network-models\/\">A Gentle Introduction to Early Stopping to Avoid Overtraining Neural Networks<\/a><\/li>\n<\/ul>\n<p>Early stopping can be used with your model by first ensuring that you have a <a href=\"https:\/\/machinelearningmastery.com\/difference-test-validation-datasets\/\">validation dataset<\/a>. You can define the validation dataset manually via the <em>validation_data<\/em> argument to the <em>fit()<\/em> function, or you can use the <em>validation_split<\/em> and specify the amount of the training dataset to hold back for validation.<\/p>\n<p>You can then define an EarlyStopping and instruct it on which performance measure to monitor, such as \u2018<em>val_loss<\/em>\u2018 for loss on the validation dataset, and the number of epochs to observed overfitting before taking action, e.g. 5.<\/p>\n<p>This configured <a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/callbacks\/EarlyStopping\">EarlyStopping<\/a> callback can then be provided to the <em>fit()<\/em> function via the \u201c<em>callbacks<\/em>\u201d argument that takes a list of callbacks.<\/p>\n<p>This allows you to set the number of epochs to a large number and be confident that training will end as soon as the model starts overfitting. You might also like to create a learning curve to discover more insights into the learning dynamics of the run and when training was halted.<\/p>\n<p>The example below demonstrates a small neural network on a synthetic binary classification problem that uses early stopping to halt training as soon as the model starts overfitting (after about 50 epochs).<\/p>\n<pre class=\"crayon-plain-tag\"># example of using early stopping\r\nfrom sklearn.datasets import make_classification\r\nfrom tensorflow.keras import Sequential\r\nfrom tensorflow.keras.layers import Dense\r\nfrom keras.callbacks import EarlyStopping\r\n# create the dataset\r\nX, y = make_classification(n_samples=1000, n_classes=2, random_state=1)\r\n# determine the number of input features\r\nn_features = X.shape[1]\r\n# define model\r\nmodel = Sequential()\r\nmodel.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))\r\nmodel.add(Dense(1, activation='sigmoid'))\r\n# compile the model\r\nmodel.compile(optimizer='adam', loss='binary_crossentropy')\r\n# configure early stopping\r\nes = EarlyStopping(monitor='val_loss', patience=5)\r\n# fit the model\r\nhistory = model.fit(X, y, epochs=200, batch_size=32, verbose=0, validation_split=0.3, callbacks=[es])<\/pre>\n<p>The tf.keras API provides a number of callbacks that you might like to explore; you can learn more here:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/callbacks\/\">tf.keras Callbacks<\/a><\/li>\n<\/ul>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Tutorials<\/h3>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/how-to-control-the-speed-and-stability-of-training-neural-networks-with-gradient-descent-batch-size\/\">How to Control the Stability of Training Neural Networks With the Batch Size<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/rectified-linear-activation-function-for-deep-learning-neural-networks\/\">A Gentle Introduction to the Rectified Linear Unit (ReLU)<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/classification-versus-regression-in-machine-learning\/\">Difference Between Classification and Regression in Machine Learning<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/how-to-manually-scale-image-pixel-data-for-deep-learning\/\">How to Manually Scale Image Pixel Data for Deep Learning<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/machine-learning-data-transforms-for-time-series-forecasting\/\">4 Common Machine Learning Data Transforms for Time Series Forecasting<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/learning-curves-for-diagnosing-machine-learning-model-performance\/\">How to use Learning Curves to Diagnose Machine Learning Model Performance<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/dropout-for-regularizing-deep-neural-networks\/\">A Gentle Introduction to Dropout for Regularizing Deep Neural Networks<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/batch-normalization-for-training-of-deep-neural-networks\/\">A Gentle Introduction to Batch Normalization for Deep Neural Networks<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/early-stopping-to-avoid-overtraining-neural-network-models\/\">A Gentle Introduction to Early Stopping to Avoid Overtraining Neural Networks<\/a><\/li>\n<\/ul>\n<h3>Books<\/h3>\n<ul>\n<li><a href=\"https:\/\/amzn.to\/2Y8JuBv\">Deep Learning<\/a>, 2016.<\/li>\n<\/ul>\n<h3>Guides<\/h3>\n<ul>\n<li><a href=\"https:\/\/www.tensorflow.org\/install\">Install TensorFlow 2 Guide<\/a>.<\/li>\n<li><a href=\"https:\/\/www.tensorflow.org\/guide\/keras\">TensorFlow Core: Keras<\/a><\/li>\n<li><a href=\"https:\/\/www.tensorflow.org\/guide\/keras\/overview\">Tensorflow Core: Keras Overview Guide<\/a><\/li>\n<li><a href=\"https:\/\/www.tensorflow.org\/guide\/keras\/functional\">The Keras functional API in TensorFlow<\/a><\/li>\n<li><a href=\"https:\/\/www.tensorflow.org\/tutorials\/keras\/save_and_load\">Save and load models<\/a><\/li>\n<li><a href=\"https:\/\/www.tensorflow.org\/addons\/tutorials\/layers_normalizations\">Normalization Layers Guide<\/a>.<\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li><a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\">tf.keras Module API<\/a>.<\/li>\n<li><a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/optimizers\">tf.keras Optimizers<\/a><\/li>\n<li><a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/losses\">tf.keras Loss Functions<\/a><\/li>\n<li><a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/metrics\">tf.keras Metrics<\/a><\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered a step-by-step guide to developing deep learning models in TensorFlow using the tf.keras API.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>The difference between Keras and tf.keras and how to install and confirm TensorFlow is working.<\/li>\n<li>The 5-step life-cycle of tf.keras models and how to use the sequential and functional APIs.<\/li>\n<li>How to develop MLP, CNN, and RNN models with tf.keras for regression, classification, and time series forecasting.<\/li>\n<li>How to use the advanced features of the tf.keras API to inspect and diagnose your model.<\/li>\n<li>How to improve the performance of your tf.keras model by reducing overfitting and accelerating training.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/tensorflow-tutorial-deep-learning-with-tf-keras\/\">TensorFlow 2 Tutorial: Get Started in Deep Learning With tf.keras<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/tensorflow-tutorial-deep-learning-with-tf-keras\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Predictive modeling with deep learning is a skill that modern developers need to know. TensorFlow is the premier open-source deep learning framework [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/12\/18\/tensorflow-2-tutorial-get-started-in-deep-learning-with-tf-keras\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2944,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2943"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2943"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2943\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2944"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2943"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2943"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2943"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}