{"id":4364,"date":"2021-02-04T18:00:41","date_gmt":"2021-02-04T18:00:41","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2021\/02\/04\/gradient-descent-with-momentum-from-scratch\/"},"modified":"2021-02-04T18:00:41","modified_gmt":"2021-02-04T18:00:41","slug":"gradient-descent-with-momentum-from-scratch","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2021\/02\/04\/gradient-descent-with-momentum-from-scratch\/","title":{"rendered":"Gradient Descent With Momentum from Scratch"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p><strong>Gradient descent<\/strong> is an optimization algorithm that follows the negative gradient of an objective function in order to locate the minimum of the function.<\/p>\n<p>A problem with gradient descent is that it can bounce around the search space on optimization problems that have large amounts of curvature or noisy gradients, and it can get stuck in flat spots in the search space that have no gradient.<\/p>\n<p><strong>Momentum<\/strong> is an extension to the gradient descent optimization algorithm that allows the search to build inertia in a direction in the search space and overcome the oscillations of noisy gradients and coast across flat spots of the search space.<\/p>\n<p>In this tutorial, you will discover the gradient descent with momentum algorithm.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Gradient descent is an optimization algorithm that uses the gradient of the objective function to navigate the search space.<\/li>\n<li>Gradient descent can be accelerated by using momentum from past updates to the search position.<\/li>\n<li>How to implement gradient descent optimization with momentum and develop an intuition for its behavior.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_12093\" style=\"width: 809px\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-12093\" loading=\"lazy\" class=\"size-full wp-image-12093\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/04\/Gradient-Descent-With-Momentum-from-Scratch.jpg\" alt=\"Gradient Descent With Momentum from Scratch\" width=\"799\" height=\"533\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2021\/04\/Gradient-Descent-With-Momentum-from-Scratch.jpg 799w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2021\/04\/Gradient-Descent-With-Momentum-from-Scratch-300x200.jpg 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2021\/04\/Gradient-Descent-With-Momentum-from-Scratch-768x512.jpg 768w\" sizes=\"(max-width: 799px) 100vw, 799px\"><\/p>\n<p id=\"caption-attachment-12093\" class=\"wp-caption-text\">Gradient Descent With Momentum from Scratch<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/matlock-photo\/5314931117\/\">Chris Barnes<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Gradient Descent<\/li>\n<li>Momentum<\/li>\n<li>Gradient Descent With Momentum\n<ol>\n<li>One-Dimensional Test Problem<\/li>\n<li>Gradient Descent Optimization<\/li>\n<li>Visualization of Gradient Descent Optimization<\/li>\n<li>Gradient Descent Optimization With Momentum<\/li>\n<li>Visualization of Gradient Descent Optimization With Momentum<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<h2>Gradient Descent<\/h2>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Gradient_descent\">Gradient descent<\/a> is an optimization algorithm.<\/p>\n<p>It is technically referred to as a first-order optimization algorithm as it explicitly makes use of the first-order derivative of the target objective function.<\/p>\n<blockquote>\n<p>First-order methods rely on gradient information to help direct the search for a minimum \u2026<\/p>\n<\/blockquote>\n<p>\u2014 Page 69, <a href=\"https:\/\/amzn.to\/39KZSQn\">Algorithms for Optimization<\/a>, 2019.<\/p>\n<p>The <a href=\"https:\/\/en.wikipedia.org\/wiki\/Derivative\">first-order derivative<\/a>, or simply the \u201c<em>derivative<\/em>,\u201d is the rate of change or slope of the target function at a specific point, e.g. for a specific input.<\/p>\n<p>If the target function takes multiple input variables, it is referred to as a multivariate function and the input variables can be thought of as a vector. In turn, the derivative of a multivariate target function may also be taken as a vector and is referred to generally as the \u201c<a href=\"https:\/\/en.wikipedia.org\/wiki\/Gradient\">gradient<\/a>.\u201d<\/p>\n<ul>\n<li>\n<strong>Gradient<\/strong>: First-order derivative for a multivariate objective function.<\/li>\n<\/ul>\n<p>The derivative or the gradient points in the direction of the steepest ascent of the target function for a specific input.<\/p>\n<p>Gradient descent refers to a minimization optimization algorithm that follows the negative of the gradient downhill of the target function to locate the minimum of the function.<\/p>\n<p>The gradient descent algorithm requires a target function that is being optimized and the derivative function for the objective function. The target function <em>f()<\/em> returns a score for a given set of inputs, and the derivative function <em>f'()<\/em> gives the derivative of the target function for a given set of inputs.<\/p>\n<p>The gradient descent algorithm requires a starting point (<em>x<\/em>) in the problem, such as a randomly selected point in the input space.<\/p>\n<p>The derivative is then calculated and a step is taken in the input space that is expected to result in a downhill movement in the target function, assuming we are minimizing the target function.<\/p>\n<p>A downhill movement is made by first calculating how far to move in the input space, calculated as the step size (called <em>alpha<\/em> or the <em>learning rate<\/em>) multiplied by the gradient. This is then subtracted from the current point, ensuring we move against the gradient, or down the target function.<\/p>\n<ul>\n<li>x = x \u2013 step_size * f'(x)<\/li>\n<\/ul>\n<p>The steeper the objective function at a given point, the larger the magnitude of the gradient and, in turn, the larger the step taken in the search space. The size of the step taken is scaled using a step size hyperparameter.<\/p>\n<ul>\n<li>\n<strong>Step Size<\/strong> (<em>alpha<\/em>): Hyperparameter that controls how far to move in the search space against the gradient each iteration of the algorithm, also called the learning rate.<\/li>\n<\/ul>\n<p>If the step size is too small, the movement in the search space will be small and the search will take a long time. If the step size is too large, the search may bounce around the search space and skip over the optima.<\/p>\n<p>Now that we are familiar with the gradient descent optimization algorithm, let\u2019s take a look at momentum.<\/p>\n<h2>Momentum<\/h2>\n<p>Momentum is an extension to the gradient descent optimization algorithm, often referred to as <strong>gradient descent with momentum<\/strong>.<\/p>\n<p>It is designed to accelerate the optimization process, e.g. decrease the number of function evaluations required to reach the optima, or to improve the capability of the optimization algorithm, e.g. result in a better final result.<\/p>\n<p>A problem with the gradient descent algorithm is that the progression of the search can bounce around the search space based on the gradient. For example, the search may progress downhill towards the minima, but during this progression, it may move in another direction, even uphill, depending on the gradient of specific points (sets of parameters) encountered during the search.<\/p>\n<p>This can slow down the progress of the search, especially for those optimization problems where the broader trend or shape of the search space is more useful than specific gradients along the way.<\/p>\n<p>One approach to this problem is to add history to the parameter update equation based on the gradient encountered in the previous updates.<\/p>\n<p>This change is based on the metaphor of momentum from physics where acceleration in a direction can be accumulated from past updates.<\/p>\n<blockquote>\n<p>The name momentum derives from a physical analogy, in which the negative gradient is a force moving a particle through parameter space, according to Newton\u2019s laws of motion.<\/p>\n<\/blockquote>\n<p>\u2014 Page 296, <a href=\"https:\/\/amzn.to\/3qSk3C2\">Deep Learning<\/a>, 2016.<\/p>\n<p>Momentum involves adding an additional hyperparameter that controls the amount of history (momentum) to include in the update equation, i.e. the step to a new point in the search space. The value for the hyperparameter is defined in the range 0.0 to 1.0 and often has a value close to 1.0, such as 0.8, 0.9, or 0.99. A momentum of 0.0 is the same as gradient descent without momentum.<\/p>\n<p>First, let\u2019s break the gradient descent update equation down into two parts: the calculation of the change to the position and the update of the old position to the new position.<\/p>\n<p>The change in the parameters is calculated as the gradient for the point scaled by the step size.<\/p>\n<ul>\n<li>change_x = step_size * f'(x)<\/li>\n<\/ul>\n<p>The new position is calculated by simply subtracting the change from the current point<\/p>\n<ul>\n<li>x = x \u2013 change_x<\/li>\n<\/ul>\n<p>Momentum involves maintaining the change in the position and using it in the subsequent calculation of the change in position.<\/p>\n<p>If we think of updates over time, then the update at the current iteration or time (t) will add the change used at the previous time (t-1) weighted by the momentum hyperparameter, as follows:<\/p>\n<ul>\n<li>change_x(t) = step_size * f'(x(t-1)) + momentum * change_x(t-1)<\/li>\n<\/ul>\n<p>The update to the position is then performed as before.<\/p>\n<ul>\n<li>x(t) = x(t-1) \u2013 change_x(t)<\/li>\n<\/ul>\n<p>The change in the position accumulates magnitude and direction of changes over the iterations of the search, proportional to the size of the momentum hyperparameter.<\/p>\n<p>For example, a large momentum (e.g. 0.9) will mean that the update is strongly influenced by the previous update, whereas a modest momentum (0.2) will mean very little influence.<\/p>\n<blockquote>\n<p>The momentum algorithm accumulates an exponentially decaying moving average of past gradients and continues to move in their direction.<\/p>\n<\/blockquote>\n<p>\u2014 Page 296, <a href=\"https:\/\/amzn.to\/3qSk3C2\">Deep Learning<\/a>, 2016.<\/p>\n<p>Momentum has the effect of dampening down the change in the gradient and, in turn, the step size with each new point in the search space.<\/p>\n<blockquote>\n<p>Momentum can increase speed when the cost surface is highly nonspherical because it damps the size of the steps along directions of high curvature thus yielding a larger effective learning rate along the directions of low curvature.<\/p>\n<\/blockquote>\n<p>\u2014 Page 21, <a href=\"https:\/\/amzn.to\/3ac5S4Q\">Neural Networks: Tricks of the Trade<\/a>, 2012.<\/p>\n<p>Momentum is most useful in optimization problems where the objective function has a large amount of curvature (e.g. changes a lot), meaning that the gradient may change a lot over relatively small regions of the search space.<\/p>\n<blockquote>\n<p>The method of momentum is designed to accelerate learning, especially in the face of high curvature, small but consistent gradients, or noisy gradients.<\/p>\n<\/blockquote>\n<p>\u2014 Page 296, <a href=\"https:\/\/amzn.to\/3qSk3C2\">Deep Learning<\/a>, 2016.<\/p>\n<p>It is also helpful when the gradient is estimated, such as from a simulation, and may be noisy, e.g. when the gradient has a high variance.<\/p>\n<p>Finally, momentum is helpful when the search space is flat or nearly flat, e.g. zero gradient. The momentum allows the search to progress in the same direction as before the flat spot and helpfully cross the flat region.<\/p>\n<p>Now that we are familiar with what momentum is, let\u2019s look at a worked example.<\/p>\n<h2>Gradient Descent with Momentum<\/h2>\n<p>In this section, we will first implement the gradient descent optimization algorithm, then update it to use momentum and compare results.<\/p>\n<h3>One-Dimensional Test Problem<\/h3>\n<p>First, let\u2019s define an optimization function.<\/p>\n<p>We will use a simple one-dimensional function that squares the input and defines the range of valid inputs from -1.0 to 1.0.<\/p>\n<p>The <em>objective()<\/em> function below implements this function.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># objective function\r\ndef objective(x):\r\n\treturn x**2.0<\/pre>\n<p>We can then sample all inputs in the range and calculate the objective function value for each.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define range for input\r\nr_min, r_max = -1.0, 1.0\r\n# sample input range uniformly at 0.1 increments\r\ninputs = arange(r_min, r_max+0.1, 0.1)\r\n# compute targets\r\nresults = objective(inputs)<\/pre>\n<p>Finally, we can create a line plot of the inputs (x-axis) versus the objective function values (y-axis) to get an intuition for the shape of the objective function that we will be searching.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# create a line plot of input vs result\r\npyplot.plot(inputs, results)\r\n# show the plot\r\npyplot.show()<\/pre>\n<p>The example below ties this together and provides an example of plotting the one-dimensional test function.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># plot of simple function\r\nfrom numpy import arange\r\nfrom matplotlib import pyplot\r\n\r\n# objective function\r\ndef objective(x):\r\n\treturn x**2.0\r\n\r\n# define range for input\r\nr_min, r_max = -1.0, 1.0\r\n# sample input range uniformly at 0.1 increments\r\ninputs = arange(r_min, r_max+0.1, 0.1)\r\n# compute targets\r\nresults = objective(inputs)\r\n# create a line plot of input vs result\r\npyplot.plot(inputs, results)\r\n# show the plot\r\npyplot.show()<\/pre>\n<p>Running the example creates a line plot of the inputs to the function (x-axis) and the calculated output of the function (y-axis).<\/p>\n<p>We can see the familiar U-shape called a parabola.<\/p>\n<div id=\"attachment_12090\" style=\"width: 1290px\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-12090\" loading=\"lazy\" class=\"size-full wp-image-12090\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Line-Plot-of-Simple-One-Dimensional-Function-2.png\" alt=\"Line Plot of Simple One Dimensional Function\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/12\/Line-Plot-of-Simple-One-Dimensional-Function-2.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/12\/Line-Plot-of-Simple-One-Dimensional-Function-2-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/12\/Line-Plot-of-Simple-One-Dimensional-Function-2-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/12\/Line-Plot-of-Simple-One-Dimensional-Function-2-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-12090\" class=\"wp-caption-text\">Line Plot of Simple One Dimensional Function<\/p>\n<\/div>\n<h2>Gradient Descent Optimization<\/h2>\n<p>Next, we can apply the gradient descent algorithm to the problem.<\/p>\n<p>First, we need a function that calculates the derivative for the objective function.<\/p>\n<p>The derivative of x^2 is x * 2 and the <em>derivative()<\/em> function implements this below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># derivative of objective function\r\ndef derivative(x):\r\n\treturn x * 2.0<\/pre>\n<p>We can define a function that implements the gradient descent optimization algorithm.<\/p>\n<p>The procedure involves starting with a randomly selected point in the search space, then calculating the gradient, updating the position in the search space, evaluating the new position, and reporting the progress. This process is then repeated for a fixed number of iterations. The final point and its evaluation are then returned from the function.<\/p>\n<p>The function <em>gradient_descent()<\/em> below implements this and takes the name of the objective and gradient functions as well as the bounds on the inputs to the objective function, number of iterations, and step size, then returns the solution and its evaluation at the end of the search.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># gradient descent algorithm\r\ndef gradient_descent(objective, derivative, bounds, n_iter, step_size):\r\n\t# generate an initial point\r\n\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\r\n\t# run the gradient descent\r\n\tfor i in range(n_iter):\r\n\t\t# calculate gradient\r\n\t\tgradient = derivative(solution)\r\n\t\t# take a step\r\n\t\tsolution = solution - step_size * gradient\r\n\t\t# evaluate candidate point\r\n\t\tsolution_eval = objective(solution)\r\n\t\t# report progress\r\n\t\tprint('&gt;%d f(%s) = %.5f' % (i, solution, solution_eval))\r\n\treturn [solution, solution_eval]<\/pre>\n<p>We can then define the bounds of the objective function, the step size, and the number of iterations for the algorithm.<\/p>\n<p>We will use a step size of 0.1 and 30 iterations, both found after a little experimentation.<\/p>\n<p>The seed for the pseudorandom number generator is fixed so that we always get the same sequence of random numbers, and in this case, it ensures that we get the same starting point for the search each time the code is run (e.g. something interesting far from the optima).<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# seed the pseudo random number generator\r\nseed(4)\r\n# define range for input\r\nbounds = asarray([[-1.0, 1.0]])\r\n# define the total iterations\r\nn_iter = 30\r\n# define the maximum step size\r\nstep_size = 0.1\r\n# perform the gradient descent search\r\nbest, score = gradient_descent(objective, derivative, bounds, n_iter, step_size)<\/pre>\n<p>Tying this together, the complete example of applying grid search to our one-dimensional test function is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># example of gradient descent for a one-dimensional function\r\nfrom numpy import asarray\r\nfrom numpy.random import rand\r\nfrom numpy.random import seed\r\n\r\n# objective function\r\ndef objective(x):\r\n\treturn x**2.0\r\n\r\n# derivative of objective function\r\ndef derivative(x):\r\n\treturn x * 2.0\r\n\r\n# gradient descent algorithm\r\ndef gradient_descent(objective, derivative, bounds, n_iter, step_size):\r\n\t# generate an initial point\r\n\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\r\n\t# run the gradient descent\r\n\tfor i in range(n_iter):\r\n\t\t# calculate gradient\r\n\t\tgradient = derivative(solution)\r\n\t\t# take a step\r\n\t\tsolution = solution - step_size * gradient\r\n\t\t# evaluate candidate point\r\n\t\tsolution_eval = objective(solution)\r\n\t\t# report progress\r\n\t\tprint('&gt;%d f(%s) = %.5f' % (i, solution, solution_eval))\r\n\treturn [solution, solution_eval]\r\n\r\n# seed the pseudo random number generator\r\nseed(4)\r\n# define range for input\r\nbounds = asarray([[-1.0, 1.0]])\r\n# define the total iterations\r\nn_iter = 30\r\n# define the step size\r\nstep_size = 0.1\r\n# perform the gradient descent search\r\nbest, score = gradient_descent(objective, derivative, bounds, n_iter, step_size)\r\nprint('Done!')\r\nprint('f(%s) = %f' % (best, score))<\/pre>\n<p>Running the example starts with a random point in the search space, then applies the gradient descent algorithm, reporting performance along the way.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the algorithm finds a good solution after about 27 iterations, with a function evaluation of about 0.0.<\/p>\n<p>Note the optima for this function is at f(0.0) = 0.0.<\/p>\n<p>We would expect that gradient descent with momentum will accelerate the optimization procedure and find a similarly evaluated solution in fewer iterations.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">&gt;0 f([0.74724774]) = 0.55838\r\n&gt;1 f([0.59779819]) = 0.35736\r\n&gt;2 f([0.47823856]) = 0.22871\r\n&gt;3 f([0.38259084]) = 0.14638\r\n&gt;4 f([0.30607268]) = 0.09368\r\n&gt;5 f([0.24485814]) = 0.05996\r\n&gt;6 f([0.19588651]) = 0.03837\r\n&gt;7 f([0.15670921]) = 0.02456\r\n&gt;8 f([0.12536737]) = 0.01572\r\n&gt;9 f([0.10029389]) = 0.01006\r\n&gt;10 f([0.08023512]) = 0.00644\r\n&gt;11 f([0.06418809]) = 0.00412\r\n&gt;12 f([0.05135047]) = 0.00264\r\n&gt;13 f([0.04108038]) = 0.00169\r\n&gt;14 f([0.0328643]) = 0.00108\r\n&gt;15 f([0.02629144]) = 0.00069\r\n&gt;16 f([0.02103315]) = 0.00044\r\n&gt;17 f([0.01682652]) = 0.00028\r\n&gt;18 f([0.01346122]) = 0.00018\r\n&gt;19 f([0.01076897]) = 0.00012\r\n&gt;20 f([0.00861518]) = 0.00007\r\n&gt;21 f([0.00689214]) = 0.00005\r\n&gt;22 f([0.00551372]) = 0.00003\r\n&gt;23 f([0.00441097]) = 0.00002\r\n&gt;24 f([0.00352878]) = 0.00001\r\n&gt;25 f([0.00282302]) = 0.00001\r\n&gt;26 f([0.00225842]) = 0.00001\r\n&gt;27 f([0.00180673]) = 0.00000\r\n&gt;28 f([0.00144539]) = 0.00000\r\n&gt;29 f([0.00115631]) = 0.00000\r\nDone!\r\nf([0.00115631]) = 0.000001<\/pre>\n<\/p>\n<h2>Visualization of Gradient Descent Optimization<\/h2>\n<p>Next, we can visualize the progress of the search on a plot of the target function.<\/p>\n<p>First, we can update the <em>gradient_descent()<\/em> function to store all solutions and their score found during the optimization as lists and return them at the end of the search instead of the best solution found.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># gradient descent algorithm\r\ndef gradient_descent(objective, derivative, bounds, n_iter, step_size):\r\n\t# track all solutions\r\n\tsolutions, scores = list(), list()\r\n\t# generate an initial point\r\n\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\r\n\t# run the gradient descent\r\n\tfor i in range(n_iter):\r\n\t\t# calculate gradient\r\n\t\tgradient = derivative(solution)\r\n\t\t# take a step\r\n\t\tsolution = solution - step_size * gradient\r\n\t\t# evaluate candidate point\r\n\t\tsolution_eval = objective(solution)\r\n\t\t# store solution\r\n\t\tsolutions.append(solution)\r\n\t\tscores.append(solution_eval)\r\n\t\t# report progress\r\n\t\tprint('&gt;%d f(%s) = %.5f' % (i, solution, solution_eval))\r\n\treturn [solutions, scores]<\/pre>\n<p>The function can be called and we can get the lists of the solutions and the scores found during the search.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# perform the gradient descent search\r\nsolutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)<\/pre>\n<p>We can create a line plot of the objective function, as before.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# sample input range uniformly at 0.1 increments\r\ninputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\r\n# compute targets\r\nresults = objective(inputs)\r\n# create a line plot of input vs result\r\npyplot.plot(inputs, results)<\/pre>\n<p>Finally, we can plot each solution found as a red dot and connect the dots with a line so we can see how the search moved downhill.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# plot the solutions found\r\npyplot.plot(solutions, scores, '.-', color='red')<\/pre>\n<p>Tying this all together, the complete example of plotting the result of the gradient descent search on the one-dimensional test function is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># example of plotting a gradient descent search on a one-dimensional function\r\nfrom numpy import asarray\r\nfrom numpy import arange\r\nfrom numpy.random import rand\r\nfrom numpy.random import seed\r\nfrom matplotlib import pyplot\r\n\r\n# objective function\r\ndef objective(x):\r\n\treturn x**2.0\r\n\r\n# derivative of objective function\r\ndef derivative(x):\r\n\treturn x * 2.0\r\n\r\n# gradient descent algorithm\r\ndef gradient_descent(objective, derivative, bounds, n_iter, step_size):\r\n\t# track all solutions\r\n\tsolutions, scores = list(), list()\r\n\t# generate an initial point\r\n\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\r\n\t# run the gradient descent\r\n\tfor i in range(n_iter):\r\n\t\t# calculate gradient\r\n\t\tgradient = derivative(solution)\r\n\t\t# take a step\r\n\t\tsolution = solution - step_size * gradient\r\n\t\t# evaluate candidate point\r\n\t\tsolution_eval = objective(solution)\r\n\t\t# store solution\r\n\t\tsolutions.append(solution)\r\n\t\tscores.append(solution_eval)\r\n\t\t# report progress\r\n\t\tprint('&gt;%d f(%s) = %.5f' % (i, solution, solution_eval))\r\n\treturn [solutions, scores]\r\n\r\n# seed the pseudo random number generator\r\nseed(4)\r\n# define range for input\r\nbounds = asarray([[-1.0, 1.0]])\r\n# define the total iterations\r\nn_iter = 30\r\n# define the step size\r\nstep_size = 0.1\r\n# perform the gradient descent search\r\nsolutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)\r\n# sample input range uniformly at 0.1 increments\r\ninputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\r\n# compute targets\r\nresults = objective(inputs)\r\n# create a line plot of input vs result\r\npyplot.plot(inputs, results)\r\n# plot the solutions found\r\npyplot.plot(solutions, scores, '.-', color='red')\r\n# show the plot\r\npyplot.show()<\/pre>\n<p>Running the example performs the gradient descent search on the objective function as before, except in this case, each point found during the search is plotted.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the search started more than halfway up the right part of the function and stepped downhill to the bottom of the basin.<\/p>\n<p>We can see that in the parts of the objective function with the larger curve, the derivative (gradient) is larger, and in turn, larger steps are taken. Similarly, the gradient is smaller as we get closer to the optima, and in turn, smaller steps are taken.<\/p>\n<p>This highlights that the step size is used as a scale factor on the magnitude of the gradient (curvature) of the objective function.<\/p>\n<div id=\"attachment_12091\" style=\"width: 1290px\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-12091\" loading=\"lazy\" class=\"size-full wp-image-12091\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Plot-of-the-Progress-of-Gradient-Descent-on-a-One-Dimensional-Objective-Function-1.png\" alt=\"Plot of the Progress of Gradient Descent on a One Dimensional Objective Function\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/12\/Plot-of-the-Progress-of-Gradient-Descent-on-a-One-Dimensional-Objective-Function-1.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/12\/Plot-of-the-Progress-of-Gradient-Descent-on-a-One-Dimensional-Objective-Function-1-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/12\/Plot-of-the-Progress-of-Gradient-Descent-on-a-One-Dimensional-Objective-Function-1-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/12\/Plot-of-the-Progress-of-Gradient-Descent-on-a-One-Dimensional-Objective-Function-1-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-12091\" class=\"wp-caption-text\">Plot of the Progress of Gradient Descent on a One Dimensional Objective Function<\/p>\n<\/div>\n<h2>Gradient Descent Optimization With Momentum<\/h2>\n<p>Next, we can update the gradient descent optimization algorithm to use momentum.<\/p>\n<p>This can be achieved by updating the <em>gradient_descent()<\/em> function to take a \u201c<em>momentum<\/em>\u201d argument that defines the amount of momentum used during the search.<\/p>\n<p>The change made to the solution must be remembered from the previous iteration of the loop, with an initial value of 0.0.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# keep track of the change\r\nchange = 0.0<\/pre>\n<p>We can then break the update procedure down into first calculating the gradient, then calculating the change to the solution, calculating the position of the new solution, then saving the change for the next iteration.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# calculate gradient\r\ngradient = derivative(solution)\r\n# calculate update\r\nnew_change = step_size * gradient + momentum * change\r\n# take a step\r\nsolution = solution - new_change\r\n# save the change\r\nchange = new_change<\/pre>\n<p>The updated version of the <em>gradient_descent()<\/em> function with these changes is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># gradient descent algorithm\r\ndef gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):\r\n\t# generate an initial point\r\n\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\r\n\t# keep track of the change\r\n\tchange = 0.0\r\n\t# run the gradient descent\r\n\tfor i in range(n_iter):\r\n\t\t# calculate gradient\r\n\t\tgradient = derivative(solution)\r\n\t\t# calculate update\r\n\t\tnew_change = step_size * gradient + momentum * change\r\n\t\t# take a step\r\n\t\tsolution = solution - new_change\r\n\t\t# save the change\r\n\t\tchange = new_change\r\n\t\t# evaluate candidate point\r\n\t\tsolution_eval = objective(solution)\r\n\t\t# report progress\r\n\t\tprint('&gt;%d f(%s) = %.5f' % (i, solution, solution_eval))\r\n\treturn [solution, solution_eval]<\/pre>\n<p>We can then choose a momentum value and pass it to the <em>gradient_descent()<\/em> function.<\/p>\n<p>After a little trial and error, a momentum value of 0.3 was found to be effective on this problem, given the fixed step size of 0.1.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# define momentum\r\nmomentum = 0.3\r\n# perform the gradient descent search with momentum\r\nbest, score = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)<\/pre>\n<p>Tying this together, the complete example of gradient descent optimization with momentum is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># example of gradient descent with momentum for a one-dimensional function\r\nfrom numpy import asarray\r\nfrom numpy.random import rand\r\nfrom numpy.random import seed\r\n\r\n# objective function\r\ndef objective(x):\r\n\treturn x**2.0\r\n\r\n# derivative of objective function\r\ndef derivative(x):\r\n\treturn x * 2.0\r\n\r\n# gradient descent algorithm\r\ndef gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):\r\n\t# generate an initial point\r\n\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\r\n\t# keep track of the change\r\n\tchange = 0.0\r\n\t# run the gradient descent\r\n\tfor i in range(n_iter):\r\n\t\t# calculate gradient\r\n\t\tgradient = derivative(solution)\r\n\t\t# calculate update\r\n\t\tnew_change = step_size * gradient + momentum * change\r\n\t\t# take a step\r\n\t\tsolution = solution - new_change\r\n\t\t# save the change\r\n\t\tchange = new_change\r\n\t\t# evaluate candidate point\r\n\t\tsolution_eval = objective(solution)\r\n\t\t# report progress\r\n\t\tprint('&gt;%d f(%s) = %.5f' % (i, solution, solution_eval))\r\n\treturn [solution, solution_eval]\r\n\r\n# seed the pseudo random number generator\r\nseed(4)\r\n# define range for input\r\nbounds = asarray([[-1.0, 1.0]])\r\n# define the total iterations\r\nn_iter = 30\r\n# define the step size\r\nstep_size = 0.1\r\n# define momentum\r\nmomentum = 0.3\r\n# perform the gradient descent search with momentum\r\nbest, score = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)\r\nprint('Done!')\r\nprint('f(%s) = %f' % (best, score))<\/pre>\n<p>Running the example starts with a random point in the search space, then applies the gradient descent algorithm with momentum, reporting performance along the way.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that the algorithm finds a good solution after about 13 iterations, with a function evaluation of about 0.0.<\/p>\n<p>As expected, this is faster (fewer iterations) than gradient descent without momentum, using the same starting point and step size that took 27 iterations.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">&gt;0 f([0.74724774]) = 0.55838\r\n&gt;1 f([0.54175461]) = 0.29350\r\n&gt;2 f([0.37175575]) = 0.13820\r\n&gt;3 f([0.24640494]) = 0.06072\r\n&gt;4 f([0.15951871]) = 0.02545\r\n&gt;5 f([0.1015491]) = 0.01031\r\n&gt;6 f([0.0638484]) = 0.00408\r\n&gt;7 f([0.03976851]) = 0.00158\r\n&gt;8 f([0.02459084]) = 0.00060\r\n&gt;9 f([0.01511937]) = 0.00023\r\n&gt;10 f([0.00925406]) = 0.00009\r\n&gt;11 f([0.00564365]) = 0.00003\r\n&gt;12 f([0.0034318]) = 0.00001\r\n&gt;13 f([0.00208188]) = 0.00000\r\n&gt;14 f([0.00126053]) = 0.00000\r\n&gt;15 f([0.00076202]) = 0.00000\r\n&gt;16 f([0.00046006]) = 0.00000\r\n&gt;17 f([0.00027746]) = 0.00000\r\n&gt;18 f([0.00016719]) = 0.00000\r\n&gt;19 f([0.00010067]) = 0.00000\r\n&gt;20 f([6.05804744e-05]) = 0.00000\r\n&gt;21 f([3.64373635e-05]) = 0.00000\r\n&gt;22 f([2.19069576e-05]) = 0.00000\r\n&gt;23 f([1.31664443e-05]) = 0.00000\r\n&gt;24 f([7.91100141e-06]) = 0.00000\r\n&gt;25 f([4.75216828e-06]) = 0.00000\r\n&gt;26 f([2.85408468e-06]) = 0.00000\r\n&gt;27 f([1.71384267e-06]) = 0.00000\r\n&gt;28 f([1.02900153e-06]) = 0.00000\r\n&gt;29 f([6.17748881e-07]) = 0.00000\r\nDone!\r\nf([6.17748881e-07]) = 0.000000<\/pre>\n<\/p>\n<h2>Visualization of Gradient Descent Optimization With Momentum<\/h2>\n<p>Finally, we can visualize the progress of the gradient descent optimization algorithm with momentum.<\/p>\n<p>The complete example is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># example of plotting gradient descent with momentum for a one-dimensional function\r\nfrom numpy import asarray\r\nfrom numpy import arange\r\nfrom numpy.random import rand\r\nfrom numpy.random import seed\r\nfrom matplotlib import pyplot\r\n\r\n# objective function\r\ndef objective(x):\r\n\treturn x**2.0\r\n\r\n# derivative of objective function\r\ndef derivative(x):\r\n\treturn x * 2.0\r\n\r\n# gradient descent algorithm\r\ndef gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):\r\n\t# track all solutions\r\n\tsolutions, scores = list(), list()\r\n\t# generate an initial point\r\n\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\r\n\t# keep track of the change\r\n\tchange = 0.0\r\n\t# run the gradient descent\r\n\tfor i in range(n_iter):\r\n\t\t# calculate gradient\r\n\t\tgradient = derivative(solution)\r\n\t\t# calculate update\r\n\t\tnew_change = step_size * gradient + momentum * change\r\n\t\t# take a step\r\n\t\tsolution = solution - new_change\r\n\t\t# save the change\r\n\t\tchange = new_change\r\n\t\t# evaluate candidate point\r\n\t\tsolution_eval = objective(solution)\r\n\t\t# store solution\r\n\t\tsolutions.append(solution)\r\n\t\tscores.append(solution_eval)\r\n\t\t# report progress\r\n\t\tprint('&gt;%d f(%s) = %.5f' % (i, solution, solution_eval))\r\n\treturn [solutions, scores]\r\n\r\n# seed the pseudo random number generator\r\nseed(4)\r\n# define range for input\r\nbounds = asarray([[-1.0, 1.0]])\r\n# define the total iterations\r\nn_iter = 30\r\n# define the step size\r\nstep_size = 0.1\r\n# define momentum\r\nmomentum = 0.3\r\n# perform the gradient descent search with momentum\r\nsolutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)\r\n# sample input range uniformly at 0.1 increments\r\ninputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\r\n# compute targets\r\nresults = objective(inputs)\r\n# create a line plot of input vs result\r\npyplot.plot(inputs, results)\r\n# plot the solutions found\r\npyplot.plot(solutions, scores, '.-', color='red')\r\n# show the plot\r\npyplot.show()<\/pre>\n<p>Running the example performs the gradient descent search with momentum on the objective function as before, except in this case, each point found during the search is plotted.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, if we compare the plot to the plot created previously for the performance of gradient descent (without momentum), we can see that the search indeed reaches the optima in fewer steps, noted with fewer distinct red dots on the path to the bottom of the basin.<\/p>\n<div id=\"attachment_12092\" style=\"width: 1290px\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-12092\" loading=\"lazy\" class=\"size-full wp-image-12092\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Plot-of-the-Progress-of-Gradient-Descent-with-Momentum-on-a-One-Dimensional-Objective-Function.png\" alt=\"Plot of the Progress of Gradient Descent With Momentum on a One Dimensional Objective Function\" width=\"1280\" height=\"960\" srcset=\"http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/12\/Plot-of-the-Progress-of-Gradient-Descent-with-Momentum-on-a-One-Dimensional-Objective-Function.png 1280w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/12\/Plot-of-the-Progress-of-Gradient-Descent-with-Momentum-on-a-One-Dimensional-Objective-Function-300x225.png 300w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/12\/Plot-of-the-Progress-of-Gradient-Descent-with-Momentum-on-a-One-Dimensional-Objective-Function-1024x768.png 1024w, http:\/\/3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com\/wp-content\/uploads\/2020\/12\/Plot-of-the-Progress-of-Gradient-Descent-with-Momentum-on-a-One-Dimensional-Objective-Function-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-12092\" class=\"wp-caption-text\">Plot of the Progress of Gradient Descent With Momentum on a One Dimensional Objective Function<\/p>\n<\/div>\n<p>As an extension, try different values for momentum, such as 0.8, and review the resulting plot.<br \/>\nLet me know what you discover in the comments below.<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Books<\/h3>\n<ul>\n<li>\n<a href=\"https:\/\/amzn.to\/39KZSQn\">Algorithms for Optimization<\/a>, 2019.<\/li>\n<li>\n<a href=\"https:\/\/amzn.to\/3qSk3C2\">Deep Learning<\/a>, 2016.<\/li>\n<li>\n<a href=\"https:\/\/amzn.to\/380Yjvd\">Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks<\/a>, 1999.<\/li>\n<li>\n<a href=\"https:\/\/amzn.to\/2Wd6uze\">Neural Networks for Pattern Recognition<\/a>, 1996.<\/li>\n<li>\n<a href=\"https:\/\/amzn.to\/3ac5S4Q\">Neural Networks: Tricks of the Trade<\/a>, 2012.<\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li>\n<a href=\"https:\/\/numpy.org\/doc\/stable\/reference\/random\/generated\/numpy.random.rand.html\">numpy.random.rand API<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/numpy.org\/doc\/stable\/reference\/generated\/numpy.asarray.html\">numpy.asarray API<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/matplotlib.org\/api\/pyplot_api.html\">Matplotlib API<\/a>.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Gradient_descent\">Gradient descent, Wikipedia<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Stochastic_gradient_descent\">Stochastic gradient descent, Wikipedia<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Gradient\">Gradient, Wikipedia<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Derivative\">Derivative, Wikipedia<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Differentiable_function\">Differentiable function, Wikipedia<\/a>.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered the gradient descent with momentum algorithm.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Gradient descent is an optimization algorithm that uses the gradient of the objective function to navigate the search space.<\/li>\n<li>Gradient descent can be accelerated by using momentum from past updates to the search position.<\/li>\n<li>How to implement gradient descent optimization with momentum and develop an intuition for its behavior.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/gradient-descent-with-momentum-from-scratch\/\">Gradient Descent With Momentum from Scratch<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/gradient-descent-with-momentum-from-scratch\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Gradient descent is an optimization algorithm that follows the negative gradient of an objective function in order to locate the minimum of [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2021\/02\/04\/gradient-descent-with-momentum-from-scratch\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":4365,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4364"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=4364"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4364\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/4365"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=4364"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=4364"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=4364"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}