{"id":4602,"date":"2021-04-26T06:30:43","date_gmt":"2021-04-26T06:30:43","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2021\/04\/26\/gradient-descent-with-adadelta-from-scratch\/"},"modified":"2021-04-26T06:30:43","modified_gmt":"2021-04-26T06:30:43","slug":"gradient-descent-with-adadelta-from-scratch","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2021\/04\/26\/gradient-descent-with-adadelta-from-scratch\/","title":{"rendered":"Gradient Descent With Adadelta from Scratch"},"content":{"rendered":"<p>Author: Jason Brownlee<\/p>\n<div>\n<p>Gradient descent is an optimization algorithm that follows the negative gradient of an objective function in order to locate the minimum of the function.<\/p>\n<p>A limitation of gradient descent is that it uses the same step size (learning rate) for each input variable. AdaGradn and RMSProp are extensions to gradient descent that add a self-adaptive learning rate for each parameter for the objective function.<\/p>\n<p><strong>Adadelta<\/strong> can be considered a further extension of gradient descent that builds upon AdaGrad and RMSProp and changes the calculation of the custom step size so that the units are consistent and in turn no longer requires an initial learning rate hyperparameter.<\/p>\n<p>In this tutorial, you will discover how to develop the gradient descent with Adadelta optimization algorithm from scratch.<\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>Gradient descent is an optimization algorithm that uses the gradient of the objective function to navigate the search space.<\/li>\n<li>Gradient descent can be updated to use an automatically adaptive step size for each input variable using a decaying average of partial derivatives, called Adadelta.<\/li>\n<li>How to implement the Adadelta optimization algorithm from scratch and apply it to an objective function and evaluate the results.<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<div id=\"attachment_12119\" style=\"width: 809px\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-12119\" loading=\"lazy\" class=\"size-full wp-image-12119\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/04\/Gradient-Descent-With-Adadelta-from-Scratch.jpg\" alt=\"Gradient Descent With Adadelta from Scratch\" width=\"799\" height=\"533\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/04\/Gradient-Descent-With-Adadelta-from-Scratch.jpg 799w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/04\/Gradient-Descent-With-Adadelta-from-Scratch-300x200.jpg 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/04\/Gradient-Descent-With-Adadelta-from-Scratch-768x512.jpg 768w\" sizes=\"(max-width: 799px) 100vw, 799px\"><\/p>\n<p id=\"caption-attachment-12119\" class=\"wp-caption-text\">Gradient Descent With Adadelta from Scratch<br \/>Photo by <a href=\"https:\/\/www.flickr.com\/photos\/bobanddusty\/45082094802\/\">Robert Minkler<\/a>, some rights reserved.<\/p>\n<\/div>\n<h2>Tutorial Overview<\/h2>\n<p>This tutorial is divided into three parts; they are:<\/p>\n<ol>\n<li>Gradient Descent<\/li>\n<li>Adadelta Algorithm<\/li>\n<li>Gradient Descent With Adadelta\n<ol>\n<li>Two-Dimensional Test Problem<\/li>\n<li>Gradient Descent Optimization With Adadelta<\/li>\n<li>Visualization of Adadelta<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<h2>Gradient Descent<\/h2>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Gradient_descent\">Gradient descent<\/a> is an optimization algorithm.<\/p>\n<p>It is technically referred to as a first-order optimization algorithm as it explicitly makes use of the first-order derivative of the target objective function.<\/p>\n<blockquote>\n<p>First-order methods rely on gradient information to help direct the search for a minimum \u2026<\/p>\n<\/blockquote>\n<p>\u2014 Page 69, <a href=\"https:\/\/amzn.to\/39KZSQn\">Algorithms for Optimization<\/a>, 2019.<\/p>\n<p>The <a href=\"https:\/\/en.wikipedia.org\/wiki\/Derivative\">first order derivative<\/a>, or simply the \u201c<em>derivative<\/em>,\u201d is the rate of change or slope of the target function at a specific point, e.g. for a specific input.<\/p>\n<p>If the target function takes multiple input variables, it is referred to as a multivariate function and the input variables can be thought of as a vector. In turn, the derivative of a multivariate target function may also be taken as a vector and is referred to generally as the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Gradient\">gradient<\/a>.<\/p>\n<ul>\n<li>\n<strong>Gradient<\/strong>: First-order derivative for a multivariate objective function.<\/li>\n<\/ul>\n<p>The derivative or the gradient points in the direction of the steepest ascent of the target function for a specific input.<\/p>\n<p>Gradient descent refers to a minimization optimization algorithm that follows the negative of the gradient downhill of the target function to locate the minimum of the function.<\/p>\n<p>The gradient descent algorithm requires a target function that is being optimized and the derivative function for the objective function. The target function <em>f()<\/em> returns a score for a given set of inputs, and the derivative function <em>f'()<\/em> gives the derivative of the target function for a given set of inputs.<\/p>\n<p>The gradient descent algorithm requires a starting point (<em>x<\/em>) in the problem, such as a randomly selected point in the input space.<\/p>\n<p>The derivative is then calculated and a step is taken in the input space that is expected to result in a downhill movement in the target function, assuming we are minimizing the target function.<\/p>\n<p>A downhill movement is made by first calculating how far to move in the input space, calculated as the steps size (called alpha or the learning rate) multiplied by the gradient. This is then subtracted from the current point, ensuring we move against the gradient, or down the target function.<\/p>\n<ul>\n<li>x = x \u2013 step_size * f'(x)<\/li>\n<\/ul>\n<p>The steeper the objective function at a given point, the larger the magnitude of the gradient, and in turn, the larger the step taken in the search space. The size of the step taken is scaled using a step size hyperparameter.<\/p>\n<ul>\n<li>\n<strong>Step Size<\/strong> (<em>alpha<\/em>): Hyperparameter that controls how far to move in the search space against the gradient each iteration of the algorithm.<\/li>\n<\/ul>\n<p>If the step size is too small, the movement in the search space will be small and the search will take a long time. If the step size is too large, the search may bounce around the search space and skip over the optima.<\/p>\n<p>Now that we are familiar with the gradient descent optimization algorithm, let\u2019s take a look at Adadelta.<\/p>\n<h2>Adadelta Algorithm<\/h2>\n<p>Adadelta (or \u201cADADELTA\u201d) is an extension to the gradient descent optimization algorithm.<\/p>\n<p>The algorithm was described in the 2012 paper by <a href=\"https:\/\/www.linkedin.com\/in\/mattzeiler\/\">Matthew Zeiler<\/a> titled \u201c<a href=\"https:\/\/arxiv.org\/abs\/1212.5701\">ADADELTA: An Adaptive Learning Rate Method<\/a>.\u201d<\/p>\n<p>Adadelta is designed to accelerate the optimization process, e.g. decrease the number of function evaluations required to reach the optima, or to improve the capability of the optimization algorithm, e.g. result in a better final result.<\/p>\n<p>It is best understood as an extension of the AdaGrad and RMSProp algorithms.<\/p>\n<p>AdaGrad is an extension of gradient descent that calculates a step size (learning rate) for each parameter for the objective function each time an update is made. The step size is calculated by first summing the partial derivatives for the parameter seen so far during the search, then dividing the initial step size hyperparameter by the square root of the sum of the squared partial derivatives.<\/p>\n<p>The calculation of the custom step size for one parameter with AdaGrad is as follows:<\/p>\n<ul>\n<li>cust_step_size(t+1) = step_size \/ (1e-8 + sqrt(s(t)))<\/li>\n<\/ul>\n<p>Where <em>cust_step_size(t+1)<\/em> is the calculated step size for an input variable for a given point during the search, <em>step_size<\/em> is the initial step size, <em>sqrt()<\/em> is the square root operation, and <em>s(t)<\/em> is the sum of the squared partial derivatives for the input variable seen during the search so far (including the current iteration).<\/p>\n<p>RMSProp can be thought of as an extension of AdaGrad in that it uses a decaying average or moving average of the partial derivatives instead of the sum in the calculation of the step size for each parameter. This is achieved by adding a new hyperparameter \u201c<em>rho<\/em>\u201d that acts like a momentum for the partial derivatives.<\/p>\n<p>The calculation of the decaying moving average squared partial derivative for one parameter is as follows:<\/p>\n<ul>\n<li>s(t+1) = (s(t) * rho) + (f'(x(t))^2 * (1.0-rho))<\/li>\n<\/ul>\n<p>Where <em>s(t+1)<\/em> is the mean squared partial derivative for one parameter for the current iteration of the algorithm, <em>s(t)<\/em> is the decaying moving average squared partial derivative for the previous iteration, <em>f'(x(t))^2<\/em> is the squared partial derivative for the current parameter, and rho is a hyperparameter, typically with the value of 0.9 like momentum.<\/p>\n<p>Adadelta is a further extension of RMSProp designed to improve the convergence of the algorithm and to remove the need for a manually specified initial learning rate.<\/p>\n<blockquote>\n<p>The idea presented in this paper was derived from ADAGRAD in order to improve upon the two main drawbacks of the method: 1) the continual decay of learning rates throughout training, and 2) the need for a manually selected global learning rate.<\/p>\n<\/blockquote>\n<p>\u2014 <a href=\"https:\/\/arxiv.org\/abs\/1212.5701\">ADADELTA: An Adaptive Learning Rate Method<\/a>, 2012.<\/p>\n<p>The decaying moving average of the squared partial derivative is calculated for each parameter, as with RMSProp. The key difference is in the calculation of the step size for a parameter that uses the decaying average of the delta or change in parameter.<\/p>\n<p>This choice of numerator was to ensure that both parts of the calculation have the same units.<\/p>\n<blockquote>\n<p>After independently deriving the RMSProp update, the authors noticed that the units in the update equations for gradient descent, momentum and Adagrad do not match. To fix this, they use an exponentially decaying average of the square updates<\/p>\n<\/blockquote>\n<p>\u2014 Pages 78-79, <a href=\"https:\/\/amzn.to\/39KZSQn\">Algorithms for Optimization<\/a>, 2019.<\/p>\n<p>First, the custom step size is calculated as the square root of the decaying moving average of the change in the delta divided by the square root of the decaying moving average of the squared partial derivatives.<\/p>\n<ul>\n<li>cust_step_size(t+1) = (ep + sqrt(delta(t))) \/ (ep + sqrt(s(t)))<\/li>\n<\/ul>\n<p>Where <em>cust_step_size(t+1)<\/em> is the custom step size for a parameter for a given update, <em>ep<\/em> is a hyperparameter that is added to the numerator and denominator to avoid a divide by zero error, <em>delta(t)<\/em> is the decaying moving average of the squared change to the parameter (calculated in the last iteration), and <em>s(t)<\/em> is the decaying moving average of the squared partial derivative (calculated in the current iteration).<\/p>\n<p>The <em>ep<\/em> hyperparameter is set to a small value such as 1e-3 or 1e-8. In addition to avoiding a divide by zero error, it also helps with the first step of the algorithm when the decaying moving average squared change and decaying moving average squared gradient are zero.<\/p>\n<p>Next, the change to the parameter is calculated as the custom step size multiplied by the partial derivative<\/p>\n<ul>\n<li>change(t+1) = cust_step_size(t+1) * f'(x(t))<\/li>\n<\/ul>\n<p>Next, the decaying average of the squared change to the parameter is updated.<\/p>\n<ul>\n<li>delta(t+1) = (delta(t) * rho) + (change(t+1)^2 * (1.0-rho))<\/li>\n<\/ul>\n<p>Where <em>delta(t+1)<\/em> is the decaying average of the change to the variable to be used in the next iteration, <em>change(t+1)<\/em> was calculated in the step before and <em>rho<\/em> is a hyperparameter that acts like momentum and has a value like 0.9.<\/p>\n<p>Finally, the new value for the variable is calculated using the change.<\/p>\n<ul>\n<li>x(t+1) = x(t) \u2013 change(t+1)<\/li>\n<\/ul>\n<p>This process is then repeated for each variable for the objective function, then the entire process is repeated to navigate the search space for a fixed number of algorithm iterations.<\/p>\n<p>Now that we are familiar with the Adadelta algorithm, let\u2019s explore how we might implement it and evaluate its performance.<\/p>\n<h2>Gradient Descent With Adadelta<\/h2>\n<p>In this section, we will explore how to implement the gradient descent optimization algorithm with Adadelta.<\/p>\n<h3>Two-Dimensional Test Problem<\/h3>\n<p>First, let\u2019s define an optimization function.<\/p>\n<p>We will use a simple two-dimensional function that squares the input of each dimension and define the range of valid inputs from -1.0 to 1.0.<\/p>\n<p>The objective() function below implements this function<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># objective function\r\ndef objective(x, y):\r\n\treturn x**2.0 + y**2.0<\/pre>\n<p>We can create a three-dimensional plot of the dataset to get a feeling for the curvature of the response surface.<\/p>\n<p>The complete example of plotting the objective function is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># 3d plot of the test function\r\nfrom numpy import arange\r\nfrom numpy import meshgrid\r\nfrom matplotlib import pyplot\r\n\r\n# objective function\r\ndef objective(x, y):\r\n\treturn x**2.0 + y**2.0\r\n\r\n# define range for input\r\nr_min, r_max = -1.0, 1.0\r\n# sample input range uniformly at 0.1 increments\r\nxaxis = arange(r_min, r_max, 0.1)\r\nyaxis = arange(r_min, r_max, 0.1)\r\n# create a mesh from the axis\r\nx, y = meshgrid(xaxis, yaxis)\r\n# compute targets\r\nresults = objective(x, y)\r\n# create a surface plot with the jet color scheme\r\nfigure = pyplot.figure()\r\naxis = figure.gca(projection='3d')\r\naxis.plot_surface(x, y, results, cmap='jet')\r\n# show the plot\r\npyplot.show()<\/pre>\n<p>Running the example creates a three dimensional surface plot of the objective function.<\/p>\n<p>We can see the familiar bowl shape with the global minima at f(0, 0) = 0.<\/p>\n<div id=\"attachment_12105\" style=\"width: 1290px\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-12105\" loading=\"lazy\" class=\"size-full wp-image-12105\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Three-Dimensional-Plot-of-the-Test-Objective-Function.png\" alt=\"Three-Dimensional Plot of the Test Objective Function\" width=\"1280\" height=\"960\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Three-Dimensional-Plot-of-the-Test-Objective-Function.png 1280w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Three-Dimensional-Plot-of-the-Test-Objective-Function-300x225.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Three-Dimensional-Plot-of-the-Test-Objective-Function-1024x768.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Three-Dimensional-Plot-of-the-Test-Objective-Function-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-12105\" class=\"wp-caption-text\">Three-Dimensional Plot of the Test Objective Function<\/p>\n<\/div>\n<p>We can also create a two-dimensional plot of the function. This will be helpful later when we want to plot the progress of the search.<\/p>\n<p>The example below creates a contour plot of the objective function.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># contour plot of the test function\r\nfrom numpy import asarray\r\nfrom numpy import arange\r\nfrom numpy import meshgrid\r\nfrom matplotlib import pyplot\r\n\r\n# objective function\r\ndef objective(x, y):\r\n\treturn x**2.0 + y**2.0\r\n\r\n# define range for input\r\nbounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])\r\n# sample input range uniformly at 0.1 increments\r\nxaxis = arange(bounds[0,0], bounds[0,1], 0.1)\r\nyaxis = arange(bounds[1,0], bounds[1,1], 0.1)\r\n# create a mesh from the axis\r\nx, y = meshgrid(xaxis, yaxis)\r\n# compute targets\r\nresults = objective(x, y)\r\n# create a filled contour plot with 50 levels and jet color scheme\r\npyplot.contourf(x, y, results, levels=50, cmap='jet')\r\n# show the plot\r\npyplot.show()<\/pre>\n<p>Running the example creates a two-dimensional contour plot of the objective function.<\/p>\n<p>We can see the bowl shape compressed to contours shown with a color gradient. We will use this plot to plot the specific points explored during the progress of the search.<\/p>\n<div id=\"attachment_12106\" style=\"width: 1290px\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-12106\" loading=\"lazy\" class=\"size-full wp-image-12106\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Two-Dimensional-Contour-Plot-of-the-Test-Objective-Function.png\" alt=\"Two-Dimensional Contour Plot of the Test Objective Function\" width=\"1280\" height=\"960\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Two-Dimensional-Contour-Plot-of-the-Test-Objective-Function.png 1280w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Two-Dimensional-Contour-Plot-of-the-Test-Objective-Function-300x225.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Two-Dimensional-Contour-Plot-of-the-Test-Objective-Function-1024x768.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Two-Dimensional-Contour-Plot-of-the-Test-Objective-Function-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-12106\" class=\"wp-caption-text\">Two-Dimensional Contour Plot of the Test Objective Function<\/p>\n<\/div>\n<p>Now that we have a test objective function, let\u2019s look at how we might implement the Adadelta optimization algorithm.<\/p>\n<h3>Gradient Descent Optimization With Adadelta<\/h3>\n<p>We can apply the gradient descent with Adadelta to the test problem.<\/p>\n<p>First, we need a function that calculates the derivative for this function.<\/p>\n<ul>\n<li>f(x) = x^2<\/li>\n<li>f'(x) = x * 2<\/li>\n<\/ul>\n<p>The derivative of x^2 is x * 2 in each dimension. The derivative() function implements this below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># derivative of objective function\r\ndef derivative(x, y):\r\n\treturn asarray([x * 2.0, y * 2.0])<\/pre>\n<p>Next, we can implement gradient descent optimization.<\/p>\n<p>First, we can select a random point in the bounds of the problem as a starting point for the search.<\/p>\n<p>This assumes we have an array that defines the bounds of the search with one row for each dimension and the first column defines the minimum and the second column defines the maximum of the dimension.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# generate an initial point\r\nsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])<\/pre>\n<p>Next, we need to initialize the decaying average of the squared partial derivatives and squared change for each dimension to 0.0 values.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# list of the average square gradients for each variable\r\nsq_grad_avg = [0.0 for _ in range(bounds.shape[0])]\r\n# list of the average parameter updates\r\nsq_para_avg = [0.0 for _ in range(bounds.shape[0])]<\/pre>\n<p>We can then enumerate a fixed number of iterations of the search optimization algorithm defined by a \u201c<em>n_iter<\/em>\u201d hyperparameter.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# run the gradient descent\r\nfor it in range(n_iter):\r\n\t...<\/pre>\n<p>The first step is to calculate the gradient for the current solution using the <em>derivative()<\/em> function.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# calculate gradient\r\ngradient = derivative(solution[0], solution[1])<\/pre>\n<p>We then need to calculate the square of the partial derivative and update the decaying moving average of the squared partial derivatives with the \u201c<em>rho<\/em>\u201d hyperparameter.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# update the average of the squared partial derivatives\r\nfor i in range(gradient.shape[0]):\r\n\t# calculate the squared gradient\r\n\tsg = gradient[i]**2.0\r\n\t# update the moving average of the squared gradient\r\n\tsq_grad_avg[i] = (sq_grad_avg[i] * rho) + (sg * (1.0-rho))<\/pre>\n<p>We can then use the decaying moving average of the squared partial derivatives and gradient to calculate the step size for the next point. We will do this one variable at a time.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# build solution\r\nnew_solution = list()\r\nfor i in range(solution.shape[0]):\r\n\t...<\/pre>\n<p>First, we will calculate the custom step size for this variable on this iteration using the decaying moving average of the squared changes and squared partial derivatives, as well as the \u201cep\u201d hyperparameter.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# calculate the step size for this variable\r\nalpha = (ep + sqrt(sq_para_avg[i])) \/ (ep + sqrt(sq_grad_avg[i]))<\/pre>\n<p>Next, we can use the custom step size and partial derivative to calculate the change to the variable.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# calculate the change\r\nchange = alpha * gradient[i]<\/pre>\n<p>We can then use the change to update the decaying moving average of the squared change using the \u201c<em>rho<\/em>\u201d hyperparameter.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# update the moving average of squared parameter changes\r\nsq_para_avg[i] = (sq_para_avg[i] * rho) + (change**2.0 * (1.0-rho))<\/pre>\n<p>Finally, we can change the variable and store the result before moving on to the next variable.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# calculate the new position in this variable\r\nvalue = solution[i] - change\r\n# store this variable\r\nnew_solution.append(value)<\/pre>\n<p>This new solution can then be evaluated using the objective() function and the performance of the search can be reported.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# evaluate candidate point\r\nsolution = asarray(new_solution)\r\nsolution_eval = objective(solution[0], solution[1])\r\n# report progress\r\nprint('&gt;%d f(%s) = %.5f' % (it, solution, solution_eval))<\/pre>\n<p>And that\u2019s it.<\/p>\n<p>We can tie all of this together into a function named <em>adadelta()<\/em> that takes the names of the objective function and the derivative function, an array with the bounds of the domain and hyperparameter values for the total number of algorithm iterations and <em>rho<\/em>, and returns the final solution and its evaluation.<\/p>\n<p>The <em>ep<\/em> hyperparameter can also be taken as an argument, although has a sensible default value of 1e-3.<\/p>\n<p>This complete function is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># gradient descent algorithm with adadelta\r\ndef adadelta(objective, derivative, bounds, n_iter, rho, ep=1e-3):\r\n\t# generate an initial point\r\n\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\r\n\t# list of the average square gradients for each variable\r\n\tsq_grad_avg = [0.0 for _ in range(bounds.shape[0])]\r\n\t# list of the average parameter updates\r\n\tsq_para_avg = [0.0 for _ in range(bounds.shape[0])]\r\n\t# run the gradient descent\r\n\tfor it in range(n_iter):\r\n\t\t# calculate gradient\r\n\t\tgradient = derivative(solution[0], solution[1])\r\n\t\t# update the average of the squared partial derivatives\r\n\t\tfor i in range(gradient.shape[0]):\r\n\t\t\t# calculate the squared gradient\r\n\t\t\tsg = gradient[i]**2.0\r\n\t\t\t# update the moving average of the squared gradient\r\n\t\t\tsq_grad_avg[i] = (sq_grad_avg[i] * rho) + (sg * (1.0-rho))\r\n\t\t# build a solution one variable at a time\r\n\t\tnew_solution = list()\r\n\t\tfor i in range(solution.shape[0]):\r\n\t\t\t# calculate the step size for this variable\r\n\t\t\talpha = (ep + sqrt(sq_para_avg[i])) \/ (ep + sqrt(sq_grad_avg[i]))\r\n\t\t\t# calculate the change\r\n\t\t\tchange = alpha * gradient[i]\r\n\t\t\t# update the moving average of squared parameter changes\r\n\t\t\tsq_para_avg[i] = (sq_para_avg[i] * rho) + (change**2.0 * (1.0-rho))\r\n\t\t\t# calculate the new position in this variable\r\n\t\t\tvalue = solution[i] - change\r\n\t\t\t# store this variable\r\n\t\t\tnew_solution.append(value)\r\n\t\t# evaluate candidate point\r\n\t\tsolution = asarray(new_solution)\r\n\t\tsolution_eval = objective(solution[0], solution[1])\r\n\t\t# report progress\r\n\t\tprint('&gt;%d f(%s) = %.5f' % (it, solution, solution_eval))\r\n\treturn [solution, solution_eval]<\/pre>\n<p><strong>Note<\/strong>: we have intentionally used lists and imperative coding style instead of vectorized operations for readability. Feel free to adapt the implementation to a vectorization implementation with NumPy arrays for better performance.<\/p>\n<p>We can then define our hyperparameters and call the <em>adadelta()<\/em> function to optimize our test objective function.<\/p>\n<p>In this case, we will use 120 iterations of the algorithm and a value of 0.99 for the rho hyperparameter, chosen after a little trial and error.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# seed the pseudo random number generator\r\nseed(1)\r\n# define range for input\r\nbounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])\r\n# define the total iterations\r\nn_iter = 120\r\n# momentum for adadelta\r\nrho = 0.99\r\n# perform the gradient descent search with adadelta\r\nbest, score = adadelta(objective, derivative, bounds, n_iter, rho)\r\nprint('Done!')\r\nprint('f(%s) = %f' % (best, score))<\/pre>\n<p>Tying all of this together, the complete example of gradient descent optimization with Adadelta is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># gradient descent optimization with adadelta for a two-dimensional test function\r\nfrom math import sqrt\r\nfrom numpy import asarray\r\nfrom numpy.random import rand\r\nfrom numpy.random import seed\r\n\r\n# objective function\r\ndef objective(x, y):\r\n\treturn x**2.0 + y**2.0\r\n\r\n# derivative of objective function\r\ndef derivative(x, y):\r\n\treturn asarray([x * 2.0, y * 2.0])\r\n\r\n# gradient descent algorithm with adadelta\r\ndef adadelta(objective, derivative, bounds, n_iter, rho, ep=1e-3):\r\n\t# generate an initial point\r\n\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\r\n\t# list of the average square gradients for each variable\r\n\tsq_grad_avg = [0.0 for _ in range(bounds.shape[0])]\r\n\t# list of the average parameter updates\r\n\tsq_para_avg = [0.0 for _ in range(bounds.shape[0])]\r\n\t# run the gradient descent\r\n\tfor it in range(n_iter):\r\n\t\t# calculate gradient\r\n\t\tgradient = derivative(solution[0], solution[1])\r\n\t\t# update the average of the squared partial derivatives\r\n\t\tfor i in range(gradient.shape[0]):\r\n\t\t\t# calculate the squared gradient\r\n\t\t\tsg = gradient[i]**2.0\r\n\t\t\t# update the moving average of the squared gradient\r\n\t\t\tsq_grad_avg[i] = (sq_grad_avg[i] * rho) + (sg * (1.0-rho))\r\n\t\t# build a solution one variable at a time\r\n\t\tnew_solution = list()\r\n\t\tfor i in range(solution.shape[0]):\r\n\t\t\t# calculate the step size for this variable\r\n\t\t\talpha = (ep + sqrt(sq_para_avg[i])) \/ (ep + sqrt(sq_grad_avg[i]))\r\n\t\t\t# calculate the change\r\n\t\t\tchange = alpha * gradient[i]\r\n\t\t\t# update the moving average of squared parameter changes\r\n\t\t\tsq_para_avg[i] = (sq_para_avg[i] * rho) + (change**2.0 * (1.0-rho))\r\n\t\t\t# calculate the new position in this variable\r\n\t\t\tvalue = solution[i] - change\r\n\t\t\t# store this variable\r\n\t\t\tnew_solution.append(value)\r\n\t\t# evaluate candidate point\r\n\t\tsolution = asarray(new_solution)\r\n\t\tsolution_eval = objective(solution[0], solution[1])\r\n\t\t# report progress\r\n\t\tprint('&gt;%d f(%s) = %.5f' % (it, solution, solution_eval))\r\n\treturn [solution, solution_eval]\r\n\r\n# seed the pseudo random number generator\r\nseed(1)\r\n# define range for input\r\nbounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])\r\n# define the total iterations\r\nn_iter = 120\r\n# momentum for adadelta\r\nrho = 0.99\r\n# perform the gradient descent search with adadelta\r\nbest, score = adadelta(objective, derivative, bounds, n_iter, rho)\r\nprint('Done!')\r\nprint('f(%s) = %f' % (best, score))<\/pre>\n<p>Running the example applies the Adadelta optimization algorithm to our test problem and reports performance of the search for each iteration of the algorithm.<\/p>\n<p><strong>Note<\/strong>: Your <a href=\"https:\/\/machinelearningmastery.com\/different-results-each-time-in-machine-learning\/\">results may vary<\/a> given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.<\/p>\n<p>In this case, we can see that a near optimal solution was found after perhaps 105 iterations of the search, with input values near 0.0 and 0.0, evaluating to 0.0.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n&gt;100 f([-1.45142626e-07 2.71163181e-03]) = 0.00001\r\n&gt;101 f([-1.24898699e-07 2.56875692e-03]) = 0.00001\r\n&gt;102 f([-1.07454197e-07 2.43328237e-03]) = 0.00001\r\n&gt;103 f([-9.24253035e-08 2.30483111e-03]) = 0.00001\r\n&gt;104 f([-7.94803792e-08 2.18304501e-03]) = 0.00000\r\n&gt;105 f([-6.83329263e-08 2.06758392e-03]) = 0.00000\r\n&gt;106 f([-5.87354975e-08 1.95812477e-03]) = 0.00000\r\n&gt;107 f([-5.04744185e-08 1.85436071e-03]) = 0.00000\r\n&gt;108 f([-4.33652179e-08 1.75600036e-03]) = 0.00000\r\n&gt;109 f([-3.72486699e-08 1.66276699e-03]) = 0.00000\r\n&gt;110 f([-3.19873691e-08 1.57439783e-03]) = 0.00000\r\n&gt;111 f([-2.74627662e-08 1.49064334e-03]) = 0.00000\r\n&gt;112 f([-2.3572602e-08 1.4112666e-03]) = 0.00000\r\n&gt;113 f([-2.02286891e-08 1.33604264e-03]) = 0.00000\r\n&gt;114 f([-1.73549914e-08 1.26475787e-03]) = 0.00000\r\n&gt;115 f([-1.48859650e-08 1.19720951e-03]) = 0.00000\r\n&gt;116 f([-1.27651224e-08 1.13320504e-03]) = 0.00000\r\n&gt;117 f([-1.09437923e-08 1.07256172e-03]) = 0.00000\r\n&gt;118 f([-9.38004754e-09 1.01510604e-03]) = 0.00000\r\n&gt;119 f([-8.03777865e-09 9.60673346e-04]) = 0.00000\r\nDone!\r\nf([-8.03777865e-09 9.60673346e-04]) = 0.000001<\/pre>\n<\/p>\n<h3>Visualization of Adadelta<\/h3>\n<p>We can plot the progress of the Adadelta search on a contour plot of the domain.<\/p>\n<p>This can provide an intuition for the progress of the search over the iterations of the algorithm.<\/p>\n<p>We must update the <em>adadelta()<\/em> function to maintain a list of all solutions found during the search, then return this list at the end of the search.<\/p>\n<p>The updated version of the function with these changes is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># gradient descent algorithm with adadelta\r\ndef adadelta(objective, derivative, bounds, n_iter, rho, ep=1e-3):\r\n\t# track all solutions\r\n\tsolutions = list()\r\n\t# generate an initial point\r\n\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\r\n\t# list of the average square gradients for each variable\r\n\tsq_grad_avg = [0.0 for _ in range(bounds.shape[0])]\r\n\t# list of the average parameter updates\r\n\tsq_para_avg = [0.0 for _ in range(bounds.shape[0])]\r\n\t# run the gradient descent\r\n\tfor it in range(n_iter):\r\n\t\t# calculate gradient\r\n\t\tgradient = derivative(solution[0], solution[1])\r\n\t\t# update the average of the squared partial derivatives\r\n\t\tfor i in range(gradient.shape[0]):\r\n\t\t\t# calculate the squared gradient\r\n\t\t\tsg = gradient[i]**2.0\r\n\t\t\t# update the moving average of the squared gradient\r\n\t\t\tsq_grad_avg[i] = (sq_grad_avg[i] * rho) + (sg * (1.0-rho))\r\n\t\t# build solution\r\n\t\tnew_solution = list()\r\n\t\tfor i in range(solution.shape[0]):\r\n\t\t\t# calculate the step size for this variable\r\n\t\t\talpha = (ep + sqrt(sq_para_avg[i])) \/ (ep + sqrt(sq_grad_avg[i]))\r\n\t\t\t# calculate the change\r\n\t\t\tchange = alpha * gradient[i]\r\n\t\t\t# update the moving average of squared parameter changes\r\n\t\t\tsq_para_avg[i] = (sq_para_avg[i] * rho) + (change**2.0 * (1.0-rho))\r\n\t\t\t# calculate the new position in this variable\r\n\t\t\tvalue = solution[i] - change\r\n\t\t\t# store this variable\r\n\t\t\tnew_solution.append(value)\r\n\t\t# store the new solution\r\n\t\tsolution = asarray(new_solution)\r\n\t\tsolutions.append(solution)\r\n\t\t# evaluate candidate point\r\n\t\tsolution_eval = objective(solution[0], solution[1])\r\n\t\t# report progress\r\n\t\tprint('&gt;%d f(%s) = %.5f' % (it, solution, solution_eval))\r\n\treturn solutions<\/pre>\n<p>We can then execute the search as before, and this time retrieve the list of solutions instead of the best final solution.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# seed the pseudo random number generator\r\nseed(1)\r\n# define range for input\r\nbounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])\r\n# define the total iterations\r\nn_iter = 120\r\n# rho for adadelta\r\nrho = 0.99\r\n# perform the gradient descent search with adadelta\r\nsolutions = adadelta(objective, derivative, bounds, n_iter, rho)<\/pre>\n<p>We can then create a contour plot of the objective function, as before.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# sample input range uniformly at 0.1 increments\r\nxaxis = arange(bounds[0,0], bounds[0,1], 0.1)\r\nyaxis = arange(bounds[1,0], bounds[1,1], 0.1)\r\n# create a mesh from the axis\r\nx, y = meshgrid(xaxis, yaxis)\r\n# compute targets\r\nresults = objective(x, y)\r\n# create a filled contour plot with 50 levels and jet color scheme\r\npyplot.contourf(x, y, results, levels=50, cmap='jet')<\/pre>\n<p>Finally, we can plot each solution found during the search as a white dot connected by a line.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">...\r\n# plot the sample as black circles\r\nsolutions = asarray(solutions)\r\npyplot.plot(solutions[:, 0], solutions[:, 1], '.-', color='w')<\/pre>\n<p>Tying this all together, the complete example of performing the Adadelta optimization on the test problem and plotting the results on a contour plot is listed below.<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\"># example of plotting the adadelta search on a contour plot of the test function\r\nfrom math import sqrt\r\nfrom numpy import asarray\r\nfrom numpy import arange\r\nfrom numpy.random import rand\r\nfrom numpy.random import seed\r\nfrom numpy import meshgrid\r\nfrom matplotlib import pyplot\r\nfrom mpl_toolkits.mplot3d import Axes3D\r\n\r\n# objective function\r\ndef objective(x, y):\r\n\treturn x**2.0 + y**2.0\r\n\r\n# derivative of objective function\r\ndef derivative(x, y):\r\n\treturn asarray([x * 2.0, y * 2.0])\r\n\r\n# gradient descent algorithm with adadelta\r\ndef adadelta(objective, derivative, bounds, n_iter, rho, ep=1e-3):\r\n\t# track all solutions\r\n\tsolutions = list()\r\n\t# generate an initial point\r\n\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\r\n\t# list of the average square gradients for each variable\r\n\tsq_grad_avg = [0.0 for _ in range(bounds.shape[0])]\r\n\t# list of the average parameter updates\r\n\tsq_para_avg = [0.0 for _ in range(bounds.shape[0])]\r\n\t# run the gradient descent\r\n\tfor it in range(n_iter):\r\n\t\t# calculate gradient\r\n\t\tgradient = derivative(solution[0], solution[1])\r\n\t\t# update the average of the squared partial derivatives\r\n\t\tfor i in range(gradient.shape[0]):\r\n\t\t\t# calculate the squared gradient\r\n\t\t\tsg = gradient[i]**2.0\r\n\t\t\t# update the moving average of the squared gradient\r\n\t\t\tsq_grad_avg[i] = (sq_grad_avg[i] * rho) + (sg * (1.0-rho))\r\n\t\t# build solution\r\n\t\tnew_solution = list()\r\n\t\tfor i in range(solution.shape[0]):\r\n\t\t\t# calculate the step size for this variable\r\n\t\t\talpha = (ep + sqrt(sq_para_avg[i])) \/ (ep + sqrt(sq_grad_avg[i]))\r\n\t\t\t# calculate the change\r\n\t\t\tchange = alpha * gradient[i]\r\n\t\t\t# update the moving average of squared parameter changes\r\n\t\t\tsq_para_avg[i] = (sq_para_avg[i] * rho) + (change**2.0 * (1.0-rho))\r\n\t\t\t# calculate the new position in this variable\r\n\t\t\tvalue = solution[i] - change\r\n\t\t\t# store this variable\r\n\t\t\tnew_solution.append(value)\r\n\t\t# store the new solution\r\n\t\tsolution = asarray(new_solution)\r\n\t\tsolutions.append(solution)\r\n\t\t# evaluate candidate point\r\n\t\tsolution_eval = objective(solution[0], solution[1])\r\n\t\t# report progress\r\n\t\tprint('&gt;%d f(%s) = %.5f' % (it, solution, solution_eval))\r\n\treturn solutions\r\n\r\n# seed the pseudo random number generator\r\nseed(1)\r\n# define range for input\r\nbounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])\r\n# define the total iterations\r\nn_iter = 120\r\n# rho for adadelta\r\nrho = 0.99\r\n# perform the gradient descent search with adadelta\r\nsolutions = adadelta(objective, derivative, bounds, n_iter, rho)\r\n# sample input range uniformly at 0.1 increments\r\nxaxis = arange(bounds[0,0], bounds[0,1], 0.1)\r\nyaxis = arange(bounds[1,0], bounds[1,1], 0.1)\r\n# create a mesh from the axis\r\nx, y = meshgrid(xaxis, yaxis)\r\n# compute targets\r\nresults = objective(x, y)\r\n# create a filled contour plot with 50 levels and jet color scheme\r\npyplot.contourf(x, y, results, levels=50, cmap='jet')\r\n# plot the sample as black circles\r\nsolutions = asarray(solutions)\r\npyplot.plot(solutions[:, 0], solutions[:, 1], '.-', color='w')\r\n# show the plot\r\npyplot.show()<\/pre>\n<p>Running the example performs the search as before, except in this case, the contour plot of the objective function is created.<\/p>\n<p>In this case, we can see that a white dot is shown for each solution found during the search, starting above the optima and progressively getting closer to the optima at the center of the plot.<\/p>\n<div id=\"attachment_12118\" style=\"width: 1290px\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-12118\" loading=\"lazy\" class=\"size-full wp-image-12118\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Contour-Plot-of-the-Test-Objective-Function-With-Adadelta-Search-Results-Shown.png\" alt=\"Contour Plot of the Test Objective Function With Adadelta Search Results Shown\" width=\"1280\" height=\"960\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Contour-Plot-of-the-Test-Objective-Function-With-Adadelta-Search-Results-Shown.png 1280w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Contour-Plot-of-the-Test-Objective-Function-With-Adadelta-Search-Results-Shown-300x225.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Contour-Plot-of-the-Test-Objective-Function-With-Adadelta-Search-Results-Shown-1024x768.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2020\/12\/Contour-Plot-of-the-Test-Objective-Function-With-Adadelta-Search-Results-Shown-768x576.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\"><\/p>\n<p id=\"caption-attachment-12118\" class=\"wp-caption-text\">Contour Plot of the Test Objective Function With Adadelta Search Results Shown<\/p>\n<\/div>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3>Papers<\/h3>\n<ul>\n<li>\n<a href=\"https:\/\/arxiv.org\/abs\/1212.5701\">ADADELTA: An Adaptive Learning Rate Method<\/a>, 2012.<\/li>\n<\/ul>\n<h3>Books<\/h3>\n<ul>\n<li>\n<a href=\"https:\/\/amzn.to\/39KZSQn\">Algorithms for Optimization<\/a>, 2019.<\/li>\n<li>\n<a href=\"https:\/\/amzn.to\/3qSk3C2\">Deep Learning<\/a>, 2016.<\/li>\n<\/ul>\n<h3>APIs<\/h3>\n<ul>\n<li>\n<a href=\"https:\/\/numpy.org\/doc\/stable\/reference\/random\/generated\/numpy.random.rand.html\">numpy.random.rand API<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/numpy.org\/doc\/stable\/reference\/generated\/numpy.asarray.html\">numpy.asarray API<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/matplotlib.org\/api\/pyplot_api.html\">Matplotlib API<\/a>.<\/li>\n<\/ul>\n<h3>Articles<\/h3>\n<ul>\n<li>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Gradient_descent\">Gradient descent, Wikipedia<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Stochastic_gradient_descent\">Stochastic gradient descent, Wikipedia<\/a>.<\/li>\n<li>\n<a href=\"https:\/\/ruder.io\/optimizing-gradient-descent\/index.html\">An overview of gradient descent optimization algorithms<\/a>, 2016.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you discovered how to develop the gradient descent with Adadelta optimization algorithm from scratch.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>Gradient descent is an optimization algorithm that uses the gradient of the objective function to navigate the search space.<\/li>\n<li>Gradient descent can be updated to use an automatically adaptive step size for each input variable using a decaying average of partial derivatives, called Adadelta.<\/li>\n<li>How to implement the Adadelta optimization algorithm from scratch and apply it to an objective function and evaluate the results.<\/li>\n<\/ul>\n<p><strong>Do you have any questions?<\/strong><br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/gradient-descent-with-adadelta-from-scratch\/\">Gradient Descent With Adadelta from Scratch<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/gradient-descent-with-adadelta-from-scratch\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Jason Brownlee Gradient descent is an optimization algorithm that follows the negative gradient of an objective function in order to locate the minimum of [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2021\/04\/26\/gradient-descent-with-adadelta-from-scratch\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":4603,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4602"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=4602"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4602\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/4603"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=4602"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=4602"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=4602"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}