{"id":2736,"date":"2019-10-25T06:35:17","date_gmt":"2019-10-25T06:35:17","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/10\/25\/5-algorithms-to-train-a-neural-network\/"},"modified":"2019-10-25T06:35:17","modified_gmt":"2019-10-25T06:35:17","slug":"5-algorithms-to-train-a-neural-network","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/10\/25\/5-algorithms-to-train-a-neural-network\/","title":{"rendered":"5 Algorithms to Train a Neural Network"},"content":{"rendered":"<p>Author: Andrea Manero-Bastin<\/p>\n<div>\n<p><em><span>This article was written by <a href=\"https:\/\/www.artelnics.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Alberto Quesada<\/a>.<\/span><\/em><\/p>\n<p><em><span>\u00a0<\/span><\/em><\/p>\n<p><span>The procedure used to carry out the learning process in a neural network is called the optimization algorithm (or optimizer).\u00a0<\/span><span>There are many different optimization algorithms. All have different characteristics and performance in terms of memory requirements, speed and precision.<\/span><\/p>\n<p><span>\u00a0<\/span><\/p>\n<p><span style=\"font-size: 14pt;\"><strong>Problem formulation<\/strong><\/span><\/p>\n<p><span>The learning problem is formulated in terms of the minimization of a\u00a0loss index,\u00a0f. This is a function which measures the performance of a\u00a0neural network\u00a0on a\u00a0data set.<\/span><\/p>\n<p><span>The loss index is, in general, composed of an error and a regularization terms. The\u00a0error term\u00a0evaluates how a neural network fits the data set. The\u00a0regularization term\u00a0is used to prevent overfitting, by controlling the effective complexity of the neural network.<\/span><\/p>\n<p><span>The loss function depends on the adaptive\u00a0parameters (biases and synaptic weights) in the neural network. We can conveniently group them together into a single n-dimensional weight vector\u00a0w.<\/span><\/p>\n<p><span>The problem of minimizing continuous and differentiable functions of many variables has been widely studied. Many of the conventional approaches to this problem are directly applicable to that of training neural networks.<\/span><\/p>\n<p><span><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3678365363?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3678365363?profile=RESIZE_710x\" class=\"align-center\"><\/a><\/span><\/p>\n<p><span style=\"font-size: 14pt;\"><strong>One-dimensional optimization<\/strong><\/span><\/p>\n<p><span>Although the loss function depends on many parameters, one-dimensional optimization methods are of great importance here. Indeed, they are very often used in the training process of a neural network.<\/span><\/p>\n<p><span>Certainly, many training algorithms first compute a training direction\u00a0d\u00a0and then a training rate\u00a0<\/span><span>\u03b7<\/span><span>; that minimizes the loss in that direction,\u00a0f(<\/span><span>\u03b7<\/span><span>).<\/span><\/p>\n<p><span>In this regard, one-dimensional optimization methods search for the minimum of a given one-dimensional function. Some of the algorithms which are widely used are the golden section method and Brent&#8217;s method. Both reduce the bracket of a minimum until the distance between the two outer points in the bracket is less than a defined tolerance.<\/span><\/p>\n<p><strong><span>\u00a0<\/span><\/strong><\/p>\n<p><span style=\"font-size: 14pt;\"><strong>Multidimensional optimization<\/strong><\/span><\/p>\n<p><span>The learning problem for neural networks is formulated as searching of a parameter vector\u00a0w<\/span>\u2217<span>\u00a0at which the loss function\u00a0f takes a minimum value. The necessary condition states that if the neural network is at a minimum of the loss function, then the gradient is the zero vector.<\/span><\/p>\n<p><span>The loss function is, in general, a non-linear function of the parameters. As a consequence, it is not possible to find closed training algorithms for the minima. Instead, we consider a search through the parameter space consisting of a succession of steps. At each step, the loss will decrease by adjusting the neural network parameters.<\/span><\/p>\n<p><span>In this way, to train a neural network we start with some parameter vector (often chosen at random). Then, we generate a sequence of parameters, so that the loss function is reduced at each iteration of the algorithm. The change of loss between two steps is called the loss decrement. The training algorithm stops when a specified condition, or stopping criterion, is satisfied.<\/span><\/p>\n<p><span>\u00a0<\/span><\/p>\n<p><span>These are the main training algorithms for neural networks:<\/span><\/p>\n<ol>\n<li><span>Gradient descent<\/span><\/li>\n<li><span>Newton method<\/span><\/li>\n<li><span>Conjugate gradient<\/span><\/li>\n<li><span>Quasi-Newton method<\/span><\/li>\n<li><span>Levenberg-Marquardt algorithm<\/span><\/li>\n<\/ol>\n<p><em><span>\u00a0<\/span><\/em><\/p>\n<p><em>To read the whole article, with formulas and illustrations, click <a href=\"https:\/\/www.neuraldesigner.com\/blog\/5_algorithms_to_train_a_neural_network\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>.<\/em><\/p>\n<p><span>\u00a0<\/span><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:900608\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Andrea Manero-Bastin This article was written by Alberto Quesada. \u00a0 The procedure used to carry out the learning process in a neural network is [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/10\/25\/5-algorithms-to-train-a-neural-network\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":460,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2736"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2736"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2736\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/465"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2736"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2736"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2736"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}