{"id":4929,"date":"2021-08-18T06:29:04","date_gmt":"2021-08-18T06:29:04","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2021\/08\/18\/the-chain-rule-of-calculus-even-more-functions\/"},"modified":"2021-08-18T06:29:04","modified_gmt":"2021-08-18T06:29:04","slug":"the-chain-rule-of-calculus-even-more-functions","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2021\/08\/18\/the-chain-rule-of-calculus-even-more-functions\/","title":{"rendered":"The Chain Rule of Calculus \u2013 Even More Functions"},"content":{"rendered":"<p>Author: Stefania Cristina<\/p>\n<div>\n<p>The chain rule is an important derivative rule that allows us to work with composite functions. It is essential in understanding the workings of the backpropagation algorithm, which applies the chain rule extensively in order to calculate the error gradient of the loss function with respect to each weight of a neural network. We will be building on our earlier introduction to the chain rule, by tackling more challenging functions.<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>In this tutorial, you will discover how to apply the chain rule of calculus to challenging functions.<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>After completing this tutorial, you will know:<\/p>\n<ul>\n<li>The process of applying the chain rule to univariate functions can be extended to multivariate ones.<span class=\"Apple-converted-space\">\u00a0<\/span>\n<\/li>\n<li>The application of the chain rule follows a similar process, no matter how complex the function is: take the derivative of the outer function first, and then move inwards. Along the way, the application of other derivative rules might be required.<span class=\"Apple-converted-space\">\u00a0<\/span>\n<\/li>\n<li>Applying the chain rule to multivariate functions requires the use of partial derivatives.<span class=\"Apple-converted-space\">\u00a0<\/span>\n<\/li>\n<\/ul>\n<p>Let\u2019s get started.<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<div id=\"attachment_12733\" style=\"width: 1034px\" class=\"wp-caption alignnone\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_cover-scaled.jpg\"><img decoding=\"async\" aria-describedby=\"caption-attachment-12733\" loading=\"lazy\" class=\"wp-image-12733 size-large\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_cover-1024x684.jpg\" alt=\"\" width=\"1024\" height=\"684\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_cover-1024x684.jpg 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_cover-300x200.jpg 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_cover-768x513.jpg 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_cover-1536x1025.jpg 1536w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_cover-2048x1367.jpg 2048w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_cover-600x400.jpg 600w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><\/a><\/p>\n<p id=\"caption-attachment-12733\" class=\"wp-caption-text\">The Chain Rule of Calculus \u2013 Even More Functions<br \/>Photo by <a href=\"https:\/\/unsplash.com\/photos\/mNuLRRjLwjA\">Nan Ingraham<\/a>, some rights reserved.<\/p>\n<\/div>\n<p>\u00a0<\/p>\n<h2><b>Tutorial Overview<\/b><\/h2>\n<p>This tutorial is divided into two parts; they are:<\/p>\n<ul>\n<li>The Chain Rule on Univariate Functions<\/li>\n<li>The Chain Rule on Multivariate Functions<\/li>\n<\/ul>\n<h2><b>Prerequisites<\/b><\/h2>\n<p>For this tutorial, we assume that you already know what are:<\/p>\n<ul>\n<li><a href=\"https:\/\/machinelearningmastery.com\/a-gentle-introduction-to-multivariate-calculus\/\">Multivariate functions<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/the-power-product-and-quotient-rules\/\">The power and product rules<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/a-gentle-introduction-to-partial-derivatives-and-gradient-vectors\">Partial derivatives<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/?p=12720&amp;preview=true\">The chain rule<\/a><\/li>\n<\/ul>\n<p>You can review these concepts by clicking on the links given above.<\/p>\n<h2><b>The Chain Rule on Univariate Functions<\/b><\/h2>\n<p>We have already discovered the chain rule for univariate and multivariate functions, but we have only seen a few simple examples so far. Let\u2019s see a few more challenging ones here. We will be starting with univariate functions first, and then apply what we learn to multivariate functions.<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p><b>EXAMPLE 1<\/b>: Let\u2019s raise the bar a little by considering the following composite function:<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_1.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12734\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_1.png\" alt=\"\" width=\"204\" height=\"40\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_1.png 930w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_1-300x59.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_1-768x150.png 768w\" sizes=\"(max-width: 204px) 100vw, 204px\"><\/a><\/p>\n<p>We can separate the composite function into the inner function, <i>f<\/i>(<i>x<\/i>) = <i>x<\/i><sup>2<\/sup> \u2013 10, and the outer function, <i>g<\/i>(<i>x<\/i>) = \u221a<i>x<\/i> = (<i>x<\/i>)<sup>1\/2<\/sup>. The output of the inner function is denoted by the intermediate variable, <i>u<\/i>, and its value will be fed into the input of the outer function.<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>The first step is to find the derivative of the outer part of the composite function, while ignoring whatever is inside. For this purpose, we can apply the power rule:<\/p>\n<p style=\"text-align: center;\"><i>dh \/ du<\/i> = (1\/2) (<i>x<\/i><sup>2<\/sup> \u2013 10)<sup>-1\/2<\/sup><\/p>\n<p>The next step is to find the derivative of the inner part of the composite function, this time ignoring whatever is outside. We can apply the power rule here too:<\/p>\n<p style=\"text-align: center;\"><i>du \/ dx<\/i> = 2<i>x<\/i><\/p>\n<p>Putting the two parts together and simplifying, we have:<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_2.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12735\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_2-1024x165.png\" alt=\"\" width=\"372\" height=\"60\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_2-1024x165.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_2-300x48.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_2-768x124.png 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_2-1536x248.png 1536w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_2.png 1824w\" sizes=\"(max-width: 372px) 100vw, 372px\"><\/a><\/p>\n<p><b>EXAMPLE 2<\/b>: Let\u2019s repeat the procedure, this time with a different composite function:<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_3.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12736\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_3.png\" alt=\"\" width=\"153\" height=\"36\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_3.png 640w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_3-300x70.png 300w\" sizes=\"(max-width: 153px) 100vw, 153px\"><\/a><\/p>\n<p>We will again use, <i>u<\/i>, the output of the inner function, as our intermediate variable.<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>The outer function in this case is, cos <i>x<\/i>. Finding its derivative, again ignoring the inside, gives us:<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p style=\"text-align: center;\"><i>dh<\/i> \/ <i>du<\/i> = (cos(<i>x<\/i><sup>3<\/sup> \u2013 1))\u2019 = -sin(<i>x<\/i><sup>3<\/sup> \u2013 1)<\/p>\n<p>The inner function is, <i>x<\/i><sup>3<\/sup> \u2013 1. Hence, its derivative becomes:<\/p>\n<p style=\"text-align: center;\"><i>du<\/i> \/ <i>dx<\/i> = (<i>x<\/i><sup>3<\/sup> \u2013 1)\u2019 = 3<i>x<\/i><sup>2<\/sup><\/p>\n<p>Putting the two parts together, we obtain the derivative of the composite function:<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_4.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12737\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_4-1024x205.png\" alt=\"\" width=\"269\" height=\"54\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_4-1024x205.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_4-300x60.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_4-768x154.png 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_4.png 1256w\" sizes=\"(max-width: 269px) 100vw, 269px\"><\/a><\/p>\n<p><b>EXAMPLE 3<\/b>: Let\u2019s now raise the bar a little further by considering a more challenging composite function:<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_5.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12738\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_5.png\" alt=\"\" width=\"179\" height=\"40\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_5.png 822w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_5-300x67.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_5-768x172.png 768w\" sizes=\"(max-width: 179px) 100vw, 179px\"><\/a><\/p>\n<p>If we observe this closely, we realize that not only do we have nested functions for which we will need to apply the chain rule multiple times, but we also have a product to which we will need to apply the product rule. <span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>We find that the outermost function is a cosine. In finding its derivative by the chain rule, we shall be using the intermediate variable, <i>u<\/i>:<\/p>\n<p style=\"text-align: center;\"><i>dh<\/i> \/ <i>du<\/i> = (cos(<i>x <\/i>\u221a(<i>x<\/i><sup>2<\/sup> \u2013 10) ))\u2019 = -sin(<i>x <\/i>\u221a(<i>x<\/i><sup>2<\/sup> \u2013 10) )<\/p>\n<p>Inside the cosine, we have the product, <i>x <\/i>\u221a(x<sup>2<\/sup> \u2013 10), to which we will be applying the product rule to find its derivative (notice that we are always moving from the outside to the inside, in order to discover the operation that needs to be tackled next):<\/p>\n<p style=\"text-align: center;\"><i>du<\/i> \/ <i>dx<\/i> = (<i>x <\/i>\u221a(x<sup>2<\/sup> \u2013 10) )\u2019 = \u221a(x<sup>2<\/sup> \u2013 10) + <i>x<\/i> ( \u221a(x<sup>2<\/sup> \u2013 10) )\u2019<\/p>\n<p>One of the components in the resulting term is, ( \u221a(x<sup>2<\/sup> \u2013 10) )\u2019, to which we shall be applying the chain rule again. Indeed, we have already done so above, and hence we can simply re-utilise the result:<\/p>\n<p style=\"text-align: center;\">( \u221a(x<sup>2<\/sup> \u2013 10) )\u2019 = <i>x<\/i> (<i>x<\/i><sup>2<\/sup> \u2013 10)<sup>-1\/2<\/sup><\/p>\n<p>Putting all the parts together, we obtain the derivative of the composite function:<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_6.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12739\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_6-1024x135.png\" alt=\"\" width=\"456\" height=\"60\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_6-1024x135.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_6-300x39.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_6-768x101.png 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_6-1536x202.png 1536w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_6-2048x269.png 2048w\" sizes=\"(max-width: 456px) 100vw, 456px\"><\/a><\/p>\n<p>This can be simplified further into:<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_7.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12740\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_7-1024x219.png\" alt=\"\" width=\"300\" height=\"64\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_7-1024x219.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_7-300x64.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_7-768x164.png 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_7.png 1440w\" sizes=\"(max-width: 300px) 100vw, 300px\"><\/a><\/p>\n<h2><b>The Chain Rule on Multivariate Functions<\/b><\/h2>\n<p><b>EXAMPLE 4<\/b>: Suppose that we are now presented by a multivariate function of two independent variables, <i>s<\/i> and <i>t<\/i>, with each of these variables being dependent on another two independent variables, <i>x<\/i> and <i>y<\/i>:<\/p>\n<p style=\"text-align: center;\"><i>h<\/i> = <i>g<\/i>(<i>s<\/i>, <i>t<\/i>) = <i>s<\/i><sup>2<\/sup> + <i>t<\/i><sup>3<\/sup><\/p>\n<p>Where the functions, <i>s <\/i>= <i>xy<\/i>, and <i>t<\/i> = 2<i>x<\/i> \u2013 <i>y<\/i>.<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p>Implementing the chain rule here requires the computation of partial derivatives, since we are working with multiple independent variables. Furthermore, <i>s<\/i> and <i>t<\/i> will also act as our intermediate variables. The formulae that we will be working with, defined with respect to each input, are the following:<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_8.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12741\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_8.png\" alt=\"\" width=\"224\" height=\"140\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_8.png 950w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_8-300x188.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_8-768x480.png 768w\" sizes=\"(max-width: 224px) 100vw, 224px\"><\/a><\/p>\n<p>From these formulae, we can see that we will need to find six different partial derivatives:<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_9.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12742\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_9.png\" alt=\"\" width=\"211\" height=\"210\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_9.png 910w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_9-300x298.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_9-150x150.png 150w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_9-768x763.png 768w\" sizes=\"(max-width: 211px) 100vw, 211px\"><\/a><\/p>\n<p>We can now proceed to substitute these terms in the formulae for \u2202<i>h<\/i> \/ \u2202<i>x <\/i>and<i> <\/i>\u2202<i>h<\/i> \/ \u2202<i>y<\/i>:<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_10.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12743\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_10-1024x456.png\" alt=\"\" width=\"314\" height=\"140\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_10-1024x456.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_10-300x134.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_10-768x342.png 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_10.png 1342w\" sizes=\"(max-width: 314px) 100vw, 314px\"><\/a><\/p>\n<p>And subsequently substitute for <i>s<\/i> and <i>t <\/i>to find the derivatives:<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_11.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12744\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_11-1024x310.png\" alt=\"\" width=\"429\" height=\"130\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_11-1024x310.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_11-300x91.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_11-768x233.png 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_11-1536x466.png 1536w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_11.png 1940w\" sizes=\"(max-width: 429px) 100vw, 429px\"><\/a><\/p>\n<p><b>EXAMPLE 5<\/b>: Let\u2019s repeat this again, this time with a multivariate function of three independent variables, <i>r<\/i>, <i>s<\/i> and <i>t<\/i>, with each of these variables being dependent on another two independent variables, <i>x<\/i> and <i>y<\/i>:<\/p>\n<p style=\"text-align: center;\"><i>h<\/i> = <i>g<\/i>(r, <i>s<\/i>, <i>t<\/i>) = <i>r<\/i><sup>2<\/sup> \u2013\u00a0<i>rs<\/i> + <i>t<\/i><sup>3<\/sup><\/p>\n<p>Where the functions, <i>r<\/i> = <i>x<\/i> cos <i>y<\/i>, <i>s <\/i>= <i>x <\/i>e<i><sup>y<\/sup><\/i>, and <i>t<\/i> = <i>x<\/i> + <i>y<\/i>.<\/p>\n<p>This time round, <i>r<\/i>, <i>s<\/i> and <i>t<\/i> will act as our intermediate variables. The formulae that we will be working with, defined with respect to each input, are the following:<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_12.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12745\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_12-1024x468.png\" alt=\"\" width=\"306\" height=\"140\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_12-1024x468.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_12-300x137.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_12-768x351.png 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_12.png 1296w\" sizes=\"(max-width: 306px) 100vw, 306px\"><\/a><\/p>\n<p>From these formulae, we can see that we will now need to find nine different partial derivatives:<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_13.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12746\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_13-1024x560.png\" alt=\"\" width=\"384\" height=\"210\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_13-1024x560.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_13-300x164.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_13-768x420.png 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_13-1536x840.png 1536w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_13.png 1704w\" sizes=\"(max-width: 384px) 100vw, 384px\"><\/a><\/p>\n<p>Again, we proceed to substitute these terms in the formulae for \u2202<i>h<\/i> \/ \u2202<i>x <\/i>and<i> <\/i>\u2202<i>h<\/i> \/ \u2202<i>y<\/i>:<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_14.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12747\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_14-1024x256.png\" alt=\"\" width=\"559\" height=\"140\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_14-1024x256.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_14-300x75.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_14-768x192.png 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_14-1536x385.png 1536w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_14-2048x513.png 2048w\" sizes=\"(max-width: 559px) 100vw, 559px\"><\/a><\/p>\n<p>And subsequently substitute for <i>r<\/i>, <i>s<\/i> and <i>t <\/i>to find the derivatives:<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_15.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12748\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_15-1024x297.png\" alt=\"\" width=\"482\" height=\"140\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_15-1024x297.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_15-300x87.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_15-768x223.png 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_15-1536x446.png 1536w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_15-2048x595.png 2048w\" sizes=\"(max-width: 482px) 100vw, 482px\"><\/a><\/p>\n<p>Which may be simplified a little further (hint: apply the trigonometric identity 2sin <i>y<\/i> cos <i>y<\/i> = sin 2<i>y<\/i> to \u2202<i>h<\/i> \/ \u2202<i>y<\/i>):<\/p>\n<p><a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_16.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-12749\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_16-1024x350.png\" alt=\"\" width=\"410\" height=\"140\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_16-1024x350.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_16-300x102.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_16-768x262.png 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_16-1536x524.png 1536w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2021\/07\/more_chain_rule_16.png 1752w\" sizes=\"(max-width: 410px) 100vw, 410px\"><\/a><\/p>\n<p>No matter how complex the expression is, the procedure to follow remains similar:<\/p>\n<blockquote>\n<p><i>Your last computation tells you the first thing to do.<\/i><\/p>\n<p>\u2013 Page 143, <a href=\"https:\/\/www.amazon.com\/Calculus-Dummies-Math-Science\/dp\/1119293499\/ref=as_li_ss_tl?dchild=1&amp;keywords=calculus&amp;qid=1606170839&amp;sr=8-2&amp;linkCode=sl1&amp;tag=inspiredalgor-20&amp;linkId=539ed0b89e326b6eb27b1a9a028e9cee&amp;language=en_US\">Calculus for Dummies<\/a>, 2016.<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<\/blockquote>\n<p>Hence, start by tackling the outer function first, then move inwards to the next one. You may need to apply other rules along the way, as we have seen for Example 3. Do not forget to take the partial derivatives if you are working with multivariate functions.<span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<h2><b>Further Reading<\/b><\/h2>\n<p>This section provides more resources on the topic if you are looking to go deeper.<\/p>\n<h3><b>Books<\/b><\/h3>\n<ul>\n<li>\n<a href=\"https:\/\/www.amazon.com\/Calculus-Dummies-Math-Science\/dp\/1119293499\/ref=as_li_ss_tl?dchild=1&amp;keywords=calculus&amp;qid=1606170839&amp;sr=8-2&amp;linkCode=sl1&amp;tag=inspiredalgor-20&amp;linkId=539ed0b89e326b6eb27b1a9a028e9cee&amp;language=en_US\">Calculus for Dummies<\/a>, 2016.<\/li>\n<li>\n<a href=\"https:\/\/www.whitman.edu\/mathematics\/multivariable\/multivariable.pdf\">Single and Multivariable Calculus<\/a>, 2020.<\/li>\n<li>\n<a href=\"https:\/\/www.amazon.com\/Mathematics-Machine-Learning-Peter-Deisenroth\/dp\/110845514X\/ref=as_li_ss_tl?dchild=1&amp;keywords=calculus+machine+learning&amp;qid=1606171788&amp;s=books&amp;sr=1-3&amp;linkCode=sl1&amp;tag=inspiredalgor-20&amp;linkId=209ba69202a6cc0a9f2b07439b4376ca&amp;language=en_US\">Mathematics for Machine Learning<\/a>, 2020.<\/li>\n<\/ul>\n<h2><b>Summary<\/b><\/h2>\n<p>In this tutorial, you discovered\u00a0how to apply the chain rule of calculus to challenging functions.<\/p>\n<p>Specifically, you learned:<\/p>\n<ul>\n<li>The process of applying the chain rule to univariate functions can be extended to multivariate ones.<span class=\"Apple-converted-space\">\u00a0<\/span>\n<\/li>\n<li>The application of the chain rule follows a similar process, no matter how complex the function is: take the derivative of the outer function first, and then move inwards. Along the way, the application of other derivative rules might be required.<span class=\"Apple-converted-space\">\u00a0<\/span>\n<\/li>\n<li>Applying the chain rule to multivariate functions requires the use of partial derivatives.<\/li>\n<\/ul>\n<p>Do you have any questions?<br \/>\nAsk your questions in the comments below and I will do my best to answer.<\/p>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/the-chain-rule-of-calculus-even-more-functions\/\">The Chain Rule of Calculus \u2013 Even More Functions<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/the-chain-rule-of-calculus-even-more-functions\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Stefania Cristina The chain rule is an important derivative rule that allows us to work with composite functions. It is essential in understanding the [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2021\/08\/18\/the-chain-rule-of-calculus-even-more-functions\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":4930,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4929"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=4929"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4929\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/4930"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=4929"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=4929"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=4929"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}