{"id":2091,"date":"2019-05-03T06:31:41","date_gmt":"2019-05-03T06:31:41","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/03\/the-approximation-power-of-neural-networks-with-python-codes\/"},"modified":"2019-05-03T06:31:41","modified_gmt":"2019-05-03T06:31:41","slug":"the-approximation-power-of-neural-networks-with-python-codes","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/03\/the-approximation-power-of-neural-networks-with-python-codes\/","title":{"rendered":"The Approximation Power of Neural Networks (with Python\u00a0codes)"},"content":{"rendered":"<p>Author: Marco Tavora<\/p>\n<div>\n<h3 id=\"e7ee\" class=\"graf graf--h3 graf-after--h4\">Introduction<\/h3>\n<p id=\"f03f\" class=\"graf graf--p graf-after--h3\">It is a well-known fact that<span>\u00a0<\/span><span class=\"markup--quote markup--p-quote is-other\">neural networks can approximate the output of any continuous mathematical function<\/span>, no matter how complicated it might be. Take for instance the function below:<\/p>\n<\/p>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*mV98Hjz5qQ5OGn0LbWkgVA@2x.png\"><\/div>\n<\/div>\n<p><\/p>\n<p id=\"79fd\" class=\"graf graf--p graf-after--figure\">Though it has a pretty complicated shape, the theorems we will discuss shortly guarantee that one can build some neural network that can approximate<span>\u00a0<\/span><em class=\"markup--em markup--p-em\">f(x)<\/em><span>\u00a0<\/span>as accurately as we want. Neural networks, therefore, display a type of universal behavior.<\/p>\n<p id=\"23de\" class=\"graf graf--p graf-after--p\">One of the reasons neural networks have received so much attention is that in addition to these rather remarkable universal properties,<span>\u00a0<\/span><a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap4.html\" class=\"markup--anchor markup--p-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">they possess many powerful algorithms for learning functions<\/a>.<\/p>\n<h3 id=\"85d3\" class=\"graf graf--h3 graf-after--p\">Universality and the underlying mathematics<\/h3>\n<p id=\"2566\" class=\"graf graf--p graf-after--h3\">This piece will give an informal overview of some of the fundamental mathematical results (theorems) underlying these approximation capabilities of artificial neural networks.<\/p>\n<blockquote id=\"94a9\" class=\"graf graf--pullquote graf--startsWithDoubleQuote graf-after--p\"><p>\u201cAlmost any process you can imagine can be thought of as function computation\u2026 [Examples include] naming a piece of music based on a short sample of the piece [\u2026], translating a Chinese text into English [\u2026], taking an mp4 movie file and generating a description of the plot of the movie, and a discussion of the quality of the\u00a0acting.\u201d<\/p><\/blockquote>\n<blockquote id=\"fa00\" class=\"graf graf--pullquote graf-after--pullquote\"><p>\u2014 Michael\u00a0Nielsen<\/p><\/blockquote>\n<h3 id=\"5123\" class=\"graf graf--h3 graf-after--pullquote\">A motivation for using neural nets as approximators: Kolmogorov\u2019s Theorem<\/h3>\n<p id=\"df98\" class=\"graf graf--p graf-after--h3\">In 1957, the Russian mathematician<span>\u00a0<\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Andrey_Kolmogorov\" class=\"markup--anchor markup--p-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">Andrey Kolmogorov<\/a> (picture below) known for his contributions to a wide range of mathematical topics (such as probability theory, topology, turbulence, computational complexity, and many others),<span>\u00a0<\/span><a href=\"https:\/\/link.springer.com\/chapter\/10.1007\/978-94-011-3030-1_56\" class=\"markup--anchor markup--p-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">proved<\/a><span>\u00a0<\/span>an important theorem about the representation of real functions of many variables. According to Kolmogorov\u2019s theorem,<span>\u00a0<\/span><span class=\"markup--quote markup--p-quote is-other\">multivariate functions can be expressed via a combination of sums and compositions of (a finite number of) univariate functions<\/span>.<\/p>\n<\/p>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*QaPof9Po6bD9S2onTnwbZg.jpeg\"><\/div>\n<\/div>\n<p id=\"4830\" class=\"graf graf--p graf-after--figure\"><span class=\"markup--quote markup--p-quote is-other\">A little more formally, the theorem states that a continuous function<span>\u00a0<\/span><em class=\"markup--em markup--p-em\">f<\/em><span>\u00a0<\/span>of real variables defined in the<span>\u00a0<\/span><em class=\"markup--em markup--p-em\">n<\/em>-dimensional hypercube (with<span>\u00a0<\/span><em class=\"markup--em markup--p-em\">n<\/em><span>\u00a0<\/span>\u2265 2) can be represented as follows:<\/span><\/p>\n<\/p>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*W-wYyeJdSo3M0K_7J-CAjA@2x.png\"><\/div>\n<\/div>\n<p><\/p>\n<p id=\"e068\" class=\"graf graf--p graf-after--figure\"><span class=\"markup--quote markup--p-quote is-other\">In this expression, the<span>\u00a0<\/span><em class=\"markup--em markup--p-em\">g<\/em>s are univariate functions and the \u03d5s are continuous, entirely (monotonically) increasing functions (as shown in the figure below) that do not depend on the choice of<span>\u00a0<\/span><em class=\"markup--em markup--p-em\">f.<\/em><\/span><\/p>\n<\/p>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*ursqbQ8rIECr3v9J0w08wQ@2x.png\"><\/div>\n<\/div>\n<h3 id=\"f37c\" class=\"graf graf--h3 graf-after--figure\">Universal Approximation Theorem\u00a0(UAT)<\/h3>\n<p id=\"7287\" class=\"graf graf--p graf-after--h3\">The UAT states that feed-forward neural networks containing a single hidden layer with a finite number of nodes can be used to approximate any continuous function provided rather mild assumptions about the form of the activation function are satisfied. Now,<span>\u00a0<\/span><span class=\"markup--quote markup--p-quote is-other\">since almost any process we can imagine can be described by some mathematical function, neural networks can, at least in principle, predict the outcome of nearly every process.<\/span><\/p>\n<p id=\"f23a\" class=\"graf graf--p graf-after--p\">There are several rigorous proofs of the universality of feed-forward artificial neural nets using different activation functions. Let us, for the sake of brevity, restrict ourselves to the sigmoid function. Sigmoids are \u201cS\u201d-shaped and include as special cases the<span>\u00a0<\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Logistic_function\" class=\"markup--anchor markup--p-anchor\" title=\"Logistic function\" rel=\"noopener noreferrer\" target=\"_blank\">logistic function<\/a>, the<span>\u00a0<\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Gompertz_curve\" class=\"markup--anchor markup--p-anchor\" title=\"Gompertz curve\" rel=\"noopener noreferrer\" target=\"_blank\">Gompertz curve<\/a>, and the<span>\u00a0<\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Ogee_curve\" class=\"markup--anchor markup--p-anchor\" title=\"Ogee curve\" rel=\"noopener noreferrer\" target=\"_blank\">ogee curve<\/a>.<\/p>\n<h3 id=\"294a\" class=\"graf graf--h3 graf-after--p\">Python code for the\u00a0sigmoid<\/h3>\n<p id=\"654d\" class=\"graf graf--p graf-after--h3\">A quick Python code to build and plot a sigmoid function is:<\/p>\n<pre id=\"0e63\" class=\"graf graf--pre graf-after--p\">import numpy as np<br>import matplotlib.pyplot as plt<br>%matplotlib inline<\/pre>\n<pre id=\"7839\" class=\"graf graf--pre graf-after--pre\">upper, lower = 6, -6<br>num = 100<\/pre>\n<pre id=\"1556\" class=\"graf graf--pre graf-after--pre\">def sigmoid_activation(x):<br>if x > upper:<br>return 1<br>\nelif x < lower:<br>\nreturn 0<br>\nreturn 1\/(1+np.exp(-x))<\/pre>\n<pre id=\"2914\" class=\"graf graf--pre graf-after--pre\">vals = [sigmoid_activation(x) for <br>x in np.linspace(lower, upper, num=num)]<br>plt.plot(np.linspace(lower, <br>\nupper, <br>\nnum=num), vals);<br>\nplt.title('Sigmoid');<\/pre>\n<\/p>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*4fIu1vrSy1A2LcksP0TQBA@2x.png\"><\/div>\n<\/div>\n<h3 id=\"f55b\" class=\"graf graf--h3 graf-after--figure\">George Cybenko\u2019s Proof<\/h3>\n<p id=\"20dd\" class=\"graf graf--p graf-after--h3\">The proof given by<span>\u00a0<\/span><a href=\"https:\/\/pdfs.semanticscholar.org\/85a4\/8564b709025bca9ab9f72373a64637f8217d.pdf\" class=\"markup--anchor markup--p-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">Cybenko (1989)<\/a><span>\u00a0<\/span>is known for its elegance, simplicity, and conciseness. In his article, he proves the following statement. Let \u03d5 be any continuous function of the sigmoid type (see discussion above). Given any multivariate continuous function<\/p>\n<\/p>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*7vjwpMEHV_NHe-dXukGeHg@2x.png\"><\/div>\n<\/div>\n<p id=\"d978\" class=\"graf graf--p graf-after--figure\">inside a<span>\u00a0<\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Compact_space\" class=\"markup--anchor markup--p-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">compact<\/a><span>\u00a0<\/span>subset of the<span>\u00a0<\/span><em class=\"markup--em markup--p-em\">N<\/em>-dimensional real space and any positive \u03f5, there are vectors<\/p>\n<\/p>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*SBO8hvfAmDb-LCSVeDfAhQ@2x.png\"><\/div>\n<\/div>\n<p id=\"ee67\" class=\"graf graf--p graf-after--figure\">(the<span>\u00a0<\/span><strong class=\"markup--strong markup--p-strong\">weights<\/strong>), constants<\/p>\n<\/p>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<p><img decoding=\"async\" class=\"graf-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*jHEIq6F99q-BDVAjQP2cgQ@2x.png\"><\/div>\n<\/p>\n<p id=\"9eb3\" class=\"graf graf--p graf-after--figure\">(the<span>\u00a0<\/span><strong class=\"markup--strong markup--p-strong\">bias<span>\u00a0<\/span><\/strong>terms) and<\/p>\n<\/p>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<p><img decoding=\"async\" class=\"graf-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*Fzfl6O2Cw-vtusPkUGLt-w@2x.png\"><\/div>\n<\/p>\n<p id=\"46a3\" class=\"graf graf--p graf-after--figure\">such that<\/p>\n<\/p>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<p><img decoding=\"async\" class=\"graf-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*_j7-MnmiFRMGHqWb6sfC9w@2x.png\"><\/div>\n<\/p>\n<p id=\"42ee\" class=\"graf graf--p graf-after--figure\">for any<span>\u00a0<\/span><em class=\"markup--em markup--p-em\">x<span>\u00a0<\/span><\/em>(the NN inputs) inside the compact subset, where the function<span>\u00a0<\/span><em class=\"markup--em markup--p-em\">G<\/em><span>\u00a0<\/span>is given by:<\/p>\n<\/p>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*532teQE_TPHTpAE12g7bEQ@2x.png\"><\/div>\n<\/div>\n<p id=\"0437\" class=\"graf graf--p graf-after--figure\">Choosing the appropriate parameters, neural nets can be used to represent functions with a wide range of different forms.<\/p>\n<h3 id=\"858d\" class=\"graf graf--h3 graf-after--p\">An example using\u00a0Python<\/h3>\n<p id=\"3ec6\" class=\"graf graf--p graf-after--h3\">To make these statements less abstract, let us build a simple Python code to illustrate what has been discussed until now. The following analysis is based on Michael Nielsen\u2019s<span>\u00a0<\/span><a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap4.html\" class=\"markup--anchor markup--p-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">great online book<\/a><span>\u00a0<\/span>and<span>\u00a0<\/span><span class=\"markup--quote markup--p-quote is-other\">on the exceptional lectures by<a href=\"http:\/\/argmatt.com\/\" class=\"markup--anchor markup--p-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">Matt Brems<\/a><span>\u00a0<\/span>and<span>\u00a0<\/span><a href=\"http:\/\/linkedin.com\/in\/pounders\" class=\"markup--anchor markup--p-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">Justin Pounders<\/a><span>\u00a0<\/span>at the<span>\u00a0<\/span><a href=\"https:\/\/generalassemb.ly\/education\/data-science-immersive\" class=\"markup--anchor markup--p-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">General Assembly Data Science Immersive (DSI)<\/a>.<\/span><\/p>\n<p id=\"fc8c\" class=\"graf graf--p graf-after--p\">We will build a neural network to approximate the following simple function:<\/p>\n<\/p>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*5ZMMDlwskCPrQK2EBqo5oQ@2x.png\"><\/div>\n<\/div>\n<p id=\"75f8\" class=\"graf graf--p graf-after--figure\">A few remarks are needed before diving into the code:<\/p>\n<ul class=\"postList\">\n<li id=\"4e2d\" class=\"graf graf--li graf-after--p\">To make the analysis more straightforward, I will work with a simple limit case of the sigmoid function. When the weights are extremely large, the sigmoid approaches the<span>\u00a0<\/span><a href=\"https:\/\/en.m.wikipedia.org\/wiki\/Heaviside_step_function\" class=\"markup--anchor markup--li-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">Heaviside step function<\/a>. Since we will need to add contributions coming from several neurons, it is much more convenient to work with step functions instead of general sigmoids.<\/li>\n<\/ul>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*2Aep1Y5QPjsWJEFZwF3UiQ@2x.png\"><\/div>\n<\/div>\n<ul class=\"postList\">\n<li id=\"687f\" class=\"graf graf--li graf-after--figure\">In the limit where the sigmoid approaches the step function, we need only one parameter to describe it, namely, the point where the step occurs.<span>\u00a0<\/span><a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap4.html\" class=\"markup--anchor markup--li-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">The value of s can be shown<\/a><span>\u00a0<\/span>to be equal to<span>\u00a0<\/span><em class=\"markup--em markup--li-em\">s=-b\/w<span>\u00a0<\/span><\/em>where<span>\u00a0<\/span><em class=\"markup--em markup--li-em\">b<\/em><span>\u00a0<\/span>and<span>\u00a0<\/span><em class=\"markup--em markup--li-em\">w<\/em><span>\u00a0<\/span>are the neuron\u2019s bias and weight respectively.<\/li>\n<li id=\"5b13\" class=\"graf graf--li graf-after--li\">The neural network I will build will be a very simple one, with one input, one output, and one hidden layer. If the values of the weights corresponding to two hidden neurons are equal in absolute value and with opposite signs, their output becomes a \u201c<a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap4.html\" class=\"markup--anchor markup--li-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">bump<\/a>\u201d with the height equal to the absolute value of the weights and width equal to the difference between the values of<span>\u00a0<\/span><em class=\"markup--em markup--li-em\">s<span>\u00a0<\/span><\/em>of each neuron (see the figure below).<\/li>\n<\/ul>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*LSoLmRpuMeFOCpiuajQ7Wg@2x.png\"><\/div>\n<\/div>\n<ul class=\"postList\">\n<li id=\"4781\" class=\"graf graf--li graf-after--figure\">We use the following<span>\u00a0<\/span><a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap4.html\" class=\"markup--anchor markup--li-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">notation<\/a><span>\u00a0<\/span>since the absolute value of the weights is the<span>\u00a0<\/span><strong class=\"markup--strong markup--li-strong\">h<\/strong>eight of the bump.<\/li>\n<\/ul>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<p><img decoding=\"async\" class=\"graf-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*xAIjKDUX78BYMDoG7fWT7A@2x.png\"><\/div>\n<\/p>\n<ul class=\"postList\">\n<li id=\"1250\" class=\"graf graf--li graf-after--figure\">Since we want to approximate<span>\u00a0<\/span><em class=\"markup--em markup--li-em\">g,<span>\u00a0<\/span><\/em>the weighted output of the hidden layer must involve the inverse of the sigmoid. In fact, it must be equal to:<\/li>\n<\/ul>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*nBUaXSdijDbNavFjyZ_jdA@2x.png\"><\/div>\n<\/div>\n<p id=\"c87a\" class=\"graf graf--p graf-after--figure\">To reproduce this function, we choose the values of<span>\u00a0<\/span><em class=\"markup--em markup--p-em\">h<\/em>s to be (see the corresponding figure below, taken from\u00a0<a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap4.html\" class=\"markup--anchor markup--figure-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">here<\/a>):<\/p>\n<\/p>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*5nHiXlPDz0NPa5G8PQcHpg@2x.png\"><\/div>\n<\/div>\n<p><\/p>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*cbE8OFO3zckMJ2qZm-J3rw@2x.png\"><\/div>\n<\/div>\n<p><\/p>\n<h3 id=\"63de\" class=\"graf graf--h3 graf-after--figure\">The code<\/h3>\n<p id=\"c09d\" class=\"graf graf--p graf-after--h3\">The code starts with the following definitions:<\/p>\n<ul class=\"postList\">\n<li id=\"01fc\" class=\"graf graf--li graf-after--p\">We first import<span>\u00a0<\/span><code class=\"markup--code markup--li-code\">inversefunc<\/code><span>\u00a0<\/span>that will need to build the inverse sigmoid function<\/li>\n<li id=\"4e30\" class=\"graf graf--li graf-after--li\">We then choose a very large weight for the sigmoid for it to be similar to the<span>\u00a0<\/span><a href=\"https:\/\/en.m.wikipedia.org\/wiki\/Heaviside_step_function\" class=\"markup--anchor markup--li-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">Heaviside<\/a><span>\u00a0<\/span>function (as just discussed).<\/li>\n<li id=\"9df4\" class=\"graf graf--li graf-after--li\">We choose the output activation to be the identity function<code class=\"markup--code markup--li-code\">identify_activation<\/code><\/li>\n<li id=\"e970\" class=\"graf graf--li graf-after--li\">The role of the function is to recover the original<span>\u00a0<\/span><em class=\"markup--em markup--li-em\">(w,b)<span>\u00a0<\/span><\/em>parametrization from<span>\u00a0<\/span><em class=\"markup--em markup--li-em\">s<span>\u00a0<\/span><\/em>and<em class=\"markup--em markup--li-em\"><span>\u00a0<\/span>w<span>\u00a0<\/span><\/em>(recall that s is the step position).<\/li>\n<li id=\"0128\" class=\"graf graf--li graf-after--li\">The architecture is set, and all the<span>\u00a0<\/span><em class=\"markup--em markup--li-em\">w<\/em>s and<span>\u00a0<\/span><em class=\"markup--em markup--li-em\">b<\/em>s are chosen. The elements of the array<span>\u00a0<\/span><code class=\"markup--code markup--li-code\">weight_outputs<\/code><span>\u00a0<\/span>are obtained from the values of the output weights given in the previous section.<\/li>\n<\/ul>\n<pre id=\"d018\" class=\"graf graf--pre graf-after--li\">from pynverse import inversefunc<\/pre>\n<pre id=\"f606\" class=\"graf graf--pre graf-after--pre\">w = 500<\/pre>\n<pre id=\"abd4\" class=\"graf graf--pre graf-after--pre\">def identity_activation(x):<br>return(x)<\/pre>\n<pre id=\"16fe\" class=\"graf graf--pre graf-after--pre\">def solve_for_bias(s, w=w):<br>return(-w * s)<\/pre>\n<pre id=\"5332\" class=\"graf graf--pre graf-after--pre\">steps = [0,.2,.2,.4,.4,.6,.6,.8,.8,1]<\/pre>\n<pre id=\"f0c4\" class=\"graf graf--pre graf-after--pre\">bias_hl = np.array([solve_for_bias(s) for s in steps])<br>weights_hl = np.array([w] * len(steps))<br>bias_output = 0<br>\nweights_output =np.array([-1.2, 1.2, -1.6, 1.6, <br>\n-.3, .3, -1])<br>\n1, 1, 1, -1])<\/pre>\n<p id=\"65ab\" class=\"graf graf--p graf-after--pre\">The final steps are:<\/p>\n<ul class=\"postList\">\n<li id=\"429a\" class=\"graf graf--li graf-after--p\">Writing a<span>\u00a0<\/span><code class=\"markup--code markup--li-code\">Python<\/code><span>\u00a0<\/span>function which I called<span>\u00a0<\/span><code class=\"markup--code markup--li-code\">nn<\/code><span>\u00a0<\/span>that builds and runs the network<\/li>\n<li id=\"cfa4\" class=\"graf graf--li graf-after--li\">Printing out the comparison between the approximation and the actual function.<\/li>\n<\/ul>\n<pre id=\"3187\" class=\"graf graf--pre graf-after--li\">def nn(input_value):<br><br>Z_hl = input_value * weights_hl + bias_hl<br>\nactivation_hl = np.array([sigmoid_activation(Z) <br>\nfor Z in Z_hl])<\/pre>\n<pre id=\"5153\" class=\"graf graf--pre graf-after--pre\">Z_output = np.sum(activation_hl * weights_output) <br>+ bias_output<\/pre>\n<pre id=\"84ff\" class=\"graf graf--pre graf-after--pre\">activation_output = identity_activation(Z_output) <br><br>return activation_output<\/pre>\n<pre id=\"d331\" class=\"graf graf--pre graf-after--pre\">x_values = np.linspace(0,1,1000)<br>y_hat = [nn(x) for x in x_values]<\/pre>\n<pre id=\"4879\" class=\"graf graf--pre graf-after--pre\">def f(x):<br>return 0.2 + 0.4*(x**2) + 0.3*x*np.sin(15*x)+ 0.05*np.cos(50*x))<\/pre>\n<pre id=\"4078\" class=\"graf graf--pre graf-after--pre\">y = [f(x) for x in x_values]<\/pre>\n<pre id=\"5230\" class=\"graf graf--pre graf-after--pre\">inv_sigmoid = inversefunc(sigmoid_activation)<\/pre>\n<pre id=\"c8d7\" class=\"graf graf--pre graf-after--pre\">y_hat = [nn(x) for x in x_values]<br>y_invsig = [inv_sigmoid(i) for i in y]<br>_ = plt.plot(x_values, y_invsig)<br>\n_ = plt.plot(x_values, y_hat)<br>\n_ = plt.xlim((0,1))<\/pre>\n<\/p>\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*YryNBGz5VYBOQ-2oZqesUA@2x.png\"><\/div>\n<\/div>\n<p><\/p>\n<p id=\"f036\" class=\"graf graf--p graf-after--figure\">This approximation is far from ideal. However is it straightforward to improve it, for example, by increasing the number of nodes or the number of layers (but at the same time avoiding overfitting).<\/p>\n<h3 id=\"9e8d\" class=\"graf graf--h3 graf-after--p\">Conclusion<\/h3>\n<p id=\"0e19\" class=\"graf graf--p graf-after--h3\">In this article, I described some basic mathematics underlying the universal properties of neural networks, and I showed a simple Python code that implements an approximation of a simple function.<\/p>\n<p id=\"e370\" class=\"graf graf--p graf-after--p\">Though the full code is already included in the article, my<span>\u00a0<\/span><a href=\"https:\/\/github.com\/marcotav\" class=\"markup--anchor markup--p-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">Github<span>\u00a0<\/span><\/a>and my personal website<span>\u00a0<\/span><a href=\"https:\/\/marcotavora.me\/\" class=\"markup--anchor markup--p-anchor\" rel=\"noopener noreferrer\" target=\"_blank\">www.marcotavora.me<\/a><span>\u00a0<\/span>have (hopefully) some other interesting stuff both about data science and about physics.<\/p>\n<p id=\"fb97\" class=\"graf graf--p graf-after--p\">Thanks for reading and see you soon!<\/p>\n<p id=\"82b1\" class=\"graf graf--p graf-after--p graf--trailing\">By the way, constructive criticism and feedback are always welcome!<\/p>\n<p class=\"graf graf--p graf-after--p graf--trailing\">\n<p class=\"graf graf--p graf-after--p graf--trailing\"><span>This article was <a href=\"https:\/\/towardsdatascience.com\/the-approximation-power-of-neural-networks-with-python-codes-ddfc250bdb58?source=friends_link&#038;sk=4a810070b293da5813b9f22d936261be\" target=\"_blank\" rel=\"noopener noreferrer\">originally published<\/a> on <a href=\"https:\/\/towardsdatascience.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Towards Data Science.<\/a><\/span><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:822146\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Marco Tavora Introduction It is a well-known fact that\u00a0neural networks can approximate the output of any continuous mathematical function, no matter how complicated it [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/05\/03\/the-approximation-power-of-neural-networks-with-python-codes\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":2092,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2091"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2091"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2091\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/2092"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2091"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2091"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2091"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}