{"id":4502,"date":"2021-03-22T06:33:54","date_gmt":"2021-03-22T06:33:54","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2021\/03\/22\/hurwitz-riemann-zeta-and-other-special-probability-distributions\/"},"modified":"2021-03-22T06:33:54","modified_gmt":"2021-03-22T06:33:54","slug":"hurwitz-riemann-zeta-and-other-special-probability-distributions","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2021\/03\/22\/hurwitz-riemann-zeta-and-other-special-probability-distributions\/","title":{"rendered":"Hurwitz-Riemann Zeta And Other Special Probability Distributions"},"content":{"rendered":"<p>Author: Vincent Granville<\/p>\n<div>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691835652?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691835652?profile=RESIZE_710x\" width=\"600\" class=\"align-center\"><\/a><\/p>\n<p style=\"text-align: center;\"><em>Source: <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/babar-mimou\" target=\"_blank\" rel=\"noopener\">here<\/a><\/em><\/p>\n<p>In my previous article\u00a0<a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/an-easy-way-to-solve-complex-optimization-problems\" target=\"_blank\" rel=\"noopener\">here<\/a>, I discussed a simple way to solve complex optimization problems in machine learning. This was illustrated in the case of complex dynamical systems, involving non-linear equations in infinite dimensions, known as functional equations. These equations were solved using a fixed point algorithm, of which the Newton\u2013Raphson method is a well known, widely used case.<\/p>\n<p>These equations are typically solved numerically, as no known theoretical solution is known in most cases. Nevertheless, in our case, a few examples have an exact, known solution. These examples with known solution are very useful, in the sense that they allow you to test your numerical algorithm and assess how fast it converges, or not. All the solutions were probability distributions, and in this article we introduce an even larger, generic class of problems (chaotic discrete dynamical systems) with known solution. The distributions presented here can thus be used as tests to benchmark optimization algorithms, but they also have their own interest for statistical modeling purposes, especially in risk management and extreme event modeling.<\/p>\n<p>Each dynamical system discussed here (or in my previous article) comes with two distributions:<\/p>\n<ul>\n<li>A continuous one on [0, 1], known as the <em>invariant distribution<\/em>.<\/li>\n<li>A discrete one taking on strictly positive integer values, known as the <em>digit distribution<\/em>.<\/li>\n<\/ul>\n<p>Besides, these distributions are very useful in number theory, though this will not be discussed here. The name Hurwitz and Riemann-Zeta is just a reminder of their strong connection to number theory problems such as continued fractions, approximation of irrational numbers by rational ones, the construction and distribution of the digits of random numbers in various numeration systems, and the famous <a href=\"https:\/\/en.wikipedia.org\/wiki\/Riemann_hypothesis\" target=\"_blank\" rel=\"noopener\">Riemann Hypothesis<\/a> that has a one million dollar prize attached to it. Some of this is discussed <a href=\"https:\/\/mathoverflow.net\/questions\/383925\/about-generalized-continued-fractions\" target=\"_blank\" rel=\"noopener\">here<\/a> and in some of my past MathOverflow questions. However, our focus here is applications in machine learning.<\/p>\n<p><span style=\"font-size: 14pt;\"><strong>1. The\u00a0Hurwitz-Riemann Zeta distribution<\/strong><\/span><\/p>\n<p>Without diving into the details, let me first briefly discuss other Riemann-related distributions invented by different authors. For a definition of the Hurwitz function, see <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hurwitz_zeta_function\" target=\"_blank\" rel=\"noopener\">here<\/a>. It generalizes the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Riemann_zeta_function\" target=\"_blank\" rel=\"noopener\">Riemann Zeta function<\/a>. The most well known probability distribution related to these functions is the discrete <a href=\"https:\/\/en.wikipedia.org\/wiki\/Zipf%27s_law\" target=\"_blank\" rel=\"noopener\">Zipf distribution<\/a>. It is well known by machine learning practitioners, and used to model phenomena such as &#8220;the top 10 websites amount to (say) 95% of the Internet traffic&#8221;.\u00a0 Another example, this time continuous over the set of all positive real numbers, can be found <a href=\"https:\/\/benthamopen.com\/FULLTEXT\/TOSPJ-7-53\" target=\"_blank\" rel=\"noopener\">here<\/a>. The paper is entitled <em>A New Class of Distributions Based on Hurwitz Zeta Function with Applications for Risk Management<\/em>. The author defines a family of distribution that generalizes the\u00a0exponential power, normal, gamma, Weibull, Rayleigh, Maxwell-Boltzmann and chi-squared distributions, with applications in actuarial sciences. Finally, there is also a well known example (for mathematicians) defined on the complex plane, see <a href=\"https:\/\/arxiv.org\/pdf\/1504.03438.pdf\" target=\"_blank\" rel=\"noopener\">here<\/a>. The paper is entitled\u00a0<em>A complete Riemann zeta distribution and the Riemann hypothesis<\/em>.<\/p>\n<p>Our Hurwitz-Riemann Zeta distribution is yet another example arising this time from discrete dynamical systems, continuous on [0, 1]. It also has a sister discrete distribution attached to it, useful for statistical modeling. It is defined as follows.<\/p>\n<p><strong>1.1. Our Hurwitz-Riemann Zeta distribution<\/strong><\/p>\n<p>The distribution discussed here is the most basic example, from the generic family described in section 2. It depends on one parameter <em>s<\/em>\u00a0 &gt;\u00a0 0, and the support domain is\u00a0[0, 1]. The construction mechanism is defined in section 2, for the general case. Our Hurwitz-Riemann zeta function has the following density:<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691224099?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691224099?profile=RESIZE_710x\" width=\"400\" class=\"align-center\"><\/a><\/p>\n<p>where\u00a0<span><em>\u03b6<\/em>(<em>s<\/em>, <em>x<\/em>) is the Hurwitz function, see <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hurwitz_zeta_function\" target=\"_blank\" rel=\"noopener\">here<\/a>. It has the following two first moments:<\/span><\/p>\n<p><span><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691286058?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691286058?profile=RESIZE_710x\" width=\"550\" class=\"align-center\"><\/a><\/span><\/p>\n<p>where\u00a0<em>\u03b6<\/em>(<em>s<\/em>) =\u00a0<em>\u03b6<\/em>(<em>s<\/em>, 1) is the Riemann Zeta function. This allows you to compute its variance. Higher moments can also be computed exactly. The cases <em>s<\/em> = 0, 1 or 2 are limiting cases, with the limit as <em>s<\/em> tends to zero, corresponding to the uniform density on [0, 1]. Particular values (<em>s<\/em> = 1, 2), empirically verified, are:<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691307680?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691307680?profile=RESIZE_710x\" width=\"400\" class=\"align-center\"><\/a><\/p>\n<p>Here\u00a0<span><em>\u03b3<\/em> = 0.57721&#8230; is the Euler-Mascheroni constant, see <a href=\"https:\/\/en.wikipedia.org\/wiki\/Euler%E2%80%93Mascheroni_constant\" target=\"_blank\" rel=\"noopener\">here<\/a>.\u00a0<\/span><\/p>\n<p><strong>1.2. The discrete version<\/strong><\/p>\n<p>These systems also have a discrete distribution attached to them, called the digit distribution, and described in section 2.\u00a0 In the Hurwitz-Riemann case, the probability that a digit is equal to <em>k<\/em>, is\u00a0<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691322267?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691322267?profile=RESIZE_710x\" width=\"300\" class=\"align-center\"><\/a><\/p>\n<p>The expectation is finite only if <em>s<\/em>\u00a0 &gt;\u00a0 1. Likewise, the variance is finite only if <em>s<\/em>\u00a0 &gt;\u00a0 2.\u00a0<\/p>\n<p><span style=\"font-size: 14pt;\"><strong>2. A generic family of distributions, with applications<\/strong><\/span><\/p>\n<p><span>We are dealing with a particular type of discrete dynamical system defined by\u00a0<\/span><em>x<\/em><span><span style=\"font-size: 8pt;\"><em>n<\/em>+1<\/span> = <em>p<\/em>(<em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>) &#8211; INT(<em>p<\/em>(<em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>)), where INT is the integer part function, and <em>x<\/em><span style=\"font-size: 8pt;\">0<\/span>\u00a0in [0, 1] is the initial condition. The function <em>p<\/em>, defined for real numbers in [0, 1], is strictly decreasing and invertible, with <em>p<\/em>(1) = 1 and <em>p<\/em>(0) infinite. The results discussed here valid for the vast majority of initial conditions, nevertheless there are infinitely many exceptions, for instance <em>x<\/em><span style=\"font-size: 8pt;\">0<\/span> = 0. These systems are discussed in details in my previous article, <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/an-easy-way-to-solve-complex-optimization-problems\" target=\"_blank\" rel=\"noopener\">here<\/a>. In this section, only the main results are presented. These systems have the following properties:<\/span><\/p>\n<ul>\n<li><span>The <em>n<\/em>-th digit\u00a0of <em>x<\/em><span style=\"font-size: 8pt;\">0<\/span> is <em>d<span style=\"font-size: 8pt;\">n<\/span><\/em> = INT(<em>p<\/em>(<em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>)). These digits are called <a href=\"https:\/\/www.tandfonline.com\/doi\/abs\/10.1080\/026811199282100?journalCode=cdss19\" target=\"_blank\" rel=\"noopener\">codewords<\/a> in the context of dynamical systems. The probability that a digit is equal to <em>k<\/em> (<em>k<\/em> = 1, 2, 3 and so on) is <em>F<\/em>(<em>q<\/em>(<em>k<\/em>)) &#8211; <em>F<\/em>(<em>q<\/em>(<em>k<\/em>+1)) where <em>F<\/em> and <em>q<\/em> are defined below.<\/span><\/li>\n<li><span>The invariant distribution <em>F<\/em>, which is the limit of the empirical distribution of the <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>&#8216;s, satisfies the following functional equation:\u00a0<a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691388861?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691388861?profile=RESIZE_710x\" width=\"250\" class=\"align-center\"><\/a><\/span><\/li>\n<\/ul>\n<p><span>where\u00a0<em>q<\/em>\u00a0is the inverse of the function\u00a0<em>p,\u00a0q<\/em>&#8216; denotes the derivative of <em>q<\/em>, and <em>f<\/em> (the invariant density) is the derivative of <em>F<\/em>. We focus only on the results that are of interest to machine learning professionals.\u00a0\u00a0<\/span><\/p>\n<p><span>Typically numerical methods are needed to solve the above functional equation, however here we are dealing with a large class of dynamical systems for which the theoretical solution is known. The purpose is to test numerical algorithms to check how well and how fast they can approach the exact solution, as discussed in section 2 <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/an-easy-way-to-solve-complex-optimization-problems\" target=\"_blank\" rel=\"noopener\">in my previous article<\/a>. The invariant distribution <em>F<\/em> discussed below is far more general than the ones described in my earlier article.\u00a0<\/span><\/p>\n<p><strong>2.1. Generalized Hurwitz-Riemann Zeta distribution<\/strong><\/p>\n<p><span>One way to find a dynamical system with know invariant distribution is to specify that distribution upfront, and then compute the resulting function <em>p<\/em>(<em>x<\/em>) that defines the system in question. Based on theory discussed <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/an-easy-way-to-solve-complex-optimization-problems\" target=\"_blank\" rel=\"noopener\">here<\/a> and <a href=\"https:\/\/mathoverflow.net\/questions\/385156\/exact-invariant-distribution-for-2d-discrete-dynamical-systems-including-contin\" target=\"_blank\" rel=\"noopener\">here<\/a>, one can proceed as follows. Try a monotonic increasing function <em>r<\/em>(<em>x<\/em>) with <em>r<\/em>(2) = 1 + <em>r<\/em>(1). Let <em>F<\/em>(<em>x<\/em>) = <em>r<\/em>(<em>x<\/em>+1) &#8211; <em>r<\/em>(1), and <em>R<\/em>(<em>x<\/em>) = <em>r<\/em>(<em>x<\/em>+1) &#8211; <em>r<\/em>(<em>x<\/em>). Then <em>R<\/em>(<em>x<\/em>) = <em>F<\/em>(<em>q<\/em>(<em>x<\/em>)), that is, <em>R<\/em>(<em>p<\/em>(<em>x<\/em>)) = <em>F<\/em>(<em>x<\/em>) since <em>q<\/em>(<em>p<\/em>(<em>x<\/em>)) = <em>x<\/em>. You can retrieve\u00a0<em>p<\/em>(<em>x<\/em>) by inverting <em>R<\/em>(<em>x<\/em>).\u00a0<\/span><\/p>\n<p><span>A simple but generic example is\u00a0<\/span><\/p>\n<p><span><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691691652?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691691652?profile=RESIZE_710x\" width=\"190\" class=\"align-center\"><\/a><\/span><\/p>\n<p><span>where\u00a0<em>\u03c8<\/em> is a strictly decreasing function with\u00a0<em>\u03c8<\/em>(\u221e) = 0,\u00a0 <em>\u03c8<\/em>(1) = 1, and <em>\u03c8<\/em>(0) =\u00a0\u221e. Then you have<\/span><\/p>\n<p><span><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691705091?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691705091?profile=RESIZE_710x\" width=\"280\" class=\"align-center\"><\/a><\/span><\/p>\n<p><span>In particular, the Hurwitz-Riemann particular case in section 1.1 corresponds to<\/span><\/p>\n<p><span><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691709297?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691709297?profile=RESIZE_710x\" width=\"300\" class=\"align-center\"><\/a><\/span><\/p>\n<p>It is easy to show that\u00a0<em>R<\/em>(<em>x<\/em>) =\u00a0\u00a0<span><em>\u03c8<\/em>(<em>x<\/em>), thanks to a careful choice for the function <em>r<\/em>(<em>x<\/em>). This explains why the system has a simple theoretical solution; it was indeed built to meet that goal.<\/span><\/p>\n<p><strong>2.2. Application<\/strong><\/p>\n<p><span>Besides being useful to test optimization algorithms against the exact solution (such as solving the above functional equation), the digits\u00a0 of the system have applications in simulations, encoding, random number generation, and statistical modeling. In particular, below is a picture featuring the typical behavior of the first 2,000 values of <em>p<\/em>(<em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>), starting with <em>x<\/em><span style=\"font-size: 8pt;\">0<\/span> = 0.5. Depending on the choice of the function\u00a0<em>\u03c8<\/em>,<em>\u00a0<\/em>these values may or may not be highly autocorrelated, and in some cases expectation and\/or variance are infinite, which implies that the autocorrelation does not exist. The picture below features the Hurwitz-Riemann case with <em>s<\/em> = 2 (expectation is finite but variance is infinite).<\/span><\/p>\n<p><span><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691827873?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8691827873?profile=RESIZE_710x\" width=\"500\" class=\"align-center\"><\/a><\/span><\/p>\n<p><span>Other special distributions are discussed in my previous articles:<\/span><\/p>\n<ul>\n<li><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/new-family-of-generalized-gaussian-distributions\" target=\"_blank\" rel=\"noopener\">New Family of Generalized Gaussian Distributions<\/a><\/li>\n<li><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/interesting-application-of-the-poisson-binomial-distribution\" target=\"_blank\" rel=\"noopener\">Interesting Application of the Poisson-Binomial Distribution<\/a><\/li>\n<li><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/a-strange-family-of-statistical-distributions\" target=\"_blank\" rel=\"noopener\">A Strange Family of Statistical Distributions<\/a><\/li>\n<\/ul>\n<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter,\u00a0<a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/check-out-our-dsc-newsletter\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/em><\/span><\/p>\n<p><span><em><strong>About the author<\/strong>:\u00a0 Vincent Granville is a d<span class=\"lt-line-clamp__raw-line\">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at\u00a0<a href=\"http:\/\/datashaping.com\/\" target=\"_blank\" rel=\"noopener\">DataShaping.com<\/a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).<\/span>\u00a0You can access Vincent&#8217;s articles and books,\u00a0<a href=\"http:\/\/datashaping.com\/\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/em><\/span><\/p>\n<\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:1044813\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Vincent Granville Source: here In my previous article\u00a0here, I discussed a simple way to solve complex optimization problems in machine learning. This was illustrated [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2021\/03\/22\/hurwitz-riemann-zeta-and-other-special-probability-distributions\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":475,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4502"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=4502"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4502\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/460"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=4502"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=4502"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=4502"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}