{"id":2495,"date":"2019-08-24T06:34:54","date_gmt":"2019-08-24T06:34:54","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2019\/08\/24\/a-strange-family-of-statistical-distributions\/"},"modified":"2019-08-24T06:34:54","modified_gmt":"2019-08-24T06:34:54","slug":"a-strange-family-of-statistical-distributions","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2019\/08\/24\/a-strange-family-of-statistical-distributions\/","title":{"rendered":"A Strange Family of Statistical Distributions"},"content":{"rendered":"<p>Author: Vincent Granville<\/p>\n<div>\n<p>I introduce here a family of very peculiar statistical distributions governed by two parameters: <em>p<\/em>, a real number in [0, 1], and <em>b<\/em>, an integer > 1. These distributions were discovered by solving the following functional equation, corresponding to <em>b<\/em> = 2.\u00a0<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3444011716?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3444011716?profile=RESIZE_710x\" class=\"align-center\"><\/a><\/p>\n<p>Here\u00a0<em>f<\/em>(<em>x<\/em>) is the density attached to that distribution. The support domain for <em>x<\/em> is also [0, 1]. This type of distribution appears in the following context.\u00a0<\/p>\n<p>Let <i>Z<\/i>\u00a0be an irrational number in [0, 1] (called\u00a0<em>seed<\/em>) and consider the sequence <em>x<\/em>(<em>n<\/em>) = {<em>b<\/em>^<em>n<\/em>\u00a0<i>Z<\/i>}. Here the brackets represent the fractional part function. In particular,\u00a0INT(<em>b<\/em>\u00a0<em>x<\/em>(<em>n<\/em>)) is the\u00a0<span style=\"font-size: 15px;\"><em><span class=\"math-container\"><span class=\"MathJax\" id=\"MathJax-Element-6-Frame\"><span class=\"MJX_Assistive_MathML\">n<\/span><\/span><\/span><\/em><\/span>-th digit of <i>Z<\/i><span style=\"font-size: 15px; white-space: nowrap;\">\u00a0<\/span>in base <em>b<\/em><span style=\"font-size: 15px; white-space: nowrap;\">.\u00a0<\/span>The values\u00a0<span style=\"font-size: 15px; white-space: nowrap;\"><em>x<\/em>(<em>n<\/em>)<\/span>\u00a0are distributed in a certain way due to the\u00a0<em>ergodicity<\/em>\u00a0of the underlying process. The density associated with this distribution is the function\u00a0<span style=\"font-size: 15px;\"><em><span style=\"white-space: nowrap;\">f<\/span><\/em><\/span>, and for the immense majority of seeds <i>Z<\/i><span style=\"font-size: 15px; white-space: nowrap;\">,\u00a0<\/span>that density is uniform on [0, 1].\u00a0Seeds<span style=\"font-size: 15px; white-space: nowrap;\">\u00a0<\/span>producing the uniform density are sometimes called\u00a0<em>normal<\/em>\u00a0numbers; their digit distribution is also uniform.<\/p>\n<p>However, the functional equation 2<em>f<\/em>(<em>x<\/em>) =\u00a0<em>f<\/em>(<em>x<\/em>\/2) + <em>f<\/em>((1+<em>x<\/em>)\/2) may have plenty of other solutions. Such solutions are called <em>non-standard<\/em> solutions. The set of seeds producing non-standard solutions is known to have Lebesgue measure zero, but there are infinitely many such seeds.\u00a0All rational seeds are, but they produce a discrete distribution. Thus their density is of the discrete type. We are interested here in a non-discrete solution.\u00a0<\/p>\n<p><strong>1. Example with <em>p<\/em> = 0.75 and <em>b<\/em> = 2<\/strong><\/p>\n<p>The uniform distribution corresponds to <em>p<\/em> = 0.5. Below is a non-standard density satisfying the requirements. Actually, the plot below represents its percentile distribution. It was produced with a seed <i>Z<\/i>\u00a0in [0,1] built as follows: the\u00a0<em>n<\/em>-th binary digit of <em>Z<\/em> is 1 if Rand(<em>n<\/em>)\u00a0 <\u00a0\u00a0<em>p<\/em>, and 0 otherwise, using a pseudo random number generator. Here <em>p<\/em> = 0.75. Note that P.25 = 0.5 and corresponds to a dip in the chart below (P.25 denotes the 25-<em>th<\/em> percentile.) Dips are everywhere, only the big ones are visible. By contrast, the percentile distribution for the uniform (standard) case <em>p<\/em> = 0.5\u00a0is a straight line, with no dips.<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3444041396?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3444041396?profile=RESIZE_710x\" class=\"align-center\"><\/a><\/p>\n<p><strong>2. General solution<\/strong><\/p>\n<p>The functional equation is a bit more complicated if <em>b<\/em> is not equal to 2. It becomes<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3444078196?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3444078196?profile=RESIZE_710x\" class=\"align-center\"><\/a><\/p>\n<p>Using the construction mechanism outlined in the previous section to generate a non-standard seed <i>Z<\/i>\u00a0(sometimes called a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Normal_number\" target=\"_blank\" rel=\"noopener noreferrer\">non-normal number<\/a> or <em>bad seed<\/em>), it is clear that <em>x<\/em>(<em>n<\/em>) is a random variable. We also have<a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3444047106?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3444047106?profile=RESIZE_710x\" class=\"align-center\"><\/a>where <em>b<\/em> is the base and <em>d<\/em>(<em>n<\/em>+<em>k<\/em>) is the (<em>n<\/em>+<em>k<\/em>)-th digit of the seed <i>Z<\/i>\u00a0in base <em>b<\/em>.\u00a0 This formula is very useful for computations. Note that <i>Z<\/i>\u00a0= <em>x<\/em>(0).\u00a0Furthermore, by construction, these digits are identically and independently distributed with a Bernouilli distribution of parameter <em>p<\/em>. Thus, using the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Convolution_theorem\" target=\"_blank\" rel=\"noopener noreferrer\">convolution theorem<\/a>, the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_(probability_theory)\" target=\"_blank\" rel=\"noopener noreferrer\">characteristic function<\/a> for the seed <i>Z<\/i>\u00a0is<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3444057860?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3444057860?profile=RESIZE_710x\" class=\"align-center\"><\/a><\/p>\n<p><span>Take the derivative of the inverse Fourier transform (see section\u00a0<\/span><em>inverse formula<\/em><span>\u00a0<\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_(probability_theory)\" rel=\"nofollow noreferrer\">here<\/a><span>) and you obtain<\/span><\/p>\n<p><span><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3444900371?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3444900371?profile=RESIZE_710x\" class=\"align-center\"><\/a><\/span><\/p>\n<p><span>If <em>p<\/em> = 0.5 and <em>b<\/em> = 2 we are back to the uniform case. Otherwise the solution is quite special: the density <em>f<\/em> is nowhere differentiable it seems. See picture below for <em>p<\/em> = 0.55 and <em>b<\/em> = 2.<\/span><\/p>\n<p><span><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3444066850?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3444066850?profile=RESIZE_710x\" class=\"align-center\"><\/a><\/span><\/p>\n<p><span>Now we should prove that this case is <em>ergodic<\/em>, for the functional equation to apply. I also tried to check with some sampled values of <em>x<\/em> to see whether\u00a02<em>f<\/em>(<em>x<\/em>) =\u00a0<em>f<\/em>(<em>x<\/em>\/2) + <em>f<\/em>((1+<em>x<\/em>)\/2)<\/span><span>, but the function being discontinuous everywhere, and since I got its value approximated probably to no more than two decimals, it is not easy.<\/span><\/p>\n<p><strong>3. Applications, properties and data<\/strong><\/p>\n<p><span>The distribution attached to this type of density has the following moments:<\/span><\/p>\n<ul>\n<li><span><strong>Expectation<\/strong>: <em>p<\/em> \/ (<em>b<\/em> &#8211; 1).<\/span><\/li>\n<li><span><strong>Variance<\/strong>: <em>p<\/em>(1 &#8211;\u00a0<em>p<\/em>) \/ (<em>b<\/em>^2 &#8211; 1).<\/span><\/li>\n<\/ul>\n<p>Why does <em>f<\/em>(<em>x<\/em>) must satisfy the functional equation discussed above? This a consequence of the fact that the underlying distribution is the equilibrium distribution for the sequence <em>x<\/em>(<em>n<\/em>) = {<em>b<\/em> <em>x<\/em>(<em>n<\/em>-1) } = {<em>b<\/em>^<em>n<\/em> <em>Z<\/em>}. In particular, the equilibrium distribution is solution to some stochastic integral equation P(<i>X<\/i>\u00a0< x) = P({<em>b<\/em> <em>X<\/em>}\u00a0 <\u00a0\u00a0<em>x<\/em>).\u00a0 For details, see my book\u00a0<em>Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems<\/em>\u00a0<a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/fee-book-applied-stochastic-processes\" target=\"_blank\" rel=\"noopener noreferrer\">available here<\/a>, see pages 65-66.<\/p>\n<p>Potential applications are found in cryptography, Fintech (stock market modeling), Bitcoin, number theory, random number generation, benchmarking statistical tests (see <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/fascinating-new-results-in-the-theory-of-randomness\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>) and even gaming (see <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/data-science-foundations-for-a-new-stock-market\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>.) However, the most interesting application is probably to gain insights about how non-normal numbers look like, especially their chaotic nature. It is a fundamental tool to help solve one of the most intriguing mathematical conjectures of all times (yet unsolved): are the digits of standard constants such as Pi or SQRT(2) uniformly distributed or not? For instance, when <em>b<\/em> = 2, any departure from <em>p<\/em> = 0.5 (a normal seed) results in a strong discontinuity for <em>f<\/em>(<em>x<\/em>) at <em>x<\/em> = 0.5. If you look at the above chart, <em>f(<\/em>0) = <em>f(<\/em>1\/2) = <em>f<\/em>(1) regardless of <em>p<\/em>, but discontinuities are masking this fact.\u00a0<\/p>\n<p>The charts featured here, as well as the underlying computations, were all produced in Excel. You can download the spreadsheet <a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/3444184412?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>. In particular, a very efficient algorithm is used to produce (say) one million digits of Z, and to compute one million successive values of <em>x<\/em>(<em>n<\/em>) each with a precision of 14 decimals. You can play interactively with the parameters <em>b<\/em> and <em>p<\/em> in the spreadsheet, and even try non-integer values of <em>b<\/em> (I suggest you try <em>b<\/em> = 1.5 and <em>p<\/em> = 0.5). If <em>b<\/em>\u00a0 < 2 is not an integer, the functional equation is more complicated: it is found in section 2.1\u00a0<a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/fascinating-new-results-in-the-theory-of-randomness\" target=\"_blank\" rel=\"noopener noreferrer\">in this article<\/a>.\u00a0<\/p>\n<\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:877593\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Vincent Granville I introduce here a family of very peculiar statistical distributions governed by two parameters: p, a real number in [0, 1], and [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2019\/08\/24\/a-strange-family-of-statistical-distributions\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":456,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2495"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=2495"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/2495\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/468"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=2495"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=2495"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=2495"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}