{"id":4833,"date":"2021-07-16T06:34:02","date_gmt":"2021-07-16T06:34:02","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2021\/07\/16\/central-limit-theorem-for-non-independent-random-variables\/"},"modified":"2021-07-16T06:34:02","modified_gmt":"2021-07-16T06:34:02","slug":"central-limit-theorem-for-non-independent-random-variables","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2021\/07\/16\/central-limit-theorem-for-non-independent-random-variables\/","title":{"rendered":"Central Limit Theorem for Non-Independent Random Variables"},"content":{"rendered":"<p>Author: Vincent Granville<\/p>\n<div>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9256085655?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9256085655?profile=RESIZE_710x\" width=\"600\" class=\"align-center\"><\/a><\/p>\n<p>The original version of the central limit theorem (CLT) assumes <em>n<\/em> independently and identically distributed (i.i.d.) random variables <em>X<\/em><span style=\"font-size: 8pt;\">1<\/span>, &#8230;, <em>X<span style=\"font-size: 8pt;\">n<\/span><\/em>, with finite variance. Let <em>S<span style=\"font-size: 8pt;\">n<\/span><\/em> =\u00a0<em>X<\/em><span style=\"font-size: 8pt;\">1<\/span> + &#8230; + <em>X<span style=\"font-size: 8pt;\">n<\/span><\/em>. Then the CLT states that<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9255975469?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9255975469?profile=RESIZE_710x\" width=\"200\" class=\"align-center\"><\/a><\/p>\n<p>that is, it follows a normal distribution with zero mean and unit variance, as <em>n<\/em> tends to infinity. Here\u00a0<em>\u03bc<\/em>\u00a0 is the expectation of <em>X<\/em><span style=\"font-size: 8pt;\">1<\/span>.<\/p>\n<p>Various generalizations have been discovered, including for weakly correlated random variables. Note that the absence of correlation is not enough for the CLT to apply (see counterexamples <a href=\"https:\/\/math.stackexchange.com\/questions\/2730696\/central-limit-theorem-for-dependent-random-variables-with-covariance-condition\" target=\"_blank\" rel=\"noopener\">here<\/a>). Likewise, even in the presence of correlations, the CLT can still be valid under certain conditions.\u00a0 If auto-correlations are decaying fast enough, some results are available, see <a href=\"https:\/\/en.wikipedia.org\/wiki\/Central_limit_theorem#CLT_under_weak_dependence\" target=\"_blank\" rel=\"noopener\">here<\/a>.\u00a0 The theory is somewhat complicated. Here our goal is to show a simple example to help you understand the mechanics of the CLT in that context. The example involves observations\u00a0<em>X<\/em><span style=\"font-size: 8pt;\">1<\/span>, &#8230;, <em>X<span><span style=\"font-size: 8pt;\">n<\/span><\/span><\/em> that behave like a simple type of time series: AR(1), also known as autoregressive time series of order one, a well studied process (see section 3.2 in <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/new-approach-to-linear-algebra-in-machine-learning\" target=\"_blank\" rel=\"noopener\">this article<\/a>).<\/p>\n<p><span style=\"font-size: 14pt;\"><strong>1. Example<\/strong><\/span><\/p>\n<p>The example in question consists of observations governed by the following time series model: <em>X<\/em><span style=\"font-size: 8pt;\"><em>k<\/em>+1<\/span> =\u00a0<span><em>\u03c1X<\/em><span style=\"font-size: 8pt;\"><em>k<\/em><\/span> + <em>Y<\/em><span style=\"font-size: 8pt;\"><em>k<\/em>+1<\/span>, with <em>X<\/em><span style=\"font-size: 8pt;\">1<\/span> = <em>Y<\/em><span style=\"font-size: 8pt;\">1<\/span>, and <em>Y<\/em><span style=\"font-size: 8pt;\">1<\/span>, &#8230;, <em>Y<\/em><span style=\"font-size: 8pt;\"><em>n<\/em><\/span> are i.i.d. with zero mean and unit variance. We assume that |<em>\u03c1<\/em>|\u00a0 &lt;\u00a0 1. It is easy to establish the following:<\/span><\/p>\n<p><span><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9255974463?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9255974463?profile=RESIZE_710x\" width=\"300\" class=\"align-center\"><\/a><\/span><\/p>\n<p>Here &#8220;~&#8221; stands for &#8220;asymptotically equal to&#8221; as n tends to infinity. Note that the lag-<em>k<\/em> autocorrelation in the time series of observations <em>X<\/em><span style=\"font-size: 8pt;\">1<\/span>, &#8230;, <em>X<\/em><span style=\"font-size: 8pt;\"><em>n<\/em><\/span> is asymptotically equal to\u00a0<span><em>\u03c1<\/em>^<em>k<\/em> (<em>\u03c1<\/em> at power <em>k<\/em>), so autocorrelations are decaying exponentially fast. Finally, the adjusted CLT (the last formula above) now includes a factor 1 &#8211;\u00a0<em>\u03c1<\/em>. If course if\u00a0<em>\u03c1<\/em> = 0, it corresponds to the classic CLT when expected values are zero.<\/span><\/p>\n<p><strong>1.2. More examples<\/strong><\/p>\n<p>Let <em>X<\/em><span style=\"font-size: 8pt;\">1<\/span> be uniform on [0, 1] and <em>X<\/em><span style=\"font-size: 8pt;\"><em>k<\/em>+1<\/span> = FRAC(<em>bX<span style=\"font-size: 8pt;\">k<\/span><\/em>) where <em>b<\/em> is an integer strictly larger than one, and FRAC is the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Fractional_part\" target=\"_blank\" rel=\"noopener\">fractional part function<\/a>. Then it is known that <em>X<span style=\"font-size: 8pt;\">k<\/span><\/em> also has a uniform distribution on [0, 1], but the <em>X<span style=\"font-size: 8pt;\">k<\/span><\/em>&#8216;s are autocorrelated with exponentially decaying lag-<em>k<\/em> autocorrelations equal to 1 \/ <em>b<\/em>^<em>k<\/em>. So I expect that the CLT would apply to this case.\u00a0<\/p>\n<p>Now let\u00a0 <em>X<\/em><span style=\"font-size: 8pt;\">1<\/span> be uniform on [0, 1] and <em>X<\/em><span style=\"font-size: 8pt;\"><em>k<\/em>+1<\/span> = FRAC(<em>b<\/em>+<em>X<span style=\"font-size: 8pt;\">k<\/span><\/em>) where <em>b<\/em> is a positive irrational number. Again, <em>X<span style=\"font-size: 8pt;\">k<\/span><\/em> is uniform on [0, 1]. However this time we have strong, long-range autocorrelations, see <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/long-range-correlation-in-time-series-tutorial-and-case-study\" target=\"_blank\" rel=\"noopener\">here<\/a>. I will publish results about this case (as to whether or not CLT still applies) in a future article.<\/p>\n<p><span style=\"font-size: 14pt;\"><strong>2. Results based on simulations<\/strong><\/span><\/p>\n<p>The simulation consisted of generating 100,000 time series\u00a0<em>X<\/em><span style=\"font-size: 8pt;\">1<\/span>, &#8230;, <em>X<\/em><span><em><span style=\"font-size: 8pt;\">n<\/span>\u00a0<\/em>as in section 1.1,<em>\u00a0<\/em><\/span>with <span><em>\u03c1\u00a0<\/em><\/span>= 1\/2, each one with\u00a0<em>n<\/em> = 10,000 observations, computing <em>S<span style=\"font-size: 8pt;\">n<\/span><\/em>\u00a0for each of them, and standardizing <em>S<span style=\"font-size: 8pt;\">n<\/span><\/em> to see if it follows a <em>N<\/em>(0, 1) distribution. The empirical density follows a normal law with zero mean and unit variance very closely, as shown in the figure below. We used uniform variables with zero mean and unity variance to generate the deviates <em>Y<span style=\"font-size: 8pt;\">k<\/span><\/em>.<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9256083059?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9256083059?profile=RESIZE_710x\" width=\"500\" class=\"align-center\"><\/a><\/p>\n<p>Below is one instance (realization) of these simulated time series, featuring the first <em>n<\/em> = 150 observations. The Y-axis represents <em>X<span style=\"font-size: 8pt;\">k<\/span><\/em>, the X-axis represents <em>k<\/em>.\u00a0<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9256076454?profile=original\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/9256076454?profile=RESIZE_710x\" width=\"500\" class=\"align-center\"><\/a><\/p>\n<p>It behaves quite differently from a white noise due to the auto-correlations.<\/p>\n<\/p>\n<p><span><em>To receive a weekly digest of our new articles, subscribe to our newsletter,\u00a0<a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/check-out-our-dsc-newsletter\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/em><\/span><\/p>\n<p><span><em><strong>About the author<\/strong>:\u00a0 Vincent Granville is a d<span class=\"lt-line-clamp__raw-line\">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at\u00a0<a href=\"http:\/\/datashaping.com\/\" target=\"_blank\" rel=\"noopener\">DataShaping.com<\/a>, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).<\/span>\u00a0He recently opened\u00a0<a href=\"https:\/\/www.parisrestaurantandbar.com\/\" target=\"_blank\" rel=\"noopener\">Paris Restaurant<\/a>, in Anacortes. You can access Vincent&#8217;s articles and books,\u00a0<a href=\"http:\/\/datashaping.com\/\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/em><\/span><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:1057241\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Vincent Granville The original version of the central limit theorem (CLT) assumes n independently and identically distributed (i.i.d.) random variables X1, &#8230;, Xn, with [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2021\/07\/16\/central-limit-theorem-for-non-independent-random-variables\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":461,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4833"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=4833"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4833\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/473"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=4833"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=4833"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=4833"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}