{"id":4149,"date":"2020-12-03T06:32:06","date_gmt":"2020-12-03T06:32:06","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2020\/12\/03\/new-tests-of-randomness-and-independence-for-sequences-of-observations\/"},"modified":"2020-12-03T06:32:06","modified_gmt":"2020-12-03T06:32:06","slug":"new-tests-of-randomness-and-independence-for-sequences-of-observations","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2020\/12\/03\/new-tests-of-randomness-and-independence-for-sequences-of-observations\/","title":{"rendered":"New Tests of Randomness and Independence for Sequences of Observations"},"content":{"rendered":"<p>Author: Vincent Granville<\/p>\n<div>\n<p>There is no statistical test that assesses whether a sequence of observations, time series, or residuals in a regression model, exhibits independence or not. Typically, what data scientists do is to look at auto-correlations and see whether they are close enough to zero. If the data follows a Gaussian distribution, then absence of auto-correlations implies independence. Here however, we are dealing with non-Gaussian observations. The setting is similar to testing whether a pseudo-random number generator is random enough, or whether the digits of a number such as&nbsp;<span>&pi;&nbsp;<\/span>behave in a way that looks random, even though the sequence of digits is deterministic. Batteries of statistical tests are available to address this problem, but there is no one-fit-all solution.<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8242402469?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8242402469?profile=RESIZE_710x\" width=\"500\" class=\"align-center\"><\/a><\/p>\n<p>Here we propose a new approach. Likewise, it is not a panacea, but rather a set of additional powerful tools to help test for independence and randomness. The data sets under consideration are specific mathematical sequences, some of which are known to exhibit independence \/ randomness or not. Thus, it constitutes a good setting to benchmark and compare various statistical tests and see how well they perform. This kind of data is also more natural and looks more real than synthetic data obtained via simulations.&nbsp;<\/p>\n<p><span style=\"font-size: 14pt;\"><strong>1. Definition of random-like sequences<\/strong><\/span><\/p>\n<p>Since we are dealing with deterministic sequences (<em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>) indexed by <em>n<\/em> = 1, 2, and so on, it is worth defining what we mean by <em>independence<\/em> and <em>random-like<\/em>.&nbsp; These two elementary concepts are very intuitive, but a formal definition may help. You may skip this section if you have an intuitive understanding of the concepts in question, as the layman does. Independence in this context is sometimes called <em>asymptotic independence<\/em>, see <a href=\"https:\/\/mathoverflow.net\/questions\/372103\/recursive-random-number-generator-based-on-irrational-numbers\/\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>. Also, for all the sequences investigated here,&nbsp;<em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>&nbsp;&isin; [0,1].<\/p>\n<p><strong>1.1. Definition of random-like and independence<\/strong><\/p>\n<p>A sequence (<em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>) with <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>&nbsp;&isin; [0,1] is <em>random-like<\/em> if it satisfies the following property. For any finite index family <em>h<\/em><span style=\"font-size: 8pt;\">1<\/span>,&hellip;, <em>h<span style=\"font-size: 8pt;\">k<\/span><\/em> and for any <span style=\"font-size: 12pt;\"><em>t<span style=\"font-size: 8pt;\">1<\/span><\/em><\/span>,&hellip;, <em>t<span style=\"font-size: 8pt;\">k<\/span><\/em> &isin; [0,1], we have&nbsp;<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8238499286?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8238499286?profile=RESIZE_710x\" width=\"400\" class=\"align-center\"><\/a><\/p>\n<p>The probabilities are empirical probabilities, that is, based on frequency counts. For instance,<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8238501465?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8238501465?profile=RESIZE_710x\" width=\"450\" class=\"align-center\"><\/a><\/p>\n<p>where &chi;(<em>A<\/em>) is the indicator function (equal to 1 if the event <em>A<\/em> is&nbsp; true, and equal to 0 otherwise). Random-like implies independence, but the converse is not true. A sequence is <em>independently distributed<\/em> if it satisfies the weaker property&nbsp;<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8238506260?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8238506260?profile=RESIZE_710x\" width=\"400\" class=\"align-center\"><\/a><\/p>\n<p>Random-like means that the <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>&#8216;s&nbsp; all have the same underlying uniform distribution on [0, 1], and are independently distributed.&nbsp;<\/p>\n<p><strong>1.2. Definition of lag-<em>k<\/em> autocorrelation<\/strong><\/p>\n<p>Again, this is just the standard definition of auto-correlations, but applied to infinite deterministic sequences.&nbsp;The lag-<em>k<\/em> auto-correlation &rho;<span style=\"font-size: 8pt;\"><em>k<\/em><\/span> is defined as follows. First define &rho;<span style=\"font-size: 8pt;\"><em>k<\/em><\/span>(<em>n<\/em>) as the empirical correlation between (<em>x<\/em><span style=\"font-size: 8pt;\">1<\/span>,&hellip;, <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>) and (<em>x<span style=\"font-size: 8pt;\">k<\/span><\/em><span style=\"font-size: 8pt;\">+1<\/span>,&hellip; ,<em>x<span style=\"font-size: 8pt;\">k<\/span><\/em><span style=\"font-size: 8pt;\">+<em>n<\/em><\/span>). Then &rho;<span style=\"font-size: 8pt;\"><em>k<\/em><\/span> is the limit (if it exists) of &rho;<span style=\"font-size: 8pt;\"><em>k<\/em><\/span>(<span style=\"font-size: 12pt;\"><em>n<\/em><\/span>) as <em>n<\/em> tends to infinity.&nbsp;<\/p>\n<p><strong>1.3. Equidistribution and fractional part denoted as { }<\/strong><\/p>\n<p>The fractional part of a positive real number <em>x<\/em> is denoted as { <em>x<\/em> }. For instance, { 3.141592 } = 0.141592. The sequences investigated here come from number theory. In that context, concepts such as random-like and identically distributed are rarely used. Instead, mathematicians rely on the weaker concept of <em>equidistribution<\/em>, also called equidistribution modulo 1. Closer to independence is the concept of equidistribution in higher dimensions, for instance if two successive values (<em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>, <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em><span style=\"font-size: 8pt;\">+1<\/span>) are equidistributed on [0, 1] x [0, 1].<\/p>\n<p>A sequence can be equidistributed yet exhibits strong auto-correlations. The most famous example is the sequence <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em> = { <em>&alpha;n<\/em> }&nbsp; where&nbsp;<em>&alpha;<\/em> is a positive irrational number. While equidistributed, it has strong lag-<em>k<\/em> auto-correlations for every strictly positive integer <em>k<\/em>,&nbsp;and it is anything but random-like. A sequence that looks perfectly random-like is the digits of <span>&pi;<\/span>: they can not be distinguished from a realization of a perfect&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Bernoulli_process\" target=\"_blank\" rel=\"noopener noreferrer\">Bernouilli process<\/a>. Such random-like sequences are very useful in cryptographic applications.<\/p>\n<p><span style=\"font-size: 14pt;\"><strong>2. Testing well-known sequences<\/strong><\/span>&nbsp;<\/p>\n<p>The sequences we are interested in are <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em> = { <em>&alpha; n<\/em>^<em>p<\/em> }<b>&nbsp;<\/b>&nbsp;where {&nbsp; } is the fractional part function (see section 1.3), <em>p<\/em>&nbsp; &gt;&nbsp; 1 is a real number and&nbsp;<em>&alpha;<\/em> is a positive irrational number. Other sequences are discussed in section 3. It is well known that these sequences are equidistributed. Also, if <em>p<\/em> = 1, these sequences are highly auto-correlated and thus the terms <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>&#8216;s are not independently distributed, much less random-like; the exact theoretical lag-<em>k<\/em> auto-correlations are known. The question here is what happens if <em>p<\/em>&nbsp; &gt;&nbsp; 1. It seems that in that case, there is much more randomness. In this section, we explore three statistical tests (including a new one) to assess how random these sequences can be depending on the parameters <em>p<\/em> and&nbsp;<em>&alpha;<\/em>. The theoretical answer to that question is known, thus this provides a good case study to check how various statistical tests perform to detect randomness, or lack of it.<\/p>\n<p><strong>2.1. The gap test<\/strong><\/p>\n<p>The gap test (some people may call it run test) proceeds as follows. Let us define the binary digit <em>d<span style=\"font-size: 8pt;\">n<\/span><\/em> as <em>d<span style=\"font-size: 8pt;\">n<\/span><\/em> = &lfloor;2<em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>&rfloor;. The brackets represent the integer part function. Say <em>d<span style=\"font-size: 8pt;\">n<\/span><\/em> = 0 and <em>d<span style=\"font-size: 8pt;\">n<\/span><\/em><span style=\"font-size: 8pt;\">+1&nbsp;<\/span>= 1 for a specific n. If <em>d<span style=\"font-size: 8pt;\">n<\/span><\/em> is followed by <em>G<\/em> successive digits <em>d<span style=\"font-size: 8pt;\">n<\/span><\/em><span style=\"font-size: 8pt;\">+1<\/span>,&hellip;,&nbsp;<em>d<span style=\"font-size: 8pt;\">n<\/span><\/em><span style=\"font-size: 8pt;\">+<em>G<\/em><\/span> all equal to 1 and then <em>d<span style=\"font-size: 8pt;\">n<\/span><\/em><span style=\"font-size: 8pt;\">+<em>G<\/em>+1<\/span> = 0, we have one instance of a gap of length <em>G<\/em>. Compute the empirical distribution of these gaps. Assuming 50% of the digits are 0 (this is the case in all our examples), then the empirical gap distribution converges to a geometric distribution of parameter 1\/2 if the sequence <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em> is random-like.<\/p>\n<p>This is best illustrated in chapter 4 of my book <em>Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems,&nbsp;<\/em>available <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/fee-book-applied-stochastic-processes\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>.&nbsp;<\/p>\n<p><strong>2.2. The collinearity test<\/strong><\/p>\n<p>Many sequences pass several tests yet fail the collinearity test. This test checks whether there are <em>k<\/em> constants <em>a<\/em><span style=\"font-size: 8pt;\">1<\/span>, &#8230;, <em>a<span style=\"font-size: 8pt;\">k<\/span><\/em>&nbsp;with <em>a<span style=\"font-size: 8pt;\">k<\/span><\/em> not equal to zero, such that <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em><span style=\"font-size: 8pt;\">+<em>k<\/em><\/span> = <em>a<\/em><span style=\"font-size: 8pt;\">1<\/span> <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em><span style=\"font-size: 8pt;\">+<em>k-1<\/em><\/span> + <em>a<\/em><span style=\"font-size: 8pt;\">2<\/span> <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em><span style=\"font-size: 8pt;\">+<em>k<\/em>-2<\/span> + &#8230; + <em>a<span style=\"font-size: 8pt;\">k<\/span><\/em> <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em> takes only on a finite (usually small) number of values. In short, it addresses this question: are <em>k<\/em> successive values of the sequence <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>&nbsp;always lie (exactly, approximately, or asymptotically)&nbsp; in a finite number of hyperplanes of dimension <em>k<\/em> &#8211; 1? This test has been used to determine that some congruential pseudo-random number generators were of very poor quality, see <a href=\"https:\/\/en.wikipedia.org\/wiki\/RANDU\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>. It is illustrated in section 3, with <em>k<\/em> = 2.&nbsp;<\/p>\n<p>Source code and examples for <em>k<\/em> = 3 can be found <a href=\"https:\/\/mathoverflow.net\/questions\/372103\/recursive-random-number-generator-based-on-irrational-numbers\/\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>.&nbsp;<\/p>\n<p><strong>2.3. The independence test<\/strong><\/p>\n<p>This may be a new test: I could not find any reference to it in the literature. It does not test for full independence, but rather for random-like behavior in small dimensions (<em>k<\/em> = 2, 3, 4). Beyond <em>k<\/em> = 4, it becomes somewhat unpractical as it requires a number of observations (that is, the number of computed terms in the sequence) growing exponentially fast with <em>k<\/em>. However, it is a very intuitive test. It proceeds as follows, for a fixed <em>k<\/em>:<\/p>\n<ul>\n<li>Let <em>N&nbsp;<\/em> &gt;&nbsp; 100 be an integer<\/li>\n<li>Let <em>T<\/em> be a <em>k<\/em>-uple (<em>t<\/em><span style=\"font-size: 8pt;\">1<\/span>,&#8230;, <em>t<span style=\"font-size: 8pt;\">k<\/span><\/em>) with <i>t<span style=\"font-size: 8pt;\">j<\/span><\/i><span style=\"font-size: 8pt;\">&nbsp;<\/span>&isin; [0,1] for&nbsp;<em>j<\/em> = 1, &#8230;, <em>k.<\/em><\/li>\n<li>Compute the following two quantities, with &chi; being the indicator function as in section 1.2:<\/li>\n<\/ul>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8242040856?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8242040856?profile=RESIZE_710x\" width=\"400\" class=\"align-center\"><\/a><\/p>\n<ul>\n<li>Repeat this computation for <em>M<\/em> different <em>k<\/em>-uples randomly selected in the <em>k<\/em>-dimensional unit hypercube<\/li>\n<\/ul>\n<p>Now plot the <em>M<\/em> vectors (<em>P<span style=\"font-size: 8pt;\">T<\/span>, Q<span style=\"font-size: 8pt;\">T<\/span><\/em>), each corresponding to a different <em>k<\/em>-uple, on a scatterplot. Unless the <em>M<\/em> points lie very close to the main diagonal on the scatterplot, the sequence <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em> is not random-like. To see how far away you can be from the main diagonal without violating the random-like assumption, do the same computations for 10 different sequences consisting this time of truly random terms. This will give you a confidence band around the main diagonal, and vectors&nbsp;(<em>P<span style=\"font-size: 8pt;\">T<\/span>, Q<span style=\"font-size: 8pt;\">T<\/span><\/em>) lying outside that band, for the original sequence you are interested in, suggests areas where the randomness assumption is violated. This is illustrated in the picture below, originally posted <a href=\"https:\/\/mathoverflow.net\/questions\/372103\/recursive-random-number-generator-based-on-irrational-numbers\/\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>:&nbsp;<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8242055058?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8242055058?profile=RESIZE_710x\" width=\"300\" class=\"align-center\"><\/a><\/p>\n<p style=\"text-align: center;\"><strong>Figure 1<\/strong><\/p>\n<p>As you can see, there is a strong enough departure from the main diagonal, and the sequence in question (see same reference) is known not to be random-like. The X-axis features <em>P<span style=\"font-size: 8pt;\">T<\/span><\/em>, and the Y-axis features <em>Q<span style=\"font-size: 8pt;\">T<\/span><\/em>. An example with known random-like behavior, resulting in an almost perfect diagonal, is also featured in the same article. Notice that there are fewer and fewer points as you move towards the upper right corner. The higher <em>k<\/em>, the more sparse the upper right corner will be. In the above example, <em>k<\/em> = 3. To address this issue, proceed as follows, stretching the point distribution along the diagonal:<\/p>\n<ul>\n<li>Let <em>P*<span style=\"font-size: 8pt;\">T<\/span><\/em> = (- 2 log <em>P<span style=\"font-size: 8pt;\">T<\/span><\/em>) \/ <em>k<\/em> and <em>Q<\/em>*<span style=\"font-size: 8pt;\"><em>T<\/em><\/span> = (- 2 log <em>Q<span style=\"font-size: 8pt;\">T<\/span><\/em>) \/ <em>k<\/em>. This is a transformation leading to a Gamma(<em>k<\/em>, 2\/<span style=\"font-size: 10pt;\"><em>k<\/em><\/span>) distribution. See explanations <a href=\"https:\/\/stats.stackexchange.com\/questions\/89949\/geometric-mean-of-uniform-variables\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>.&nbsp;<\/li>\n<li>Let <em>P<\/em>**<span style=\"font-size: 8pt;\"><em>T<\/em><\/span> = <em>F<\/em>(<span style=\"font-size: 12pt;\"><em>P<\/em><\/span>*<span style=\"font-size: 8pt;\"><em>T<\/em><\/span>) and <em>Q<\/em>**<span style=\"font-size: 8pt;\"><em>T<\/em><\/span> = <em>F<\/em>(<i>Q<\/i>*<span style=\"font-size: 8pt;\"><em>T<\/em><\/span>) where <em>F<\/em> is the cumulative distribution function of a Gamma(<em>k<\/em>, 2\/<span style=\"font-size: 10pt;\"><em>k<\/em><\/span>) random variable.<\/li>\n<\/ul>\n<p>By virtue of the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Inverse_transform_sampling\" target=\"_blank\" rel=\"noopener noreferrer\">inverse transform sampling theorem<\/a>, the points (<em>P<\/em>**<span style=\"font-size: 8pt;\"><em>T<\/em><\/span>, <em>Q<\/em>**<span style=\"font-size: 8pt;\"><em>T<\/em><\/span>) are now uniformly stretched along the main diagonal.&nbsp;<\/p>\n<p><span style=\"font-size: 14pt;\"><strong>3. Results and generalization<\/strong><\/span><\/p>\n<p>Let&#8217;s get back to our sequence <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em> = {&nbsp;<em>&alpha; n<\/em>^<em>p<\/em> } with <em>p<\/em>&nbsp; &gt;&nbsp; 1 and <em>&alpha;<\/em> irrational. Before showing and discussing some charts, I want to discuss a few issues. First, if <em>p<\/em> is large, machine accuracy will quickly result in erroneous computations for <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>. You need to detect when loss of accuracy becomes a critical problem, usually well below <em>n<\/em> = 1,000 if <em>p<\/em> = 5. Working with double precision arithmetic will help. Another issue, if <em>p<\/em> is close to 1, is the fact that randomness does not kick in until <em>n<\/em> is large enough. You may have to ignore the first few hundreds terms of the sequence in that case. If <em>p<\/em> = 1, randomness never occurs. Also, we have assumed that the marginal distributions are uniform on [0, 1]. From the theoretical point of view, they indeed are, and it will show if you compute the empirical percentile distribution of <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>, even in the presence of strong auto-correlations (the reason why is because of the ergodic nature of the sequences in question, but this topic is beyond the scope of the present article). So it would be a good exercise to use various statistical tools or libraries to assess whether they can confirm the uniform distribution assumption.<\/p>\n<p><strong>3.1. Examples<\/strong><\/p>\n<p>The exact theoretical value of the lag-<em>k<\/em> auto-correlation is known for all <em>k<\/em>&nbsp;if <em>p<\/em> = 1. See section 5.4 in <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/fascinating-new-results-in-the-theory-of-randomness\" target=\"_blank\" rel=\"noopener noreferrer\">this article<\/a>.&nbsp; It is almost never equal to zero, but it turns out that if <em>k<\/em> = 1, <em>p<\/em> = 1 and&nbsp;<em>&alpha;<\/em> = (3 + SQRT(3))\/6, it is indeed equal to zero. Use a statistical package to see if it can detect this fact, or ask your team to do the test. Also, if <em>p<\/em> is an integer, show (using statistical techniques) that for some <em>a<\/em><span style=\"font-size: 8pt;\">1<\/span>, &#8230;, <em>a<\/em><span style=\"font-size: 8pt;\">k<\/span>, we have&nbsp;<em>x<span style=\"font-size: 8pt;\">n<\/span><\/em><span style=\"font-size: 8pt;\">+<em>k<\/em><\/span> = <em>a<\/em><span style=\"font-size: 8pt;\">1<\/span> <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em><span style=\"font-size: 8pt;\">+<em>k-1<\/em><\/span> + <em>a<\/em><span style=\"font-size: 8pt;\">2<\/span> <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em><span style=\"font-size: 8pt;\">+<em>k<\/em>-2<\/span> + &#8230; + <em>a<span style=\"font-size: 8pt;\">k<\/span><\/em> <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em> takes only on a finite number of values as discussed in section 2.2, and thus, the random-like assumption is always violated. In particular, <em>k<\/em> = 2 if <em>p<\/em> = 1. This is also true <em>asymptotically<\/em> if <em>p<\/em> is not an integer, see <a href=\"https:\/\/mathoverflow.net\/questions\/377697\/sequences-similar-to-n-alpha-that-are-both-equidistributed-and-truly-rando\/377748#377748\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a> for details. Yet, if <em>p<\/em>&nbsp; &gt;&nbsp; 1, the auto-correlations are very close to zero, unlike the case <em>p<\/em> = 1. But are they truly identical to zero?&nbsp; What about the sequence <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em> = {&nbsp;<em>&alpha;<\/em>^<em>n<\/em> } with say&nbsp;<em>&alpha;<\/em> = log 3? Is it random-like? Nobody knows. Of course, if&nbsp;<em>&alpha;<\/em> = (1 + SQRT(5))\/2, that sequence is anything but random, so it depends on&nbsp;<em>&alpha;<\/em>.&nbsp;<\/p>\n<p>Below are three scatterplots showing the distribution of (<em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>, <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em><span style=\"font-size: 8pt;\">+1<\/span>) for a few hundreds value of <em>n<\/em>, for various&nbsp;<em>&alpha;<\/em> and <em>p<\/em>, for the sequence <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em> = {&nbsp;<em>&alpha;<\/em> <em>n<\/em>^<em>p<\/em> }. The X-axis represents <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em>, the Y-axis represents <em>x<span style=\"font-size: 8pt;\">n<\/span><\/em><span style=\"font-size: 8pt;\">+1<\/span>.&nbsp;<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8242305270?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8242305270?profile=RESIZE_710x\" width=\"400\" class=\"align-center\"><\/a><\/p>\n<p style=\"text-align: center;\"><strong>Figure 2<\/strong>: <em>p = SQRT(7),&nbsp;&alpha; = 1<\/em><\/p>\n<p>Even to the trained naked eye, Figure 2 shows randomness in 2 dimensions. Independence may fail in higher dimensions (k&nbsp; &gt;&nbsp; 2) as the sequence is known not to be random-like. There is no apparent collinearity pattern as discussed in section 2.2, at least for <em>k<\/em> = 2. Can you run some test to detect lack of randomness in higher dimensions?<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8242307701?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8242307701?profile=RESIZE_710x\" width=\"400\" class=\"align-center\"><\/a><\/p>\n<p style=\"text-align: center;\"><strong>Figure 3<\/strong>:&nbsp;<em>p = 1.4,&nbsp;&alpha; = log 2<\/em><\/p>\n<p>To the trained naked eye, Figure 3 shows lack of randomness as highlighted in the red band. Can you do a test to confirm this? If the test is inclusive or provide the wrong answer, than the naked eye performs better, in this case, than statistical software.<\/p>\n<p><a href=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8242319869?profile=original\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/storage.ning.com\/topology\/rest\/1.0\/file\/get\/8242319869?profile=RESIZE_710x\" width=\"400\" class=\"align-center\"><\/a><\/p>\n<p style=\"text-align: center;\"><strong>Figure 4<\/strong>:&nbsp;<em>p = 1.1,&nbsp;&alpha; = log 2<\/em><\/p>\n<p>Here (Figure 4) any statistical software and any human being, even the layman, can identify lack of randomness in more than one way. As <em>p<\/em> gets closer and closer to 1, lack of randomness is obvious, and the collinearity issue discussed in section 1.2, even if fuzzy, becomes more apparent even in two dimensions.<\/p>\n<p><strong>3.2. Independence between two sequences<\/strong><\/p>\n<p>It is known that if&nbsp;<em>&alpha;<\/em> and&nbsp;<em>&beta;<\/em> are irrational numbers linearly independent over the set of rational numbers, then the sequences { <em>&alpha;n<\/em> } and { <em>&beta;n<\/em> } are not correlated, even though each one taken separately is heavily auto-correlated. A sketch proof of this result can be found in the Appendix of <a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/state-of-the-art-statistical-science-to-address-famous-number-the\" target=\"_blank\" rel=\"noopener noreferrer\">this article<\/a>. But are they really independent? Test, using statistical software, the absence of correlation if <em>&alpha;&nbsp;<\/em>=&nbsp; log 2 and&nbsp;<em>&beta;<\/em> = log 3. How would you do to test independence? The methodology presented in section 2.3 can be adapted and used to answer this question empirically (although not theoretically).&nbsp;<\/p>\n<\/p>\n<p><em><strong>About the author<\/strong>:&nbsp; Vincent Granville is a d<span class=\"lt-line-clamp__raw-line\">ata science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent also founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target).<\/span>&nbsp;You can access Vincent&#8217;s articles and books,<span>&nbsp;<\/span><a href=\"https:\/\/www.datasciencecentral.com\/profiles\/blogs\/my-data-science-machine-learning-and-related-articles\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>.<\/em><\/p>\n<\/div>\n<p><a href=\"https:\/\/www.datasciencecentral.com\/xn\/detail\/6448529:BlogPost:1004429\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Vincent Granville There is no statistical test that assesses whether a sequence of observations, time series, or residuals in a regression model, exhibits independence [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2020\/12\/03\/new-tests-of-randomness-and-independence-for-sequences-of-observations\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":472,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[26],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4149"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=4149"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/4149\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/464"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=4149"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=4149"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=4149"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}