State-of-the-Art Statistical Science to Tackle Famous Number Theory Conjectures

Author: Vincent Granville

The methodology described here has broad applications, leading to new statistical tests, new type of ANOVA (analysis of variance), improved design of experiments, interesting fractional factorial designs, a better understanding of irrational numbers leading to cryptography, gaming and Fintech applications, and high quality random numbers generators (and when you really need them). It also features exact arithmetic / high performance computing and distributed algorithms to compute millions of binary digits for an infinite family of real numbers, including detection of auto- and cross-correlations (or lack of) in the digit distributions.

The data processed in my experiment, consisting of raw irrational numbers (described by a new class of elementary recurrences) led to the discovery of unexpected apparent patterns in their digit distribution: in particular, the fact that a few of these numbers, contrarily to popular belief, do not have 50% of their binary digits equal to 1. It turned out that perfectly random digits simulated in large numbers, with a good enough pseudo-random generator, also exhibit the same strange behavior, pointing to the fact that pure randomness may not be as random as we imagine it is. Ironically, failure to exhibit these patterns would be an indicator that there really is a departure from pure randomness in the digits in question.

In addition to new statistical / mathematical methods and discoveries and interesting applications, you will learn in my article how to avoid this type of statistical traps that lead to erroneous conclusions, when performing a large number of statistical tests, and how to not be misled by false appearances. I call them statistical hallucinations and false outliers.

This article has two main sections: section 1, with deep research in number theory, and section 2, with deep research in statistics, with applications. You may skip one of the two sections depending on your interests and how much time you have. Both sections, despite state-of-the-art in their respective fields, are written in simple English. It is my wish that with this article, I can get data scientists to be interested in math, and the other way around: the topics in both cases have been chosen to be exciting and modern. I also hope that this article will give you new powerful tools to add to your arsenal of tricks and techniques. Both topics are related, the statistical analysis being based on the numbers discussed in the math section.

One of the interesting new topics discussed here for the first time is the cross-correlation between the digits of two irrational numbers. These digit sequences are treated as multivariate time series. I believe this is the first time ever that this subject is not only investigated in detail, but in addition comes with a deep, spectacular probabilistic number theory result about the distributions in question, with important implications in security and cryptography systems. Another related topic discussed here is a generalized version of the Collatz conjecture, with some insights on how to potentially solve it.

1. On the Digits Distribution of Quadractic Irrational Numbers

We start with the following recursion, first discussed in my article Two New Deep Conjectures in Probabilistic Number Theory, here.

Ewan Delanoy proved the following (see his proof here). Let

Then, under conditions discussed below and referred to as the standard case, we haveThis result is valid if the following three conditions are met: y(0) ≠ 0, z(0) ≠ 2y(0), and z(0) ≥ y(0). We shall assume that this is always the case here. In short, x is solution of 2 x^2 + (z(0) – 1) x – y(0) = 0, and the sequence { d(k) } represents its binary digits. The values y(0) and z(0) are the initial conditions, also called seeds, of the recurrence system. This system has been used in a lottery and simulated stock trading where the winning numbers can be pre-computed with a public algorithm that requires hundreds of years of computation (thus for all purposes the winning numbers can not be predicted). This number guessing game, played with real money, assumes that the digits in question behave exactly like perfectly random numbers, with no detectable pattern. See here for a full description of this game.

In this article, the core of our discussion is a deep dive into that assumption. By itself, this assumption of perfect randomness is one of the oldest, most challenging and fascinating unsolved mathematical conjectures of all times. Very little progress has been made towards solving this mystery. We don’t even know if more than 1% of the binary digits of SQRT(2) are equal to 1, though all evidence points to the fact that this proportion is indeed 50%. In this section, I will discuss new promising results to help solve this problem, including the best possible asymptotic formula regarding the distribution of these digits. I will then discuss a result about cross-correlation between the binary digit sequences of two such numbers.

1.1. Properties of the recursion

In the standard case, we have the following result:

Using u(n) = 2y(n) − z(n) and v(n) = 2z(n) + 3, the recurrence can be rewritten as:

Note that mod(v(n), 8) = 5, that is, (v(n) – 5) / 8 is an integer. If n > 1 we have

Reverse recursion

This leads to the following simple reverse recurrence involving only one variable, allowing you to compute the digits of x backward, starting with a large n and moving backward down to n = 1:

The very tough problem, outlined in the section 1.2, is to prove that each of these two outcomes is equally likely to occur, on average. This would indeed be true if each v(n) is arbitrary, but this is not the case here. Also, if for some large n, we have d(n) = 1, then a run of R successive digits d(n) − 1, …, d(n) − R all equal to zero can only go so far, unless v(n) is a very special number not leading to x being irrational. Maybe R = ⌊2 SQRT(n)⌋ is an upper bound. This is something worth exploring. Here the brackets represent the integer part function

Properties of the reverse recursion

If mod(v(n), 8) = 5 and v(n) > 5, then the sequence v(n), v(n) − 1,… is strictly decreasing until it reaches 5 and stays there permanently; also each term is congruent to 5 modulo 8. This is true whether or not v(n) was generated using our forward recurrence.

An interesting application of this property is as follows. Take an arbitrary number, say x = log 2. Multiply by a large power of 2, say 2^30. Round the result to the closest integer congruent to 5 modulo 8, and let this be your v(n). In this case, v(n) = ⌊2^30 log 2⌋. Compute v(n) − 1, v(n) − 2 and so on, as well as the associated digits, using the reverse recurrence. Stop when you hit 5. The digits in question are the first binary digits of log 2 yielding the approximation 0.693147175… while the exact value is 0.693147180…

A similar reverse recurrence is also available for the original system: If mod[(z(n) − 1)/4, 2] = 1, then z(n−1) = (z(n) − 3) / 2, d(n) = 1, else z(n−1) = (z(n) + 1) / 2, d(n) = 0. We also have mod(z(n), 4) = 1.

Connection to Collatz conjecture

The recurrences discussed earlier are a particular case of a generic type of recurrences, leading to various kinds of chaotic behavior depending on the parameters. A particular and simple case of this family is the following one:

This is the connection between the Collatz conjecture and the digit distribution of SQRT(2). The famous unsolved Collatz conjecture states that no matter the initial value z(0) also called seed, the above sequence z(n) always reaches 1. This is a bit like our above sequence v(n) taken in reverse order: it always reaches 5. More about these recursions can be found here. Regarding the recursion associated to the Collatz conjecture, read this article.

A generalization of Collatz recursion is a follows:

where a, c > 0 and b, d are integers. The behavior of this recursion can take two forms: either growth (usually exponential but not always) or eventually becoming periodic (usually preceded by a chaotic period). The Collatz case is periodic. If a particular seed z(0) leads to the periodic mode for specific values of the parameters a, b, c, d, does it mean that for the same parameters, any other seed also lead to periodicity? Is the converse also true?

For the non-periodic, exponential-growth case, it is easy to prove that

where p is the average frequency for the event “mod(z(n), 2) = 1″, and q = 1 – p. In many cases, p = q = 0.5. An non-periodic example is when z(0) = 1, a = 6, b = 4, c = 1, and d = 2. In that case, p = q = 0.5 and the above limit is equal to (log 3) / 2. If we were able to determine all the cases leading to periodicity, and if the period, for a set of parameters a, b, c, d does not depend on the seed z(0), then we would be able to solve the Collatz conjecture, which is just a particular case. Solving the general problem consists in finding the parameter configurations leading to periodicity, and determining when the period does not depend on the seed.

Source code

The source code to compute the iterates z(n) for the above non-periodic example, is available here. The output, up to n = 5,000, is available here. This text file has three columns: n, mod(z(n), 2), and z(n). Note that the last iterates are numbers with more than 1,000 decimals.

A more general program to compute the iterates associated with all the various recursions, including the computation of summary statistics, can be downloaded here. It uses the Perl Bignum library to handle millions of iterations and numbers with millions of digits, using exact arithmetic. This is necessary as the various sequences discussed here grow exponentially fast.

1.2. New deep probabilistic number theory results

Here we are dealing with another famous unsolved math conjecture: the normality of math constants such as SQRT(2), see here. Let S(n) denote the number of binary digits d(k) of x, that are equal to 1, for k = 1,⋯, n. If irrational quadratic numbers were indeed normal as we all believe they are, then, there is an absolute constant K (not depending on x), and for each x, there is a number N(x) denoted as N, such that

This is a consequence of the Berry-Hessen theorem applied to Bernouilli variables, and I call it the Berry-Essen inequality. It is discussed in sections 1.1 and 1.2 in this article. The chart below shows | 2S(n) − n | / SQRT(n) on the Y-axis, with n between 0 and 530,000 on the X-axis, for the case y(0) = z(0) = 1 leading to x = 2 SQRT(2). It suggests (not a proof) that in this case, N = 0 and K = 0.90 might work. Note that the behavior looks like a bounded Brownian motion. It is not a standard Brownian motion, and it has potential applications to model stock prices that move up or down under some constraints that prevent them from fully exploding or collapsing.

To prove that x has 50% of its binary digits equal to 1, a potential approach thus consists in proving that if the previous inequality is true for n large enough, then it is also true for n+1, by looking at the worst case scenario for the potential distribution of the first n binary digits of x, using the recurrence relation introduced at the beginning, or the backward recurrence.

Some of the numbers x that I tested are approaching the 50% ratio in question rather slowly, for instance if y(0) = 1, z(0) = 16. Indeed, I am wondering if some of these quadratic irrationals, maybe a finite number of them, even though normal, do not satisfy the above Berry-Essen inequality, but instead a weaker result, say with SQRT(n) replaced by n^(3/4) or n / log n. To the contrary, a very fast convergence, say n^(1/4) or log n instead of SQRT(n), would also mean, even though x may be normal, that its digits are not distributed like i.i.d. Bernouilli(0.5) variables. The only way for this Bernouilli behavior to happen, is if the convergence to the 50% ratio occurs at the right speed, that is with SQRT(n) in the Berry-Essen inequality. In other words, for a specific x, any asymptotic departure from SQRT(n) in the inequality in question would mean that its binary digits are not distributed in a purely random way. This “pure randomness” criterion is stronger than having 50% of the digits equal to 1. For instance, x = 2/3 = 0.1010101… (in base 2) has 50% of its digits equal to 1, but the term O(SQRT(n)) in Berry-Essen can be replaced by the optimum bound O(1), and the digits look anything but random.

1.3. Spectacular new result about cross-correlations

Let x be a random number in [0, 1], in the sense that its binary digits are i.i.d. Bernouilli variables of parameter 1/2. Let p and q be two odd co-prime positive integers. Then the cross-correlation between the binary digits of of px and qx is equal to 1/(pq).

If x is a normal number (the immense majority of irrational numbers are normal numbers), then this statement is also true. It is believed but not proved yet, that SQRT(2) for instance, is normal in base 2. A simple consequence of this fact is that if x and x’ are irrational normal numbers linearly independent over the set of rational numbers, then the correlation between the binary digits of x and x’ is equal to zero. For instance, this is the case if x = SQRT(2) and x’ = SQRT(3) if we could prove that these two numbers are normal.

These results are backed by empirical evidence. The precise sense of what I mean by cross-correlation, and a proof sketch of the above theorem together with empirical results, can be found in the appendix.

Applications

In cryptography, random number generation, simulations, or Fintech modeling, the implication is that if you work with sequences of binary digits associated with irrational numbers, you should choose these numbers such that they are linearly independent over the set of rational numbers. This is the case if these numbers are the square roots of square-free integer numbers.

A more unusual application is with clinical trials or multivariate testing, and more specifically, fractional factorial designs involving a large number of factors. In your design of experiments table, which is typically an m x n binary matrix, to achieve randomness, the entries can ideally be the first n binary digits of m irrational numbers that are linearly independent over the set of rational numbers.

2. New Statistical Techniques Used in Our Analysis

First, I invite you to read the above short section 1.3, which machine learning practitioners and statisticians should be interested in, especially the paragraph about applications. Here, we will focus on something different.

As stated in the introduction we focus here an analyzing many time series consisting of 0’s and 1’s, that look like pure random noise. Yet we found patterns that seem to disprove the theory developed in section 1. However these patterns are artificially created by testing too many things at the same time. This is a typical problem faced by many machine learning practitioners, economists and statisticians, leading to publications and decisions that are fundamentally wrong. We explain how to detect these fallacies and avoid them, mostly via careful simulations. In the process, we also explain how to perform a sound design of experiments.

The n time series in question consist of the first m binary digits of n irrational numbers that are supposed to be unrelated, that is, in statistical terms, independent. Section 2.1. discuss the details of the analysis, while section 2.2, I show how to do things the right way, including doing a better selection of the data that needs to be gathered (the design of experiments work), and faster implementation. In many ways, the work performed resembles very much an analysis of variance, performed using a non-standard, model-free, simple methodology, that even non-statisticians can understand.

2.1. Data, features, and preliminary analysis

The word feature is used as synonymous to variable or metric. The n numbers used in our analysis are irrational numbers produced using n = 100 pairs of seeds y(0) and z(0) in the first recurrence in section 1. For each of the n numbers (referred to as time series), we computed the first m = 20,000 binary digits (referred to as the observations). The k-th pair of seeds (k = 1, …, n) was chosen as follows: y(0) = 1, z(0) = k. Thus the k-th number in this experiment, according to section 1, is equal to

For each time series, we counted the number of occurrences of all the potential 2^5 = 32 blocks of five successive digits: 00000, 00001, 00010, 00011, 00100, 00101, 00111, and so on. Each of these blocks was assigned a unique code, for instance the code for 00101 is 0*1 + 0*2 + 1*4 + 0*8 + 1*16 = 20. The code is always an integer between 0 and 31. We also checked whether these blocks were evenly distributed within each time series, and across all the time series. The source code for these computations can be downloaded here.

One would think that these blocks are distributed independently, within each time series, according to a multinomial distribution of parameters N = m – 5 + 1 = 19,996, with the probability of occurrence for each of the 2^5 blocks being 1 / 2^5. And that the distributions in two different time series would be independent. If you look at average counts, this is correct: the average block count observed within each time series, averaged across the n = 100 time series, is 624.81, while the expected theoretical value is 19,996 / 32 = 624.88. But that’s pretty much the only agreement between theory and observations.

You would also expect a bell curve for these block counts. And at first glance, when plotting the curve based on a very large n (far above n = 100) you get a perfect, very smooth symmetrical curve. Unfortunately, the excess kurtosis is way off, and even though very volatile, it keeps being way off (around 0.63) and always strictly above 0, even when repeating the experiment many times with a different set of numbers. The kurtosis measures how thick the tail of the distribution is.

What’s more, across the 32 blocks, the two extreme blocks (with codes 0 and 31, that is, 00000 and 11111) have a propensity to consistently occur much more frequently than the other blocks, about twice as frequently as the other 30 blocks. The blocks with codes 10 and 21 also occur too frequently, and their propensity to occur, while more reasonable, is well above 1/32 with strong statistical significance.

If that was not enough, one of the 100 time series seems to have an extreme behavior. While the block count, for each block and within each time series varies from around 530 to 730 with an average of 624, the k-th series corresponding to k = 86 has a very low count of 506 occurrences for block 31. This means that the number (-85 + SQRT(72,333)) / 4 has a very low count of 11111 (the block 31) in its first 20,000 digits.

There are two ways to react to these findings:

You can think that something is either wrong (possibly even the source code used for the computations) or it can somehow be explained.
Or you can think that you made a fantastic discovery: these digits are not random at all afterall, and despite everyone saying the contrary, you claim that you found an irrational algebraic number that is not normal. If true this would be a huge mathematical discovery that goes against what all mathematicians have firmly believed in for hundreds of years.

Unfortunately, many practitioners tend to opt for the latter conclusion — the wrong one. We shall see in the next section that there is actually an explanation for the apparent lack of randomness. And it is not an error in the data or the source code. Yes, (-85 + SQRT(72,333)) / 4 truly has a very low count of 11111 among its first 20,000 digits, but it is a false outlier, and very likely it is a normal number like all the other ones. You just have to look at more than the first 20,000 digits for that number.

2.2. Doing it the right way

The analysis performed in section 2.1. has several issues. First, we are doing simulations involving the generation of dozens of millions of uniform deviates. Any auto-correlations or patterns produced by the random number generator may generate biases in the estimated quantities, and erroneous conclusions. This is a case where trying two different types of random number generators, and see if the results agree, makes sense. In particular, the default random number generator function in the programming language used for the tests, has very severe issues: it can only produce 32,767 different numbers. After that, it somehow repeats itself, though things are a bit more complicated than that. See here for more on this topic. I had to write my own random number generator, using the algorithm below:

rnd = 1000
for (k=0; n<20000; k++) {
rnd=(10232193 * rnd + 3701101) % 54198451371
Rand= rnd / 54198451371

}

In the above code, % stands for the modulo operator. In the first line of code, rnd is the seed. I used various seeds to estimate the variances in the quantities involved. The random deviates, uniform on [0, 1], are stored in the variable Rand.

Another issue (though eventually it only had an impact on performance, not on accuracy) is the way the design of experiments was carried out. Testing n irrational numbers using z(0) = 1, 2, …, n-1 results in duplicates and thus lack of independence in the digits of the successive numbers being tested. An example is z(0) = 3 and z(0) = 11, leading respectively to the numbers (-1 + SQRT(3)) / 2 and (-5 + 3 SQRT(3)) / 2, whose binary digit sequences are not independent. The way to address this issue is to carefully choose the z(0)’s, for instance by choosing a list of square-free integers. Such list are available online, see here.

Finally, in some of the programs that use exact arithmetic to perform operations on numbers with millions of digits, the code line x = x / 2 does not perform what it is supposed to do. It does perform a division by 2, but not using exact arithmetic. It results in digits that are completely wrong. The simple fix is to replace x = x / 2 by x = x >> 1, which performs a true, exact division by 2. In other words, use the bit-shift operator to perform that division. It also runs much faster!

Now that we fixed the flaws in the design of experiments stage, we can focus on the core of the problem.

Are the patterns found a statistical illusion, or caused by errors, or real?

Even after detecting and eliminating sources of errors and biases, we still find patterns in a setting that is supposed to be purely random and pattern-free. Here we perform a final statistical deep dive to conclude that the same patterns in question are also present in perfectly random numbers, using simulations. Some of the patterns correspond to illusionary outliers, caused by testing the same assumptions many times without correcting for the large number of trials. Some are just natural, not well-known signatures of randomness: lack of these patterns would actually mean lack of randomness.

Now I describe the simulations and statistical analysis confirming that indeed, we are facing pure randomness. It is an interesting combinatoric problem.

Let W be a word (also called block) consisting k binary digits. Let S be a sequence (also called text or book) consisting of m binary digits, with k ≤ m. Let N(W,S) be the number of occurrences of W in S. For instance, if S = 010001010011 and W = 00, then N(W,S) = 3. Here k is small and m is large: for instance k = 5 and m = 20,000 in section 2.1.

For a positive integer x, a block W of length k and a random sequence S of length m, the number of occurrences of the event { N(W,S) = x } is denoted as P(N(W,S) = x). So x can be viewed as the realization of a discrete random variable X. In particular,

We are, among other things, interested in the distribution of Z = (X – E(X)) / SQRT(Var(X)). Remember, I am checking if all blocks of k = 5 binary digits are distributed as expected (that is, randomly) in the first m binary digits of a bunch of quadratic irrational numbers. To test my hypothesis that this is the case, I need to know the exact distribution of the test statistic for the null hypothesis. The exact distribution is the distribution attached to Z.

Here I summarize some of the findings, full details can be found here. The empirical percentile distribution of Z is pictured in the chart below.

Pattern #1: Non-Gaussian behavior

The computed values of the empirical distribution of Z, based on numerous simulations, are available here. The distribution, as m increases (here k = 5) appears smooth and symmetrical, with a nice bell shape. But there is a big challenge: the excess kurtosis is consistently well above 0, indeed around 0.63. So the tail is thinner than that of a normal distribution. In short, Z does not have a normal distribution.

If instead of one sequence S, you consider n random sequences S(1), …, S(n) all of the same length m, and independent from each other, then the variance for the counts N(W,S), computed across all sequences bundled together, satisfies

Note that in section 2.1., we used n = 100. In our simulations, each sequence is the analogous of one irrational number with its binary digits perfectly randomly distributed. This result about the variance of X can be used to test if sequences found in actual data sets are both random and independent from each other.

The problem is that the successive m − k + 1 blocks W of length k do overlap in any sequence S of length m, resulting in lack of independence between the various counts N(W,S). If the blocks (and thus their counts) were independent instead, then the counts would follow a multinomial distribution, with each of the n (m − k + 1) probability parameters being 1 / 2^k, and Z would be asymptotically normal. Here this is not the case: the excess kurtosis does not converge to zero. There is convergence to smooth, symmetrical distributions (depending on k) as n and m increase, but that limit is never Gaussian. My big question is: what is it then?

That said, for the first two moments (expectation and variance) attached to N(W,S), we get the same values (at least asymptotically) as those arising from the multinomial model. But not anymore for higher moments. This is in perfect agreements with the findings in section 2,1., applied to the binary digits of irrational algebraic numbers! So we’ve explained that weird pattern.

The code to perform simulations and compute the variances, expectations, kurtosis and all the counts N(W,S), is available here. In the code, n is represented by the variable $nbigstrings. Note that the variance and kurtosis depend on S, but they stabilize as n is increasing. The expectation depends only on m and k.

Pattern #2: Illusionary outliers

The expected number for N(W,S) is around 625 per sequence of m = 20,000 random digits. These counts for the vast majority are between 530 and 730 when computed over n = 100 sequences. But the extreme counts can be lower, just by chance. So a low count of 506, as observed in section 2.1., is low but by no means unusually low. Repeat the experiment many times with n = 100, and you are bound to find counts even lower. But a count of 450 would truly be unexpected and would constitute a real outlier. We don’t see this here. So that 506 count is not an outlier, but the fact that over a large number of sequences, the most extreme count will always look too high or too low, just by chance.

Pattern #3: Weird distribution for block counts

With k = 5, we have 2^5 potential blocks of bits such as 01010, 11010, 00001 and so on. One would expect that out of n = 100 binary digit sequences, each consisting of m = 20,000 random digits, to find approximately (m – k +1) / 2^k = 625 occurrences of each block per sequence on average, with a standard deviation close to 22. But blocks 00000 and 11111 have a much higher occurrence, resulting in a much higher standard deviation, see chart below.

Interestingly, in the simulations not involving irrational digits but instead, strings of random bits, we get the same statistics. Again this proves that this mysterious pattern is expected if the binary digits were random. An even distribution in the above chart would indeed correspond to digits that are not random.

2.3. Related articles and book

The material published here will soon be included in my upcoming book New Statistical Methods and Results in Probabilistic Number Theory, focusing on experimental math, and related statistical methods. It is part of a series that include the following books, already published:

Statistics: New Foundations, Toolbox, and Machine Learning Recipes
Applied Stochastic Processes – Chaos Modeling, and Probabilistic Properties of Numeration Systems

More similar articles can be found here. To not miss our upcoming books and articles on this topic, subscribe to our newsletter, here.

Appendix

Here we prove the result discussed in section 1.3, regarding the cross-correlation between the two binary digit sequences of two particular numbers that mimic normal numbers.

Let x be a random number in [0, 1], in the sense that its binary digits are i.i.d. Bernouilli variables of parameter 1/2. Let p and q be two odd co-prime positive integers. We prove here that the cross-correlation between the binary digits of of px and qx is equal to 1 / (pq).

Before starting, let us mention the following related result: the cross-correlation between the terms in the sequences { npx } and { nqx }, n = 1,2,⋯, is 1 / (pq). Here the curly brackets represent the fractional part function, and x is irrational. See proof here. In our case, the relevant sequences are { 2^n px } and { 2^n qx } as the n-th binary digit of x is ⌊2{ 2^n x }⌋. The right brackets represent the integer part function, the curly brackets the fractional part function. Another related result, with proof, can be found here.

The code used to simulate random numbers x and estimate the correlation between the binary digits of px and qx for various values of p, q and x, is available here. It can be used to assess the variability of the estimates, from one sample to the other, and to check the very slow convergence to the theoretical correlation 1 / (pq) if p or q is larger than 5. The chart below is based on one simulated x, computing 10 millions of its random binary digits as well as those of px and qx, with p = 1 and q = 3. The orange line corresponds to the true theoretical value of the cross-correlation, in this case 1/3. The Y-axis represents the correlation estimates computed incrementally over the first n digits of px versus qx, for n = 2, 3, …, 10^7.

Proof of main result (sketch)

First, note that if we shift the digits of either px or qx (that is by multiplying px or qx by a power of two, positive or negative) any apparent cross-correlation in the two digit distributions vanishes. Only one particular shift produces a non-zero cross-correlation, and that’s the shift produced when running the code mentioned above.

Here I will use the following notation:

b(k) represents the k-th binary digit of x
d(k) and d′(k) are the k-th binary digits of px and qx respectively
e(k) and e′(k) are auxiliary variables used in the computations, attached to px and qx respectively

The digits satisfy the recursions

We observe that

The sequences d(k) and e(k−1) are independent; same with d′(k) and e′(k−1)
The digits b(k) behave as i.i.d Bernouilli of parameter 12, by design
The terms in the sequences e(k) and e′(k) are uniformly distributed, respectively on { 0, 2, 4,⋯, 2(p−1) } and { 0, 2, 4,⋯, 2(q−1) }

Thus the cross-correlation between the binary digit sequences d(k) and d′(k) is equal to

Note that p, q are assumed to be odd co-prime integers. As a result, it is easy to prove that d(k)d′(k) = 1 if and only if e(k−1) / 2 = e′(k−1) / 2 (mod 2), and otherwise d(k)d′(k) = 0.

Let us consider the p × q matrix M defined as follows: M(i, j) is a positive integer, with

M(i, j) = 0 if and only if the joint event e(k−1) = 2i, e′(k−1) = 2j never occurs regardless of k. Otherwise M(i, j) is strictly positive.
The sum of the elements of M in any row is equal to q
The sum of the elements of M in any column is equal to p

These three properties define M uniquely. Let’s M∗ = M / (pq). Now M∗(i, j) is the probability that e(k−1) = 2i and e′(k−1) = 2j simultaneously, measured as the asymptotic frequency of this event computed on all observed (e(k),e′(k)). The probability P that d(k)d′(k) = 1 is the sum of the terms of M∗(i, j) over all indices i, j with i = j (mod 2). And of course, the sum of all the terms of M∗(i, j) (regardless of parity) is equal to 1. To conclude, it suffices to prove that P = (pq + 1) / (2pq) and ρ = 2P − 1 = 1 / (pq).

Examples

Here p = 7, q = 11. Non-null entries in M are starred below.

The above starred entries are based on counts computed on 10^6 values of (e(k), e′(k)). These counts are featured in the table below. The binary digits b(k) were generated as i.i.d Bernouilli with parameter 1/2 using the source code shared earlier.

The resulting matrix M is as follows:

Below is the matrix M for p = 31, q = 71. Click on the picture to zoom in.

Go to Source