Dear All,
just sharing some weekend thoughts. I am looking for an answer to the question, which may be so simple and obvious, that it's not even discussed, but I'd still want to have some math proof of it, even if it is not being doubted.
Suppose I have a random number generator (RNG) which produces a sequence of pseudorandom numbers: r1, r2, r3, r4,....
The vendor guarantees that the values will appear to be distributed uniformly and pass some common tests (Diehard or similar) that are thrown on such generators to confirm they are well-behaving.
What I want to confirm is the following:
If r1, r2, r3, r4.... is a sequence R of uniformly distributed pseudorandom numbers between 0 and 1, then:
First:
d1(r1), d1(r2), d1(r3), d1(r4),... is also a sequence of uniformly distributed pseudorandom numbers between 0 and 9 (with d1() taking the value of the first digit after comma in the decimal notation of its argument r1);
and
Second:
d1(R) is uncorrelated with d2(R), where d2() is a similar function returning the second digit after comma in the decimal notation of its argument.
Alternatively, if one can show that it is possible to create an RNG that would be passing the tests with its generated sequence, but have a correlated pattern between d1 and d2, that would seriously damage the statement I am trying to confirm.
One can simply take Stata's built-in RNG for experimentation. Below is a fragment of code that produces a sample of d1-d2 and some quick diagnostics on it. And statistically it complies with the statement I am making, but still, I have doubts whether this is always the case.
Any thoughts on this?
Thank you, Sergiy
Here is the heatmap of the resulting d1-d2 distribution:

Correlating d1 vs d2 results in:
And the histogram of d1 by d2 is:

just sharing some weekend thoughts. I am looking for an answer to the question, which may be so simple and obvious, that it's not even discussed, but I'd still want to have some math proof of it, even if it is not being doubted.
Suppose I have a random number generator (RNG) which produces a sequence of pseudorandom numbers: r1, r2, r3, r4,....
The vendor guarantees that the values will appear to be distributed uniformly and pass some common tests (Diehard or similar) that are thrown on such generators to confirm they are well-behaving.
What I want to confirm is the following:
If r1, r2, r3, r4.... is a sequence R of uniformly distributed pseudorandom numbers between 0 and 1, then:
First:
d1(r1), d1(r2), d1(r3), d1(r4),... is also a sequence of uniformly distributed pseudorandom numbers between 0 and 9 (with d1() taking the value of the first digit after comma in the decimal notation of its argument r1);
and
Second:
d1(R) is uncorrelated with d2(R), where d2() is a similar function returning the second digit after comma in the decimal notation of its argument.
Alternatively, if one can show that it is possible to create an RNG that would be passing the tests with its generated sequence, but have a correlated pattern between d1 and d2, that would seriously damage the statement I am trying to confirm.
One can simply take Stata's built-in RNG for experimentation. Below is a fragment of code that produces a sample of d1-d2 and some quick diagnostics on it. And statistically it complies with the statement I am making, but still, I have doubts whether this is always the case.
Any thoughts on this?
Thank you, Sergiy
Code:
clear all version 17.0 set seed 12345678 set obs `=1e6' generate rnd=int(runiform()*100) generate d1=int(rnd/10) generate d2=rnd-d1*10
Here is the heatmap of the resulting d1-d2 distribution:
Correlating d1 vs d2 results in:
Code:
. corr d1 d2 (obs=1,000,000) | d1 d2 -------------+------------------ d1 | 1.0000 d2 | 0.0010 1.0000
Comment