Non-uniform random floats (_randommodule.c) #31606

amrali-eg · 2022-02-27T21:18:28Z

The original implementation produces an output that has some bias in the lower bits of mantissa (more 0's than 1's). That's because we try to fit 2^53 pigeons into 2^52 available holes, so rounding to even is unavoidable. My solution is to generate just 2^52 pigeons using the expression (randbits_53 | 1) to generate 2^52 odd numbers only in the range [1, 2^53-1] , with zero cannot be generated. but, the distribution of mantissa bits becomes more uniform.

the-knights-who-say-ni · 2022-02-27T21:18:31Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@amrali-eg

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

tim-one · 2022-02-27T21:51:42Z

Sorry, but this is misguided. There is no rounding of any kind going on. Each integer in range(2**53) is equally likely, and dividing it by 2**53 is an exact operation: the infinite-precision result is exactly representable as a 754 double.

Note that while the double storage format has 52 mantissa bits, the value it represents has 53 mantissa bits: a leading 1 bit is implicit (except for tiny subnormal numbers, which are irrelevant here).

tim-one · 2022-02-28T20:04:34Z

Sorry, your links don't work for me. They land on "bpo-38576: Disallow control characters in hostnames in http.client #18995".

[OOPS! I wrote some nonsense here - deleted it - sorry]

Demonstration that nothing is lost to rounding:

>>> i = 2**52 + 1
>>> i.bit_length()
53
>>> float(i)
4503599627370497.0
>>> i
4503599627370497

See? No info is rounded away.

amrali-eg · 2022-02-28T20:19:33Z

Really I did not get that.
Anyway, what do you think about generating floats like that:

def fRandom():
    mantissa = random.getrandbits(52)
    significand = 1.0 + mantissa * float_info.epsilon
    x = 0
    while not x:
        x = random.getrandbits(32)
        x = x & -x  # bitmask for rightmost 1-bit (exponential)
    return significand / (2 * x)

tim-one · 2022-02-28T20:22:53Z

Note that there's no reason to expect the mantissa bits in the storage format to be uniformly distributed. For example, the float 1.0 has 52 zero bits in the storage format's explicit mantissa. The float 3.0 has 51 trailing zero bits in the storage format's explicit mantissa. And so on.

amrali-eg · 2022-02-28T20:25:23Z

What I mean for all possible numbers [0, 2** 53) when divided by 2**53. I plot the distribution of the bits of generated numbers.
I also use 'detect-significand-bias.py' from Rademacher Floating Point Library (on GitLab), to check bits.

tim-one · 2022-02-28T20:47:33Z

The method Python uses is the same as was shipped with the original Mersenne Twister code, so is a de facto industry standard. We're not going to change it without superb reason, and so far here there's no reason at all 😉. Note that all the buildbot tests failed here: that's because you changed random()'s output. That's not something that can be done either without superb reason.

I have no idea what you did, and since your links don't work as intended for me, apparently no way to find out. Here's a simple test of the distribution of the last 3 bits you can run yourself:

from random import random
counters = [0] * 8
T53 = 2.0 ** 53
for x in range(1000000):
    r = random()
    i = int(r * T53)
    assert i / T53 == r
    counters[i & 7] += 1
for c in counters:
    print(c)

Here's output from one run:

By eyeball it looks fine.

Please move this to bugs.python.org if you want to pursue it. That's the place for extended discussions, not here. And attach the actual code you used to the issue report.

About your other code, there may be some niche demand for the possibility of generating random floats that can be less than 1/2**53, but, again, we cannot change the output of random.random() without superb reason. People expect to be able to seed the generator to a known value, and get exactly the same results across Python releases.

tim-one · 2022-02-28T20:55:01Z

BTW, current docs already show a way to get a uniform distribution across the entire range of representable doubles in [0, 1):

https://docs.python.org/3/library/random.html#recipes

amrali-eg · 2022-02-28T22:42:32Z

Here is my code showing the bias:

import numpy as np
from random import random

def significand(xs: np.float64):
    ss = np.abs(xs).view(np.uint64) & (2**52 - 1)
    return ss

bits = 8
counters = [0] * bits
lst = []
for x in range(100000):
    lst.append(random())

xs = np.array(lst, dtype='float64')
ss = significand(xs)
for x in range(100000):
    mantissa = int(ss[x])
    for j in range(bits):
        counters[j] += (mantissa >> j) & 1
for c in counters:
    print(c)

Here is a sample output

My explanation:
dividing even number by a power of 2, causing shift of mantissa bits to the left, leaving right 0's. So, I thought it is better to avoid even numbers, and only use odd numbers to prevent that behavior.

Sorry if I flooded this discussion.
Thanks for your time

tim-one · 2022-02-28T23:00:42Z

ss = np.abs(xs).view(np.uint64) & (2**52 - 1)

is extracting the last 52 bits of the raw double storage format. As before, there's no reason to expect that to be uniformly distributed. To the contrary, it "should be" highly skewed.

Let's look at the last byte instead, sticking to built-in Python functions:

    from random import random
    from collections import defaultdict
    counters = defaultdict(int)
    for x in range(1000000):
        r = random()
        string = r.hex() # e.g., '0x1.c000000000000p+3'
        i = string.index('p')
        lastbyte = string[i - 1]
        counters[lastbyte] += 1
    for byte, c in sorted(counters.items()):
        print(byte, c)

with highly skewed output:

amrali-eg · 2022-02-28T23:13:29Z

Finally :)

import random
from collections import defaultdict

def unbiased_random():
    return (random.getrandbits(53) | 1) / 9007199254740992.0

counters = defaultdict(int)
for x in range(1000000):
    #r = random.random()
    r = unbiased_random()
    string = r.hex() # e.g., '0x1.c000000000000p+3'
    i = string.index('p')
    lastbyte = string[i - 1]
    counters[lastbyte] += 1
for byte, c in sorted(counters.items()):
    print(byte, c)

A nice uniform distribution:

github-actions · 2022-03-31T00:09:44Z

This PR is stale because it has been open for 30 days with no activity. If the CLA is not signed within 14 days, it will be closed. See also https://devguide.python.org/pullrequest/#licensing

sweeneyde · 2022-04-01T05:49:12Z

On one hand, this changes the expected value of random() from 1/2 - 2**-54 to 1/2 exactly:

from statistics import mean
>>> mean(x/4 for x in range(4)) # 3/8
0.375
>>> mean(x/8 for x in range(8)) # 7/16
0.4375
>>> mean(x/2**26 for x in range(2**26)) # (2**26-1)/2**27
0.4999999925494194
>>> mean((x|1)/4 for x in range(4))
0.5
>>> mean((x|1)/8 for x in range(8))
0.5
>>> mean((x|1)/2**26 for x in range(2**26))
0.5

# On my hypothetical superfast computer
>>>>>> mean(x/2**53 for x in range(2**53)) # (2**53-1)/2**54
0.49999999999999994
>>>>>> math.nextafter(0.5, 0.0)
0.49999999999999994
>>>>>> mean((x|1)/2**53 for x in range(2**53)) # (2**53-1)/2**54
0.5

On the other hand, it makes 0.0 impossible, which probably isn't ideal.

>>> (0 | 1) / 9007199254740992.0
1.1102230246251565e-16

Consistency across versions and implementations and the one less bit of entropy per float feel like more issues to me though, so my suggestion would be to leave the code as is. Anyone needing integer precision should use integer operations.

tim-one · 2022-04-01T21:42:14Z

I'm closing this, because there's really no chance it would be adopted. As mentioned before, we'd need extremely strong reason to change what random() returns across releases, and there's at best a weak "theoretical wish list" kind of benefit here, but also stronger drawbacks.

Note that while the mean would theoretically (although probably not measurably, in our lifetime) become 0.5, it would no longer be possible for random() to return 0.5. Or 0.25, 0.75, 0.125, or any other rational of the form I / 2**53 for I in range(0, 2**53, 2). It's not just 0.0 that would vanish from the possible outputs. Of course half the possible outputs would vanish regardless, but this systemically excludes the "simplest" outputs.

amrali-eg requested a review from rhettinger as a code owner February 27, 2022 21:18

bedevere-bot added the awaiting review label Feb 27, 2022

the-knights-who-say-ni added the CLA not signed label Feb 27, 2022

github-actions bot added the stale Stale PR or inactive for long period of time. label Mar 31, 2022

tim-one closed this Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-uniform random floats (_randommodule.c) #31606

Non-uniform random floats (_randommodule.c) #31606

amrali-eg commented Feb 27, 2022

the-knights-who-say-ni commented Feb 27, 2022

tim-one commented Feb 27, 2022

tim-one commented Feb 28, 2022 •

edited

Loading

amrali-eg commented Feb 28, 2022 •

edited

Loading

tim-one commented Feb 28, 2022

amrali-eg commented Feb 28, 2022 •

edited

Loading

tim-one commented Feb 28, 2022

tim-one commented Feb 28, 2022

amrali-eg commented Feb 28, 2022 •

edited

Loading

tim-one commented Feb 28, 2022 •

edited

Loading

amrali-eg commented Feb 28, 2022

github-actions bot commented Mar 31, 2022

sweeneyde commented Apr 1, 2022

tim-one commented Apr 1, 2022

Non-uniform random floats (_randommodule.c) #31606

Non-uniform random floats (_randommodule.c) #31606

Conversation

amrali-eg commented Feb 27, 2022

the-knights-who-say-ni commented Feb 27, 2022

Recognized GitHub username

tim-one commented Feb 27, 2022

tim-one commented Feb 28, 2022 • edited Loading

amrali-eg commented Feb 28, 2022 • edited Loading

tim-one commented Feb 28, 2022

amrali-eg commented Feb 28, 2022 • edited Loading

tim-one commented Feb 28, 2022

tim-one commented Feb 28, 2022

amrali-eg commented Feb 28, 2022 • edited Loading

tim-one commented Feb 28, 2022 • edited Loading

amrali-eg commented Feb 28, 2022

github-actions bot commented Mar 31, 2022

sweeneyde commented Apr 1, 2022

tim-one commented Apr 1, 2022

tim-one commented Feb 28, 2022 •

edited

Loading

amrali-eg commented Feb 28, 2022 •

edited

Loading

amrali-eg commented Feb 28, 2022 •

edited

Loading

amrali-eg commented Feb 28, 2022 •

edited

Loading

tim-one commented Feb 28, 2022 •

edited

Loading