Our RNG interface and implementation seem outdated #6171

david-christiansen · 2024-11-22T15:57:21Z

It seems that our standard library's randomness support is not up to our usual standards. I think it's using old technology, and perhaps also old Lean naming conventions from before most of the current library was designed.

We have two separate interfaces to randomness: there's IO.getRandomBytes, and a splittable RNG based on RandomGen. Here, I'm talking about the latter.

First off, the namespacing is inconsistent. Some of the operators are in the IO namespace, like IO.setRandSeed and IO.rand. Most of them are in no defined namespace at all, like randBool, randNat, and stdSplit, making it harder to find them with autocomplete.

But the big issue seems to be that the StdGen in the Lean library uses an algorithm that's known to have poor randomness properties (see paper). It seems to have been based on an old version of the Haskell libraries. I'm far from an expert here, but I believe that SplitMix (which the Haskell libraries have used for the last four or so years) is the state of the art for splittable RNGs, though when I last looked into this a few years ago for a different project, I also ran across a blog post about common implementation mistakes in SplitMix. Git blame says that the Haskell libraries were used as inspiration five years ago, so Lean would have imported them just before they were updated.

For reference: here's the PR that updated the Haskell library in 2020. They considered, but rejected, having separate classes for splittable and non-splittable RNGs, deciding against it primarily due to backwards compatibility. It seems that ours is not used much, so perhaps we should have both?

I found one use of StdGen in Mathlib. Its comments point out what seem to be implementation issues with seeding.

Plausible also uses it. IIRC tf-random, which was a slower predecessor to SplitMix that used the Threefish block cipher to generate new states, came about when the statistical issues with the old Haskell StdGen started causing problems with QuickCheck generators in practice.

Impact

I don't know of anyone being actively blocked by this. It came up while working on the language reference, and I figured it was worth writing down what little I know about the matter for the stdlib team to prioritize as you see fit.

Add 👍 to issues you consider important. If others are impacted by this issue, please ask them to add 👍 to it.

The text was updated successfully, but these errors were encountered:

leanprover-bot added the P-medium We may work on this issue if we find the time label Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Our RNG interface and implementation seem outdated #6171

Our RNG interface and implementation seem outdated #6171

david-christiansen commented Nov 22, 2024 •

edited

Loading

Our RNG interface and implementation seem outdated #6171

Our RNG interface and implementation seem outdated #6171

Comments

david-christiansen commented Nov 22, 2024 • edited Loading

Impact

david-christiansen commented Nov 22, 2024 •

edited

Loading