-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE REQUEST] Add np.random Module #569
Comments
Do you need only the |
Atm I only the |
I would vote for the new nomenclature in https://numpy.org/doc/stable/reference/random/generator.html#numpy.random.Generator. However, a more fundamental question is what we should do with the
Where should we pull this information from, if there is no underlying OS? |
There already is a function
|
OK, but if that is the case, then you can already use this via the vectorize method. Is that prohibitively expensive? Another question is, if we were to re-implement this in |
@MichaelRinger Do you have comments on the second question? |
|
@MichaelRinger Did you close this by accident? I was actually going to add the module... |
Hi @v923z , sorry to bump this, did you end up adding the module? can't seem to find it |
@ammar-bitar No, I haven't added, because the OP closed the issue, and there was no response to the question after that. Do you need the module? If so, which methods? It would not be too much work to add, but it was not clear to me, whether there is interest. |
I'd like to upvote this. I think numpy is half crippled without a vectorized RNG. Not sure though which RNG has decent entropy while being small enough in code and work memory sizes. Specifically the ESP32 that I'm interested in has a hardware RNG that according to the docs yields 32-bit integers in <100 CPU cycles. |
I opened a discussion some time ago here: https://github.com/orgs/micropython/discussions/10550, and I think I could pick up the thread from there, and implement the module in a reasonable time. In the meantime, as suggested above, you can use the |
Thanks. I wasn't aware of As for yasmarang, I think it really is too spartan... a 32 bit state is not enough, even for a microcontroller. Especially seeing as in the other thread you were worried about quality (of seeding). I think that any hardware capable of running a python-like interpreter with per-object overhead and a GC, can afford a few 10s of bytes for an RNG with several 100 bits of state. Even more so if the system is capable of holding If you do end up implementing an RNG, I implore you to forget micropython's poor choice of yasmarang and consider the much more modern xoroshiro, that has a 32-bit variant with a 128-bit state and a 64-bit variant with a 256-bit state, useful for the fortunate who have double precision floats. Looking at the Also I would advise against taking micropython's (following original Python's) path of having one global static generator state. If we already paid the code size overhead, why hardwire it to a single piece of memory... Thank you so much for the effort you put into |
OK, then I would start here https://www.pcg-random.org/.
I think the actual choice of the RNG is irrelevant from the viewpoint of implementation effort. If you don't like a particular RNG, we can always replace it later by a better one. But could we use https://github.com/numpy/numpy/blob/3d8cfd68d8df1af4e25e8dffd0a155fda20dc770/numpy/random/src/pcg64/pcg64.c#L64 as a starting point?
Again, I feel that this is a minor implementation detail that we could change, if necessary. Do you happen to know how Also, do you need distributions, too, or just a uniform RNG? Which method of the 'random` module would be important in your application? |
You are right of course about the actual RNG being immaterial to the Anyway as you said the RNG itself is interchangeable and you can also choose to support multiple and have them compiled selectively via configuration file for code size efficiency (see
True again. But change after deployment is itself expensive. Your discussion thread in the micropython repo proves just that. There is nothing in the yasmarang and seeding code in micropython that prevents your desired use pattern (obtain random seed) or that prevents using yasmarang with multiple generator states instead of a single static one. But their choice not to export the proper interface makes this later extension difficult.
Sure. Old Modern The whole point of the I would start with the following
In all of the above I think it's also important to support the BTW I think this If you feel like being even more generous to users, these are the second tier methods. Useful, but not as often as the ones above, and require a lot more code. I think they are entirely optional in a microcontroller environment:
|
OK, this is a good point.
Yes, this would be a reasonable approach. Thich could even be done automatically, based on whether you have floats or doubles on a microcontroller.
I understand that, and there is no disagreement in this regard. But we don't have to stick with
OK.
I agree. You're not the first, who noticed this: #592 Just out of curiosity, what is your use case? One reason I didn't originally consider the |
My use case is quite trivial, I just need some randomness for real-time graphics with many objects moving on the screen. This is roughly similar to game programming, where to maintain a reasonable frame rate I need to be quick about computing many object state updates. Without the array processing that So for my use case, just having
I don't see why you'd need to plot / analyse, and you definitely don't have to do that on-device. The thing with PRNGs is that you can easily seed them with a known value, generate some sequence data and store it to a file on an SD card. If you want to be really fancy you can send that out over Ethernet or Wifi if so equipped. As for the I think this would be practical to test up to sequences of several 100GB using a cheap SD card and a lot of patience to wait. I'm not sure it's necessary to test large volumes of data. Once a PRNG follows the sequence, it will not stray from it, unless its code is insanely written. For the |
In that case, the quality of the generator is not really an issue, I guess. In any case, here is a first implementation. https://github.com/v923z/micropython-ulab/tree/random The MicroPython v1.20.0-311-gea1a5e43d on 2023-07-23; linux [GCC 12.2.0] version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> from ulab import numpy as np
>>> rng = np.random.Generator()
>>> rng.random(12)
array([1.688118533849092e-10, 0.7703293013781235, 0.6074833036730274, ..., 0.2697689388821915, 0.6484892791277977, 0.6955782450470177], dtype=float64)
>>> rng.random(size=(5, 5))
array([[0.7724060907456224, 0.1599236337293051, 0.7339406510831528, 0.5274410894040502, 0.6714481027287427],
[0.6492265779888077, 0.06322272656419292, 0.4808784435223349, 0.8659584576783896, 0.6510361989559311],
[0.7035330373879642, 0.1143488863528934, 0.3228607724092158, 0.9696441515163603, 0.3353713223418759],
[0.4890397567261386, 0.8449387836308111, 0.4607909039388876, 0.1653575696739978, 0.4966210702938057],
[0.3707932026109159, 0.7451473233433786, 0.1918072763583848, 0.7469201739384557, 0.7688914647671239]], dtype=float64) |
Great. That was fast... Looks good. This certainly goes the numpy way about having as many generator states as needed. Just need a way to seed them differently from each other :-) And while you're right that the quality of the generator is not an issue, its speed is, so I would love to have an option to use an XOR based Something like |
Sure. I haven't yet had time to add that, but it should be trivial.
PCG64 has a single integer multiplication, and bit shifts. Are you concerned about the time it takes to perform the multiplication?
Well, we've got to be compatible with seeds = [1, 2, 3, 4, 5]
generators = [random.Generator(seed) for seed in seeds] so I wouldn't waste flash space for having that in C. In fact, if you really want to, you an also hook into https://github.com/v923z/micropython-ulab/tree/master/snippets, and extend the C code with |
Yes. I'm working with ESP32 which has an Xtensa LX6 CPU.
My bad, I was trying to be succinct so I used To really be compatible with numpy you'd have to implement a Here is the gist of my suggestion above, it makes things match
|
I just read a little more in the numpy docs, apparently this is exactly what numpy does too, they don't delegate to the Instead, the concrete
|
We could replace the RNG, if you insist.
I don't really care about compatibility to the very last bit. It should be compatible in the standard use cases, and only the user-facing implementation. In fact, using a XOR-based RNG would already break compatibility, for
I don't want to implement everything, just because it's done so in
Is a function pointer to an inline function going to work? I thought that you wanted to inline the function, so that you can save on execution time. Now, using a pointer to a function is not going to speed up the code. |
I don't. I'm just hoping we can keep it open to additions.
I agree. I would keep PCG64 and add an optional xoroshiro. If you leave it open to extension, I might find time to add that later myself (under a #if config).
No, obviously a function called via a pointer cannot be inlined. Good point. Anyway, since there is no intention to make this open to adding |
I don't mind having multiple RNG implementations, and as you pointed out, performance could be platform-dependent.
No, that should be trivial. Either we simply decide this at compile time with the #if directive, or we find a way of switching between various implementations at run time. In any case, it would be great, if you could report on execution time/performance, when all the dust settles. One thing to keep in mind, however, is that it's not only the flash for the RNG that you have to pay for this, but the loops, too. If you want to inline the RNG, then you've got to write out the loops for each case that you implement. So, you would have something like this (pseudocode): if(RNG == PCG64) {
i = 0
while(i < ndarray.shape[3]) {
j = 0
while(j < ndarray.shape[2]) {
k = 0
while(k < ndarray.shape[1]) {
l = 0
while(l < ndarray.shape[0]) {
// PCG64 implementation here
*array = ...
array += ndarray.strides[0]
}
l++
array += ndarray.strides[1]
}
array += ndarray.strides[2]
k++
}
array += ndarray.strides[3]
j++
}
i++
} else if(RNG == xoroshiro) {
i = 0
while(i < ndarray.shape[3]) {
j = 0
while(j < ndarray.shape[2]) {
k = 0
while(k < ndarray.shape[1]) {
l = 0
while(l < ndarray.shape[0]) {
// xoroshiro implementation here
*array = ...
array += ndarray.strides[0]
}
l++
array += ndarray.strides[1]
}
k++
array += ndarray.strides[2]
}
j++
array += ndarray.strides[3]
}
i++
} There is no other way of avoiding the function call overhead. |
That would be true if the inner operation was on the order of one or two cycles. Since we are dealing with int const algo_index = algo_index_for_bitgen_type(figure_out_type(bitgen_argument));
void * const state_ptr = get_state_pointer(bitgen_argument);
while(...)
while (...)
while (...) {
register uint32_t bits;
switch(algo_index) {
case 0: bits = algo_0_next(state_ptr); break;
case 1: bits = algo_1_next(state_ptr); break;
...
}
*array = do_something_with(bits);
} |
OK. |
The switch can actually always be placed outside the innermost while, which would amortize it better and might enable use of zero overhead loops: while(...)
while (...)
while (...)
switch(algo_index) {
case 0: for(i=0; i<inner_extent; ++i, array += inner_stride) *array = do_something_with(algo_0_next(state_ptr)); break;
case 1: for(i=0; i<inner_extent; ++i, array += inner_stride) *array = do_something_with(algo_1_next(state_ptr)); break;
} Also, you could (and perhaps already do) mark ndarrays as contiguous when they are, and then run lower dimensionality loops on them. That would save time on loop code when users run something on arrays of shape (1000, 2, 1) and stride (2, 1 ,1) by looping on shape (2000,) stride (1,) instead. But that is all second order, I don't think it's really necessary. |
You can specify the seed now, an integer, or a tuple of integers. |
Thanks. I'm going on vacation for a while. Will test this when I get back. |
Hello All, I see the there was quite a progress here 6 months ago any expectations when it could reach closed/prebuilt stage? I would love to use this. |
I think the only outstanding issue is the |
Awesome, I don't want to rush anything so I'll patiently wait for "random". Glad that this is not dead. Thanks! ;) |
I think #654 is pretty close to what you wanted, so we could in principle merge that, and add new features in separate PRs. |
Amazing work, yeah that would unblock me for a start if I will find anything missing I will let you know but this is HUGE! Thanks a lot! |
Completed in #654. |
Hi Zoltán, I was looking for information on Yasmarang for my home made RNG when I came across this thread. I have implemented an RNG on ESP32 by directly calling the appropriate register in micropython and I was hooked and implemented another one for ESP8266 and one for RP2040. The thing is, I'm not a great specialist in random generators and I'm wading a bit... I've published my source code, documentation, datasets and tests, so if you could have a look and give me your impressions, I'd be delighted. Could you tell me whether these RNGs seem to be insightful (i.e. whether they generate sufficiently unpredictable and random sequences), and whether they meet the entropy criteria expected to guarantee a good level of security and diversity? PS: I'm more in drones and geomatics. |
@MicroControleurMonde I can't claim to be an expert on the question of how good a random number generator is. As you can also see from this thread, at the time of the implementation, @mcskatkat had a rather strong opinion on the issue, so it might be worthwhile to ping them. In any case, the site https://www.pcg-random.org/ gives a rather good overview of the features one has to watch out for. There is also a link to a publication that is in-depth and detailed. I think you would find that useful. |
Thank you for the swift response Zoltán. Yes, I've started looking pcg-random.org . A little stressful for me... I'm going to start digging around. I just received my Pyboard v1.1 card with hardcoded RNG hardcoded today. I'm going to do a test series and have a little fun with it this weekend. @mcskatkat any advise for a poor geomatician guy like me ? |
A quick feedback regarding the Pyboard and its hardcoded RNG: Time spent to generate 200'000 numbers: 108.295601 seconds I'm surprised, it's going at good speed ! |
Entropy = 3.458180 ... not good enough ! |
I'm not an expert on the mathematical properties of RNGs either. Serious RNG research is a very deep field in which many PhDs have been written. What I do know is that random registers on devices that have them are not intended to be used to generate streams of random numbers. The common practice is to use them to seed a mathematically robust pseudo-random generation algorithm, thus providing some amount of true randomness into what is otherwise always a deterministic sequence. As for the speed of ~2k numbers per second, you didn't mention how many bits in each number but even if those are 128-bit numbers, 500 microseconds per scalar cannot be considered fast. Personally I wrote a simulation that ran on GPU and produced ~10 billion high quality, 64-bit, accurately guassian distributed random number per second. That's about 5M times faster and it was on hardware that would now be ~13 years old. Also note that physical phenomenon random registers often have very specific instructions on how they need to be operated. This usually involves allowing some design-determined time to pass between taking samples, to allow the physical phenomenon to develop sufficiently to avoid sample correlation. This could explain why, if you generate sequences of numbers by repeated sampling of the register, you get unsatisfactory performance in entropy tests. I couldn't figure out exactly what you are trying to accomplish in your project, but I hope this helps. |
@mcskatkat Thanks, this is useful insight! |
Yes indeed it is. Thank you very much Mc Skatkat. |
You are not an expert in the mathematical properties of RNGs, of course, but still having an excellent understanding of the thing. It helps. Concerning the random registers of MCUs equipped with RNG, such as the ESP32 and the STM32F405:here is what I found and understood:
Number of bits for each numberI thought I had given it, silly me ...
In the case of the RP2040 and the ESP8266, I ‘tweaked’ a grinder based on reading the ADC converter register, mixed with SHA256 sauce (whitening). My project:To test the most common MCUs available to me in terms of performance and production. It's a far from my field of geomatics, but given that my drones, lidars and GPS cards are based on these little monsters, it's a good idea to check out what they've got and what they're worth. A small ESP32 or RP2040 MCU mounted on a card is only worth US$5 ... and in fact it allows you to do a lot of things, but it also has its own limitations. If I want to do triple-axis (X,Y,Z) coordinates collection with a bit of real-time processing + radio transmission, it's better to go for more robust Cortex ARMs like the STM32, for example. The speed of the TRNG on an ESP32 or a Pyboard is still very good. Don't forget that the concept is still to test the relevance of small 32-bit MCUs. So inevitably, we're going to find ourselves very far from a 64-bit CPU or a small GPU. Note:‘~10 billion high-quality, 64-bit, precisely distributed Gaussian random numbers per second’ |
Today, I ported the PCG32 that you suggested and I have generated a sample of 2,000,000 values. Here are the results: Target Values: 2000002 If I have enough time, I will test my batch tonight and will push the results tomorrow. Cheers ^^ |
The |
I still can't say I understand where you are going with this. They are put there for the sole purpose of letting a tiny bit of controlled non-determinism seep into computer systems, that otherwise spend a lot of energy and design effort on suppressing the natural randomness of the physical world they operate in, as they strive to produce deterministic, predictable and repeatable results.
If you just want to experiment with compute on MCUs (which is indeed surprisingly good per $), there may be more promising paths such as learning to use their dedicated HW accelerators. You don't need to go to ARM for that. Xtensa (the CPU on ESP32) is actually very capable and very extensible, as its name implies. If producing one output (random numbers or anything else) requires, say, 16 compute operations, it should come close to ~30M 32-bit values per second when running carefully designed software. |
Which |
Quickly, I will make a more elaborated answer tomorrow. It all started several weeks ago when I looked at the block diagram of my ESP32-S. That's when I wondered what the Espressif chip's hard-coded functions were worth. Digging around in the source code, I couldn't find an RNG function for the ESP module. So, I went in the Espressif doc to find the information and I wrote a little library that will read the correct register. It all started from there. To my great surprise, the Xtensa LX6 core is extremely swift for a chip that was produced in 2016. However, it does not have a SIMD vector unit or hardware acceleration. Correct me if I'm wrong. I can't get my hands on an ESP32-S3 card any time soon here. |
No, the one in |
Oh OK, I understand now. |
Yes, we agree. I would also be surprised if an MCU with TRNG was subjected to hundreds of thousands of random number generation requests per day... It doesn't make much sense from what I understand. Now there's the reality of tests like NIST, Dieharder and others... And there, on the other hand, you need a minimum sample volume to test things!
This is so true. I've plotted the disparity of the data over various random draws and the problem is that the dispersion is pretty bad. The sources of entropy aren't that entropic. Lol. On a more serious note, the variation intervals are quite small. As part of what I do, I also require the ability to ‘encrypt’ communications. This explains my |
The module is useful for machine learning related applications.
I would be glad if someone could add the np.random.randn function since I dont know how to do it in C.
The text was updated successfully, but these errors were encountered: