numpy.load() feature upgrade #644

hamza-712 · 2023-08-09T10:40:37Z

numpy.load() feature upgrade
Hi,
Can you please add a very useful feature of using memmap to load only a part of numpy array from a file?
For example

Let's create a memory-mapped array in write mode:

import numpy as np
nrows, ncols = 1000000, 100
f = np.memmap('memmapped.dat', dtype=np.float32,
              mode='w+', shape=(nrows, ncols))

Let's feed the array with random values, one column at a time because our system's memory is limited!

for i in range(ncols):
    f[:, i] = np.random.rand(nrows)
x = f[:, -1]
del

### READING

f = np.memmap('memmapped.dat', dtype=np.float32,
              shape=(nrows, ncols))
np.array_equal(f[:, -1], x)
True
del f

Additional context
(https://numpy.org/doc/stable/reference/generated/numpy.load.html)
like using numpy.memmap inside numpy.load()

The text was updated successfully, but these errors were encountered:

v923z · 2023-08-09T12:24:56Z

I believe this is actually much more than just reading part of the file, at least, this is what I understand from this: https://numpy.org/doc/stable/reference/generated/numpy.memmap.html. Basically, you don't load anything with memmap, you just create a pointer to data on the disc, so if you take the method sum as an example, sum has to know how to handle data that are not stored in RAM, and that is highly non-trivial.

hamza-712 · 2023-08-11T20:48:00Z

Can you implement a way to save the numpy arrays in append mode. Similarly, reading the numpy partial subarray of the numpy with somekind of 'offset' variable.

v923z · 2023-08-13T16:06:06Z

Can you point to the relevant documentation?

hamza-712 · 2023-08-13T19:31:26Z

For appending arrays, there is a library. It is not part of the official numpy docs.
https://pypi.org/project/npy-append-array/

for reading I haven't seen other way implemented than h5py or numpy.memmap
https://numpy.org/doc/stable/reference/generated/numpy.load.html

v923z · 2023-08-14T18:47:38Z

I feel that we're rapidly going off-tangent, but still, here are a couple of comments:

Dtype mod #327 implements more or less what you want. As I said, your request is not trivial, and we have to tread carefully here. It's no accident that that hasn't yet been merged, but we could dust it off.
As you pointed out, npy-append-array is not part of numpy, which leads me to the question, whether what you would like could/should be implemented not at the C-level, but in python. If so, the next question is, what would you need for that. Would it help, if you had a method that simply lays bare the binary contents of an ndarray's pointer, which you could then write to a file from python? We could turn the methods of https://github.com/v923z/micropython-ulab/blob/master/code/utils/utils.c, and add one that gets you the ndarray. You would then manipulate the header or your .npy file from python.

hamza-712 · 2023-08-21T18:03:15Z

I need it for ndarray file reading. Its basically audio data that I'm loading from sdcard into ESP32. I have implemented file handling in micropython. but its slow as compared to normal numpy.load()
. numpy.load() roughly takes around 4ms usually for the size of data. and my implementation takes around 50ms.
I have made a rudimentary python based implementation that puts a header in the beginning of a binary file. the header needs to edited in append mode which slows down the write process.

a method that simply lays bare the binary contents of an ndarray's pointer, which you could then write to a file from python? We could turn the methods of https://github.com/v923z/micropython-ulab/blob/master/code/utils/utils.c, and add one that gets you the ndarray. You would then manipulate the header or your .npy file from python.

This will help a lot. Anything lays out pointers to rows is good enough.

v923z · 2023-08-27T08:21:28Z

It's not quite clear to me what your vision for such a function would be. The way you describe it seems to indicate that you'd need access to data that is not contiguous. Is that the case?

hamza-712 · 2023-08-27T16:03:19Z

let's say I have my data stored in ndarray (1000,7) in a file.
I want to retrieve only a block (10,7) from file without bringing the whole ndarray into the memory.

The function should be able to allow to read some block of rows. The function I have implemented in Python allows to read contiguous rows from the file only.

def filereader( rows_to_read = 1, offset_index = 0):

v923z · 2023-08-28T14:52:41Z

OK, so one thing we could do is add the numpy-incompatible keywords offset and count to load, so that you could start from a particular place, and read a given number of values.

There might be an issue, and I don't quite know how to handle that: if you want to add offset and count, then you have to know beforehand what the shape in the file is, otherwise, you might request something that's not compatible with the contents of the file.

hamza-712 · 2023-08-29T09:32:28Z

I have created a header struct in my python struct which keeps track of array dtype and array shape.

        header_format = "BBHH"
        header_data = ustruct.pack(header_format, byte_size, array.dtype, row_dimension, column_dimension)

Moreover, it's better that the write operation mode should be only overwrite mode so that we don't have to edit the header again and again.

v923z · 2023-08-29T12:50:42Z

What you're saying here doesn't address the issue I mentioned earlier. If we add a keyword or something like that to load, then we cannot rely on the fact that you know everything about the file that you're going to read. So, if the file contains data that were of the shape (4, 4, 4), which is 16 entries, but you're trying to read into a shape (2, 5), what should happen?

Also, the title of this thread is "numpy.load() feature upgrade", so we shouldn't talk about write operations here. Even memmap is about reading from a file, and not writing to it. I have the feeling that we're dealing with a feature creep here. Could you, please, define exactly what this new feature of the load function should do?

We might actually be better off adding the function to utils, if you really need it.

jonnor · 2024-08-11T19:33:15Z

I wrote an implementation of .npy file loading/saving for MicroPython - which also supports streaming reading of data. The streaming API is different than the numpy.load() one - to allow accessing/validating the metadata/structure information before actually reading the data. https://github.com/jonnor/micropython-npyfile?tab=readme-ov-file#streaming-read

hamza-712 added the enhancement New feature or request label Aug 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numpy.load() feature upgrade #644

numpy.load() feature upgrade #644

hamza-712 commented Aug 9, 2023 •

edited

Loading

v923z commented Aug 9, 2023

hamza-712 commented Aug 11, 2023

v923z commented Aug 13, 2023

hamza-712 commented Aug 13, 2023

v923z commented Aug 14, 2023

hamza-712 commented Aug 21, 2023

v923z commented Aug 27, 2023

hamza-712 commented Aug 27, 2023

v923z commented Aug 28, 2023 •

edited

Loading

hamza-712 commented Aug 29, 2023

v923z commented Aug 29, 2023

jonnor commented Aug 11, 2024 •

edited

Loading

numpy.load() feature upgrade #644

numpy.load() feature upgrade #644

Comments

hamza-712 commented Aug 9, 2023 • edited Loading

v923z commented Aug 9, 2023

hamza-712 commented Aug 11, 2023

v923z commented Aug 13, 2023

hamza-712 commented Aug 13, 2023

v923z commented Aug 14, 2023

hamza-712 commented Aug 21, 2023

v923z commented Aug 27, 2023

hamza-712 commented Aug 27, 2023

v923z commented Aug 28, 2023 • edited Loading

hamza-712 commented Aug 29, 2023

v923z commented Aug 29, 2023

jonnor commented Aug 11, 2024 • edited Loading

hamza-712 commented Aug 9, 2023 •

edited

Loading

v923z commented Aug 28, 2023 •

edited

Loading

jonnor commented Aug 11, 2024 •

edited

Loading