-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can it be made a more transparent drop-in for ndarray? #21
Comments
Hello, implementing It's possible to assign I'm not too sure about integrating a general lazy To implement |
I spent much of the last week implementing lazy reshape. And yes, it was complicated. I ended up rewriting about 90% of the code. Though I got it working, and I tested quite a few combinations of transpose and reshape and slice, I'm sure some there are some corner cases where it will fail. After I've had more time to play with it I'll push my changes to my fork, but it's such a huge change that I doubt a PR is what you want. I'll post here again when I feel it's ready for other eyes and you can let me know how you feel. I took a quick look at dask but it didn't seem to meet my use case. I should look again. |
@cboulay |
I'm not quite ready to say it's suitable for a PR, but I'll post the main commit for reference in case I get otherwise distracted and someone wants these features without waiting for me to clean it up more: As I was working on it, I thought it would have been better to use |
I'm trying to see how far I can take my ~50 GB hdf5 datasets through my processing pipeline before explicitly creating an ndarray. My pipeline uses a framework (Neuropype) that puts the ndarray in a container along with some metadata and makes extensive use of ndarray functions returning views. I think I could get a lot further in this framework with my h5 dataset if a wrapper class like
DatasetViewh5py
reimplemented some of those ndarray functions that return views.Are there any downsides to renaming
lazy_transpose
totranspose
?Do you foresee any problems with a lazy implementation of
reshape
?I'm also considering a custom implementation of
squeeze
.numpy users expect
flatten()
to return a copy so probably not that one.What about
min
,max
,argmin
,argmax
,any
andall
when an axis is provided? Even though all of the data will have to be loaded into memory eventually, it can be done sequentially row-by-row (or column-by-column) so maybe this will help avoid out-of-memory errors. I am fairly new to processing data cached-on-disk so I'm hoping others with more experience can tell me if this is a bad idea from the outset.The text was updated successfully, but these errors were encountered: