Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting V1 dtype #126

Open
SergeyTsimfer opened this issue Jul 13, 2024 · 4 comments
Open

Supporting V1 dtype #126

SergeyTsimfer opened this issue Jul 13, 2024 · 4 comments
Labels
feature New feature or request

Comments

@SergeyTsimfer
Copy link

Hey,

Great work with the library!

Is it possible to add support for one-bit dtype (np.dtype('V1')) arrays as input for cc3d routines, mainly connected_components?

@william-silversmith william-silversmith added the feature New feature or request label Jul 13, 2024
@william-silversmith
Copy link
Contributor

Hi!

Thanks for the suggestion. I was thinking about it and I'm trying to decide if this is enough of a gain to be worth it. I presume you are trying to optimize memory consumption, but odds are the output will need to be uint16 or uint32 to contain the final labels, particularly if your input is so large as to justify bit-packing.

I also gave this a try:

>>> arr = np.zeros([32], dtype='V1')
>>> arr.nbytes
32

It doesn't seem like numpy actually represents things as bit-packed?

I would recommend doing arr.astype(bool) or arr.astype(np.uint8).

Can you tell me a little bit more about your use case? How big the input array is, how many components there are, what are the performance constraints etc.?

Will

@SergeyTsimfer
Copy link
Author

My usecase is a ~5000x5000x2000 array with one very large component (a surface along the first two dimensions, so it has ~5000x5000x3 points) and a few thousands of smaller ones. This array comes as a prediction to a binary segmentation task, so I need only the largest component anyway.

I am already passing it as dtype=bool array and I know in advance that uint16 is enough for the final labels; this works well within my RAM requirements, so I'm mainly trying to optimize the speed of component extraction. Using a smaller dtype was one of the ideas; however, as you pointed out, for numpy the minimum size of a dtype is one byte:(

On the other hand, array can be manually packed into uint8, analogous to how np.packbits works. If connected_components would be aware of such possible packed dtype, would it work faster than on actual one-bit dtype?

@william-silversmith
Copy link
Contributor

Interesting use case. Here are my thoughts:

  1. Have you considered downsampling the image? You can achieve very fast downsamples with my https://github.com/seung-lab/tinybrain/ library. I expect you'd want to use "mode" downsampling.

  2. Bit packing the image could make the algorithm slower or faster. If your image is so large as to be swapping or otherwise the caches are not managed well, the bit packed image could be a little faster for reading the input. However, if that's not the case, bit packed image access is slower as more work needs to be done to extract the proper bit per a voxel. The output image still needs to be written voxel by voxel and this won't help with that.

  3. Binary images do have slightly faster algorithms available. The algorithm used here is for multivalued images. You can expect somewhere between a 10-20% performance improvement, maybe a bit more, by using these improved techniques.

  4. For your use case in particular, where the number of shapes is limited and most of the voxels do not need to be analyzed, using an alternative algorithm such as contour tracing may show superior performance. However, this is not guaranteed.

  5. I don't know what your time target is, you have a 50 Gigavoxel image and it will take some time to process. However, a parallel implementation of cc3d would potentially get you the kind of speedups you are looking for. However, it is not particularly easy to write.

@william-silversmith
Copy link
Contributor

I did some experiments with parallel and found I got about a 2.3x improvement with 8 cores. I'm probably doing something wrong though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants