-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed improvements to loading HDF5 trees #25
Comments
I've added some caching logic to the group class. Try out this latest checkin: 1999417. This is not a single operation recursive load, but I saw a speed up of about ~4x speed up walking the tree for the sample Nexus file. This is with using the hsls.py script in the app directory. |
@rayosborn - did you get a chance to try this out? |
I have tested it, but I wasn't sure of the previous speeds because I forgot to do a proper timing before upgrading. I need to revert to the old version. However, I don't think I saw a factor four. It might have been a factor of two. |
There will be some variability based on the latency between client and server. My testing was with a server running on the same LAN. Also, the test driver is different. Did the NexPy GUI need a lot of mods to work with h5serv? I could set it up in my environment. |
I haven't made any changes to the NeXpy GUI yet. In the latest development version on my own clone of the nexusformat API, the |
If you push the branch to github, I can just grab from there. How would I use it to list the contents of a Nexus file? |
The
The file path is the path relative to the data directory. The module converts that to a domain name. The top domain is currently 'exfac.org' to match the test repository. |
@rayosborn - some updates on this old issue... To compare the performance not using the prefetch, you can use: |
Thanks, @jreadey. I can't look into this for a couple of weeks, but I plan to soon. |
In the nexusformat API, we load the entire HDF5 file tree by recursively walking through the groups in h5py, without reading in data values except for scalars and small arrays. On a local file, we can load files containing hundreds of objects without a significant time delay. For example, a file with 80 objects (groups, datasets, and attributes) takes 0.05s to load on my laptop. However, on h5pyd, the same load takes over 20s.
A call to load all the items in an HDF5 group requires two GET requests, and sometimes three, for each object, so there could be an improvement if all the metadata (shape, dtype, etc.) for each object were returned in a single call, and an even more significant one if all the items in a group could be returned with one GET request. Loading one group of 10 objects took 29 requests in my tests.
Binary data reads are fast, though.
The text was updated successfully, but these errors were encountered: