-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rich visualization of large datasets #24
Comments
Also, this repo isn't very active so I'm wondering if this project is still actively supported? |
thanks for your interest and the pointer to the library. this repo is actively supported, but this repo gets updated only as relevant to dandisets and given contributor interests in contributing to this repo. indeed dandi supports remote reading in various formats, and the kind of example you should should exist in several dandisets. i would suggest focusing on microns and ibl datasets as they both have a very rich set of recordings. does the library work in the absence of a gpu? for example, would it work on a google colab notebook with or without a GPU? |
Hi, I finally got around to trying this but it's very slow, seems like the network is the bottleneck. I am trying it over wifi and getting 5MB/s (bytes not bits). I am on wifi though, I will try wired. from dandi.dandiapi import DandiAPIClient
dandiset_id = "000168"
file_path = "jGCaMP8f/jGCaMP8f_ANM471993_cell01.nwb"
# Get the location of the file on DANDI
with DandiAPIClient() as client:
asset = client.get_dandiset(dandiset_id, 'draft').get_asset_by_path(file_path)
s3_url = asset.get_content_url(follow_redirects=1, strip_query=True)
from fsspec.implementations.cached import CachingFileSystem
from fsspec import filesystem
from h5py import File
from pynwb import NWBHDF5IO
# Create a virtual filesystem based on the http protocol and use caching to save accessed data to RAM.
fs = filesystem("http")
file_system = fs.open(s3_url, "rb")
file = File(file_system, mode="r")
data = file["acquisition"]["Registered movie 0"]["data"] fetching single frames:
EDIT: My network is definitely not the bottleneck, it could either be a bandwidth limitation from where the files are stored, or from the way it's being access through dandi client, the virtual filesystem stuff, or nwb hdf (I doubt it would be nwb hdf). Are there any datasets hosted somewhere that is known to have very high bandwidth to help rule this out? EDIT 2: For this particular dataset, it seems like fetching single frames is also quite slow even if the data are locally on disk, this is separate from the network issue though and possibly due to the chunk size or compression of this particular dataset. The network is still the main bottleneck:
You'd need ~30ms or less seek-time to have useful random-access. We have done this before many times with files on remote filesystems, but they were usually memmaps or zarr and not NWB files. |
Is there any way to know what bandwidth is available for a given dandiset? |
Regarding colab compatibility, we are trying to implement that here: vispy/jupyter_rfb#77 |
@kushalkolar - a few things. we don't control the bandwidth, aws does hence it fluctuates based on demand at a given point in time. there are indeed times when it can be slow. you also may want to look into https://github.com/magland/remfile for at least nwb files and also talk to @magland who is building new viz tools as well. |
Hi, I just came across this repo. I haven't looked into DANDI in detail but it seems like there's a nice API which can provide lazy loading and random access to files (assuming those file types support lazy loading), correct me if I'm wrong?
Anyways I'm writing a new library to perform very fast visualizations in jupyter notebooks, it can leverage Vulkan/WGPU using an expressive API. I'm curious to see how it would perform with DANDI.
https://github.com/kushalkolar/fastplotlib
The text was updated successfully, but these errors were encountered: