Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable browsing s3 via jupyter-fs #13

Merged
merged 3 commits into from
Apr 3, 2024
Merged

Conversation

minrk
Copy link
Collaborator

@minrk minrk commented Apr 3, 2024

@tinaok this adds an S3 browser to the JupyterLab sidebar:

Screenshot 2024-04-03 at 14 22 32

@yuvipanda this is what I mentioned to you yesterday. It seems to work fine with JupyterLab 4, but I had to do some shenanigans to work around PyFilesystem/s3fs#70 because our files are not created with S3FS (they are created with s3fs), and S3FS makes some hard assumptions that it's created everything it might read (namely that an empty Object exists representing each directory level, which is not true in general). I did the definitely-totally-fine thing of catching the error that raises when a directory lacks a corresponding Object and making those empty objects if they are missing.

@minrk minrk merged commit 565a13c into destination-earth:main Apr 3, 2024
1 check passed
@minrk minrk deleted the jupyter-fs branch April 3, 2024 12:30
@minrk minrk changed the title enable jupyter-fs enable browsing s3 via jupyter-fs Apr 3, 2024
@yuvipanda
Copy link

our files are not created with S3FS (they are created with s3fs)

this is beautiful haha

@yuvipanda
Copy link

Hmm, so this just doesn't recognize 'directories' unless a specific empty object exists? So it probably won't work for readonly data buckets that don't do that?

@minrk
Copy link
Collaborator Author

minrk commented Apr 3, 2024

So it probably won't work for readonly data buckets that don't do that?

My exact workaround won't, though you could do a read-only version of my workaround where it returns a fake Info model as if it were a real one instead of creating the object and trying again. The advantage of my version is it will only take the fallback path one time for any given missing directory, and will do the 'right' thing forever after.

@minrk
Copy link
Collaborator Author

minrk commented Apr 3, 2024

This one appears to work for read-only:

import fs.errors
from fs.info import Info, ResourceType
from fs_s3fs import S3FS


class EnsureDirS3FS(S3FS):
    def getinfo(self, path, namespaces=None):
        try:
            return super().getinfo(path, namespaces)
        except fs.errors.ResourceNotFound as e:
            # workaround https://github.com/PyFilesystem/s3fs/issues/70
            # check if it's a directory with no corresponding Object (not created by S3FS)
            # scandir/getinfo don't work on missing directories, but listdir does
            # if it's really a directory, return stub Info instead of failing
            try:
                self.listdir(path)
            except fs.errors.ResourceNotFound:
                raise e from None
            else:
                # return fake Info
                # based on S3FS.getinfo handling of root (`/`)
                name = path.rstrip("/").rsplit("/", 1)[-1]
                return Info(
                    {
                        "basic": {
                            "name": name,
                            "is_dir": True,
                        },
                        "details": {"type": int(ResourceType.directory)},
                    }
                )

@yuvipanda
Copy link

@minrk this is great!

How do you control the list of buckets that show up here?

@minrk
Copy link
Collaborator Author

minrk commented Apr 3, 2024

We're only working with one bucket. I think you need to explicitly list each bucket you want to mount in the resources config.

Ours is here. The bucket name (or arbitrary subdir) is in the mount.

@yuvipanda
Copy link

Interesting! Was https://github.com/destination-earth/DestinE_ESA_GFTS/pull/13/files#diff-96599d676c72313e9986285fd7ab9d14b18d8bec6167a33056b85ad4d2529435R101 needed as well, even if you only want the sidebar to show up?

@minrk
Copy link
Collaborator Author

minrk commented Apr 3, 2024

Yes, the listing requests use the contents API at special drive:/... subdirectories. A default FileContentsManager still serves the "root" so there's no noticeable effect on regular file UI. I think you can still specify the root contents manager class if you need to.

I'm not 100% sure why this is implemented by overriding ContentsManager rather than replicating the API on a different endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants