Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5139] feat: (python-client): Support GCS fileset in the python GVFS client #5160

Open
wants to merge 108 commits into
base: main
Choose a base branch
from

Conversation

yuqi1129
Copy link
Contributor

@yuqi1129 yuqi1129 commented Oct 16, 2024

What changes were proposed in this pull request?

  • Support GCS fileset in the Python GVFS client
  • Add an IT about the GCS fileset.

Why are the changes needed?

It's user needs.

Fix: #5139

Does this PR introduce any user-facing change?

Modify the Python GVFS client.

How was this patch tested?

Test locally and add an IT that can't run automatically.

Modify mode as the following picture and execute ./gradlew :clients:client-python:test -PskipDockerTests=false success.

image image

@yuqi1129
Copy link
Contributor Author

This PR is not ready for review until #5079 is merged.

@yuqi1129 yuqi1129 requested a review from xloya October 17, 2024 13:42
@yuqi1129 yuqi1129 self-assigned this Oct 17, 2024
@yuqi1129
Copy link
Contributor Author

@jerryshao @xloya
This PR is ready for review, please help to take a look.

@jerryshao
Copy link
Contributor

@xloya would you please help to review this code, thanks.

clients/client-python/gravitino/filesystem/gvfs.py Outdated Show resolved Hide resolved
@@ -27,6 +27,8 @@
from fsspec.implementations.arrow import ArrowFSWrapper
from fsspec.utils import infer_storage_options
from pyarrow.fs import HadoopFileSystem
from pyarrow.fs import GcsFileSystem
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation, that all storage Python lib dependencies will be introduced into PyGVFS. Although there will be no conflicts in most cases, it may be better for users to only load the underlying storage dependencies they need. I wonder if there is a better way to introduce other FileSystem dependencies on demand, could you take a time to research this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will do it today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xloya
I have updated the code and used importlib to dynamically import file system classes according to needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Support GCS and S3 for python GVFS client
3 participants