Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalize testing for gcs filesystem #1400

Merged
merged 4 commits into from
May 6, 2021
Merged

Conversation

vnghia
Copy link
Contributor

@vnghia vnghia commented May 5, 2021

I think the gcs filesystem plugins is now usable, the errors are fixed in #1396. I am inclined to refactor the gcs filesystem to have it worked with #1229 (mostly adding some classes for inheritance), but it seems useless since we can not do a dynamic_cast<RetryingGcsFileSystem*> anyway. I would like to ask what we will do with gcs filesystem plugin in the future ? @mihaimaruseac @yongtang

Copy link
Member

@yongtang yongtang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yongtang yongtang merged commit e397b6b into tensorflow:master May 6, 2021
@yongtang
Copy link
Member

yongtang commented May 6, 2021

@vnvo2409 In our SIG IO meeting, the plan was that:

  1. An independent tensorflow-io-plugin-gs wheel which consists of only gs plugin will be released. (At the moment we have a nightly build of tensorflow-io-plugin-gs-nightly. The final release of tensorflow-io-plugin-gs will be out the same time tensorflow-io 0.18.0 is out).
  2. In tensorflow 2.6.0 (or later), eventually tensorflow will depends on tensorflow-io-plugin-gs and the gs implementation inside tensorflow C++ source will be removed. tensorflow-io-plugin-gs will be loaded at the startup of tensorflow to allow the availability of gs file system whenever user imports tensorflow.

There are some ordering dependencies as we will need to wait for TF 2.5 final out first, then we will have tensorflow-io-plugin-gs out (together with tensorflow-io 0.18.0 which will be a separate wheel), then we will update tensorflow to depending on a stable version of tensorflow-io-plugin-gs. So it will take at least 2-3 rounds to get the plan in place.

/cc @mihaimaruseac @kvignesh1420 @terrytangyuan

@vnghia
Copy link
Contributor Author

vnghia commented May 6, 2021

@yongtang When I first wrote the gcs plugins, I was not aware of internal Google and I removed a level of inheritance:

  • FileBlockCache -> RamFileBlockCache to only RamFileBlockCache.

Afterwards, I discovered that tensorflow-gcs-config uses FileBlockCache and maybe other internal Google code uses it too. I think it is one of the root sources of breakages right now. I think it's not really too hard to re-add FileBlockCache. However, since the plugins come in the form of a shared library instead of a class, we can not really do a dynamic_cast<RetryingGcsFileSystem*> and pull the FileBlockCache out.

As you said in #1229 (comment), we will eventually need some extra ops but I don't see any progress related to it. I would like to ask about the plan for incorporating gcs plugins and tensorflow-gcs-configso that I can prepare some infrastructures for it. Thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants