Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transitioning to Julian datasets and data loaders #7

Closed
4 tasks done
darsnack opened this issue Jun 12, 2020 · 5 comments
Closed
4 tasks done

Transitioning to Julian datasets and data loaders #7

darsnack opened this issue Jun 12, 2020 · 5 comments
Labels
parity:pytorch Needed for feature parity with PyTorch

Comments

@darsnack
Copy link
Member

darsnack commented Jun 12, 2020

I wanted to add to this the general discussion from Zulip (linked below). It's best we transfer concrete suggestions to here so it's available for everyone to contribute. The plan has evolved into two "milestones" — short-term and long-term. I'll summarize both below.

Short-term:

  • Implement iterable and map-like datasets à la PyTorch
  • Implement a DataLoader interface mimicking PyTorch
  • The goal here is to take PyTorch implementations as-is to get something up and running, since this is a desperately needed functionality in the ecosystem
  • Being tracked here: Implement Dataloader interface #5

Long-term:

  • Transfer iterable dataset interface to Base iterator interface [1]
  • Transfer the map-like dataset interface to Base indexable collections [1]
  • The DataLoader interface doesn't necessarily need to change, but long term we should make sure that concrete implementations of the interface take advantage of things like Base.Random and Distributions for sampling

Relevant Zulip topics:


[1]: the Base interfaces may not be perfect for our needs, so we might need to build off of them

@darsnack
Copy link
Member Author

cc @opus111 @SomTambe

@darsnack darsnack added interface Interface design or implementation issue needs-discussion Community input wanted labels Jun 12, 2020
@darsnack darsnack added parity:pytorch Needed for feature parity with PyTorch and removed interface Interface design or implementation issue needs-discussion Community input wanted labels Oct 26, 2020
@lorenzoh
Copy link
Member

I think the LearnBase.jl is being established pretty nicely, with DataLoaders.jl and FastAI.jl fully embracing it.

As an update:

@darsnack
Copy link
Member Author

I think most of the original issue has effectively been completed by adopting JuliaML/LearnBase.jl. In addition to the dataset transfer and data container transformations above, there are some short-term objectives that need to be met:

@lorenzoh
Copy link
Member

Yeah, I didn't want to close this, just to give an update on what's been happening 👍

@darsnack
Copy link
Member Author

darsnack commented Aug 9, 2022

Closing this since we either have completed the sub-issue as part of the LearnBase.jl -> MLUtils.jl transition, or we have a tracking issue on MLUtils.jl / MLDatasets.jl itself.

@darsnack darsnack closed this as completed Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parity:pytorch Needed for feature parity with PyTorch
Projects
None yet
Development

No branches or pull requests

2 participants