Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does Collator need to exist? #1181

Open
lendle opened this issue Jun 12, 2023 · 1 comment
Open

Does Collator need to exist? #1181

lendle opened this issue Jun 12, 2023 · 1 comment

Comments

@lendle
Copy link

lendle commented Jun 12, 2023

📚 The doc issue

Docs for Collator leave a lot of questions.

Collates samples from DataPipe to Tensor(s) by a custom collate function
What does collate mean in this context? What is the collate function applied to? In the torch Dataloader docs, it's clear that collate_fn is meant to be applied to a batch of data, but that's not explained here at all. Looking at the implementation I think the input datapipe is supposed to be batched here too, but that's not clear.

What's the difference between this and Mapper? Sort of seems like the only difference is that the output of collate_fn is supposed to be tensors? Or collections of Tensors? I have used it with a function that returns a list of ints though, so there doesn't seem to be anything enforcing that the output is Tensors.

Suggest a potential alternative/fix

Get rid of Collator if it doesn't add anything over Mapper, it's confusing

If keeping it:

  • If it's basically Mapper with a default mapping function that converts things to tensors, don't allow specifying the function.
  • Or explain why this is different than mapper.
  • State that input should be batched
  • Document the conversion argument
@josiahls
Copy link

I was under the impression that the dataloader could look for a Collator in the pipeline, and if one doesn't exist, it would just use the default_collate function from pytorch, but I could be wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants