Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel reading of files #8

Open
pierreroudier opened this issue May 5, 2021 · 2 comments
Open

Parallel reading of files #8

pierreroudier opened this issue May 5, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@pierreroudier
Copy link
Owner

Implement reading of files in parallel chunks (most universally via future; e.g. future.apply::future_lapply()). If there are 100s or 1000s of files, the process of reading could for example be distributed over multiple cores to speed up the process. Key points (let's discuss):

  • Reading single files per core at a given time has too much overhead compared to time it takes to read one file.
  • Chunking could be done by splitting up the list of files into multiple groups according to numbers of cores registered (user).
  • Each chunk of files is read separately and then recombined at the end.
@pierreroudier pierreroudier added the enhancement New feature or request label May 5, 2021
@philipp-baumann
Copy link
Collaborator

In addition to my list above, we might also think about implementing a progress bar for the futurized version, i.e. using {progressr}

@pierreroudier
Copy link
Owner Author

FWIW the current progress bar implementation (pbapply::pblapply) does support parallel processing. But we could equally switch to progressr + future_lapply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants