Parallel reading of files #8

pierreroudier · 2021-05-05T21:22:11Z

Implement reading of files in parallel chunks (most universally via future; e.g. future.apply::future_lapply()). If there are 100s or 1000s of files, the process of reading could for example be distributed over multiple cores to speed up the process. Key points (let's discuss):

Reading single files per core at a given time has too much overhead compared to time it takes to read one file.
Chunking could be done by splitting up the list of files into multiple groups according to numbers of cores registered (user).
Each chunk of files is read separately and then recombined at the end.

philipp-baumann · 2021-05-06T09:46:08Z

In addition to my list above, we might also think about implementing a progress bar for the futurized version, i.e. using {progressr}

pierreroudier · 2021-05-06T21:58:43Z

FWIW the current progress bar implementation (pbapply::pblapply) does support parallel processing. But we could equally switch to progressr + future_lapply.

pierreroudier added the enhancement New feature or request label May 5, 2021

pierreroudier mentioned this issue May 5, 2021

Initial CRAN release checklist #4

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel reading of files #8

Parallel reading of files #8

pierreroudier commented May 5, 2021

philipp-baumann commented May 6, 2021

pierreroudier commented May 6, 2021

Parallel reading of files #8

Parallel reading of files #8

Comments

pierreroudier commented May 5, 2021

philipp-baumann commented May 6, 2021

pierreroudier commented May 6, 2021