Why `RepeatDataset` can reduce time of loading data between epochs #1193

MciaR · 2023-06-06T01:41:58Z

MciaR
Jun 6, 2023

I find this note when using RepeatDataset, which is in mmengine/dataset/dataset_wrapper.py.

class RepeatDataset:
    """A wrapper of repeated dataset.

    The length of repeated dataset will be `times` larger than the original
    dataset. This is useful when the data loading time is long but the dataset
    is small. Using RepeatDataset can reduce the data loading time between
    epochs.
    """

When a epoch is done, the sampler of Dataloader will just be reset in my opion, so it's seem to take no extra time to reload new data. Why the note says it is helpful to reduce the data loading time between epochs.
Thanks for comments!

Answered by HAOCHENYE

Jul 17, 2023

Sorry for my late reply. If the persistent_workers is set to False in Dataloader, there could be additional overhead to relaunch the workers. However, if you enable it in Dataloader, it means all workers will run independently in multiple processes, and any modification to dataset or pipeline will only work in the main process, and the dataset and pipeline in subprocesses launched by Dataloader will not be influenced.

View full answer

HAOCHENYE · 2023-07-17T03:19:05Z

HAOCHENYE
Jul 17, 2023
Maintainer

Sorry for my late reply. If the persistent_workers is set to False in Dataloader, there could be additional overhead to relaunch the workers. However, if you enable it in Dataloader, it means all workers will run independently in multiple processes, and any modification to dataset or pipeline will only work in the main process, and the dataset and pipeline in subprocesses launched by Dataloader will not be influenced.

1 reply

MciaR Jul 26, 2023
Author

Get it! Thanks for your answer!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why `RepeatDataset` can reduce time of loading data between epochs #1193

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Why RepeatDataset can reduce time of loading data between epochs #1193

MciaR Jun 6, 2023

Replies: 1 comment · 1 reply

HAOCHENYE Jul 17, 2023 Maintainer

MciaR Jul 26, 2023 Author

Why `RepeatDataset` can reduce time of loading data between epochs #1193

MciaR
Jun 6, 2023

Replies: 1 comment 1 reply

HAOCHENYE
Jul 17, 2023
Maintainer

MciaR Jul 26, 2023
Author