-
I find this note when using class RepeatDataset:
"""A wrapper of repeated dataset.
The length of repeated dataset will be `times` larger than the original
dataset. This is useful when the data loading time is long but the dataset
is small. Using RepeatDataset can reduce the data loading time between
epochs.
""" When a epoch is done, the sampler of Dataloader will just be reset in my opion, so it's seem to take no extra time to reload new data. Why the note says it is helpful to reduce the data loading time between epochs. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Sorry for my late reply. If the |
Beta Was this translation helpful? Give feedback.
Sorry for my late reply. If the
persistent_workers
is set to False inDataloader
, there could be additional overhead to relaunch the workers. However, if you enable it inDataloader
, it means all workers will run independently in multiple processes, and any modification to dataset or pipeline will only work in the main process, and the dataset and pipeline in subprocesses launched by Dataloader will not be influenced.