-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Speed up the resume process of IterBased loop #1520
Comments
一个最小改动的方案是在迭代前 mock dataset 的 def run(self) -> None:
"""Launch training."""
self.runner.call_hook('before_train')
# In iteration-based training loop, we treat the whole training process
# as a big epoch and execute the corresponding hook.
self.runner.call_hook('before_train_epoch')
if self._iter > 0:
print_log(
f'Advance dataloader {self._iter} steps to skip data '
'that has already been trained',
logger='current',
level=logging.WARNING)
# mock
old_getitem = self.dataloader_iterator.dataset.__getitem__
self.dataloader_iterator.dataset.__getitem__ = a_new_getitem_method
for _ in range(self._iter):
next(self.dataloader_iterator)
self.dataloader_iterator.dataset.__getitem__ = old_getitem |
I believe this PR is the cause of the issue: #1471. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What is the feature?
mmengine/mmengine/runner/loops.py
Line 281 in 2c4516c
现有的恢复方式会对dataloader 迭代 n 个step,当n较大时,速度会很慢,因为执行了实际的数据加载和处理逻辑。 是否有比较好的方式只迭代index,不执行实际的数据加载流程。
尝试直接迭代batch_sampler 在worker=0的时候是正常的,在多worker的时候恢复数据顺序出现错误。 像知道有没有什么比较好的解决方案
Any other context?
https://discuss.pytorch.org/t/is-there-any-way-to-skip-steps-in-a-dataloader/123201
https://pytorch.org/data/main/dataloader2.html
Snapshot the state of data-preprocessing pipeline (WIP)
The text was updated successfully, but these errors were encountered: