too many open files from dataloader #379

jpata · 2024-12-12T08:36:43Z

I'm seeing some issues with tfds, it does not seem to close open files properly after we switched to the split datasets in #350 that perhaps made the problem somewhat more apparent.

The reason is that random access to the concatenated datasets does not allow files to be closed.
With shuffling disabled here: https://github.com/jpata/particleflow/blob/main/mlpf/model/PFDataset.py#L259, the usage seems to be somewhat lower.

jpata · 2025-01-15T11:06:40Z

I'm not getting issues on my systems right now, as I've raised ulimits, but one needs to be careful about too many parallel jobs.

For a CLIC training, the number of open files currently looks like this:

jpata added the hard label Dec 12, 2024

jpata changed the title ~~migrate from tfds array record datasets to native pytorch parquet datasets~~ too many open files from dataloader Dec 18, 2024

jpata added bug and removed hard labels Dec 18, 2024

jpata mentioned this issue Dec 18, 2024

Refactor training code #384

Merged

jpata closed this as completed Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

too many open files from dataloader #379

too many open files from dataloader #379

jpata commented Dec 12, 2024 •

edited

Loading

jpata commented Jan 15, 2025

too many open files from dataloader #379

too many open files from dataloader #379

Comments

jpata commented Dec 12, 2024 • edited Loading

jpata commented Jan 15, 2025

jpata commented Dec 12, 2024 •

edited

Loading