Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

too many open files from dataloader #379

Closed
jpata opened this issue Dec 12, 2024 · 1 comment
Closed

too many open files from dataloader #379

jpata opened this issue Dec 12, 2024 · 1 comment
Labels

Comments

@jpata
Copy link
Owner

jpata commented Dec 12, 2024

I'm seeing some issues with tfds, it does not seem to close open files properly after we switched to the split datasets in #350 that perhaps made the problem somewhat more apparent.

The reason is that random access to the concatenated datasets does not allow files to be closed.
With shuffling disabled here: https://github.com/jpata/particleflow/blob/main/mlpf/model/PFDataset.py#L259, the usage seems to be somewhat lower.

@jpata jpata added the hard label Dec 12, 2024
@jpata jpata changed the title migrate from tfds array record datasets to native pytorch parquet datasets too many open files from dataloader Dec 18, 2024
@jpata jpata added bug and removed hard labels Dec 18, 2024
@jpata
Copy link
Owner Author

jpata commented Jan 15, 2025

I'm not getting issues on my systems right now, as I've raised ulimits, but one needs to be careful about too many parallel jobs.

For a CLIC training, the number of open files currently looks like this:
Screenshot 2025-01-15 at 13 06 22

@jpata jpata closed this as completed Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant