Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit number of open files #409

Open
madsbk opened this issue Jul 23, 2024 · 2 comments
Open

Limit number of open files #409

madsbk opened this issue Jul 23, 2024 · 2 comments
Labels
improvement Improves an existing functionality

Comments

@madsbk
Copy link
Member

madsbk commented Jul 23, 2024

In order to avoid ulimit issues, it would be useful to have an option that limits the number of open files.
Maybe open files lazily?

cc. @VibhuJawa

@madsbk madsbk added the improvement Improves an existing functionality label Jul 23, 2024
@VibhuJawa
Copy link
Member

VibhuJawa commented Jul 23, 2024

For context, I have seen this error the most while writing partitioned datasets. Don't know how it impacts there.

dask_df.to_parquet(partition_on=["xyz"])

And if #410 helps in that case too.

@VibhuJawa
Copy link
Member

Should help with issues mentioned in the comments here:

NVIDIA/NeMo-Curator#157 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality
Projects
None yet
Development

No branches or pull requests

2 participants