Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump file descriptor rlimit to hard rlimit by default #315

Merged
merged 1 commit into from
Feb 5, 2024

Conversation

cvonelm
Copy link
Member

@cvonelm cvonelm commented Feb 5, 2024

The default soft limit for the number of open file descriptors per-process in most Linux systems is 1024. This results in crashes on most HPC systems I've used recently as even simple lo2s invocations will exceed this limit with all the per-core perf_event_open calls.

This microscopic soft limit is in place because select() only allows fd's < 1024. If you do not plan to use select() in your code, it is safe to bump the file descriptor limit from the soft limit to the hard limit.

We need to save and restore the old limit before we start the program under measurement however, as the resource limits are inherited by forked processes and we can not guarantee that the program under measurement does not do stupid stuff with select()

@cvonelm cvonelm force-pushed the issue-313-bump-fd-rlimit branch from b9b725a to b30ece8 Compare February 5, 2024 09:09
src/util.cpp Outdated
@@ -366,4 +366,24 @@ std::string get_nec_thread_comm(Thread thread)
// If no '--' is found, fall back to the complete commandline as a name
return std::accumulate(args.begin(), args.end(), std::string(""));
}

struct rlimit save_rlimit_fd()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't call it save, if it only saves one time and then is used as getter. Also, which or whom rlimit is saved? Maybe inherited_rlimit_fd, initial_rlimit_fd, or parent_process_rlimit_fd instead?

@@ -55,6 +55,9 @@ namespace monitor

[[noreturn]] static void run_command(const std::vector<std::string>& command_and_args)
{
struct rlimit saved_rlimit = save_rlimit_fd();
setrlimit(RLIMIT_OFILE, &saved_rlimit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it RLIMIT_OFILE instead of RLIMIT_NOFILE?

The default soft limit for the number of open file descriptors per-process in most Linux systems is 1024. This results in crashes on most HPC systems I've used recently as even simple lo2s invocations will exceed this limit with all the per-core perf_event_open calls.

This microscopic soft limit is in place because select() only allows fd's < 1024. If you do not plan to use select() in your code, it is safe to bump the file descriptor limit from the soft limit to the hard limit.
@cvonelm cvonelm force-pushed the issue-313-bump-fd-rlimit branch from c572212 to db8dfe0 Compare February 5, 2024 10:42
@cvonelm cvonelm merged commit 32266e2 into master Feb 5, 2024
40 checks passed
@tilsche
Copy link
Member

tilsche commented Feb 5, 2024

All those calls need error handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants