Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread count in parallel nut-scanner should scale down in case of "Too many open files" #2576

Open
jimklimov opened this issue Aug 2, 2024 · 1 comment
Labels
enhancement Low-hanging fruit A proposal or issue that is good for newcomers to codebase or otherwise a quick win need testing Code looks reasonable, but the feature would better be tested against hardware or OSes nut-scanner portability We want NUT to build and run everywhere possible
Milestone

Comments

@jimklimov
Copy link
Member

As slightly noted in issue #2575 and in PRs that dealt with parallelized scans in nut-scanner, depending on platform defaults and particular OS deployment and third-party library specifics, nut-scanner may run out of file descriptors despite already trying to adapt the maximums to ulimit information where available.

As seen recently and culminating in commit 2c3a09e of PR #2539 (issue #2511), certain libnetsnmp builds can consume FD's for network sockets, local filesystem looking for per-host configuration files or MIB files, for directory scanning during those searches, etc. This is a variable beyond our control, different implementations and versions of third-party code can behave as they please. Example staged with that commit reverted and a scan of a large network range:

...
   0.321562     [D5] nutscan_ip_ranges_iter_inc: got IP from range: 172.28.67.254
   0.321597     [D4] nutscan_scan_ip_range_snmp: max_threads_scantype=0 curr_threads=1022 thread_count=1022 stwST=-1 stwS=0 pass=1
   0.321573     [D2] Entering try_SysOID_thready for 172.28.67.253
   0.321667     [D5] nutscan_ip_ranges_iter_inc: got IP from range: 172.28.67.255
   0.321703     [D4] nutscan_scan_ip_range_snmp: max_threads_scantype=0 curr_threads=1023 thread_count=1023 stwST=-1 stwS=0 pass=1
   0.321677     [D2] Entering try_SysOID_thready for 172.28.67.254
   0.321782     [D5] nutscan_ip_ranges_iter_inc: got IP from range: 172.28.68.0
   0.321817     [D4] nutscan_scan_ip_range_snmp: max_threads_scantype=0 curr_threads=1024 thread_count=1024 stwST=-1 stwS=-1 pass=0
   0.321851     [D2] nutscan_scan_ip_range_snmp: Running too many scanning threads (1024), waiting until older ones would finish
   0.321796     [D2] Entering try_SysOID_thready for 172.28.67.255
   0.475060     [D2] Failed to open SNMP session for 172.28.67.147
/var/lib/snmp/hosts/172.28.66.252.local.conf: Too many open files
/var/lib/snmp/hosts/172.28.65.208.local.conf: Too many open files

<blocks on "too many threads" anyway, but skips a number of hosts> 

What we can do is not abort the scans upon any hiccup, but checking for errno==EMFILE and delaying and retrying later (or maybe even actively decreasing the thread maximum variable of the process). We already have a way to detect Running too many scanning threads (NUM), waiting until older ones would finish so that's about detecting the issue and extending criteria.

@jimklimov jimklimov added enhancement nut-scanner portability We want NUT to build and run everywhere possible labels Aug 2, 2024
@jimklimov jimklimov added this to the 2.8.4 milestone Aug 2, 2024
@jimklimov
Copy link
Member Author

jimklimov commented Aug 2, 2024

Experimented with a change to log errno - and yes: at nut-scanner level, at least for this use-case, we do know the cause of the problem:

diff --git a/tools/nut-scanner/nut-scanner.c b/tools/nut-scanner/nut-scanner.c
index a3d785f5a..711dc3307 100644
--- a/tools/nut-scanner/nut-scanner.c
+++ b/tools/nut-scanner/nut-scanner.c
@@ -84,7 +84,7 @@
  * Another +1 is for NetSNMP which wants to open MIB files,
  * potential per-host configuration files, etc.
  */
-#   define RESERVE_FD_COUNT 4
+#   define RESERVE_FD_COUNT 0
 #  endif /* HAVE_SYS_RESOURCE_H */
 # endif  /* HAVE_PTHREAD_TRYJOIN || HAVE_SEMAPHORE_UNNAMED || HAVE_SEMAPHORE_NAMED */
 #endif   /* HAVE_PTHREAD */
diff --git a/tools/nut-scanner/scan_snmp.c b/tools/nut-scanner/scan_snmp.c
index a8c3b42cb..fc3826454 100644
--- a/tools/nut-scanner/scan_snmp.c
+++ b/tools/nut-scanner/scan_snmp.c
@@ -969,7 +969,7 @@ static void * try_SysOID_thready(void * arg)
        /* Open the session */
        handle = wrap_nut_snmp_sess_open(&snmp_sess); /* establish the session */
        if (handle == NULL) {
-               upsdebugx(2,
+               upsdebug_with_errno(2,
                        "Failed to open SNMP session for %s",
                        sec->peername);
                goto try_SysOID_free;

...leads to:

...
   0.296940     [D2] Entering try_SysOID_thready for 172.28.67.252
   0.297073     [D5] nutscan_ip_ranges_iter_inc: got IP from range: 172.28.67.254
   0.297115     [D4] nutscan_scan_ip_range_snmp: max_threads_scantype=0 curr_threads=1022 thread_count=1022 stwST=-1 stwS=0 pass=1
   0.297190     [D5] nutscan_ip_ranges_iter_inc: got IP from range: 172.28.67.255
   0.297235     [D4] nutscan_scan_ip_range_snmp: max_threads_scantype=0 curr_threads=1023 thread_count=1023 stwST=-1 stwS=0 pass=1
   0.297083     [D2] Entering try_SysOID_thready for 172.28.67.253
   0.297190     [D2] Entering try_SysOID_thready for 172.28.67.254
   0.297351     [D2] Entering try_SysOID_thready for 172.28.67.255
   0.297359     [D5] nutscan_ip_ranges_iter_inc: got IP from range: 172.28.68.0
   0.297396     [D4] nutscan_scan_ip_range_snmp: max_threads_scantype=0 curr_threads=1024 thread_count=1024 stwST=-1 stwS=-1 pass=0
   0.297413     [D2] nutscan_scan_ip_range_snmp: Running too many scanning threads (1024), waiting until older ones would finish
/var/lib/snmp/hosts/172.28.67.165.local.conf: Too many open files
   0.378710     [D2] Failed to open SNMP session for 172.28.65.167: Too many open files
   0.378813     [D2] Failed to open SNMP session for 172.28.65.113: Too many open files
   0.378755     [D2] Failed to open SNMP session for 172.28.67.165: Too many open files
^C

@jimklimov jimklimov added need testing Code looks reasonable, but the feature would better be tested against hardware or OSes Low-hanging fruit A proposal or issue that is good for newcomers to codebase or otherwise a quick win labels Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Low-hanging fruit A proposal or issue that is good for newcomers to codebase or otherwise a quick win need testing Code looks reasonable, but the feature would better be tested against hardware or OSes nut-scanner portability We want NUT to build and run everywhere possible
Projects
None yet
Development

No branches or pull requests

1 participant