-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce the number of open syscalls getting ENOENT from unexisting caches in sysfs #434
Comments
Perhaps using |
The easiest solution would be to reduce the number of iterations\and use the |
Instead of trying to open all "index%u" from 0 to 9. tests/hwloc/linux/gather must be update to ignore obj ID/gp_index because readdir() doesn't always get the caches in the expected order when loading from the sysfs dump. Refs open-mpi#434 Signed-off-by: Brice Goglin <[email protected]>
I did a quick test. We actually get more syscalls using opendir. Instead of having one useless openat() for each of the 6 non-existing caches (those failing openat are likely very cheap), opendir+readdirs+closedir uses 7 syscalls (openat+newfstatat+2fnctl+2getdents+close). That's for each core. If you want to play with it, the code is in PR #629. There will be a tarball at https://ci.inria.fr/hwloc/job/basic/job/PR-629/ soon. |
Instead of trying to open all "index%u" from 0 to 9. Refs open-mpi#434 Signed-off-by: Brice Goglin <[email protected]>
We currently try to open /sys/devices/system/cpu/cpuX/cache/indexY/shared_cpu_map for every PU and Y between 0 and 9. That's usually 6 useless syscalls per PU since most CPUs have 4 caches per PU. That's almost 1ms per PU.
Linux numbers caches from 0 to N-1 internally but some of them might get skip when added to sysfs for some reasons (see cache_add_dev() in drivers/base/cacheinfo.c). That means we have no easy way to break the loop when index4 is missing as usual.
Doing stat on the parent directory might be a good way to find out the total number of indexY subdirectories. That would mean one syscall to avoid 6 syscalls. However btrfs (for fsroot regression tests) has some issues with nlink being wrong (see comments in topology-linux.c).
Reducing to 5 instead of 9 is likely a good start for now. Most current CPUs have 4 caches in sysfs. There are some L4 out there but I have never seen those in sysfs since they are rather outside of the CPUs. Itanium had 5 caches (L2i and L2d) but it's dead. So 5 works fine and gives us one free slot in case newer CPUs bring an additional level.
The text was updated successfully, but these errors were encountered: