-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix earlyoom killing processes too early when ZFS is in use #191
Conversation
The Codacity report shows 2 issues:
Your choice, should I move it into the if block, or do you prefer it to remain at the top to indicate the memory use of the function?
Not sure if that is addressable, see e.g. from https://stackoverflow.com/questions/18368712/since-we-have-snprintf-why-we-dont-have-a-snscanf/18368925#18368925
|
https://github.com/openzfs/zfs/wiki/ZFS-on-Linux-Module-Parameters#zfs_arc_min Maybe we should subtract |
That sounds like a good idea. |
@@ -56,17 +57,25 @@ static long long available_guesstimate(const char* buf) | |||
return MemFree + Cached + Buffers - Shmem; | |||
} | |||
|
|||
/* Parse /proc/meminfo. | |||
/* Parse /proc/meminfo and other related files: | |||
* * ZFS's /proc/spl/kstat/zfs/arcstats, if it exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a separate function (parse_arcstats
?) please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
meminfo.c
Outdated
size_t len = fread(buf, 1, sizeof(buf) - 1, fd); | ||
if (len == 0) { | ||
fatal(102, "could not read /proc/meminfo: 0 bytes returned\n"); | ||
// Loop to handle short reads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove retry loop, glibc handles this internally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted as per #189 (comment)
meminfo.c
Outdated
* This function either returns valid data or kills the process | ||
* with a fatal error. | ||
*/ | ||
meminfo_t parse_meminfo() | ||
{ | ||
static FILE* fd; | ||
static FILE* arcstats_fd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should keep this fd always-open. zfs is a kernel module that can be unloaded, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, I've changed it and added a comment.
meminfo.c
Outdated
// hits 4 373259415 | ||
// ... | ||
// size 4 1721339808 | ||
// We scan for the "size" line and parse the second number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is possible to use
get_entry("size 4", arcstats_buf)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether the number of spaces here is guaranteed to be constant. It seems to be "suspiciously enough" to fit the length of all parameters. So far I have not found a specification that discussed this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm ok, I agree, too risky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment about that.
64feba5
to
8e07881
Compare
I've addressed the feedback, and implemented subtraction of |
8e07881
to
998cc4e
Compare
|
||
char* hit = strstr(buf, search_term); | ||
if (hit == NULL) { | ||
warn("parse_meminfo: arcstats does not contain size field\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warn("get_arcstats_entry: arcstats does not contain field %s", search_term);
// ` ` skips spaces, `%*u` ignores the `type` field. | ||
int matches = sscanf(hit + strlen(search_term), " %*u %lld", &result); | ||
if (matches < 1) { | ||
warn("parse_meminfo: unexpected /proc/spl/kstat/zfs/arcstats contents in size line\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warn("get_arcstats_entry: ..... %s", search_term);
if (fd == NULL && errno != ENOENT) { | ||
fatal(106, "could not open /proc/spl/kstat/zfs/arcstats: %s\n", strerror(errno)); | ||
} | ||
if (fd != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invert the if condition to get rid of one level of indentation:
if (fd == NULL) {
return 0;
}
107: Could read /proc/spl/kstat/zfs/arcstats | ||
|
||
108: Could not parse /proc/spl/kstat/zfs/arcstats contents | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fatal error is too much here. Keep in mind that this depends on the behavoir of a debug file of an out-of-tree kernel module
FILE* fd = NULL; | ||
fd = fopen("/proc/spl/kstat/zfs/arcstats", "r"); | ||
if (fd == NULL && errno != ENOENT) { | ||
fatal(106, "could not open /proc/spl/kstat/zfs/arcstats: %s\n", strerror(errno)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning is ok, but no fatal error
|
||
size_t len = fread(arcstats_buf, 1, sizeof(arcstats_buf) - 1, fd); | ||
if (ferror(fd)) { | ||
fatal(107, "could not read /proc/spl/kstat/zfs/arcstats: %s\n", strerror(errno)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning is ok, but no fatal error
fatal(107, "could not read /proc/spl/kstat/zfs/arcstats: %s\n", strerror(errno)); | ||
} | ||
if (len == 0) { | ||
fatal(108, "could not read /proc/spl/kstat/zfs/arcstats: 0 bytes returned\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning is ok, but no fatal error
// * `arc_meta_min`: https://github.com/openzfs/zfs/wiki/ZFS-on-Linux-Module-Parameters#zfs_arc_meta_min | ||
long long arcstats_c_min_bytes = get_arcstats_entry("\nc_min ", arcstats_buf); | ||
if (arcstats_c_min_bytes > 0) { | ||
zfs_available_bytes -= arcstats_c_min_bytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value can become negative! You have to check
zfs_available_bytes > arcstats_c_min_bytes
first. See https://gist.github.com/rfjakob/0a674048b9efd4b985a70107d64f1110 for example when it would get negative
} | ||
long long arcstats_arc_meta_min_bytes = get_arcstats_entry("\narc_meta_min ", arcstats_buf); | ||
if (arcstats_arc_meta_min_bytes > 0) { | ||
zfs_available_bytes -= arcstats_arc_meta_min_bytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure that the value does not become negative
Maybe add I saw My suggestion:
|
Could you post the |
The ZFS ARC cache is memory-reclaimable, like the Linux buffer cache. However, in contrast to the buffer cache, it currently does not count to `MemAvailable` (see openzfs/zfs#10255), leading earlyoom to believe we are out of memory when we still have a lot of memory available (in practice, many GBs). Thus, until now, earlyoom tended to kill processes on ZFS systems even though there was no memory pressure. This commit fixes it by adding the `size` field of `/proc/spl/kstat/zfs/arcstats` to `MemAvailable`. The effect can be checked easily on ZFS systems: Before this commit, dropping the ARC via (command from [1]) echo 3 | sudo tee /proc/sys/vm/drop_caches would result in an increase of free memory in earlyoom's output; with this fix, it stays equal. [1]: https://serverfault.com/a/857386/128321
On the contrary, the use of
In this case (values in MiB): MemTotal: 3936 Monitor: https://github.com/hakavlad/nohang-extra/blob/master/zfs/m04 |
I made a spreadsheet of how things look before and after https://docs.google.com/spreadsheets/d/1ZVtcdkoZqsOAAEK20wd_h9mOvrNOYN-AwmsM7Ma6iMk/edit?usp=sharing Raw source data: https://gist.github.com/rfjakob/272691aa61798c134410a69a323cc5d7 |
This bug should be mentioned in the earlyoom documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No fatal errors please. Keep in mind that this depends on the behavoir of a debug file of an out-of-tree kernel module
Closing after 3 months of inactivity after review, please reopen when ready |
I don't currently have time to finish this, would appreciate if somebody else could take it over. |
Kinda bummed this PR is stuck after so much work has already been put into it 😢 The one thing I can add is that I use this bash command:
in my polybar setup for months now to indicate the usage of the ARC and it has been working flawless since the beginning. The command I use to display the total ram usage (excluding ARC) is this:
If these commands should fail, I would notice immediately. Of course that really depends on the setup a user might have, but I would argue its better than having my IDE killed at 55% ram usage or freezing the entire system because of extremely low memory... I am really hoping someone with more skill than me is willing to join this ❤️ |
@markusressel you could use nohang [1] as temporary solution. It tries to respect [2] ARC. [1] https://github.com/hakavlad/nohang |
The ZFS ARC cache is memory-reclaimable, like the Linux buffer cache. However, in contrast to the buffer cache, it currently does not count to
MemAvailable
(see openzfs/zfs#10255), leading earlyoom to believe we are out of memory when we still have a lot of memory available (in practice, many GBs).Thus, until now, earlyoom tended to kill processes on ZFS systems even though there was no memory pressure.
This commit fixes it by adding the
size
field of/proc/spl/kstat/zfs/arcstats
toMemAvailable
.The effect can be checked easily on ZFS systems:
Before this commit, dropping the ARC via (command from 1)
would result in an increase of free memory in earlyoom's output; with this fix, it stays equal.