-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Systemd: Restart on OOM #3611
base: main
Are you sure you want to change the base?
Systemd: Restart on OOM #3611
Conversation
|
OK, I have no familiarity with the tests here. Feel free to take it over, create a new PR, or give me some guidance to complete it. |
Thanks for contribution @Dreamsorcerer. For testing reference which systemd version are you using? There's no real systemd mtr tests at the moment but we do have basics ones elsewhere (but we'll take care of those). docs for ref is: https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#Restart= An OOM is a sigkill, which should be a unclean signal, so I was assuming it would restart in the current state, but I assumed you've tested and the behaviour is different? So At one point we where relying on systemd to prevent dual processes running (but now there's mechanisms in MariaDB (MDEV-31568 and systemd v242 systemd/systemd#11457). Can you show the systemd around the unit logs? It might possible my systemd change above has caused systemd to not to restart as the OOM killed mariadbd process is in a defunct/zombie state. If this is the case, changing this setting won't help. I'll need to construct a local test case to test this properly. Aside MDEV-34753, now that its fixed in yesterdays release, should avoid some OOM conditions if there is only transient memory pressure. |
I'm using Debian stable, whatever packages are with those. I think from the logs, it was the watchdog that OOM killed the process, and that's why on-abnormal seems to work in restarting the service. From journactl:
|
Automatic restarting (i.e. the last log) does not happen with on-abort. From the logs, it's clear that systemd knows this has been oom-killed, rather than just knowing the exit code. |
Description
After an OOM kill, the process should be restarted by systemd. Prior to this change, that did not happen.
The comment in the file says it doesn't use on-failure in case of config errors, which I assume is caused by an unclean exit code. on-abnormal is the same as on-failure except for unclean exit code.
Release Notes
Changed systemd Restart mode to ensure that the server is restarted in abnormal situations such as OOM.