Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stopping psd on shutdown could get timeout and killed #352

Open
felixonmars opened this issue May 3, 2023 · 4 comments
Open

Stopping psd on shutdown could get timeout and killed #352

felixonmars opened this issue May 3, 2023 · 4 comments

Comments

@felixonmars
Copy link
Contributor

May 03 11:24:54 flygon.felixc.at systemd[1820]: Stopping Profile-sync-daemon...
May 03 11:26:24 flygon.felixc.at systemd[1820]: psd.service: Stopping timed out. Terminating.
May 03 11:26:24 flygon.felixc.at systemd[1820]: psd.service: Control process exited, code=killed, status=15/TERM
May 03 11:26:24 flygon.felixc.at profile-sync-daemon[973086]: rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(713)>
May 03 11:26:25 flygon.felixc.at profile-sync-daemon[973087]: rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(713)>
May 03 11:26:25 flygon.felixc.at systemd[1820]: psd.service: Failed with result 'timeout'.
May 03 11:26:25 flygon.felixc.at systemd[1820]: Stopped Profile-sync-daemon.

This leaves PSD in ungrateful state. Probably TimeoutStopSec should be set for psd.service with a higher value.

@graysky2
Copy link
Owner

graysky2 commented May 3, 2023

Have you tried experimenting with a TimeoutStopSec=60 or something similar? Does it solve the issue?

@felixonmars
Copy link
Contributor Author

Not yet because I have more issues with PSD (firefox localstorage completely broken with PSD on, etc), unfortunately :(

@gma
Copy link

gma commented Dec 5, 2023

I've got a related problem, but different log output. I've got psd managing two Firefox profiles, Chromium, and Brave. I only really use Brave, so sync'ing its data takes significantly longer.

I usually experience data loss (i.e. Brave forgets open tabs and windows, etc) if I reboot the system without quitting Brave manually first, waiting a few seconds, then logging out of my desktop session.

I just had to login remotely and reboot the machine after my desktop environment went into an infinite loop.

Here are a few pertinent lines from my journalctl output:

Dec 05 12:32:02 dilaton sudo[2886391]:   graham : TTY=pts/1 ; PWD=/home/graham ; USER=root ; COMMAND=/usr/sbin/reboot

Dec 05 12:32:04 dilaton profile-sync-daemon[2886783]: firefox resync successful
Dec 05 12:32:05 dilaton profile-sync-daemon[2886783]: chromium resync successful
Dec 05 12:32:06 dilaton profile-sync-daemon[2886783]: brave resync successful
Dec 05 12:32:07 dilaton sudo[2887065]:   graham : PWD=/home/graham ; USER=root ; COMMAND=/usr/bin/psd-overlay-helper -d /run/u>
Dec 05 12:32:07 dilaton sudo[2887065]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Dec 05 12:32:07 dilaton systemd[1]: run-user-1000-graham\x2dfirefox\x2d45e08wmw.default.mount: Deactivated successfully.
Dec 05 12:32:07 dilaton systemd[1]: Unmounted /run/user/1000/graham-firefox-45e08wmw.default.
Dec 05 12:32:07 dilaton sudo[2887065]: pam_unix(sudo:session): session closed for user root
Dec 05 12:32:07 dilaton profile-sync-daemon[2886783]: firefox unsync successful
Dec 05 12:32:07 dilaton sudo[2887084]:   graham : PWD=/home/graham ; USER=root ; COMMAND=/usr/bin/psd-overlay-helper -d /run/u>
Dec 05 12:32:07 dilaton sudo[2887084]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Dec 05 12:32:07 dilaton systemd[1]: run-user-1000-graham\x2dfirefox\x2dbtxrhgoe.default\x2drelease.mount: Deactivated successf>
Dec 05 12:32:07 dilaton systemd[1]: Unmounted /run/user/1000/graham-firefox-btxrhgoe.default-release.
Dec 05 12:32:07 dilaton sudo[2887084]: pam_unix(sudo:session): session closed for user root
Dec 05 12:32:08 dilaton profile-sync-daemon[2887087]: rm: cannot remove '/run/user/1000/.graham-firefox-btxrhgoe.default-relea>
Dec 05 12:32:08 dilaton profile-sync-daemon[2886783]: firefox unsync successful
Dec 05 12:32:08 dilaton sudo[2887102]:   graham : PWD=/home/graham ; USER=root ; COMMAND=/usr/bin/psd-overlay-helper -d /run/u>
Dec 05 12:32:08 dilaton sudo[2887102]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Dec 05 12:32:08 dilaton systemd[1]: run-user-1000-graham\x2dchromium.mount: Deactivated successfully.
Dec 05 12:32:08 dilaton systemd[1]: Unmounted /run/user/1000/graham-chromium.
Dec 05 12:32:08 dilaton sudo[2887102]: pam_unix(sudo:session): session closed for user root
Dec 05 12:32:08 dilaton profile-sync-daemon[2886783]: chromium unsync successful

Dec 05 12:32:09 dilaton systemd[1]: [email protected]: State 'stop-sigterm' timed out. Killing.
Dec 05 12:32:09 dilaton systemd[1]: [email protected]: Killing process 2886783 (profile-sync-da) with signal SIGKILL.
Dec 05 12:32:09 dilaton systemd[1]: [email protected]: Main process exited, code=killed, status=9/KILL
Dec 05 12:32:09 dilaton systemd[1]: [email protected]: Failed with result 'timeout'.
Dec 05 12:32:09 dilaton systemd[1]: Stopped User Manager for UID 1000.

Dec 05 12:32:09 dilaton systemd[1]: Stopped User Manager for UID 1000.
Dec 05 12:32:09 dilaton systemd[1]: [email protected]: Consumed 1w 1d 2h 35min 10.135s CPU time.
Dec 05 12:32:09 dilaton systemd[1]: Stopping User Runtime Directory /run/user/1000...
Dec 05 12:32:09 dilaton systemd[1]: run-user-1000-graham\x2dbrave.mount: Deactivated successfully.
Dec 05 12:32:09 dilaton systemd[1]: Unmounted /run/user/1000/graham-brave.
Dec 05 12:32:09 dilaton systemd[1]: run-user-1000.mount: Deactivated successfully.
Dec 05 12:32:09 dilaton systemd[1]: Unmounted /run/user/1000.
Dec 05 12:32:09 dilaton systemd[1]: [email protected]: Deactivated successfully.
Dec 05 12:32:09 dilaton systemd[1]: Stopped User Runtime Directory /run/user/1000.

I trimmed out a bunch of unrelated stuff (e.g. a few other services that hung around a bit too long got SIGKILL'd too), and added some blank lines around snippets of logs that belong together.

Notice that:

  1. All browsers except Brave have a "unsync successful" log message, and
  2. "profile-sync-da" (process 2886783) got killed after the attempt to SIGTERM all my user space stuff timed out (the penultimate block of log messages).

What's the best way to fix this? I looked into TimeoutStopSec that was mentioned above, and see that it's not configured at all my in psd.service file.

The value for DefaultTimeoutStopSec in my /etc/systemd/system.conf and user.conf files is 90s.

My reading of the systemd.service(5) man page suggests that my system should be waiting up to 90 seconds during shutdown for things to shutdown cleanly. So TimeoutStopSec for the psd service doesn't seem related.

Perhaps this is the culprit?

cat /usr/lib/systemd/system/[email protected]/timeout.conf 
# Avoid long hangs during shutdown if user services fail/hang due to X.org
# going away too early
[Service]
TimeoutStopSec=5

I strongly suspect that the solution (for me at least) will be to override TimeoutStopSec=5 for the [email protected] service. For those unfamiliar, 1000 is my user id, and systemd is starting the [email protected] as my user. That's why it's showing up in the log output as user@1000, but the timeout.conf file I just cat'd is in a directory that doesn't contain "1000" in the name.

Once I've worked out a clean way to "fix" this I might pop back with the relevant command...

For anyone wondering, I'm on Pop!_OS 22.04. To all extents and purposes, I think that's equivalent to me being on Ubuntu in terms of how we can expect this area of the system to behave.

@gma
Copy link

gma commented Dec 5, 2023

I've worked around this on my Pop!_OS (i.e. Ubuntu-like) machines like this:

sudo mkdir /etc/systemd/system/[email protected]
sudo tee -a /etc/systemd/system/[email protected]/timeout.conf <<EOF
[Service]
TimeoutStopSec=30s
EOF

To get the new setting picked up run sudo systemctl daemon-reload.

To check that it's been detected correctly, use systemctl show:

systemctl show user@$(id -u) | grep TimeoutStop
TimeoutStopUSec=30s
TimeoutStopFailureMode=terminate

You should see the new value from your config file is now assigned to TimeoutStopUSec. Yes, it says USec not Sec on the end; that's just how systemd stores these values internally.

From what I've learned this evening, I think you could argue that this is a bug in the operating system. I don't think the OS has any right to set the timeout before it sends SIGKILL to user-level services to just 5 seconds, when the upstream (systemd) default is 2 minutes.

I assume it's present in Ubuntu as well as Pop!_OS, but I don't know. But as such, I wonder if it's worth documenting this problem fairly prominently in the README?

I've been putting up with intermittently (and unpredictably) losing all my browser state (with my open tabs reverting to whatever I was reading a couple of months ago) for at least two years now, before deciding to dig in and find out what was going wrong. I can imagine most people might just blame psd and stop using it (I was very close to doing that myself this evening).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants