Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use logind instead of utmp because of Y2038 #2300

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

aplanas
Copy link

@aplanas aplanas commented Aug 22, 2023

Summary

  • OS: GNU/Linux with systemd
  • Bug fix: no
  • Type: core
  • Fixes:

Description

Bi-arch systems line x86-64 present the Y2038 problem, where an overflow can be produced because some glibc compatibility decissions (see https://github.com/thkukuk/utmpx/blob/main/Y2038.md for more information)

This patch uses logind from systemd instead of utmp on Linux systems, if the systemd version is support the new API (>= 254).

@aplanas
Copy link
Author

aplanas commented Aug 22, 2023

This approach is using macros to share the code, but maybe is cleaner if the function is split in two: one for utmp and other for logind. Is there any preference?

@aplanas aplanas force-pushed the utmp branch 5 times, most recently from 50d5199 to 89b2d18 Compare August 23, 2023 09:26
@aplanas aplanas marked this pull request as ready for review August 23, 2023 09:31
@giampaolo
Copy link
Owner

giampaolo commented Aug 23, 2023

Mmmm... what a tricky problem. In order to reproduce the problem I've tried setting the system time to year 2050, then calling psutil.users(), but it didn't work. Do you have a way to reproduce it?

~/svn/psutil {master}$ sudo date -s "2050-11-22"
[sudo] password for giampaolo: 
mar 22 nov 2050, 00:00:00, CST
~/svn/psutil {master}$ date
mar 22 nov 2050, 00:00:01, CST
~/svn/psutil {master}$ python3 -c "import psutil, datetime; print(datetime.datetime.fromtimestamp(psutil.users()[0].started))"
2023-08-22 22:12:56

How about the other platforms? Are they affected as well?

@aplanas
Copy link
Author

aplanas commented Aug 23, 2023

Mmmm... what a tricky problem. In order to reproduce the problem I've tried setting the system time to year 2050, then calling psutil.users(), but it didn't work. Do you have a way to reproduce it?

Uhm ... try to set the date and reboot. IIUC the issue is when the ut_tv struct gets created. There is a bit more info here: https://www.thkukuk.de/blog/Y2038_glibc_utmp_64bit/

As you can see the ut_tv field tv_sec is then declared as int32_t, so it is signed. This can store 2**31 seconds, so around 68 years (hence 1970 + 68 = 2038). The issue should be present when it is created, not when it is compared then.

How about the other platforms? Are they affected as well?

It is a glibc issue, so that depends if they are using the same struct with int32_t or using the struct timeval ut_tv; one. I do not know : (

* On Linux we use `setutent(), getutent(), endutent()` (so different APIs): https://github.com/giampaolo/psutil/blob/77e5b7445748d30d22c0e3b2e651414da96a88b4/psutil/_psutil_linux.c#L377

That is the one that I fixed here! : )

For the rest of the platforms, if they are affected (not sure about that), this solution is not applicable as it depends on systemd.

@thkukuk
Copy link

thkukuk commented Aug 23, 2023

Some more documentation why this is necessary:
https://www.thkukuk.de/blog/Y2038_glibc_utmp_64bit/

So for linux using systemd, coreutils. procps-ng, shadow, util-linux and more already implemented support to use logind instead of utmp or accepted PRs for this.
glibc developers don't plan to fix the problem but instead they want to remove support for utmp/wtmp.
The situation for other OS and libraries may be different, musl libc even did never support utmp, others are 64bit clean and thus Y2038 safe.

@giampaolo
Copy link
Owner

giampaolo commented Aug 23, 2023

If binaries / wheels are produced with systemd support, but the Linux system installing the wheel doesn't have systemd, are we sure that this patch won't cause a "sd_get_sessions symbol not found" error on psutil import?

I remember we had a similar problem with prlimit() syscall on Linux. We used #ifdefs in the C code to check whether prlimit() was available at compile time, but on some old systems (CentOS) it wasn't, so the resulting .whl could be installed but crashed with "prlimit symbol not found" on psutil import (see code). For this reason I rewrote it by using ctypes (the only part of psutil using ctypes), so that the check for prlimit() availability is done at runtime instead of compile time: #1879.

I'm not sure if the same problem may arise here as well. Are there Linux distros without systemd support and / or where sd_get_sessions() is not available? Is this even a problem?

@aplanas
Copy link
Author

aplanas commented Aug 23, 2023

If binaries / wheels are produced with systemd support, but the Linux system installing the wheel doesn't have systemd, are we sure that this patch won't cause a "sd_get_sessions symbol not found" error on psutil import?

It will produce this error indeed. One solution can be what you describes:

  • Remove the SYSTEMD_LINUX macro and the systemd version detection from setup.py
  • Rename psutil_users to psutil_users_utmp
  • Creates a ctypes code for psutil_users_logind
  • users now will try first the ctypes version first and if fails, fallback to the utmp version

Would this a preferred approach?

@giampaolo
Copy link
Owner

giampaolo commented Aug 23, 2023

Yes, ctypes sounds like the safest bet at the time of writing, but it has a cost: slower speed + extra and convoluted code to maintain (I am also thinking about the dual unit tests).

I am skeptical how much of a problem this is in reality. Who sets the system date to >= 2035? If done, it looks like a user error. After setting the system date to 2050 all my browser tabs automatically logged out and I could not establish new HTTPS connections, so basically a system with a date that far in the future is not usable in general.

@aplanas
Copy link
Author

aplanas commented Aug 23, 2023

I am skeptical how much of a problem this is in reality. Who sets the system date to >= 2035?

It will happen automatically in <12 15 years. Systems with long time support created now will suffer of this issue eventually.

I mean, the issue to fix is not that an user will set the date to 2050 by accident or because a specific use case, it is that a system that will be build in the next 2 5 years and needs to be supported for around 10 year more will stop working because of this bug.

Edit: math is hard

@aplanas
Copy link
Author

aplanas commented Aug 25, 2023

On a second thought I think that dlopen() can be a better solution here. I will try this path.

@aplanas aplanas force-pushed the utmp branch 2 times, most recently from af91ac8 to 20fe497 Compare August 25, 2023 12:47
@aplanas
Copy link
Author

aplanas commented Aug 25, 2023

@giampaolo I updated the PR using dlopen(). Now there is no link to libsystemd, so a Linux system without the library will no produce the error of the missing symbol. Those are now dynamically loaded.

@aplanas
Copy link
Author

aplanas commented Sep 8, 2023

@giampaolo ping?

@aplanas
Copy link
Author

aplanas commented Oct 19, 2023

@giampaolo did you have a chance to evaluate the new approach here?

Meanwhile we added in openSUSE the patch in the package

@giampaolo
Copy link
Owner

giampaolo commented Oct 19, 2023

Hi there.

@giampaolo I updated the PR using dlopen(). Now there is no link to libsystemd, so a Linux system without the library will no produce the error of the missing symbol. Those are now dynamically loaded.

I'm afraid this is the case only if you install psutil from sources. In the C code you do:

#ifdef SYSTEMD_LINUX
    #include <dlfcn.h>
#endif

...but SYSTEMD_LINUX is set only when setup.py is invoked (aka installation from sources). The problem we have is with the installation of wheels / binaries. If the wheel was compiled with systemd support, but the system which installs the wheel does not have systemd support, then the user will get "symbol not found". This is the annoying problem.

You should somehow avoid the #ifdef SYSTEMD_LINUX check.

setup.py Outdated Show resolved Hide resolved
@aplanas
Copy link
Author

aplanas commented Oct 20, 2023

You should somehow avoid the #ifdef SYSTEMD_LINUX check.

Yes. Dropped.

@aplanas
Copy link
Author

aplanas commented Oct 20, 2023

...but SYSTEMD_LINUX is set only when setup.py is invoked (aka installation from sources). The problem we have is with the installation of wheels / binaries. If the wheel was compiled with systemd support, but the system which installs the wheel does not have systemd support, then the user will get "symbol not found". This is the annoying problem.

But now that I recall ... this part is not true.

If the library is not present in the system, the load_library function will return NULL and the code will fallback to the current code that uses utmp.

The new macro dlsym_check is checking if the symbol is present in the library, and if one of the required symbols is not available, again it will return NULL and use the same fallback.

The SYSTEMD_LINUX condition is used only to remove this part in case that we want a source code that never checks for systemd.

So, the confusion matrix is like this (C = Compile time; R = At runtime; S = Systemd is present; N = No systemd present)

  • CN - RN: At compile time the systemd section gets disabled and is not present in the runtime system. The code will use the utmp path (the only present in the binary)
  • CN - RS: At compile time the systemd section gets disabled, but sytemd is present at runtime. There is no systemd code path, so will still use the utmp
  • CS - RN: At compile time the systemd section gets enabled, so there are two code path: one for systemd and another for utmp. At runtime the systemd library is not present (or some symbols are missing). Instead of showing the missing symbol error, in will fallback to the utmp code path.
  • CS - RS: Systemd is present in both scenarios, and the new systemd code path will be used.

You should somehow avoid the #ifdef SYSTEMD_LINUX check.

Yes. Dropped.

I re-added it for now to have the same code-base for discussion. We can drop all the systemd checks and make the systemd code path always present (with the fallback to utmp) but the reason will not be for avoiding the "symbol not found" error, as it is already not present.

@giampaolo
Copy link
Owner

Me: You should somehow avoid the #ifdef SYSTEMD_LINUX check.
You: Yes. Dropped.
You: I re-added it for now to have the same code-base for discussion.

I'm confused now. :) Can you commit the "final" code?

@giampaolo
Copy link
Owner

I'm confused now. :) Can you commit the "final" code?

Also can you please avoid to squash commits and force push? It's more clear to see the individual commits and how they evolve.

@aplanas
Copy link
Author

aplanas commented Oct 23, 2023

(you should update your branch)

Done, I also rebased on top of the aplanas-utmp branch, that contains your code

@giampaolo
Copy link
Owner

From Github CI https://github.com/giampaolo/psutil/actions/runs/6614641177/job/17968190988?pr=2300:

2023-10-23T15:42:36.3752124Z psutil-debug [psutil/arch/linux/users.c:44]> missing 'sd_session_get_leader' fun
2023-10-23T15:42:36.5224487Z psutil-debug [psutil/arch/linux/users.c:44]> missing 'sd_session_get_leader' fun

This means that the new code is currently not tested. I also cannot test it locally on Ubuntu 22.04 for the same reason (missing 'sd_session_get_leader'). What distro are you on? Can you run make test and make test-memleaks?

@aplanas
Copy link
Author

aplanas commented Oct 23, 2023

What distro are you on?

openSUSE Tumbleweed, but I guess Arch will also have the systemd v254 there.

Can you run make test

That fails a lot. Some of the fails are assumptions about partitions, mount points (I am using btrfs with subvolumes, etc). But from the one that is related with this PR I see one:

======================================================================
FAIL: psutil.tests.test_posix.TestSystemAPIs.test_users (psutil=[suser(name='root', terminal=None, host='192.168.122.1', started=1698081882.810051, pid=1209)], who='root     pts/0        Oct 23 19:24   .          1227 (192.168.122.1)')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/psutil/psutil/tests/test_posix.py", line 345, in test_users
    self.assertEqual(u.terminal, terminals[idx])
AssertionError: None != 'pts/0'

and make test-memleaks?

....
Ran 101 tests in 1.062s

OK (skipped=9)
SUCCESS

@giampaolo
Copy link
Owner

giampaolo commented Oct 23, 2023

I tried playing with it a bit more and other than sd_session_get_leader I'm also missing sd_session_get_start_time and sd_session_get_username. I'm afraid it's too premature to add this right now. Most of these systemd APIs seem they were added very recently.

@aplanas
Copy link
Author

aplanas commented Oct 23, 2023

Most of these systemd APIs seem they were added very recently.

All those are already present in v254

@giampaolo
Copy link
Owner

giampaolo commented Oct 23, 2023

Let's not rush this. Considering these new APIs are literally 5 months old, it seems unlikely to me that we may cause issues to long-term support distros if we don't do this right now.

We don't have to wait years, but the minimum requirement here should be that the default Github Linux image has systemd >= v254, so that both utmp and systemd code paths are covered by unit tests. We can leave this PR open 'till then.

In the meantime, another couple of suggestions I can give for this PR:

  • fix the TTY / terminal failure
  • cache the libsystemd.so.0 handle in a global var so that it's loaded only on first call

@giampaolo
Copy link
Owner

I now realize though that there is some logic in the python layer that is specific to the utmp implementation:
Perhaps it makes sense to translate this logic in C, and remove it from the Python layer, so that the 2 C implementations return the same things.

FYI I did this in 0c3a1c5, so you need to update your branch again.

aplanas and others added 6 commits October 24, 2023 08:49
Bi-arch systems line x86-64 present the Y2038 problem, where an overflow
can be produced because some glibc compatibility decissions (see
https://github.com/thkukuk/utmpx/blob/main/Y2038.md for more
information)

This patch uses logind from systemd instead of utmp on Linux systems, if
the systemd version is support the new API (>= 254).

Signed-off-by: Alberto Planas <[email protected]>
Signed-off-by: Alberto Planas <[email protected]>
Signed-off-by: Alberto Planas <[email protected]>
Signed-off-by: Giampaolo Rodola <[email protected]>
Signed-off-by: Giampaolo Rodola <[email protected]>
@aplanas
Copy link
Author

aplanas commented Oct 24, 2023

* cache the `libsystemd.so.0` handle in a global var so that it's loaded only on first call

This will leak the handle, as the dlclose() cannot be called. Would this be OK?

@giampaolo
Copy link
Owner

Mmm no I don't think so. It will slightly increase memory usage the first time it's called, then it will just stay loaded in memory until the python process is terminated. The important thing is not to open a new handle on each call.

@aplanas
Copy link
Author

aplanas commented Oct 24, 2023

The important thing is not to open a new handle on each call.

Done.

@aplanas
Copy link
Author

aplanas commented Oct 24, 2023

@giampaolo I found what I think is a bug: if there is an error in the _utmp or _systemd version it will return NULL without setting the exception. The last commit is addressing this

@aplanas
Copy link
Author

aplanas commented Oct 24, 2023

The test is also fixed: the issue was not code related.

For now I will partially update the patch living in the openSUSE package, and we will revisit this PR in some time in the future then.

Thanks a lot for the detailed review and all your patience working with me.

Signed-off-by: Alberto Planas <[email protected]>
Signed-off-by: Alberto Planas <[email protected]>
@giampaolo
Copy link
Owner

giampaolo commented Oct 24, 2023

There's one last thing. From the man page https://manpages.debian.org/testing/libsystemd-dev/sd_session_get_username.3.en.html:

On success, sd_session_get_state(), sd_session_get_uid(), sd_session_get_seat(), 
sd_session_get_service(), sd_session_get_type(), sd_session_get_class(), 
sd_session_get_display(),sd_session_get_remote_user(), sd_session_get_remote_host() 
and sd_session_get_tty() return 0 or a positive integer. On failure, these calls return a 
negative errno-style error code.

Errors
Returned errors may indicate the following problems:

-ENXIO: The specified session does not exist.
-ENODATA: The given field is not specified for the described session.
-EINVAL: An input parameter was invalid (out of range, or NULL, where that is not accepted).
-ENOMEM: Memory allocation failed.

Right now on failure we get RuntimeError("cannot get user information via systemd"); without any indication of what syscall failed and why. Something like this should do it (not tested):

#include <stdlib.h>  // abs()

void set_systemd_errno(const char *syscall, int neg_errno) {
    PyObject *exc;
    int pos_errno;

    pos_errno = abs(neg_errno);
    sprintf(
        fullmsg, "%s (originated from %s)", strerror(pos_errno), syscall
    );
    exc = PyObject_CallFunction(PyExc_OSError, "(is)", pos_errno, fullmsg);
    PyErr_SetObject(PyExc_OSError, exc);
    Py_XDECREF(exc);
}


PyObject *
psutil_users_systemd(PyObject *self, PyObject *args) {
    int ret;
    
    ...
    
    ret = sd_session_get_username(session_id, &username);
    if (ret < 0) {
        set_systemd_errno("sd_session_get_username", ret);
        goto error;
    }

error:
   ...
   return NULL;  // NOT RuntimeError

@aplanas
Copy link
Author

aplanas commented Oct 24, 2023

There's one last thing.

Adapted the code and tested it a bit. Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants