Label for Syscall Tables #39

fatherlinux · 2015-09-02T15:21:44Z

What about having a label for syscall (system call) tables. Imagine that I want to verify that the user space packaged up in the container image can run on a given container host?

For example, what about verifying that a Fedora 22 container image can or can't run on a Fedora 18 container host?

rhatdan · 2015-09-02T15:34:54Z

I am not sure how we would do this, I would think this would be more about requiring a particular version of the linux kernel then just syscalls. Other examples might be particular layouts of /proc or /sys.

So something like

minimal_kernel_version

fatherlinux · 2015-09-02T15:37:51Z

Admittedly, I am not quite sure myself. My gut feeling is always run Fedora 18 containers on Fedora 18 container hosts, but I know people are mixing and matching even though it's a bad idea. This proposal was partially about exposing the problem and getting some smart people to think about it :-)

rhatdan · 2015-09-02T15:43:30Z

Well obviously this would not be a supported environment, but consider RHEL7 in a few years running newer RHEL8 containers or Fedora 30 images. These images might require features of the host kernel that are not present in RHEL7. So having a min_kernel_version label could allow tooling like atomic command to tell user the application/image will not run because the host kernel is too old.

eparis · 2015-09-02T16:26:55Z

Specifying specific syscalls would allow you to setup seccomp filters. Admittedly no one knows what syscalls they make, so it's not actually useful/possible. But anyway, this is basically a light weight, rebuild rpm deps in atomic command with labels? I know why it's needed, but I still say yuck ;)

fatherlinux · 2015-09-02T17:38:42Z

I agree, it's not pretty. The fun part is we need to build tooling to analyze the user space (container image). Firing it up and putting it through some kind of smoke test is like finding a needle in a haystack.

I haven't had time to play with this yet, but a technique similar to this should be possible.

https://www.usenix.org/legacy/publications/library/proceedings/sec05/tech/full_papers/linn/linn_html/paper.html

Perhaps propose a syscall list for LSB and have a file in /etc is a better alternative? Maybe have it wherever you want and have a label specifying where the file is?

I am thinking out loud (quietly typing on my phone) ;-)

fatherlinux · 2015-09-02T17:39:33Z

BTW, I really like the seccmp idea. I think Linux needs a methodology for this just to autogenerate policy for seccmp.

mtrmac · 2015-09-02T17:53:41Z

(Diverging from the seccomp discussion.)

Well obviously this would not be a supported environment, but consider RHEL7 in a few years running newer RHEL8 containers or Fedora 30 images. These images might require features of the host kernel that are not present in RHEL7. So having a min_kernel_version label could allow tooling like atomic command to tell user the application/image will not run because the host kernel is too old.

Backporting features in RHEL7 also shows that min_kernel_version doesn’t work: we wouldn’t backport all features from the future kernel.

We could have the container list every feature it relies on (all syscalls, sysfs features, docker runtime features, atomic labels,…), which is pretty unfeasible and not well-defined anyway (until we decide what to backport the definition of what a given feature precisely includes is unknown).

The only approach which at least has a chance of working is to just let a container declare “this requires a RHEL 7.2 runtime”, and then have the host OS worry about compatibility (we could have RHEL 8 declare compatibility with 7.2, RHEL 8.1 declare compatibility with 7.2 and 7.4, or shudder RHEL 6.20 declare compatibility with 8.2), and it could equally be used cross-distributions (RHEL/Fedora/Ubuntu/Windows/whatever). It would be still a hard problem but at least it would be owned only by one party which has an interest in making it work, instead of distributing the responsibility between platform vendor / language stack vendor / ISV and having them argue who has violated the implied contract or what the implied contract is.

fatherlinux · 2015-09-02T18:06:27Z

Let me throw one more curve ball into it. I was chatting with a guy that came up to the booth yesterday and he had worked on an interesting problem - latency skew between minor versions of the kernel.

The application was containerized and had associated performance benchmark testing. They noticed one day that the application had failed the performance tests. They, then realized that it was because they were rolling to new kernel versions and in the newest kernel they were using there was a performance regression somewhere in the syscall table that slowed the application down enough to make it fail the performance tests :-(

Long story short, even if the syscall tables match between the userspace/kernel, there can be performance differences between version of the kernel :-( I don't think we want to embed the entire syscall table and some kind of latency benchmark into some kind of label format. That would be really yuck :-)

Really long story short, just use Fedora 30 images on Fedora 30 kernels and RHEL 7 images on RHEL7 kernels :-P

jmtd · 2015-09-25T12:16:12Z

Seems to me that apps need to do feature detection (try syscall, check errno?) and either gracefully degrade or fail early, fail loudly if they determine they can't run. Unfortunately that would mean you couldn't tell a container won't work until you try it. But I can't see any reliable way for this to be managed at a layer other than the application itself.

fatherlinux · 2015-09-25T13:43:40Z

Here is another problem. When an application starts, it doesn't make every single system call that it is going to make. The calls happen when certain code paths are run. This makes it very hard to determine at the beginning unless EVERY application checked on startup which could definitely slow things down, not to mention this would require changes to glibc that others might not want.

I am starting to envision something around this would be a test suite container to verify some "set" of system calls and all of their speeds (yes regressions happen at the syscall layer too, I have heard of this happening with speed). Then I could envision a set of tools that would analyze binaries and determine what syscalls they need. You could then match container hosts with containers.

But, I am probably just dreaming, I suspect we will just "wing it" for many years to come. Until some kind of tooling is developed, I really don't think it's sane to run containers on any kernel other than the one they were compiled on....

jmtd · 2015-09-25T13:50:27Z

Here is another problem. When an application starts, it doesn't make every single system call that it is going to make. The calls happen when certain code paths are run. This makes it very hard to determine at the beginning unless EVERY application checked on startup which could definitely slow things down, not to mention this would require changes to glibc that others might not want.

The app image is going to have a glibc version with it which corresponds to the kernel that it needs, so even if the glibc supported a given syscall, it's still no guarantee that the host kernel does. For that reason the app would have to test via a raw syscall call (the low level one where you pass in the integer corresponding to the syscall you want), rather than trying to use the syscall in question. And yes, they'd need to test all the esoteric ones that they wanted to use, and at startup, not at use.

fatherlinux · 2015-09-25T15:40:09Z

Yes, good catch. Even as I was typing it, I was second guessing myself. Before coffee. You are dead right. Glibc will think it can make the syscall and fail.

Interestingly, you led me to a small epiphany. If we could analyze what syscalls glibc can make, then it would help. Still, I think analyzing every binary in the packaged user space (aka container image) is the only deterministic way...

mtrmac · 2015-09-25T16:15:18Z

We don’t need to analyze what system calls glibc can make, that is an application-independent glibc-dependent constant; we need to analyze what glibc calls the application makes, which is application-specific (and much easier to do than searching executables for the syscall instruction and analyzing the code to extract syscall numbers: just use nm -D).

(And FWIW if you did want to know what system calls glibc needs, glibc has --enable-kernel=$version to enable/disable fallbacks to support older kernels, so it inherently knows which kernel version is required. I don’t know whether this is easy to extract from the libc.so; anyway, it is still not useful to determine true container requirements or seccomp masks: every container’s glibc contains a settimeofday(2) call.)

Of course that can only work if all applications use glibc, not syscall(3), for all syscalls (and in particular if glibc exposes all Linux syscalls immediately, which IIRC is not consistently the case).

Also, this can not distinguish between two applications which call the same syscall, but one of them is using a new flag value not supported by older kernels; we could in principle bump glibc symbol versions whenever a flag is added (arguably that would be the right thing to do, or perhaps glibc is doing it already?). Then, if we see the application use an old symbol version we know that it is compatible with an old kernel (“know” as in “could in principle build a large database which knows”), if we see a new symbol version we are not sure. In other words, “flags” arguments are problematic, it would be better better to always add a new function call for new features to make the use of new features visible to the linker (this is not going to be popular).

fatherlinux mentioned this issue Oct 1, 2015

Proposal: add TEST label #56

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Label for Syscall Tables #39

Label for Syscall Tables #39

fatherlinux commented Sep 2, 2015

rhatdan commented Sep 2, 2015

fatherlinux commented Sep 2, 2015

rhatdan commented Sep 2, 2015

eparis commented Sep 2, 2015

fatherlinux commented Sep 2, 2015

fatherlinux commented Sep 2, 2015

mtrmac commented Sep 2, 2015

fatherlinux commented Sep 2, 2015

jmtd commented Sep 25, 2015

fatherlinux commented Sep 25, 2015

jmtd commented Sep 25, 2015

fatherlinux commented Sep 25, 2015

mtrmac commented Sep 25, 2015

Label for Syscall Tables #39

Label for Syscall Tables #39

Comments

fatherlinux commented Sep 2, 2015

rhatdan commented Sep 2, 2015

fatherlinux commented Sep 2, 2015

rhatdan commented Sep 2, 2015

eparis commented Sep 2, 2015

fatherlinux commented Sep 2, 2015

fatherlinux commented Sep 2, 2015

mtrmac commented Sep 2, 2015

fatherlinux commented Sep 2, 2015

jmtd commented Sep 25, 2015

fatherlinux commented Sep 25, 2015

jmtd commented Sep 25, 2015

fatherlinux commented Sep 25, 2015

mtrmac commented Sep 25, 2015