Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: ExitWithError() will determine an incorrect exit code and result in an error. #24213

Closed
hiramotom opened this issue Oct 9, 2024 · 5 comments · Fixed by #24274
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@hiramotom
Copy link

hiramotom commented Oct 9, 2024

Issue Description

It seems that the changes made by #22661 might not be completely accurate.
When I ran the command in test environment, it returned an error: "Command exited with status 134 (expected 2)".
Could this test result be different depending on the environment?

I believe this code is causing the error output.

  • Expect(session).To(ExitWithError(2, "SIGFPE: floating-point exception"))

Steps to reproduce the issue

  1. Run run_signal_test.go
# FOCUS_FILE="run_signal_test.go" QUADLET_BINARY=/usr/libexec/podman/quadlet PATH=${PATH}:$(pwd)/hack PODMAN_BINARY=$(which podman) PODMAN_REMOTE_BINARY=$(which podman-remote) CGROUP_MANAGER=$(podman info --format {{.Host.CgroupManager}}) NETWORK_BACKEND=$(podman info --format {{.Host.NetworkBackend}}) OCI_RUNTIME=$(podman info --format {{.Host.OCIRuntime.Name}}) STORAGE_FS=overlay ROOTLESS_STORAGE_FS=overlay make localintegration`

2.The results will be output, so I will check them.

...snip...
  SIGQUIT: quit
  PC=0x561b8b47ba41 m=10 sigcode=0

  goroutine 0 gp=0xc0004d6380 m=10 mp=0xc0004a4808 [idle]:
  runtime.futex(0xc0004a4948, 0x80, 0x0, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/runtime/sys_linux_amd64.s:557 +0x21 fp=0x7f7a99fffa78 sp=0x7f7a99fffa70 pc=0x561b8b47ba41
  runtime.futexsleep(0x7f7a99fffaf0?, 0x8b410750?, 0x561b8b410750?)
        /usr/lib/golang/src/runtime/os_linux.go:69 +0x30 fp=0x7f7a99fffac8 sp=0x7f7a99fffa78 pc=0x561b8b43adb0
  runtime.notesleep(0xc0004a4948)
        /usr/lib/golang/src/runtime/lock_futex.go:170 +0x87 fp=0x7f7a99fffb00 sp=0x7f7a99fffac8 pc=0x561b8b4108e7
  runtime.mPark(...)
        /usr/lib/golang/src/runtime/proc.go:1761
  runtime.stopm()
        /usr/lib/golang/src/runtime/proc.go:2782 +0x8c fp=0x7f7a99fffb30 sp=0x7f7a99fffb00 pc=0x561b8b445aec
  runtime.findRunnable()
        /usr/lib/golang/src/runtime/proc.go:3512 +0xd5f fp=0x7f7a99fffca8 sp=0x7f7a99fffb30 pc=0x561b8b44765f
  runtime.schedule()
        /usr/lib/golang/src/runtime/proc.go:3868 +0xb1 fp=0x7f7a99fffce0 sp=0x7f7a99fffca8 pc=0x561b8b448731
  runtime.goexit0(0xc0002d96c0?)
        /usr/lib/golang/src/runtime/proc.go:4181 +0x18 fp=0x7f7a99fffcf8 sp=0x7f7a99fffce0 pc=0x561b8b4495d8
  runtime.mcall()
        /usr/lib/golang/src/runtime/asm_amd64.s:458 +0x50 fp=0x7f7a99fffd10 sp=0x7f7a99fffcf8 pc=0x561b8b477bf0
  rax    0xca
  rbx    0x0
  rcx    0x561b8b47ba43
  rdx    0x0
  rdi    0xc0004a4948
  rsi    0x80
  rbp    0x7f7a99fffab8
  rsp    0x7f7a99fffa70
  r8     0x0
  r9     0x0
  r10    0x0
  r11    0x286
  r12    0x15
  r13    0x1
  r14    0xc0004d6380
  r15    0x1
  rip    0x561b8b47ba41
  rflags 0x286
  cs     0x33
  fs     0x0
  gs     0x0
  test2
  [FAILED] Command exited with status 134 (expected 2)
  In [It] at: /root/podman-upstream/test/e2e/run_signal_test.go:115 @ 10/07/24 04:51:30.368

  Full Stack Trace
    github.com/containers/podman/v5/test/e2e.init.func98.2()
        /root/podman-upstream/test/e2e/run_signal_test.go:115 +0x428
  < Exit [It] signals are not forwarded to container with sig-proxy false - /root/podman-upstream/test/e2e/run_signal_test.go:93 @ 10/07/24 04:51:30.368 (2.071s)
  > Enter [AfterEach] TOP-LEVEL - /root/podman-upstream/test/e2e/common_test.go:122 @ 10/07/24 04:51:30.368
  Running: /usr/bin/podman --storage-opt overlay.imagestore=/tmp/podman-e2e-2134768457/imagecachedir --root /tmp/podman-e2e-2134768457/subtest-2968878380/root --runroot /tmp/podman-e2e-2134768457/subtest-2968878380/runroot --runtime crun --conmon /usr/bin/conmon --network-config-dir /etc/containers/networks --network-backend netavark --cgroup-manager systemd --tmpdir /tmp/podman-e2e-2134768457/subtest-2968878380 --events-backend file --db-backend sqlite --storage-driver overlay stop --all -t 0
  3f7501ec1389c8da7224ec95ec58cc4ad10c8334bfd082c7f7f271975f5c0d16
  Running: /usr/bin/podman --storage-opt overlay.imagestore=/tmp/podman-e2e-2134768457/imagecachedir --root /tmp/podman-e2e-2134768457/subtest-2968878380/root --runroot /tmp/podman-e2e-2134768457/subtest-2968878380/runroot --runtime crun --conmon /usr/bin/conmon --network-config-dir /etc/containers/networks --network-backend netavark --cgroup-manager systemd --tmpdir /tmp/podman-e2e-2134768457/subtest-2968878380 --events-backend file --db-backend sqlite --storage-driver overlay pod rm -fa -t 0
  Running: /usr/bin/podman --storage-opt overlay.imagestore=/tmp/podman-e2e-2134768457/imagecachedir --root /tmp/podman-e2e-2134768457/subtest-2968878380/root --runroot /tmp/podman-e2e-2134768457/subtest-2968878380/runroot --runtime crun --conmon /usr/bin/conmon --network-config-dir /etc/containers/networks --network-backend netavark --cgroup-manager systemd --tmpdir /tmp/podman-e2e-2134768457/subtest-2968878380 --events-backend file --db-backend sqlite --storage-driver overlay rm -fa -t 0
  3f7501ec1389c8da7224ec95ec58cc4ad10c8334bfd082c7f7f271975f5c0d16
  < Exit [AfterEach] TOP-LEVEL - /root/podman-upstream/test/e2e/common_test.go:122 @ 10/07/24 04:51:30.514 (146ms)
  << Timeline
------------------------------
...snip...

Summarizing 1 Failure:
  [FAIL] Podman run with --sig-proxy [It] signals are not forwarded to container with sig-proxy false
  /root/podman-upstream/test/e2e/run_signal_test.go:115

Describe the results you received

This test will output an error.
"Command exited with status %d (expected %d)", matcher.ExitCode, matcher.ExpectedExitCode

Describe the results you expected

The test will end successfully if the exit status is 134 (128+6 SIGABRT).

podman info output

podman info

host:
  arch: amd64
  buildahVersion: 1.37.3
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.12-2.fc40.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: '
  cpuUtilization:
    idlePercent: 85.37
    systemPercent: 4.03
    userPercent: 10.6
  cpus: 2
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: workstation
    version: "40" 
  eventLogger: journald
  freeLocks: 2039
  hostname: fedora
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.10.11-200.fc40.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 719458304
  memTotal: 4098375680
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.2-2.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.12.2
    package: netavark-1.12.2-1.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.12.2
  ociRuntime:
    name: crun
    package: crun-1.17-1.fc40.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.17
      commit: 000fa0d4eeed8938301f3bcf8206405315bc1017
      rundir: /run/user/0/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240906.g6b38f07-1.fc40.x86_64
    version: |
      pasta 0^20240906.g6b38f07-1.fc40.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: "" 
    package: "" 
    version: "" 
  swapFree: 3194482688
  swapTotal: 4097830912
  uptime: 1h 6m 16.00s (Approximately 0.04 days)
  variant: "" 
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /usr/lib/containers/storage
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 20397948928
  graphRootUsed: 5803491328
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false" 
    Supports d_type: "true" 
    Supports shifting: "true" 
    Supports volatile: "true" 
    Using metacopy: "true" 
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.2.3
  Built: 1727136000
  BuiltTime: Mon Sep 23 20:00:00 2024
  GitCommit: "" 
  GoVersion: go1.22.7
  Os: linux
  OsArch: linux/amd64
  Version: 5.2.3

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

@hiramotom hiramotom added the kind/bug Categorizes issue or PR as related to a bug. label Oct 9, 2024
@Luap99
Copy link
Member

Luap99 commented Oct 9, 2024

The test is sending SIGFPE and that is causing golang to catch it and print SIGFPE: floating-point exception + stack trace and then exits 2. This works fine in all our CI systems and for me locally on f40.

There is nothing wrong with ExitWithError is simply check the exit code is what we expect, sending SIGABRT will also cause the podman process to exit 2 as this is also catched by golang.

So if you get anything else something seems to be wrong in your environment.

@hiramotom
Copy link
Author

Is there a way to identify issues in the execution environment?

For example, after creating a new F40 environment and updating it, I tried running the following command.
# podman run --name fedora-test --sig-proxy=false fedora:40 bash -c "trap 'echo Received' SIGFPE; while :; do sleep 0.25; done"

This creates a container that sends a trap message saying 'echo Received' when SIGFPE occurs.
Then, I triggered SIGFPE as follows:

# ps aux | grep fedora-test
root     1314618  3.7  1.6 1941592 67420 pts/1   Sl+  22:07   0:03 podman run --name fedora-test --sig-proxy=false fedora:40 bash -c trap 'echo Received' SIGFPE; while :; do sleep 0.25; done
# kill -s SIGFPE 1314618

The output in this new F40 environment is as follows.
It showed "SIGFPE: Floating Point Exception + Stack Trace," but the result of "echo $?" was 134.
It seems that the exit status caught in this test is 134 (128+6 SIGABRT).

https://gist.github.com/hiramotom/fa71edc0b9cbd855a8dc3eb77330bf27

@Luap99
Copy link
Member

Luap99 commented Oct 15, 2024

That is not what I am seeing when building locally and all our CI tests on different distros pass too.

However I can confirm that something in the fedora distro build changed, if I use the packaged version then it is behaving differently for some reason.

This can be easily tested with

$ podman events &
$ kill -SIGFPE $!
...
[1]+  Aborted                 (core dumped) podman events

Compared to a local build via make podman

$ bin/podman events &
$ kill -SIGFPE $!
...
[1]+  Exit 2                  bin/podman events

So most likely this is something in the distro build options that caused this change and has nothing to do with podman. I don't think it matters much

If I strace it we clearly see that the go program send itself SIGABRT

getpid()                                = 58502
gettid()                                = 58502
tgkill(58502, 58502, SIGABRT)           = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=58502, si_uid=0} ---
+++ killed by SIGABRT (core dumped) +++

I have no idea why, on the locally build version I see the expected

exit_group(2)                           = ?
+++ exited with 2 +++

@Luap99
Copy link
Member

Luap99 commented Oct 15, 2024

Ok it seems to be caused by the rpm_crashtraceback build tag that is passed in package builds but I have no idea what the purpose of this tag is.

@Luap99
Copy link
Member

Luap99 commented Oct 15, 2024

FYI I asked the fedora and RHEl golang maintainers what they now about rpm_crashtraceback, the only thing that we know is that it forces GOTRACEBACK=crash which causes the SIGABRT exit as documented in https://pkg.go.dev/runtime
However the change was not really explained why this was done so they are looking into it, anyway as GOTRACEBACK=crash is enough to trigger the failure I fixed the test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants