Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman 5.2.x hangs on heavy data loads #23681

Closed
hackeryarn opened this issue Aug 20, 2024 · 7 comments
Closed

Podman 5.2.x hangs on heavy data loads #23681

hackeryarn opened this issue Aug 20, 2024 · 7 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@hackeryarn
Copy link

hackeryarn commented Aug 20, 2024

Issue Description

When running a heavy data load, in my case loading a sql file into a postgres database, podman just completely hangs in the middle of the run, after running normally for a short time. There are no memory or CPU during the hang, and I cannot ctrl-c out of the process. I have to run podman stop ... which can stop the container.

I tried this with multiple data files, across different projects, and different versions of postgres. Downgrading to 5.1.2 completely fixes the issue. Using 5.2.0 seems to avoid the issue on smaller workloads but still has issues with loads bigger that 3GB.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Start a postgres container podman run -dt --name test -e POSTGRES_PASSWORD='test' -p 5432:5432 docker.io/postgres:16.
  2. Try to load a largish sql file. This happened to me on a 3GB file.

Describe the results you received

The container hangs in the middle of loading the data. I cannot ctrl-c out of the container and there are no CPU or memory spikes when it hangs.

Describe the results you expected

I expect it to run to completion.

podman info output

host:
  arch: amd64
  buildahVersion: 1.37.1
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: Unknown
    path: /nix/store/yamfar96szwhs671arpkb66i0kmzsspl-podman-helper-binary-wrapper/bin/conmon
    version: 'conmon version 2.1.12, commit: '
  cpuUtilization:
    idlePercent: 95.91
    systemPercent: 0.99
    userPercent: 3.1
  cpus: 16
  databaseBackend: sqlite
  distribution:
    codename: vicuna
    distribution: nixos
    version: "24.11"
  eventLogger: journald
  freeLocks: 1942
  hostname: nixos
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 100
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.10.5-zen1
  linkmode: dynamic
  logDriver: journald
  memFree: 36275949568
  memTotal: 67268026368
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: Unknown
      path: /nix/store/c1hhd1cjvlgy4cqlg02vjnd3ndli0fy8-podman-5.2.1/libexec/podman/aardvark-dns
      version: aardvark-dns 1.12.1
    package: Unknown
    path: /nix/store/c1hhd1cjvlgy4cqlg02vjnd3ndli0fy8-podman-5.2.1/libexec/podman/netavark
    version: netavark 1.7.0
  ociRuntime:
    name: crun
    package: Unknown
    path: /nix/store/yamfar96szwhs671arpkb66i0kmzsspl-podman-helper-binary-wrapper/bin/crun
    version: |-
      crun version 1.16
      commit: 1.16
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /nix/store/c1hhd1cjvlgy4cqlg02vjnd3ndli0fy8-podman-5.2.1/libexec/podman/pasta
    package: Unknown
    version: |
      pasta 2024_07_26.57a21d2
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: ""
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /nix/store/c1hhd1cjvlgy4cqlg02vjnd3ndli0fy8-podman-5.2.1/libexec/podman/slirp4netns
    package: Unknown
    version: |-
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.8.0
      SLIRP_CONFIG_VERSION_MAX: 5
      libseccomp: 2.5.5
  swapFree: 9448923136
  swapTotal: 9448923136
  uptime: 10h 20m 23.00s (Approximately 0.42 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
  - quay.io
store:
  configFile: /home/artem/.config/containers/storage.conf
  containerStore:
    number: 3
    paused: 0
    running: 2
    stopped: 1
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/artem/.local/share/containers/storage
  graphRootAllocated: 973518606336
  graphRootUsed: 283149082624
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /tmp/nix-shell.aWqziX
  imageStore:
    number: 31
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/artem/.local/share/containers/storage/volumes
version:
  APIVersion: 5.2.1
  Built: 315532800
  BuiltTime: Mon Dec 31 18:00:00 1979
  GitCommit: ""
  GoVersion: go1.22.5
  Os: linux
  OsArch: linux/amd64
  Version: 5.2.1

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

Here is podman info for the version that runs the workload without issues:

host:
  arch: amd64
  buildahVersion: 1.36.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: Unknown
    path: /nix/store/mbxcn95wf32cy267jg6f46fpfnxznl49-podman-helper-binary-wrapper/bin/conmon
    version: 'conmon version 2.1.12, commit: '
  cpuUtilization:
    idlePercent: 94.39
    systemPercent: 1.35
    userPercent: 4.27
  cpus: 16
  databaseBackend: sqlite
  distribution:
    codename: vicuna
    distribution: nixos
    version: "24.11"
  eventLogger: journald
  freeLocks: 2040
  hostname: nixos
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 100
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.10.5-zen1
  linkmode: dynamic
  logDriver: journald
  memFree: 5056778240
  memTotal: 67268026368
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: Unknown
      path: /nix/store/qj017a8klv62p3gb6wr9788sbx1jcx3i-podman-5.1.2/libexec/podman/aardvark-dns
      version: aardvark-dns 1.11.0
    package: Unknown
    path: /nix/store/qj017a8klv62p3gb6wr9788sbx1jcx3i-podman-5.1.2/libexec/podman/netavark
    version: netavark 1.7.0
  ociRuntime:
    name: crun
    package: Unknown
    path: /nix/store/mbxcn95wf32cy267jg6f46fpfnxznl49-podman-helper-binary-wrapper/bin/crun
    version: |-
      crun version 1.15
      commit: 1.15
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /nix/store/qj017a8klv62p3gb6wr9788sbx1jcx3i-podman-5.1.2/libexec/podman/pasta
    package: Unknown
    version: |
      pasta 2024_06_24.1ee2eca
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: ""
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /nix/store/qj017a8klv62p3gb6wr9788sbx1jcx3i-podman-5.1.2/libexec/podman/slirp4netns
    package: Unknown
    version: |-
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.8.0
      SLIRP_CONFIG_VERSION_MAX: 5
      libseccomp: 2.5.5
  swapFree: 9445777408
  swapTotal: 9448923136
  uptime: 17h 15m 8.00s (Approximately 0.71 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
  - quay.io
store:
  configFile: /home/artem/.config/containers/storage.conf
  containerStore:
    number: 3
    paused: 0
    running: 2
    stopped: 1
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/artem/.local/share/containers/storage
  graphRootAllocated: 973518606336
  graphRootUsed: 270555635712
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /tmp/nix-shell.h9Wbcy
  imageStore:
    number: 31
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/artem/.local/share/containers/storage/volumes
version:
  APIVersion: 5.1.2
  Built: 315532800
  BuiltTime: Mon Dec 31 18:00:00 1979
  GitCommit: ""
  GoVersion: go1.22.5
  Os: linux
  OsArch: linux/amd64
  Version: 5.1.2
@hackeryarn hackeryarn added the kind/bug Categorizes issue or PR as related to a bug. label Aug 20, 2024
@hackeryarn hackeryarn changed the title Podman 5.2.1 hands on heavy data loads Podman 5.2.1 hangs on heavy data loads Aug 20, 2024
@hackeryarn hackeryarn changed the title Podman 5.2.1 hangs on heavy data loads Podman 5.2.x hangs on heavy data loads Aug 20, 2024
@baude
Copy link
Member

baude commented Aug 22, 2024

do you have a 3GB data file you can point to?

@hackeryarn
Copy link
Author

@baude unfortunately, I cannot provide a DB file since none of the projects that experience the problem are open source. This is still happening, however, even with the upgrade to 5.2.2. I am happy to run any type of additional diagnostics, I am just not sure where to go from here to provide more information.

@Luap99
Copy link
Member

Luap99 commented Sep 6, 2024

What does load even mean here? Load how, piped into podman exec, file load from a volume or copied into the container first, etc...?

You can try to kill -ABRT that will give us a stack trace and we see where it hangs.

If you know the downgrade fixes the issue then you can also compile from source and run git bisect to find the root cause commit which would help us a lot.

@hackeryarn
Copy link
Author

Sorry for the late reply. I did some extra investigating and will provide more detail.

It seems like it's the connection between a local command and the podman container that stops accepting data.

What I mean by a heavy load, is that this happens when I load data into the running container. To do this, I run psql locally like so:

psql postgresql://...:...@localhost/db <data.sql

This communicates to the container running via:

podman run -dt -e POSTGRES_PASSWORD='password' -p 5432:5432 docker.io/postgres:14

When I run this command, it loads the first few statements. Then when it gets to a table with over 100k rows, it just hangs, I can't even Ctrl-C to stop it. I can still run podman stop to stop the container, and then the psql command also exits with an error.

I tried running kill -ABRT on the psql command, and all that gave me was:

(core dumped) psql "$pg_url" < data.sql

I also tried running podman kill --signal ABRT to see if I got any info from the container, but it didn't do anything.

I will try to git bisect to get to the root cause next.

@hackeryarn
Copy link
Author

hackeryarn commented Sep 11, 2024

@Luap99 I did confirm that it's a connection issue. If I load the data from inside the container, using podman exec -i postgres sh -c "psql $pg_url" <"$sql_file", everything loads just fine.

@Luap99
Copy link
Member

Luap99 commented Sep 16, 2024

I tried running kill -ABRT on the psql command, and all that gave me was:

You have to kill the podman command as golang provides useful stack trace. However as you mentioned transfer via the network it is not a podman issue but rather a pasta issue. You should try using the most recent pasta version.

@hackeryarn
Copy link
Author

Thanks for pointing me in the right direction! This was a pasta issue. After updating passt from 2024_08_21.1d6142f to 2024_09_06.6b38f07, everything works perfectly.

@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Dec 17, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Dec 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

3 participants