"No route to host" after restarting container #1129

jorgeml · 2024-11-19T10:25:45Z

I'm running Prometheus on a container defined in a quadlet:

# cat /etc/containers/systemd/prometheus.container 
[Unit]
Description=Podman container-prometheus.service
Wants=network-online.target
After=traefik.service

[Service]
Restart=always
TimeoutStartSec=900

[Container]
Image=quay.io/prometheus/prometheus
AutoUpdate=registry
HostName=prometheus
Network=dns.network
PublishPort=9090:9090
PublishPort=[::]:9090:9090
Volume=/etc/prometheus:/etc/prometheus:Z
Volume=prometheus-data:/prometheus

[Install]
WantedBy=multi-user.target default.target

If I restart the container I lose connectivity to it:

# curl -v localhost:9090/metrics
* Host localhost:9090 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:9090...
*   Trying 127.0.0.1:9090...
* connect to ::1 port 9090 from ::1 port 42548 failed: No route to host
* connect to 127.0.0.1 port 9090 from 127.0.0.1 port 52496 failed: No route to host
* Failed to connect to localhost port 9090 after 14456 ms: Could not connect to server
* closing connection #0
curl: (7) Failed to connect to localhost port 9090 after 14456 ms: Could not connect to server

# curl -v localhost:9090/metrics
* Host localhost:9090 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:9090...
*   Trying 127.0.0.1:9090...
* connect to ::1 port 9090 from ::1 port 34122 failed: No route to host
* connect to 127.0.0.1 port 9090 from 127.0.0.1 port 50052 failed: No route to host
* Failed to connect to localhost port 9090 after 3264 ms: Could not connect to server
* closing connection #0
curl: (7) Failed to connect to localhost port 9090 after 3264 ms: Could not connect to server

From what I can see podman isn't able to clean up the nft rules when restarting:

# podman restart systemd-prometheus 
internal:0:0-0: Error: Could not process rule: No such file or directory

internal:0:0-0: Error: Could not process rule: No such file or directory

internal:0:0-0: Error: Could not process rule: No such file or directory

internal:0:0-0: Error: Could not process rule: No such file or directory

internal:0:0-0: Error: Could not process rule: No such file or directory

internal:0:0-0: Error: Could not process rule: No such file or directory

internal:0:0-0: Error: Could not process rule: No such file or directory

internal:0:0-0: Error: Could not process rule: No such file or directory

internal:0:0-0: Error: Could not process rule: No such file or directory

internal:0:0-0: Error: Could not process rule: No such file or directory

ERRO[0000] Unable to clean up network for container c7ba876dddf8379fd801e685bd6c952a587109613854e8e94878581218f6696d: "netavark: nftables error: nft did not return successfully while applying ruleset"

And that causes stale entries on the NFT tables:

# grep 9090 nft_ruleset_good.txt 
		tcp dport 9090 accept
		tcp dport 9090 jump nv_fa15b266_10_89_0_0_nm24_dnat
		tcp dport 9090 jump nv_fa15b266_fd34-5749-e624--_nm64_dnat
		tcp dport 9090 jump nv_fa15b266_fd34-5749-e624--_nm64_dnat
		ip saddr 10.89.0.0/24 tcp dport 9090 jump NETAVARK-HOSTPORT-SETMARK
		ip saddr 127.0.0.1 tcp dport 9090 jump NETAVARK-HOSTPORT-SETMARK
		tcp dport 9090 dnat ip to 10.89.0.8:9090
		ip6 saddr fd34:5749:e624::/64 tcp dport 9090 jump NETAVARK-HOSTPORT-SETMARK
		tcp dport 9090 dnat ip6 to [fd34:5749:e624::8]:9090
		ip6 saddr fd34:5749:e624::/64 tcp dport 9090 jump NETAVARK-HOSTPORT-SETMARK
		tcp dport 9090 dnat ip6 to [fd34:5749:e624::8]:9090

# grep 9090 nft_ruleset_bad.txt 
		tcp dport 9090 accept
		tcp dport 9090 jump nv_fa15b266_10_89_0_0_nm24_dnat
		tcp dport 9090 jump nv_fa15b266_fd34-5749-e624--_nm64_dnat
		tcp dport 9090 jump nv_fa15b266_fd34-5749-e624--_nm64_dnat
		tcp dport 9090 jump nv_fa15b266_10_89_0_0_nm24_dnat
		tcp dport 9090 jump nv_fa15b266_fd34-5749-e624--_nm64_dnat
		tcp dport 9090 jump nv_fa15b266_fd34-5749-e624--_nm64_dnat
		ip saddr 10.89.0.0/24 tcp dport 9090 jump NETAVARK-HOSTPORT-SETMARK
		ip saddr 127.0.0.1 tcp dport 9090 jump NETAVARK-HOSTPORT-SETMARK
		tcp dport 9090 dnat ip to 10.89.0.8:9090
		ip saddr 10.89.0.0/24 tcp dport 9090 jump NETAVARK-HOSTPORT-SETMARK
		ip saddr 127.0.0.1 tcp dport 9090 jump NETAVARK-HOSTPORT-SETMARK
		tcp dport 9090 dnat ip to 10.89.0.12:9090
		ip6 saddr fd34:5749:e624::/64 tcp dport 9090 jump NETAVARK-HOSTPORT-SETMARK
		tcp dport 9090 dnat ip6 to [fd34:5749:e624::8]:9090
		ip6 saddr fd34:5749:e624::/64 tcp dport 9090 jump NETAVARK-HOSTPORT-SETMARK
		tcp dport 9090 dnat ip6 to [fd34:5749:e624::8]:9090
		ip6 saddr fd34:5749:e624::/64 tcp dport 9090 jump NETAVARK-HOSTPORT-SETMARK
		tcp dport 9090 dnat ip6 to [fd34:5749:e624::c]:9090
		ip6 saddr fd34:5749:e624::/64 tcp dport 9090 jump NETAVARK-HOSTPORT-SETMARK
		tcp dport 9090 dnat ip6 to [fd34:5749:e624::c]:9090

# podman info 
host:
  arch: arm64
  buildahVersion: 1.37.5
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.12-3.fc41.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: '
  cpuUtilization:
    idlePercent: 88.3
    systemPercent: 5.05
    userPercent: 6.65
  cpus: 4
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: iot
    version: "41"
  eventLogger: journald
  freeLocks: 2028
  hostname: aldebaran
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.11.7-300.fc41.aarch64
  linkmode: dynamic
  logDriver: journald
  memFree: 158474240
  memTotal: 8197677056
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.13.1-1.fc41.aarch64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.13.1
    package: netavark-1.13.0-1.fc41.aarch64
    path: /usr/libexec/podman/netavark
    version: netavark 1.13.0
  ociRuntime:
    name: crun
    package: crun-1.18.1-1.fc41.aarch64
    path: /usr/bin/crun
    version: |-
      crun version 1.18.1
      commit: c41f034fdbb9742c395085fc98459c94ad1f9aae
      rundir: /run/user/0/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20241030.gee7d0b6-1.fc41.aarch64
    version: |
      pasta 0^20241030.gee7d0b6-1.fc41.aarch64-pasta
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.3.1-1.fc41.aarch64
    version: |-
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.8.0
      SLIRP_CONFIG_VERSION_MAX: 5
      libseccomp: 2.5.5
  swapFree: 8196190208
  swapTotal: 8196714496
  uptime: 0h 11m 46.00s
  variant: v8
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 13
    paused: 0
    running: 10
    stopped: 3
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /usr/lib/containers/storage
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 123371671552
  graphRootUsed: 31992442880
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 16
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.2.5
  Built: 1729209600
  BuiltTime: Fri Oct 18 00:00:00 2024
  GitCommit: ""
  GoVersion: go1.23.2
  Os: linux
  OsArch: linux/arm64
  Version: 5.2.5

The text was updated successfully, but these errors were encountered:

jorgeml · 2024-11-19T10:28:11Z

Possibly the same issue as containers/podman#23404

jorgeml · 2024-11-19T10:44:46Z

strace.txt

Uploading a strace. As I struggled to get the output to a file this was captured after a second container restart.

Luap99 · 2024-11-19T10:51:12Z

PublishPort=9090:9090
PublishPort=[::]:9090:9090

This should not be needed PublishPort=9090:9090 already does forwarding for both ipv4 and ipv6 when the network has both v4 and v6 addresses.

Regardless this still looks like a bug, I would have assumed that #1075 matches things correctly but clearly there are some differences still around how the host ip matches.

Luap99 · 2024-11-19T16:07:19Z

@mheon So I did debug this but it is not clear to me how to best fix it.

The issue is that we add the same delete rule twice in the json we send to nft. This causes the second delete to fail with ENOENT.

Now the issue is what it is complaining about is the rules in NETAVARK-HOSTPORT-DNAT such as tcp dport 8080 jump nv_2f259bab_10_88_0_0_nm16_dnat which get added twice for two different host ip as they do not contain the host ip in the rule. The the teardown logic matches both rules for both ports so we have 4 delete commands.
But here is the thing, one might think just fix teardown to not delete the same things twice but that would be incorrect either. In particular think of two containers forwarding the same port on different host ips. Once one of the containers stop we delete the rule from NETAVARK-HOSTPORT-DNAT rendering the port forward for the other container broken.

As such I don't see a proper fix without breaking or rule setup which then might cause other issues for upgrades. My best bet would be to drop the tcp dport <port> part from the rule and only keep the jump <network> chain part. That means each packages would have to walk all network chains not just where dport matches but then it should at least work consistently. And then the jump rule is added as part of network setup and only removed on full network teardown like some of the other rules.

Luap99 added the kind/bug label Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"No route to host" after restarting container #1129

"No route to host" after restarting container #1129

jorgeml commented Nov 19, 2024

jorgeml commented Nov 19, 2024

jorgeml commented Nov 19, 2024

Luap99 commented Nov 19, 2024

Luap99 commented Nov 19, 2024

"No route to host" after restarting container #1129

"No route to host" after restarting container #1129

Comments

jorgeml commented Nov 19, 2024

jorgeml commented Nov 19, 2024

jorgeml commented Nov 19, 2024

Luap99 commented Nov 19, 2024

Luap99 commented Nov 19, 2024