Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k0s reset: failed to connect to /run/k0s/containerd.sock #5274

Open
4 tasks done
neopointer opened this issue Nov 18, 2024 · 3 comments
Open
4 tasks done

k0s reset: failed to connect to /run/k0s/containerd.sock #5274

neopointer opened this issue Nov 18, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@neopointer
Copy link

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 6.1.0-10-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.37-1 (2023-07-03) x86_64 GNU/Linux
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Version

v1.31.1+k0s.1

Sysinfo

`k0s sysinfo`
Total memory: 23.5 GiB (pass)
File system of /var/lib/k0s: ext4 (pass)
Disk space available for /var/lib/k0s: 1.1 TiB (pass)
Relative disk space available for /var/lib/k0s: 94% (pass)
Name resolution: localhost: [127.0.0.1 ::1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.1.0-10-amd64 (pass)
  Max. file descriptors per process: current: 524288 / max: 524288 (pass)
  AppArmor: active (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

Reset never works smoothly for me. So I would say I most likely faced different issues so far, but this one is the more commend one

Why does reset need containerd running anyway? you only do reset after stop, so wouldn't it be expected that containerd is offline when you run reset?

This issue could be related to #4211 and #4434.

Steps to reproduce

  1. install/start k0s
  2. run k0s stop
  3. run k0s reset

However, I'm sure this is not reproducible every time.
I haven't encountered a way to reproduce the issue reliably.

Expected behavior

No errors during reset.

Actual behavior

Errors during reset, sometimes it hangs for a very long time until it finally exits.

Screenshots and logs

W1118 11:24:12.320036   10086 logging.go:55] [core] [Channel #1 SubChannel #2]grpc: addrConn.createTransport failed to connect to {Addr: "/run/k0s/containerd.sock", ServerName: "localhost", Attributes: {"<%!p(networktype.keyType=grpc.internal.transport.networktype)>": "unix" }, }. Err: connection error: desc = "transport: Error while dialing: dial unix /run/k0s/containerd.sock: connect: no such file or directory"
W1118 11:24:12.518752   10086 logging.go:55] [core] [Channel #3 SubChannel #4]grpc: addrConn.createTransport failed to connect to {Addr: "/run/k0s/containerd.sock", ServerName: "localhost", Attributes: {"<%!p(networktype.keyType=grpc.internal.transport.networktype)>": "unix" }, }. Err: connection error: desc = "transport: Error while dialing: dial unix /run/k0s/containerd.sock: connect: no such file or directory"
W1118 11:24:12.779778   10086 logging.go:55] [core] [Channel #5 SubChannel #6]grpc: addrConn.createTransport failed to connect to {Addr: "/run/k0s/containerd.sock", ServerName: "localhost", Attributes: {"<%!p(networktype.keyType=grpc.internal.transport.networktype)>": "unix" }, }. Err: connection error: desc = "transport: Error while dialing: dial unix /run/k0s/containerd.sock: connect: no such file or directory"
W1118 11:24:13.197238   10086 logging.go:55] [core] [Channel #7 SubChannel #8]grpc: addrConn.createTransport failed to connect to {Addr: "/run/k0s/containerd.sock", ServerName: "localhost", Attributes: {"<%!p(networktype.keyType=grpc.internal.transport.networktype)>": "unix" }, }. Err: connection error: desc = "transport: Error while dialing: dial unix /run/k0s/containerd.sock: connect: no such file or directory"
W1118 11:24:14.078510   10086 logging.go:55] [core] [Channel #9 SubChannel #10]grpc: addrConn.createTransport failed to connect to {Addr: "/run/k0s/containerd.sock", ServerName: "localhost", Attributes: {"<%!p(networktype.keyType=grpc.internal.transport.networktype)>": "unix" }, }. Err: connection error: desc = "transport: Error while dialing: dial unix /run/k0s/containerd.sock: connect: no such file or directory"
W1118 11:24:15.768146   10086 logging.go:55] [core] [Channel #11 SubChannel #12]grpc: addrConn.createTransport failed to connect to {Addr: "/run/k0s/containerd.sock", ServerName: "localhost", Attributes: {"<%!p(networktype.keyType=grpc.internal.transport.networktype)>": "unix" }, }. Err: connection error: desc = "transport: Error while dialing: dial unix /run/k0s/containerd.sock: connect: no such file or directory"
W1118 11:24:19.023620   10086 logging.go:55] [core] [Channel #13 SubChannel #14]grpc: addrConn.createTransport failed to connect to {Addr: "/run/k0s/containerd.sock", ServerName: "localhost", Attributes: {"<%!p(networktype.keyType=grpc.internal.transport.networktype)>": "unix" }, }. Err: connection error: desc = "transport: Error while dialing: dial unix /run/k0s/containerd.sock: connect: no such file or directory"
W1118 11:24:25.513591   10086 logging.go:55] [core] [Channel #15 SubChannel #16]grpc: addrConn.createTransport failed to connect to {Addr: "/run/k0s/containerd.sock", ServerName: "localhost", Attributes: {"<%!p(networktype.keyType=grpc.internal.transport.networktype)>": "unix" }, }. Err: connection error: desc = "transport: Error while dialing: dial unix /run/k0s/containerd.sock: connect: no such file or directory"
W1118 11:24:38.331021   10086 logging.go:55] [core] [Channel #17 SubChannel #18]grpc: addrConn.createTransport failed to connect to {Addr: "/run/k0s/containerd.sock", ServerName: "localhost", Attributes: {"<%!p(networktype.keyType=grpc.internal.transport.networktype)>": "unix" }, }. Err: connection error: desc = "transport: Error while dialing: dial unix /run/k0s/containerd.sock: connect: no such file or directory"
W1118 11:25:03.959317   10086 logging.go:55] [core] [Channel #19 SubChannel #20]grpc: addrConn.createTransport failed to connect to {Addr: "/run/k0s/containerd.sock", ServerName: "localhost", Attributes: {"<%!p(networktype.keyType=grpc.internal.transport.networktype)>": "unix" }, }. Err: connection error: desc = "transport: Error while dialing: dial unix /run/k0s/containerd.sock: connect: no such file or directory"
WARN[2024-11-18 11:25:14] To ensure a full reset, a node reboot is recommended.

Additional context

No response

@neopointer neopointer added the bug Something isn't working label Nov 18, 2024
@twz123
Copy link
Member

twz123 commented Nov 25, 2024

Why does reset need containerd running anyway?

Because kubelet / containerd don't stop running pods / containers when they get stopped. The kubelet doesn't have any APIs to stop all pods, so k0s reset relies on containerd to stop containers instead. Therefore, containerd must be up.

you only do reset after stop, so wouldn't it be expected that containerd is offline when you run reset?

In fact, k0s reset spawns its own containerd instance just for the purpose of shutting down all containers.

I suppose the containerd process that's being spawned fails to start for some reason, hence k0s cannot connect to it.

@neopointer
Copy link
Author

@twz123 where can I find logs or debug information about this? so that the next time it happens maybe I can collect more info and share here?

@twz123
Copy link
Member

twz123 commented Nov 29, 2024

where can I find logs or debug information about this?

You can run k0s reset --debug, which will be more verbose on standard out. But I fear that this won't include any containerd logs, as k0s reset sends containerd's output to /dev/null 🙈. (I have some draft PR floating around that'll change that, but for now, no containerd logs for us).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants