Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch pause container image from ECR before starting kubelet #382

Merged
merged 2 commits into from
Oct 16, 2019

Conversation

etungsten
Copy link
Contributor

@etungsten etungsten commented Oct 8, 2019

Issue #, if available: Fixes #351

Description of changes:
Adds new sandbox_image setting to set the pause container image for containerd/cri plugin.
Converts /etc/containerd/config.toml to a template
Updated pluto to accommodate new naming changes with pause container setting.

Updated host-ctr with new command line options:

  • -pull-image-only to just pull and unpack the specified image at -source
  • -containerd-sock to specify the containerd socket
  • -namespace to specify the containerd namespace.

Adds new oneshot service called pause-ctr-image-fetcher to pull the pause container image from ECR in k8s.io namespace.

Testing:
Launched Thar instance and the sandbox_image option is set successfully:

bash-5.0# cat /etc/containerd/config.toml        
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
disabled_plugins = [
    "io.containerd.snapshotter.v1.aufs",
    "io.containerd.snapshotter.v1.zfs",
    "io.containerd.snapshotter.v1.devmapper",
]

[grpc]
address = "/run/containerd/containerd.sock"

[plugins."io.containerd.grpc.v1.cri"]
# Pause container image is specified here, shares the same image as kubelet's pod-infra-container-image
sandbox_image = "602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause-amd64:3.1"

[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"

[plugins."io.containerd.internal.v1.opt"]
path = "/opt/containerd"

pause-ctr-image-fetcher runs successfully:

bash-5.0# systemctl status pause-ctr-image-fetcher
● pause-ctr-image-fetcher.service - Fetches pause container image from ECR before kubelet starts
   Loaded: loaded (/x86_64-thar-linux-gnu/sys-root/usr/lib/systemd/system/pause-ctr-image-fetcher.service; enabled; vendor preset: enabled)
   Active: active (exited) since Tue 2019-10-15 00:22:07 UTC; 6min ago
  Process: 2654 ExecStart=/usr/bin/host-ctr -source ${POD_INFRA_CONTAINER_IMAGE} -pull-image-only -containerd-socket /run/containerd/containerd.sock -namespac
e k8s.io (code=exited, status=0/SUCCESS)
 Main PID: 2654 (code=exited, status=0/SUCCESS)

Oct 15 00:22:07 ip-192-168-40-77.us-west-2.compute.internal systemd[1]: Starting Fetches pause container image from ECR before kubelet starts...
Oct 15 00:22:07 ip-192-168-40-77.us-west-2.compute.internal host-ctr[2654]: time="2019-10-15T00:22:07Z" level=info msg="Pulling from Amazon ECR" ref="ecr.aws/
arn:aws:ecr:us-west-2:602401143452:repository/eks/pause-amd64:3.1"
Oct 15 00:22:07 ip-192-168-40-77.us-west-2.compute.internal host-ctr[2654]: time="2019-10-15T00:22:07Z" level=info msg="Pulled successfully" img="ecr.aws/arn:
aws:ecr:us-west-2:602401143452:repository/eks/pause-amd64:3.1"
Oct 15 00:22:07 ip-192-168-40-77.us-west-2.compute.internal host-ctr[2654]: time="2019-10-15T00:22:07Z" level=info msg=Unpacking... img="ecr.aws/arn:aws:ecr:u
s-west-2:602401143452:repository/eks/pause-amd64:3.1"
Oct 15 00:22:07 ip-192-168-40-77.us-west-2.compute.internal host-ctr[2654]: time="2019-10-15T00:22:07Z" level=info msg="Tagging image" image name="60240114345
2.dkr.ecr.us-west-2.amazonaws.com/eks/pause-amd64:3.1"
Oct 15 00:22:07 ip-192-168-40-77.us-west-2.compute.internal systemd[1]: Started Fetches pause container image from ECR before kubelet starts.

kubelet also starts successfully

bash-5.0# systemctl status kubelet
● kubelet.service - Kubelet
   Loaded: loaded (/x86_64-thar-linux-gnu/sys-root/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-10-15 00:22:07 UTC; 5min ago
     Docs: https://github.com/kubernetes/kubernetes
  Process: 2787 ExecStartPre=/sbin/iptables -P FORWARD ACCEPT (code=exited, status=0/SUCCESS)
 Main PID: 2818 (kubelet)
    Tasks: 16
   Memory: 128.9M
      CPU: 4.261s
   CGroup: /system.slice/kubelet.service
           └─2818 /usr/bin/kubelet --cloud-provider aws --config /etc/kubernetes/kubelet/config --kubeconfig /etc/kubernetes/kubelet/kubeconfig --container-runt

And I see the worker node registered and pods are scheduled on it successfully:

$ kubectl describe node ip-192-168-52-231.us-west-2.compute.internal
Name:               ip-192-168-52-231.us-west-2.compute.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=c5.large
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-west-2
                    failure-domain.beta.kubernetes.io/zone=us-west-2a
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-192-168-52-231.us-west-2.compute.internal
                    kubernetes.io/os=linux
                    testLabel=foo
                    testLabel2=bar
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 16 Oct 2019 13:05:05 -0700
Taints:             dedicated=experimental:PreferNoSchedule
                    special=true:PreferNoSchedule
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 16 Oct 2019 13:08:35 -0700   Wed, 16 Oct 2019 13:05:04 -0700   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 16 Oct 2019 13:08:35 -0700   Wed, 16 Oct 2019 13:05:04 -0700   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 16 Oct 2019 13:08:35 -0700   Wed, 16 Oct 2019 13:05:04 -0700   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 16 Oct 2019 13:08:35 -0700   Wed, 16 Oct 2019 13:05:25 -0700   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   192.168.52.231
  ExternalIP:   52.13.23.132
  InternalDNS:  ip-192-168-52-231.us-west-2.compute.internal
  ExternalDNS:  ec2-52-13-23-132.us-west-2.compute.amazonaws.com
  Hostname:     ip-192-168-52-231.us-west-2.compute.internal
Capacity:
 attachable-volumes-aws-ebs:  25
 cpu:                         2
 ephemeral-storage:           20624592Ki
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      3792268Ki
 pods:                        110
Allocatable:
 attachable-volumes-aws-ebs:  25
 cpu:                         2
 ephemeral-storage:           19007623956
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      3689868Ki
 pods:                        110
System Info:
 Machine ID:                 ec2ae5856748d54644a0d989642e343b
 System UUID:                ec2ae585-6748-d546-44a0-d989642e343b
 Boot ID:                    34552b04-35df-4d5f-84b3-03869b381d15
 Kernel Version:             4.19.72
 OS Image:                   Thar, The Operating System
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  containerd://1.3.0+unknown
 Kubelet Version:            v1.14.6
 Kube-Proxy Version:         v1.14.6
ProviderID:                  aws:///us-west-2a/i-00ecf5a29e918e1bf
Non-terminated Pods:         (3 in total)
  Namespace                  Name                CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                ------------  ----------  ---------------  -------------  ---
  default                    fluentbit-6pn2c     500m (25%)    0 (0%)      100Mi (2%)       500Mi (13%)    3m37s
  kube-system                aws-node-9947p      10m (0%)      0 (0%)      0 (0%)           0 (0%)         3m57s
  kube-system                kube-proxy-8lljn    100m (5%)     0 (0%)      0 (0%)           0 (0%)         3m57s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests    Limits
  --------                    --------    ------
  cpu                         610m (30%)  0 (0%)
  memory                      100Mi (2%)  500Mi (13%)
  ephemeral-storage           0 (0%)      0 (0%)
  attachable-volumes-aws-ebs  0           0
Events:
  Type     Reason                   Age                    From                                                      Message
  ----     ------                   ----                   ----                                                      -------
  Normal   Starting                 3m58s                  kubelet, ip-192-168-52-231.us-west-2.compute.internal     Starting kubelet.
  Warning  InvalidDiskCapacity      3m58s                  kubelet, ip-192-168-52-231.us-west-2.compute.internal     invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  3m58s (x2 over 3m58s)  kubelet, ip-192-168-52-231.us-west-2.compute.internal     Node ip-192-168-52-231.us-west-2.compute.internal status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    3m58s (x2 over 3m58s)  kubelet, ip-192-168-52-231.us-west-2.compute.internal     Node ip-192-168-52-231.us-west-2.compute.internal status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     3m58s (x2 over 3m58s)  kubelet, ip-192-168-52-231.us-west-2.compute.internal     Node ip-192-168-52-231.us-west-2.compute.internal status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  3m58s                  kubelet, ip-192-168-52-231.us-west-2.compute.internal     Updated Node Allocatable limit across pods
  Normal   Starting                 3m48s                  kube-proxy, ip-192-168-52-231.us-west-2.compute.internal  Starting kube-proxy.
  Normal   NodeReady                3m37s                  kubelet, ip-192-168-52-231.us-west-2.compute.internal     Node ip-192-168-52-231.us-west-2.compute.internal status is now: NodeReady

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@etungsten etungsten changed the title settings: replace pod-infra-container-image setting with new sandbox_image setting WIP: Fetch pause container from ECR Oct 8, 2019
@etungsten
Copy link
Contributor Author

etungsten commented Oct 9, 2019

Updating PR to implement image fetch from ECR before kubelet starts so containerd is able to start pause containers without needing to auth to ECR (which it cannot do),

@etungsten
Copy link
Contributor Author

Rebase develop

@etungsten

This comment has been minimized.

@etungsten
Copy link
Contributor Author

Fixed the issue above, I didn't include the repository path in addition to the pause container image name when building the ECR resolvable reference.
Added a new unit test to cover cases with image sources that have additional repository path (e.g./eks/image:tag).

@etungsten
Copy link
Contributor Author

Adds new namespace command line option for host-ctr to specify the containerd namespace.
After pulling the image, host-ctr will also tag the image with the original image name if it was converted into an ECR resolver image reference.

@etungsten etungsten changed the title WIP: Fetch pause container from ECR Fetch pause container image from ECR before starting kubelet Oct 15, 2019
@etungsten etungsten marked this pull request as ready for review October 15, 2019 00:33
@etungsten
Copy link
Contributor Author

Updates README with information about settings.kubernetes.sandbox-image

workspaces/api/storewolf/defaults.toml Outdated Show resolved Hide resolved
workspaces/host-ctr/cmd/host-ctr/main.go Show resolved Hide resolved
workspaces/host-ctr/cmd/host-ctr/main.go Show resolved Hide resolved
workspaces/host-ctr/cmd/host-ctr/main.go Outdated Show resolved Hide resolved
@etungsten etungsten force-pushed the sandbox-image-setting branch 2 times, most recently from db763f4 to 8041fb6 Compare October 15, 2019 21:52
@etungsten
Copy link
Contributor Author

1st force push: Rebase develop.
2nd force push: Addresses subset of @jahkeup 's comments.

Copy link
Member

@jahkeup jahkeup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM!

workspaces/api/storewolf/defaults.toml Outdated Show resolved Hide resolved
@etungsten
Copy link
Contributor Author

Ran go fmt and added comment per @jahkeup

packages/release/pause-ctr-image-fetcher.service Outdated Show resolved Hide resolved
packages/release/pause-ctr-image-fetcher.service Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@etungsten
Copy link
Contributor Author

Addresses @bcressey 's comments.

  • Removed pause-ctr-image-fetcher.service
  • Added ExecStartPre= directive to pull the pause container image in kubelet.service
  • Removed settings.containerd.* and now use kubernetes.pod-infra-container-image to set the sandbox_image option in containerd/config

@etungsten
Copy link
Contributor Author

Rebase develop. Resolves conflicts in pluto/src/main.rs

@etungsten etungsten force-pushed the sandbox-image-setting branch 2 times, most recently from 12677ce to 2ceacb9 Compare October 16, 2019 22:00
@etungsten
Copy link
Contributor Author

Rebase develop, fixes conflict from #420

Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

workspaces/api/pluto/src/main.rs Outdated Show resolved Hide resolved
Converts the containerd/config.toml to a template, and specifies
`sandbox-image` option in containerd

Renamed pod_infra_container_image to pause_container_image in `pluto`.
Adds new command line options for `host-ctr` to explicitly specify
containerd socket, namespace, and optionally to only pull the container image

`kubelet.service` will invoke host-ctr to fetch the pause container
image before actually starting kubelet.
@etungsten
Copy link
Contributor Author

etungsten commented Oct 16, 2019

Addresses @bcressey 's comment.

Testing again, things still work!

@etungsten etungsten merged commit 2924141 into develop Oct 16, 2019
@etungsten etungsten deleted the sandbox-image-setting branch October 16, 2019 23:48
etungsten added a commit that referenced this pull request Oct 21, 2019
Fixes a bug introduced in #382 where `ref` isn't set when pulling images from
non-ECR registries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kubelet: pod-infra-container-image flag does not work with containerd
4 participants