AKS: Not all services are starting #198

greg-bucko · 2021-11-09T18:46:13Z

The README.md for running setup_f5_aks.sh states that: "The cluster will have three Standard_D4_v3 nodes which have 4 CPU cores and 16 GB of memory.", however it does not seem to be enough for all services to start.

kubectl get pods
NAME f5-admin-ui-6896d75965-g4mqp f5-ambassador-57c8d798d-cjzwc f5-api-gateway-df9b65dc6-dfsd5 f5-argo-ui-7c87f99b64-vmx2s f5-auth-ui-7765cd5488-h5wx8 f5-classic-rest-service-0 f5-connectors-68f64c488f-m6mts f5-connectors-backend-69d877b594-k6tb5 f5-devops-ui-86bc48f54-2c65h f5-fusion-admin-75794787c9-pn294 f5-fusion-indexing-8479f45ffc-bmqkj f5-fusion-log-forwarder-9c768c45-tg4m9 f5-insights-5ff56c5d-95vcd f5-job-launcher-6f7896dc-59g8m f5-job-rest-server-58994d99dd-6v64z f5-ml-model-service-7448f97bf6-s9m6s f5-monitoring-grafana-6647cddd56-m45cl f5-monitoring-prometheus-kube-st f5-monitoring-prometheus-pushgat f5-monitoring-prometheus-server-0 f5-mysql-5666f7474f-xz7cs f5-pm-ui-5d4cb9f8f6-xbsr8 f5-pulsar-bookkeeper-0 f5-pulsar-broker-0 f5-pulsar-broker-1 f5-query-pipeline-6c4ff48788-8rw6c f5-rules-ui-5fd49b5974-smq4k f5-solr-0 f5-solr-exporter-778cfc8566-fqtg8 f5-templating-567f74c8c4-d8skj f5-tikaserver-6bbd4dd778-59hw8 f5-webapps-c5cb654cc-njjcs f5-workflow-controller-7bc469557b-l2dml f5-zookeeper-0 f5-zookeeper-1 milvus-writable-64bc9f8b75-hdfsw seldon-controller-manager-85cc4458dc-w9zmw READY STATUS RESTARTS AGE
1/1 Running 0 6d17h
0/1 CrashLoopBackOff 1869 6d17h
0/1 Init:CrashLoopBackOff 1361 6d17h
1/1 Running 0 6d17h
1/1 Running 0 6d17h
0/1 Init:0/3 1361 6d17h
0/1 Pending 0 6d17h
0/1 Pending 0 6d17h
1/1 Running 0 6d17h
0/1 Pending 0 6d17h
0/1 Init:CrashLoopBackOff 1361 6d17h
0/1 Init:CrashLoopBackOff 1360 6d17h
1/1 Running 0 6d17h
0/1 CrashLoopBackOff 2220 6d17h
0/1 Init:CrashLoopBackOff 1361 6d17h
0/1 Init:CrashLoopBackOff 1357 6d17h
1/1 Running 0 6d17h
ate-metrics-647cd65579-qc8kc 1/1 Running 0 6d17h
eway-5dd445ff4f-pccht 1/1 Running 0 6d17h
2/2 Running 0 6d17h
1/1 Running 0 6d17h
1/1 Running 0 6d17h
0/1 Init:CrashLoopBackOff 1360 6d17h
0/1 Init:0/4 0 6d17h
0/1 Init:0/4 0 6d17h
0/1 Pending 0 6d17h
1/1 Running 0 6d17h
0/1 Init:CrashLoopBackOff 1360 6d17h
0/1 Init:0/1 0 6d17h
0/1 Pending 0 6d17h
1/1 Running 0 6d17h
0/1 Init:CrashLoopBackOff 1360 6d17h
1/1 Running 0 6d17h
1/1 Running 0 6d17h
0/1 Pending 0 6d17h
1/1 Running 0 6d17h
1/1 Running 2 6d17h

I believe most containers are in CrashLoopBackOff because they cannot verify connection to the zookeeper.

kubectl describe pod f5-api-gateway-df9b65dc6-dfsd5
Name: f5-api-gateway-df9b65dc6-dfsd5
Namespace: default
Priority: 0
Node: aks-agentpool-20404971-vmss000000/10.240.0.4
Start Time: Wed, 03 Nov 2021 01:26:45 +0000
Labels: app.kubernetes.io/component=api-gateway
app.kubernetes.io/instance=f5
app.kubernetes.io/part-of=fusion
pod-template-hash=df9b65dc6
Annotations: prometheus.io/path: /actuator/prometheus
prometheus.io/port: 6764
prometheus.io/scrape: true
Status: Pending
IP: 10.244.0.18
IPs:
IP: 10.244.0.18
Controlled By: ReplicaSet/f5-api-gateway-df9b65dc6
Init Containers:
check-zk:
Container ID: containerd://764fad878747462caeb8147f618c8613ef9e1be76d446a31a61c452f8630056e
Image: lucidworks/check-fusion-dependency:v1.2.0
Image ID: docker.io/lucidworks/check-fusion-dependency@sha256:9829ccb6a0bea76ac92851b51f8fd8451b7f803019adf27865f093d168a6b19e
Port:
Host Port:
Args:
zookeeper
State: Waiting
Reason: CrashLoopBackOff

Events for kubectl describe pod f5-zookeeper-1:

Events:
Type Reason Age From Message

Warning FailedScheduling 2m41s default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 91m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 79m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636478510}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 78m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636478510}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 78m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636478510}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 77m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 67m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479234}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 66m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479234}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 65m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479234}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 65m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 55m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479958}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 53m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479958}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 53m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 52m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 42m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636480742}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 41m default-scheduler 0/3 nodes are available: 1 node(s) exceed max volume count, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 40m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {node.cloudprovider.kubernetes.io/shutdown: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 40m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 39m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 29m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636481532}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 28m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636481532}, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 27m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636481532}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 27m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 17m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482255}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 15m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482255}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 15m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482255}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 15m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 5m2s default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482979}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 3m47s default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482979}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 3m36s default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482979}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Normal NotTriggerScaleUp 2m42s (x29319 over 3d23h) cluster-autoscaler pod didn't trigger scale-up: 1 max node group size reached

Would you be able to let me know what resources are needed in order to have all services up?

Thanks,
Greg

greg-bucko changed the title ~~setup_f5_aks.sh - not all services are starting~~ AKS: Not all services are starting Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AKS: Not all services are starting #198

AKS: Not all services are starting #198

greg-bucko commented Nov 9, 2021 •

edited

Loading

AKS: Not all services are starting #198

AKS: Not all services are starting #198

Comments

greg-bucko commented Nov 9, 2021 • edited Loading

greg-bucko commented Nov 9, 2021 •

edited

Loading