Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AKS: Not all services are starting #198

Open
greg-bucko opened this issue Nov 9, 2021 · 0 comments
Open

AKS: Not all services are starting #198

greg-bucko opened this issue Nov 9, 2021 · 0 comments

Comments

@greg-bucko
Copy link

greg-bucko commented Nov 9, 2021

The README.md for running setup_f5_aks.sh states that: "The cluster will have three Standard_D4_v3 nodes which have 4 CPU cores and 16 GB of memory.", however it does not seem to be enough for all services to start.

kubectl get pods
NAME READY STATUS RESTARTS AGE
f5-admin-ui-6896d75965-g4mqp 1/1 Running 0 6d17h
f5-ambassador-57c8d798d-cjzwc 0/1 CrashLoopBackOff 1869 6d17h
f5-api-gateway-df9b65dc6-dfsd5 0/1 Init:CrashLoopBackOff 1361 6d17h
f5-argo-ui-7c87f99b64-vmx2s 1/1 Running 0 6d17h
f5-auth-ui-7765cd5488-h5wx8 1/1 Running 0 6d17h
f5-classic-rest-service-0 0/1 Init:0/3 1361 6d17h
f5-connectors-68f64c488f-m6mts 0/1 Pending 0 6d17h
f5-connectors-backend-69d877b594-k6tb5 0/1 Pending 0 6d17h
f5-devops-ui-86bc48f54-2c65h 1/1 Running 0 6d17h
f5-fusion-admin-75794787c9-pn294 0/1 Pending 0 6d17h
f5-fusion-indexing-8479f45ffc-bmqkj 0/1 Init:CrashLoopBackOff 1361 6d17h
f5-fusion-log-forwarder-9c768c45-tg4m9 0/1 Init:CrashLoopBackOff 1360 6d17h
f5-insights-5ff56c5d-95vcd 1/1 Running 0 6d17h
f5-job-launcher-6f7896dc-59g8m 0/1 CrashLoopBackOff 2220 6d17h
f5-job-rest-server-58994d99dd-6v64z 0/1 Init:CrashLoopBackOff 1361 6d17h
f5-ml-model-service-7448f97bf6-s9m6s 0/1 Init:CrashLoopBackOff 1357 6d17h
f5-monitoring-grafana-6647cddd56-m45cl 1/1 Running 0 6d17h
f5-monitoring-prometheus-kube-state-metrics-647cd65579-qc8kc 1/1 Running 0 6d17h
f5-monitoring-prometheus-pushgateway-5dd445ff4f-pccht 1/1 Running 0 6d17h
f5-monitoring-prometheus-server-0 2/2 Running 0 6d17h
f5-mysql-5666f7474f-xz7cs 1/1 Running 0 6d17h
f5-pm-ui-5d4cb9f8f6-xbsr8 1/1 Running 0 6d17h
f5-pulsar-bookkeeper-0 0/1 Init:CrashLoopBackOff 1360 6d17h
f5-pulsar-broker-0 0/1 Init:0/4 0 6d17h
f5-pulsar-broker-1 0/1 Init:0/4 0 6d17h
f5-query-pipeline-6c4ff48788-8rw6c 0/1 Pending 0 6d17h
f5-rules-ui-5fd49b5974-smq4k 1/1 Running 0 6d17h
f5-solr-0 0/1 Init:CrashLoopBackOff 1360 6d17h
f5-solr-exporter-778cfc8566-fqtg8 0/1 Init:0/1 0 6d17h
f5-templating-567f74c8c4-d8skj 0/1 Pending 0 6d17h
f5-tikaserver-6bbd4dd778-59hw8 1/1 Running 0 6d17h
f5-webapps-c5cb654cc-njjcs 0/1 Init:CrashLoopBackOff 1360 6d17h
f5-workflow-controller-7bc469557b-l2dml 1/1 Running 0 6d17h
f5-zookeeper-0 1/1 Running 0 6d17h
f5-zookeeper-1 0/1 Pending 0 6d17h
milvus-writable-64bc9f8b75-hdfsw 1/1 Running 0 6d17h
seldon-controller-manager-85cc4458dc-w9zmw 1/1 Running 2 6d17h

I believe most containers are in CrashLoopBackOff because they cannot verify connection to the zookeeper.

kubectl describe pod f5-api-gateway-df9b65dc6-dfsd5
Name: f5-api-gateway-df9b65dc6-dfsd5
Namespace: default
Priority: 0
Node: aks-agentpool-20404971-vmss000000/10.240.0.4
Start Time: Wed, 03 Nov 2021 01:26:45 +0000
Labels: app.kubernetes.io/component=api-gateway
app.kubernetes.io/instance=f5
app.kubernetes.io/part-of=fusion
pod-template-hash=df9b65dc6
Annotations: prometheus.io/path: /actuator/prometheus
prometheus.io/port: 6764
prometheus.io/scrape: true
Status: Pending
IP: 10.244.0.18
IPs:
IP: 10.244.0.18
Controlled By: ReplicaSet/f5-api-gateway-df9b65dc6
Init Containers:
check-zk:
Container ID: containerd://764fad878747462caeb8147f618c8613ef9e1be76d446a31a61c452f8630056e
Image: lucidworks/check-fusion-dependency:v1.2.0
Image ID: docker.io/lucidworks/check-fusion-dependency@sha256:9829ccb6a0bea76ac92851b51f8fd8451b7f803019adf27865f093d168a6b19e
Port:
Host Port:
Args:
zookeeper
State: Waiting
Reason: CrashLoopBackOff

Events for kubectl describe pod f5-zookeeper-1:

Events:
Type Reason Age From Message


Warning FailedScheduling 2m41s default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 91m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 79m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636478510}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 78m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636478510}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 78m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636478510}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 77m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 67m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479234}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 66m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479234}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 65m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479234}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 65m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 55m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479958}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 53m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479958}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 53m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 52m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 42m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636480742}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 41m default-scheduler 0/3 nodes are available: 1 node(s) exceed max volume count, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 40m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {node.cloudprovider.kubernetes.io/shutdown: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 40m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 39m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 29m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636481532}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 28m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636481532}, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 27m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636481532}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 27m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 17m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482255}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 15m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482255}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 15m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482255}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 15m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 5m2s default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482979}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 3m47s default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482979}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 3m36s default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482979}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Normal NotTriggerScaleUp 2m42s (x29319 over 3d23h) cluster-autoscaler pod didn't trigger scale-up: 1 max node group size reached

Would you be able to let me know what resources are needed in order to have all services up?

Thanks,
Greg

@greg-bucko greg-bucko changed the title setup_f5_aks.sh - not all services are starting AKS: Not all services are starting Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant