-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluentd pod crashing on Azure Container Service #6
Comments
From @sbulman on August 5, 2017 7:48 A bit more info. I created the ACS cluster with 1 agent. The fluentd pod that is crashing is on the master node. The pod running on the agent appears to be working fine. |
From @henrydecovo on September 25, 2017 19:44 We're facing the same issue, same symptoms and circumstances as @sbulman. |
From @bacongobbler on September 25, 2017 19:50 There should not be a fluentd pod running on the master node. There was an open ticket on DaemonSet pods being accidentally scheduled on the kubernetes master node that was eventually solved upstream. More background context in this ticket, which was resolved in Kubernetes 1.5.0+ via kubernetes/kubernetes#35526. |
From @henrydecovo on September 25, 2017 21:36 Ok, thanks @bacongobbler for the context. It still appears to be an issue though on ACS today. Any thoughts much appreciated! The fluentd logger pod event for the master node indicates the following error:
K8S versions (client and Azure Container Service):
Deis version 2.18.0 Fluentd pod is definitely running on the master node on ACS as denoted by the event logs, in this case being created by: k8s-master-47933ef9-0 |
From @monaka on December 25, 2017 7:54 I also got same issue on my K8s/CoreOS. In my case, it was fixed by adding the option The unschedulable field of a node is not respected by the DaemonSet controller. |
I have the master tainted with the same flag but still facing the same issue is there any workaround for this? |
@Jaipreet95 : Have you tried adding a toleration on the fluentd daemonset so that it does not schedule on the master nodes? Something like the last spec field below:
|
|
it is a bug in fluentd? assuming that the k8s API is at 10.0.0.1 (the first address in the podCIDR range) is not a reasonable assumption – should it be binding by local DNS name for master node?
|
@PaulCharlton , we are currently not doing any testing on ACS and no way for us to really verify. We would have to deploy a k8s cluster and deploy Hephy Workflow on top. This sounds like the issue that the fluentd pods are expecting the k8s API to be on 10.0.0.1 on the internal CIDR cluster range. If I were to guess from the information we have is that it is most likely failiing because of number 2:
|
From @sbulman on August 5, 2017 7:27
Hi All,
I'm following the instructions to set up Deis on Azure Container Service. One of the deis-logger-fluentd pods is crashing with the following log.
2017-08-05 07:21:26 +0000 [info]: reading config file path="/opt/fluentd/conf/fluentd.conf"
2017-08-05 07:22:27 +0000 [error]: config error file="/opt/fluentd/conf/fluentd.conf" error_class=Fluent::ConfigError error="Invalid Kubernetes API v1 endpoint https://10.0.0.1:443: Timed out connecting to server"
Any ideas?
Thanks.
Copied from original issue: deis/workflow#847
The text was updated successfully, but these errors were encountered: