Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monasca agent running error #433

Open
zhangjianweibj opened this issue Nov 5, 2018 · 13 comments
Open

monasca agent running error #433

zhangjianweibj opened this issue Nov 5, 2018 · 13 comments

Comments

@zhangjianweibj
Copy link

image

kubernetes version:Kubernetes v1.9.5

@zhangjianweibj
Copy link
Author

hello,i run command in kubernetes environment(1.9.5)
$ helm repo add monasca http://monasca.io/monasca-helm
$ helm install monasca/monasca --name monasca --namespace monitoring
find agent and aggregator pods crash,is anything wrong? thanks
image

@timothyb89
Copy link
Member

It looks like RBAC isn't turned on, have you set rbac.create to true in your helm values (e.g. by adding --set rbac.create=true to the install/upgrade command)?

@zhangjianweibj
Copy link
Author

@timothyb89 very thanks,i reinstall monasca with rbac value.monasca works well.but aggregator and cleanup pods still crash.
image

@zhangjianweibj
Copy link
Author

aggerator pod no error log.
image

@timothyb89
Copy link
Member

The cleanup job will have trouble as it's probably running with the old configuration that had no RBAC enabled - you can just delete the job (kubectl delete job monasca-cleanup-job-...) and any leftover pods manually. I'm not sure about the aggregator, did you check the previous log? e.g. kubectl logs -p monasca-aggregator-...

The most likely cause for the aggregator crashing is that it received no metrics. That's normal for the first hour or so after a fresh deployment, but if it keeps crashing it might be a sign of Kafka issues or the agent pods failing to collect any metrics. Logs for both of those would be helpful if things continue to crash.

@zhangjianweibj
Copy link
Author

@timothyb89 very thanks.i will reinstall monasca.i think aggregator crashing is the pod resource limit and request too low.you see the aggregator pod restart 5 times.,but no error logs appear.

@zhangjianweibj
Copy link
Author

why monasca-thresh pod restart 40 times in 4h?
thresh pod logs:
image

image

@zhangjianweibj
Copy link
Author

image

@timothyb89
Copy link
Member

Hmm, I don't see any errors in those logs - are those the previous container logs (kubectl logs -p ...)?

It looks like thresh is running alright in that 2nd screenshot ("no left over resources ..." is unrelated to CPU/memory resources), so I think it's either being OOM killed and logging nothing or is actually running alright in that log and we need to look at the logs generated before the last crash.

@zhangjianweibj
Copy link
Author

kubectl logs -p monasca-thresh-74758d6db-fk8zg -n monitoring ,result is:

image

image

image

resource limit:cpu 2 memory:2G,is resource too low to run thresh pod or jvm heap size not enough?

image

@timothyb89
Copy link
Member

That definitely should be enough memory, at least if you only have a few agents running. You might need more resources if you have more nodes (like 10+) but it's probably fine as-is.

Based on those errors, It looks like thresh is having trouble keeping its connection to zookeeper. Do the zookeeper logs show anything interesting? Possibly some network trouble between nodes/pods?

@zhangjianweibj
Copy link
Author

ok,thanks.zookeeper pod contain many error logs.but it seems those errors can not cause thresh pod crashed.
kubectl logs monasca-zookeeper-5bc74dc5f-dk6zz -n monitoring |grep Error
image

@zhangjianweibj
Copy link
Author

@timothyb89 i thank the reason is resource not enough for storm.i edit deployment and daemonset ,set resource.limit.cpu=4 resource.limit.memory=8G THRESH_STACK_SIZE=4096K.
now it is works well,not restart any more.

image

image

but i find some metircs contin negative number?is this a bug?

image

and slave6 has two agents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants