-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanos failing to query data due to memcached #797
Comments
Extracted from: https://github.com/operate-first/SRE/issues/280 |
This is a separate issue from 280. Trying to upscale memcached pod. |
|
The currently deployed Observatorium version doesn't provide a way to scale up the default memory limit on memcached, we had to upgrade Observatorium to the latest image. Upgrade wasn't smooth - we had to delete the PVCs and add a anyuid to the containers. |
Memcached still OOMKilled when a big query is run. |
So we're seeing issues on 3 fronts:
|
Prometheus errors:
|
Also it seems like the images being specified in the observatorium CR are not being respected. |
Similar issues being handled elsewhere:
Take away: lower memcached memory limit so the store is not hammered by big queries and avoid big queries. We need to reach out to upstream to help us tune the setup. Maybe we're missing some rollup/downsamping settings somewhere. |
I have created an issue upstream for this: observatorium/operator#67 |
For the time being, I have scaled down the Observatorium Operator and manually Updated deployments to use correct versions of the image ( |
can this be closed @4n4nd ? |
the default image issue is still there in the operator, so let's keep this issue open to track it. |
@4n4nd is this issue still relevant? can it be closed? iirc you mentioned you'll be making some changes to how we deploy observatorium for smaug. |
yeah we can close this |
Grafana reporting error:
Thanos logs:
Thanos shard logs:
Memcached logs:
The text was updated successfully, but these errors were encountered: