-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alerting][Security] Rules fail due to a security exception: missing authentication credentials for REST request #118520
Comments
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI) |
Pinging @elastic/kibana-security (Team:Security) |
Pinging @elastic/security-solution (Team: SecuritySolution) |
The security solution executor utilizes the |
Ah, that makes sense, thanks @dhurley14 ! |
I'd been thinking this could be caused by the api key / task doc race condition issues: #106292 and #110096 . Another source of this could be (see linked SDH issue above) a bad upgrade, where the original encryption key isn't available during the migration. It appears in such cases we migrate the rule with the API key set to null. Clearly we want to "disable" the rule, but we can't really, since the task document still exists and we need to delete it, but can't during the migration. We also presumably have an API key that should be invalidated, but we can't since we couldn't recover it. This gets complicated to reason about, because for "no security" deployments, the API key WILL BE In a Slack conversation, @ymao1 noted:
|
I was able to repro changing the encryption key on a migration will cause this error. During migration, this was logged for every rule:
It didn't lie! It did cause problems seconds later:
Seems like we need to do better than logging during migration. I think we need to mark these somehow as not-runnable, and then disable them sometime after startup. I wonder if we could even do it DURING startup? Or does it need to be a cleanup task so not every Kibana will try to "fix" these? Another possibility is fixing these as-needed - if we recognize we'll get this error because there SHOULD be an API key, but isn't, disable the rule instead of running it. But then you won't know till you try to run it. |
One theory that may cause this.. if a user sets up alerting rules with security disabled ( Though, I don't think this scenario is possible on Cloud.. (security always enabled?). |
Ya, security is always on for cloud, but this could obviously happen on-prem. Thought about that for a second when I was doing my repro, but shoved it to the back of my mind. Obviously we need to take this into account though. We want to disable these, because they NEED an API key at that point, but I guess the question is - when do we make that call and actually disable them. And how do we notify the user that we disabled them. |
This overlaps well with upcoming efforts to ensure alerting rules run continuously. This becomes a scenario where rules stop running indefinitely until a user intervenes. And we'll need to find a way to notify the user in these cases. So lots TBD :) |
@deepikakeshav-qasource reproduced this issue in her test Cloud environment in #120872 without doing any Kibana upgrades - this was a fresh 8.0.0 deployment. Could this mean that the race condition mentioned by @pmuellr might be the root cause in this case?
I wasn't able to reproduce it though, even in the same Cloud environment where she managed to do that. |
Sadly this had the side effect of killing our kibana nodes connection to elasticsearch and requests to kibana would display tls handshake errors. If we restarted kibana it would work for 3 or 4 minutes then tls errors. after 2days of chaos, we disabled all alerts... and problem resolved temporarily and all errors stopped in our logs. We have a few alerts that are throwing the has privilages error, so more investigation needed. We are running v7.16.3 onpremise. Began life as v6.4.2 and upgraded versions over past 3years. Support case was opened today as well. Let me know if you want the number. |
It sounds like this isn't a Platform Security issue, so I'll remove our team's label.
@jugsofbeer Thanks for chiming in, it's helpful to know on this issue if users are affected, and we will get the right eyes on the support case! |
We had the same issue with our On-Prem deployment of 8.4.x. Some rules would produce this error for weeks. We found that if you edit the rule and re-save it, it stops failing. Hope this helps. |
Kibana version: 7.15.0
Looking at Kibana Server logs on cloud I've noticed a high rate of security errors causing many of our Rule Types to fail.
Specifically:
...appears a lot and accounts for around 200 rule execution failures per minute.
Interestingly, this seems to happen predominantly to the following Rule Types:
So this is likely not something that's happening at the platform level, but rather specific to the implementation of these three rule types.
The text was updated successfully, but these errors were encountered: