Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Monitoring] remote_cluster_client role shouldn't be required to use Monitoring #93432

Closed
chrisronline opened this issue Mar 3, 2021 · 21 comments
Labels
bug Fixes for quality problems that affect the customer experience Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services

Comments

@chrisronline
Copy link
Contributor

Currently, we rely on the remote_cluster_client role attached to the configured user (monitoring.ui.elasticsearch.username) to make CCS requests in the Stack Monitoring UI. If this role is not available, all out of the box alerts will start to fail as we do not gracefully handle this scenario and the user will see something like this in the Kibana server log:

An error occurred when running the alert.
[illegal_argument_exception] node [es-node] does not have the [remote_cluster_client] role

In this scenario, there are two fixes to resolve the issue:

  1. Add the remote_cluster_client role to the monitoring.ui.elasticsearch.username user
  2. Set monitoring.ui.ccs.enabled: false in kibana.yml

However, none of this should be necessary. We should be able to handle this scenario more gracefully and simply not perform any future CCS requests if the role is not available.

@chrisronline chrisronline added bug Fixes for quality problems that affect the customer experience Team:Monitoring Stack Monitoring team labels Mar 3, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/stack-monitoring (Team:Monitoring)

@bvierra
Copy link

bvierra commented Apr 7, 2021

I am seeing a similar issue, however I do not believe that it's related to a user role but rather to the node role. Using the eck-operator and giving each node a specific role we get the following in the logs for each node that does not have the remove_cluster_client role

{"type":"log","@timestamp":"2021-04-07T10:19:50+00:00","tags":["error","plugins","alerts","plugins","alerting"],"pid":6,"message":"Executing Alert \"4e63d010-9789-11eb-9637-0166accd96bb\" has resulted in Error: [illegal_argument_exception] node [es-logs-es-master-edc2-0] does not have the [remote_cluster_client] role"} {"type":"log","@timestamp":"2021-04-07T10:19:54+00:00","tags":["error","plugins","alerts","plugins","alerting"],"pid":6,"message":"Executing Alert \"4e63f720-9789-11eb-9637-0166accd96bb\" has resulted in Error: [illegal_argument_exception] node [es-logs-es-master-edc2-0] does not have the [remote_cluster_client] role"} {"type":"log","@timestamp":"2021-04-07T10:19:54+00:00","tags":["error","plugins","alerts","plugins","alerting"],"pid":6,"message":"Executing Alert \"4e641e30-9789-11eb-9637-0166accd96bb\" has resulted in Error: [illegal_argument_exception] node [es-logs-es-data-edc2-1] does not have the [remote_cluster_client] role"} {"type":"log","@timestamp":"2021-04-07T10:19:54+00:00","tags":["error","plugins","alerts","plugins","alerting"],"pid":6,"message":"Executing Alert \"4e5f1520-9789-11eb-9637-0166accd96bb\" has resulted in Error: [illegal_argument_exception] node [es-logs-es-data-edc1-0] does not have the [remote_cluster_client] role"}

We are not specifying the monitoring.ui.elasticsearch.username config option which per the docs means it falls back to the elasticsearch.username which with the eck-operator I believe is a superuser.

This is also a test cluster so all of the monitoring is reporting back to itself so it should not need the remote_cluster_client role.

@travisby
Copy link

travisby commented Jul 9, 2021

We are currently on e/k 7.9.2. We are attempting an upgrade to 7.13.3, and even tried 7.12.1 of ES and started experiencing this issue.

It looks like somewhere between elasticsearch 7.9.2 and 7.12.1 something introduced this!

@neptunian
Copy link
Contributor

neptunian commented Aug 19, 2021

@ravikesarwani I wonder if CCS should be on by default. Otherwise, if the user doesn't have remote_cluster_client role added, we would need to inform them in some way that CCS is on but not working due to the missing role. This effects other SM queries, not just alerts. Perhaps we can have a toast within the SM app and a warning (if thats possible) in the alerts whilst gracefully handling the scenario.

@katefarrar Perhaps we need some UX help with this.

@ravikesarwani
Copy link
Contributor

What is the default value for monitoring.ui.ccs.enabled when its not explicitly set by the users? I am assuming it it is false (or should be false) and hence by default the alerts (queries) shouldn't rely on it.

If the value is set by the user to "true" then we are saying that they should also add the remote_cluster_client role to the monitoring.ui.elasticsearch.username user. This is something that we should clarify in our documentation.

@neptunian
Copy link
Contributor

The default for monitoring.ui.ccs.enabled is true. From what I've read, remote_cluster_client is added to node roles by default but if you specify a role without adding it back, it will be gone. Users do not know they need to add this role, though, as they never explicitly turned on CCS.

@ravikesarwani
Copy link
Contributor

Do we know why "monitoring.ui.ccs.enabled" is true by default especially if it requires specific roles?
I was trying to look for monitoring.ui.ccs.enabled setting in our documentations and see why it was set to true by default but haven't found anything.

My take is that we should start simple by default and make sure things work correctly without any extra work by the users (so monitoring.ui.ccs.enabled should be false).
When users need specific feature then they set the config and add more roles.

Asking them to add role to the user OR asking them to change the config value to false "by all the users" I don't think is the best answer.

Anyone know why monitoring.ui.ccs.enabled is set as true by default? Do we see any reason to make this "false" as a default value to fix this issue.
cc: @elastic/kibana-app

@simianhacker
Copy link
Member

simianhacker commented Aug 19, 2021

It seems like the intention was "Everything should just work" as long as you didn't do anything custom. But the moment you start customizing roles for nodes, things will start going haywire. I want to point out that if we make the change monitoring.ui.ccs.enabled: false as the default, we are going to have to decide how that works on Cloud. Right now it "just works" and the user don't have to enable CCS in Kibana for Cloud.

@ravikesarwani
Copy link
Contributor

Does this relates to #109100 (comment) and is a regression in 7.15?

I am not sure about the comment around Cloud. Is the behavior different there than self-managed because of how we packages things differently.

@neptunian
Copy link
Contributor

It's possible the 7.15 ES issue could affect alerts but this particular issue is not related to that and exists in previous versions.

@simianhacker
Copy link
Member

simianhacker commented Aug 19, 2021

@ravikesarwani Yes, it sort of relates, but it looks like the issue in #109100 (comment) is a regression in Elasticsearch 7.15.0 which hopefully they fix.

Anyone know why monitoring.ui.ccs.enabled is set as true by default? Do we see any reason to make this "false" as a default value to fix this issue.

monitoring.ui.ccs.enabled is true by default.

In order to remove the remote_cluster_client role requirement for Stack Monitoring, we will need to either turn off CCS support (by default) OR add a check to ensure the node we are sending requests to has the remote_cluster_client role. If it doesn't have the role, we should throw an error or message explaining what the user needs to do (add the role OR turn off CCS).

@ravikesarwani
Copy link
Contributor

From product perspective my thinking is clear.
The default should work with no extra dialogs or questions or work from the users.
If someone wants to configure CCS to see monitoring data stored on the remote cluster from the production cluster kibana then we should have steps on how to configure this.
In that case, requiring an extra role and configuring the user etc. is okay as long as we clearly document that.

In ESS I don’t see monitoring.ui.ccs.enabled kibana setting available to users (or at least documented). Also, I spun up a cluster in ESS (7.14), enabled self monitoring, said yes to create rules on SM app and things are working okay. I don’t get the said errors.
Has anyone reproduced the issue in Cloud (like we get in self-managed environments)?

My vote would be to keep things simple "by default" and turn off CCS support (by default) and document the steps on how to turn it on (where we will talk about remote_cluster_client role is needed).

@neptunian
Copy link
Contributor

neptunian commented Aug 19, 2021

You would get the error if you removed the remote_cluster_client role from the node role. I believe this is how the user ends up in this state. #93432 (comment)

@simianhacker
Copy link
Member

@ravikesarwani I agree with this approach. I think we should schedule this for 7.16 so we have some time to discuss how this will affect other parts of the stack.

@matschaffer
Copy link
Contributor

Is it possible to switch SM behavior based on node roles? The error isn’t good but having CCS “just work” once the remote is connected is pretty nice behavior. Would be a shame to lose that, doubly so if getting it back required modifying kibana.yml

@jmp601
Copy link

jmp601 commented Mar 1, 2022

How is this issue resolved with a self managed instance? I'm running ECK 2.0 on version 7.17.1 of Elastic/Kibana on an Azure AKS cluster.

@matschaffer
Copy link
Contributor

matschaffer commented Mar 1, 2022

@jmp601 you can set monitoring.ui.ccs.enabled: false which will stop stack monitoring from attempting to use CCS remotes.

@smith smith added Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services and removed Team:Monitoring Stack Monitoring team labels Apr 4, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

@smith
Copy link
Contributor

smith commented Apr 4, 2022

Closing this as a duplicate of #120384.

@jmp601
Copy link

jmp601 commented May 7, 2023

toring.ui.c

monitoring.ui.ccs.enabled: false * and not monitoring.ui.css.enabled: false

@miltonhultgren
Copy link
Contributor

@jmp601 Thanks for noting that! I've updated the above comments to correct the issue to avoid any further misunderstandings 🙏🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services
Projects
None yet
Development

No branches or pull requests