Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fleet managed Elastic Agent installation with Helm Chart missing topics #1531

Open
eedugon opened this issue Dec 11, 2024 · 12 comments
Open
Assignees

Comments

@eedugon
Copy link
Contributor

eedugon commented Dec 11, 2024

Description

There are 2 topics that are missing in the document https://www.elastic.co/guide/en/fleet/current/example-kubernetes-fleet-managed-agent-helm.html:

  • Explain that --set agent.fleet.insecure=true will be needed in some cases.
  • Explain that kube-state-metrics should be installed separately (as it's not installed by the helm chart in this mode).

** Insecure flag explanation **

When following the document we suggest a command like:

helm install demo ./deploy/helm/elastic-agent \
--set agent.fleet.enabled=true \
--set agent.fleet.url=https://fleet-svc.default.svc \
--set agent.fleet.token=TTg1NHNaTUJoNkpaNzE4R3IzeGg6WXo2MUxSakJTNmVvZUE3d212V0JGUQ== \
--set agent.fleet.preset=perNode

The previous only works when the Fleet Server HTPS certificate is signed by a publicly trusted CA.

If the Fleet Server certificate is signed by a corporate / private CA or if the fleet server was created in quick start mode (which creates a self-signed certificate), the previous won't work as the Elastic Agents won't be able to talk to the Fleet Server for enrollment.

The solution would be to add --set agent.fleet.insecure=true to the previous command.

We should add a comment and disclaimer in the docs about it.

Also note that the recommended solution for the case of a Fleet Server using a private CA would be to provide the Elastic Agents the CA during the installation, and not the insecure flag. But that's still not supported by the helm chart (issue created to support this: elastic/elastic-agent#6285).

** kube-state-metrics explanation **

In the doc, when we ask the user to configure and add in Kibana the kubernetes integration with default values we should indicate that the integration expects KSM to be installed and available in a specific endpoint. We should add that to the doc with a link to KSM.

@pkoutsovasilis , let us know if you agree on this or if it would make sense to have a flag to automatically install KSM also on this "fleet managed -> perNode" installation.

I will be able to work on this and raise a PR soon if you like, @kilfoyle

Resources

N/A

Collaboration

Please choose a preferred collaboration model.

Point of contact.

Main contact: @eedugon

Stakeholders: @kilfoyle / @pkoutsovasilis

@kilfoyle
Copy link
Contributor

I will be able to work on this and raise a PR soon if you like, @kilfoyle

Thanks Edu! Since you know what changes are needed (much more than me) I would certainly like to take you up on that offer!

@pkoutsovasilis
Copy link

Hey Edu 👋 Just to echo again one of your topics; the Helm chart doesn't deploy kube-state-metrics only for the case where a user deploys a fleet-managed agent. For standalone agents that's not the case and autosharded KSM gets deployed.

The first reason why we can't do something similar for fleet-managed agents as we do for standalone ones - deploy both a daemonset and KSM-sharded agents -- is that although all agents will be enrolled successfully through Fleet there isn't a way, at least as of now, to say that parts of the kubernetes integration apply to the daemonset and others to the KSM-sharded agent.

Now, again about Fleet-managed agents, the configuration of the kubernetes integration (which might or might not require KSM) happens in Kibana. Thus I find it way easier and consistent to suggest to the user there in UI to run this single cli command helm install --repo https://prometheus-community.github.io/helm-charts ksm kube-state-metrics -n default (if they have selected kube state metrics) than doing that from the elastic-agent Helm chart which in this case knows nothing about the kubernetes integration user preferences.

Of course this is only my opinion and definitely we could investigate supporting this for fleet-managed agents. Let's try to get more opinions on the matter @cmacknz @swiatekm @ycombinator @nimarezainia @strawgate

@eedugon
Copy link
Contributor Author

eedugon commented Dec 11, 2024

@pkoutsovasilis , thanks! Your comment makes all sense and I agree.

So we deploy KSM sharded in the complex use case together with the standalone agent. That's really makes a difference because it's indeed complex for users to achieve.

And in this specific case we will ask the user to ensure they deploy KSM independently (which has been always the case with Metricbeat and Elastic Agent until now). We will add something in the docs then.

@eedugon eedugon changed the title [REQUEST]: Fleet Managed Elastic Agent installation with Helm Chart missing topics Fleet Managed Elastic Agent installation with Helm Chart missing topics Dec 13, 2024
@eedugon eedugon self-assigned this Dec 13, 2024
@eedugon
Copy link
Contributor Author

eedugon commented Dec 13, 2024

We will wait to the outcome of the following issues before deciding the exact changes to apply to the doc:

Leader election issue - elastic/elastic-agent#6284
CA certificate issue - elastic/elastic-agent#6285
HostNetwork issue - elastic/elastic-agent#6324
Log files mount issue - elastic/elastic-agent#6204

If we decide to postpone the fixes in any of them we will include some comments to align expectations in the doc.

@swiatekm
Copy link
Contributor

Now, again about Fleet-managed agents, the configuration of the kubernetes integration (which might or might not require KSM) happens in Kibana. Thus I find it way easier and consistent to suggest to the user there in UI to run this single cli command helm install --repo https://prometheus-community.github.io/helm-charts ksm kube-state-metrics -n default (if they have selected kube state metrics) than doing that from the elastic-agent Helm chart which in this case knows nothing about the kubernetes integration user preferences.

My feeling is that if the user has already installed the agent Helm Chart, then it's going to be easier for them to change their values file to enable KSM and upgrade, than install a whole new Chart. In the longer-term, we could provide defaults in Kibana that would work with the agent Chart KSM, and user-installed KSM would become the advanced use case requiring additional configuration.

I think that, as a rule, if a fleet-managed agent installed via the Helm Chart requires some additional dependency, and the Helm Chart could ensure that dependency is present, then that's what we should do.

@pkoutsovasilis
Copy link

pkoutsovasilis commented Dec 13, 2024

My feeling is that if the user has already installed the agent Helm Chart, then it's going to be easier for them to change their values file to enable KSM and upgrade, than install a whole new Chart. In the longer-term, we could provide defaults in Kibana that would work with the agent Chart KSM, and user-installed KSM would become the advanced use case requiring additional configuration.

not really required though right?! Since KSM will become a subchart in Elastic-agent Helm chart any configuration a user can achieve with the "standalone KSM Helm chart" is - by Helm's design - 100% applicable through the "Elastic-agent Helm chart". Thus I do not grasp 100% the point here.

I think that, as a rule, if a fleet-managed agent installed via the Helm Chart requires some additional dependency, and the Helm Chart could ensure that dependency is present, then that's what we should do.

With that framing, if we apply generally the same logic in other places; Redis integration has an additional dependency to Redis, should we include a Redis subchart in the Elastic-agent chart?! In other words, these are not fleet-managed agent dependencies, an agent can run just fine without KSM. These are integration dependencies and when a user chooses to install an integration the respective services should exist as is the case with both Redis and kube-state-metrics. This just for the shake of categorising dependencies properly, but I do believe that providing KSM all the time from our Helm-chart might/potentially bring some values for some users?! The tricky bit is how many users would want to rely on Elastic-agent Helm chart to install KSM for them

@strawgate
Copy link
Contributor

strawgate commented Dec 13, 2024

Redis integration has an additional dependency to Redis, should we include a Redis subchart in the Elastic-agent chart?!

I understand this point in principle, but if our goal is to offer easy out-of-the-box Kubernetes monitoring for our customers looking for an Observability solution, if we can make it easier to get started we should consider it. If deploying Redis was important to the getting started experience we were hoping to offer, then we'd probably consider it! :)

is that although all agents will be enrolled successfully through Fleet there isn't a way, at least as of now, to say that parts of the kubernetes integration apply to the daemonset and others to the KSM-sharded agent.

For the Fleet case, it seems we could offer an option for users intending to monitor Kubernetes where we deploy KSM without sharding and rely on the default leader election functionality? Is this right? Something like --set include.k8s.metrics.prerequisites or --set agent.fleet.include_ksm

We could also offer a more advanced option where we could ask for a shard count and a second enrollment token so that we can have the KSM agents in their own agent policy? Something like --set agent.fleet.ksm.token and --set agent.fleet.ksm.shard_count?

This would match our current recommendations for fleet-management with KSM sharding:
Image

@swiatekm
Copy link
Contributor

swiatekm commented Dec 16, 2024

My feeling is that if the user has already installed the agent Helm Chart, then it's going to be easier for them to change their values file to enable KSM and upgrade, than install a whole new Chart. In the longer-term, we could provide defaults in Kibana that would work with the agent Chart KSM, and user-installed KSM would become the advanced use case requiring additional configuration.

not really required though right?! Since KSM will become a subchart in Elastic-agent Helm chart any configuration a user can achieve with the "standalone KSM Helm chart" is - by Helm's design - 100% applicable through the "Elastic-agent Helm chart". Thus I do not grasp 100% the point here.

Yeah, it's functionally the same thing, minus the fact that we can provide a more opinionated default config for KSM in the agent Helm Chart. But in my experience, there's often less friction involved in an organization enabling a feature of a Helm Chart their vendor recommended they use vs adopting a separate, community-maintained Helm Chart. Even if they're effectively identical.

I think that, as a rule, if a fleet-managed agent installed via the Helm Chart requires some additional dependency, and the Helm Chart could ensure that dependency is present, then that's what we should do.

With that framing, if we apply generally the same logic in other places; Redis integration has an additional dependency to Redis, should we include a Redis subchart in the Elastic-agent chart?! In other words, these are not fleet-managed agent dependencies, an agent can run just fine without KSM. These are integration dependencies and when a user chooses to install an integration the respective services should exist as is the case with both Redis and kube-state-metrics. This just for the shake of categorising dependencies properly, but I do believe that providing KSM all the time from our Helm-chart might/potentially bring some values for some users?! The tricky bit is how many users would want to rely on Elastic-agent Helm chart to install KSM for them

I don't think these cases are really all that similar.

The Redis integration does not have a dependency on Redis. The need a user has when installing the Redis integration is that they want to monitor a Redis instance they already have running in Kubernetes.

On the other hand, the need a user can have with the Kubernetes integration, is that they want to get metrics about the state of K8s resources. The integration even installs dashboards which require these metrics to populate. The user doesn't want KSM, we tell them to install KSM to get the functionality they want. As such, I think we should make it as frictionless as possible.

Does that make sense?

@pkoutsovasilis
Copy link

ok I think that the crowd has spoken

our goal is to offer easy out-of-the-box Kubernetes monitoring for our customers looking for an Observability solution

often less friction involved in an organization enabling a feature of a Helm Chart their vendor recommended they use vs adopting a separate, community-maintained Helm Chart. Even if they're effectively identical

let's go KSM sub-chart then. brb with a PR 🙂

@eedugon eedugon changed the title Fleet Managed Elastic Agent installation with Helm Chart missing topics Elastic Agent installation with Helm Chart missing topics Dec 17, 2024
@eedugon eedugon changed the title Elastic Agent installation with Helm Chart missing topics Fleet managed Elastic Agent installation with Helm Chart missing topics Dec 17, 2024
@eedugon
Copy link
Contributor Author

eedugon commented Dec 17, 2024

Not sure if we should be moving this KSM related conversation to a different issue in elastic-agent repo, but I'd like to add something...

In case we decide to deploy a standard KSM together with the agent resources I'd like to bring something to your attention:

  • The k8s integration in the UI is configured by default to point to http://kube-state-metrics:8080.
  • We have two possibilities:
    • Deploy KSM in the same namespace as the agent and with a service name kube-state-metrics (then the default integration config will work out of the box, but there are more chances to fail installation due to conflicts with existing resources).
    • Deploy KSM with a custom service name, related to the release name (which is usually the good practice with helm charts). In such case we will have to document how to configure the policy when installing the agent to match the destination KSM.

About installation possibilities and related with the comments by @strawgate :

For the Fleet case, it seems we could offer an option for users intending to monitor Kubernetes where we deploy KSM without sharding and rely on the default leader election functionality? Is this right? Something like --set include.k8s.metrics.prerequisites or --set agent.fleet.include_ksm

This would be the easiest to achieve and document, as only one Fleet policy / enrollment token will be needed, and technically we are almost there alredy. I would recommend to start achieving this with a simple dependency on KSM chart and deploying a basic ksm when requested by the user.

As I also told @pkoutsovasilis , in my opinion the automatic installation of KSM should probably be kept false by default, and just ensure we add in our documentation and examples good explanations about how to use this and what to do with either option (enabled or disabled).

The reason about preferring to keep the agent.fleet.include_ksm=false by default (but documenting it to set it to true for this use case) is just for end users experience. If for example including_ksm brings any problem it's easier to perceive it and remove the parameter --set agent.fleet.include_ksm=true than if the paremeter is not there and there's a problem related with the KSM installation (in that case you need to investigate how to disable it explicitly).

We could also offer a more advanced option where we could ask for a shard count and a second enrollment token so that we can have the KSM agents in their own agent policy? Something like --set agent.fleet.ksm.token and --set agent.fleet.ksm.shard_count?

That would be awesome, but between KSM sharding and the first quick start approach (all in a single DaemonSet with leader election) I think we should have the option of:

  1. One DS for node level monitoring and logs
  2. One Deployment for cluster level monitoring (with or without KSM being deployed, per user desire).

Maybe the previous could be accomplished with 2 installed releases, and without any special change needed at helm chart level.

In summary, and including the latest proposal by @strawgate , I think we have this on the table:

  • All in a DaemonSet relying on leader election (with or without an extra KSM included) and only 1 Fleet policy / token (to be honest I'm not a fan of leader election on this use case but I agree it's the simplest installation).

  • 1 DaemonSet + 1 Deployment (with or without an extra KSM), with no leader election and relying on 2 different policies: I don't think we need to solve this directly with a unique helm chart release installation, but up to you to decide. I remark this scenario because it's the one we have been recommending users in the past (Metricbeat) when trying to move away of the single daemonset due to known issues.

  • Request a shard count and 2 different tokens in order to perform the most sofisticated installation (Fleet managed): 1 DS + the KSM sharded with embedded Elastic Agents. --> The only caveat of this approach is that it really requires KSM, it's not optional. But in my opinion is probably the nicest and most elegant (plus scalable and production ready).

Let me know your thoughts, if you believe it's not worthy to document or mention that second approach I'm totally ok with that too.

(Side / silly question) - Would it make sense to suggest to Kubernetes Integration developers to split the integration into 2 different integration packages in the UI? One for Kubernetes node level stuff and another one for Kubernetes cluster Level stuff. In long term that would make much easier to understand situations where users might need to configure multiple policies.

@eedugon
Copy link
Contributor Author

eedugon commented Dec 17, 2024

Arguing against myself here... I really don't see the point of offering my suggested method if the sharded KSM approach proposed by @strawgate is really better and it's better aligned with what we do in the standalone mode :)

@strawgate
Copy link
Contributor

The reason about preferring to keep the agent.fleet.include_ksm=false by default (but documenting it to set it to true for this use case) is just for end users experience. If for example including_ksm brings any problem it's easier to perceive it and remove the parameter --set agent.fleet.include_ksm=true than if the paremeter is not there and there's a problem related with the KSM installation (in that case you need to investigate how to disable it explicitly).

I would actually lean towards including it by default. We have been including it by default with kustomize have we not? Has that caused any problems?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants