[Enhancement] Add possible monitoring and metrics to decide the lifecycle of the Agent Nodes #498

peterzhuamazon · 2024-10-10T16:22:32Z

As a followup to this issue: #494.

Remove an agent node after X runs is a quick fix to remove agent with issues, but it is not fine-grain or robust enough when we have a lot of agents.

We can go ahead and use some custom solutions, such as jenkins plugins, groovy scripts, or even the metrics cluster to find out the current status of the agent nodes. Define a baseline for the health of the agents, and remove them once there is an outage on such node.

Example plugin: https://plugins.jenkins.io/monitoring/
cc: @getsaurabh02 @prudhvigodithi @gaiksaya

Thanks.

github-project-automation bot added this to Engineering Effectiveness Board Oct 10, 2024

github-project-automation bot moved this to 🆕 New in Engineering Effectiveness Board Oct 10, 2024

github-actions bot added the untriaged Issues that have not yet been triaged label Oct 10, 2024

peterzhuamazon mentioned this issue Oct 10, 2024

[Enhancement] Purge every agent after X amount of runs #494

Closed

peterzhuamazon added this to OpenSearch Engineering Effectiveness Oct 10, 2024

github-project-automation bot moved this to Backlog in OpenSearch Engineering Effectiveness Oct 10, 2024

gaiksaya added enhancement New feature or request and removed untriaged Issues that have not yet been triaged labels Oct 10, 2024

peterzhuamazon moved this to 📦 Backlog in Engineering Effectiveness Board Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Add possible monitoring and metrics to decide the lifecycle of the Agent Nodes #498

[Enhancement] Add possible monitoring and metrics to decide the lifecycle of the Agent Nodes #498

peterzhuamazon commented Oct 10, 2024 •

edited

Loading

[Enhancement] Add possible monitoring and metrics to decide the lifecycle of the Agent Nodes #498

[Enhancement] Add possible monitoring and metrics to decide the lifecycle of the Agent Nodes #498

Comments

peterzhuamazon commented Oct 10, 2024 • edited Loading

peterzhuamazon commented Oct 10, 2024 •

edited

Loading