[Metricbeat] Improve the `elasticsearch` module when used for Stack Monitoring #39058

consulthys · 2024-04-18T15:58:38Z

While investigating the root cause of indexing failures (also reported here in the past), we discovered that when using Metricbeat to feed Stack Monitoring, the elasticsearch module of Metricbeat ships elasticsearch.shard documents with concrete IDs that are made of the current cluster state (i.e., state_uuid) and some other constant data. Since the cluster state doesn't change at the same pace as Metricbeat collection rounds (10s by default), those version conflicts happen all the time.

Those version conflicts are probably a side-effect of switching to data streams in 8.0.0 (i.e. put if absent semantics with concrete ID) and weren't apparent earlier when the data was stored in simple indexes. Since each elasticsearch.shard document is about a shard placement in the cluster, the logic makes sense, i.e. there's no point re-indexing a document whose content hasn't changed since the last collection round.

However, we could/should go one step further and detect if the cluster state hasn't changed between two collection rounds. I'm naively thinking about "simply" comparing the old and new state_uuid, but it might be more involved than that. Anyway, if there's no change, there's no point in even rebuilding those documents and sending them again, since we know they'll bounce anyway, generate a version conflict and increase the indexing failure counter for no reason. In addition to that, that wastes network bandwidth and CPU/RAM resource on ES side. For big clusters with many thousands of shards, that can make a big difference.

Related issue: #36547 (comment)

The text was updated successfully, but these errors were encountered:

pickypg · 2024-07-26T16:48:39Z

The UI may need to be updated to understand the lack of a changing timestamp, but comparing the state_uuid should be all that's needed for that suggestion.

consulthys · 2024-12-13T08:38:12Z

The UI may need to be updated to understand the lack of a changing timestamp, but comparing the state_uuid should be all that's needed for that suggestion.

@pickypg Queries on shard data don't have any time ranges, the state_uuid is used as an implied time range. You can find more about this in elastic/kibana#189728

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 18, 2024

consulthys mentioned this issue Apr 20, 2024

Improve information about _stats index_failures elastic/elasticsearch#80802

Open

cmacknz added Team:Monitoring Stack Monitoring team Team:Infra Monitoring UI Infrastructure Monitoring UI team labels Apr 23, 2024

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 23, 2024

consulthys mentioned this issue Jul 26, 2024

elasticsearch-xpack shard metricset not sending metrics #26314

Closed

consulthys linked a pull request Sep 10, 2024 that will close this issue

[Metricbeat] Compare previous/current cluster state in the elasticsearch.shard metricset #40731

Open

3 tasks

consulthys mentioned this issue Oct 3, 2024

Investigate timeout issue and use of time range in stack monitoring queries elastic/kibana#189728

Open

consulthys mentioned this issue Dec 13, 2024

Provide more insights into indexing failures elastic/elasticsearch#107601

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metricbeat] Improve the `elasticsearch` module when used for Stack Monitoring #39058

[Metricbeat] Improve the `elasticsearch` module when used for Stack Monitoring #39058

consulthys commented Apr 18, 2024 •

edited

Loading

pickypg commented Jul 26, 2024

consulthys commented Dec 13, 2024 •

edited

Loading

[Metricbeat] Improve the elasticsearch module when used for Stack Monitoring #39058

[Metricbeat] Improve the elasticsearch module when used for Stack Monitoring #39058

Comments

consulthys commented Apr 18, 2024 • edited Loading

pickypg commented Jul 26, 2024

consulthys commented Dec 13, 2024 • edited Loading

[Metricbeat] Improve the `elasticsearch` module when used for Stack Monitoring #39058

[Metricbeat] Improve the `elasticsearch` module when used for Stack Monitoring #39058

consulthys commented Apr 18, 2024 •

edited

Loading

consulthys commented Dec 13, 2024 •

edited

Loading