Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: export vttablet_tablet_type to prometheus #14303

Closed

Conversation

maxenglander
Copy link
Collaborator

Description

This PR is very similar in spirit to #12772.

We already have a TabletType string that is exported to Golang expvars. E.g. after ./examples/local/101_initial_cluster.sh:

bash-5.2$ curl -s localhost:15100/debug/vars | jq -r '.TabletType'
primary

However, this is not exported to Prometheus. It's a pretty valuable thing to know for monitoring purposes, e.g. to detect and alert on conditions like "primary tablet is not serving".

This PR exposes the vttablet_tablet_type via /metrics, without changing the shape of the data currently exported via /debug/vars.

E.g., after ./examples/local/101_initial_cluster.sh.

bash-5.2$ curl -s localhost:15100/metrics | grep tablet_type{
vttablet_tablet_type{tablet_type="primary"} 1
bash-5.2$ curl -s localhost:15101/metrics | grep tablet_type{
vttablet_tablet_type{tablet_type="replica"} 1
bash-5.2$ curl -s localhost:15102/metrics | grep tablet_type{
vttablet_tablet_type{tablet_type="rdonly"} 1

Notes

tablet_type label

A note about the label name, tablet_type. There is another metric vttablet_tablet_type_count that has the label type, but there is another set of more recent metrics by @rafer #13521 that have the label tablet_type. I had to pick one of these, so I went with the more recent one, which I happen to prefer, personally.

The "right way"

The "right way" to export Prometheus metrics is to define the time series at process start up, and then only change their value.

That means that this metric in its current form is using Prometheus the "wrong way" - a ChangeTabletType or reparent will change the value of the tablet_type label, which from Prometheus' point of view means that one time series has disappeared, and a new time series has suddenly appeared.

The current way should be somewhat usable, but will prevent Prometheus users from using the metric to its fullest.

Perhaps a better approach would be to define a new metric like vttablet_tablet_types, which is a map of tablet types to either 0 or 1. These time series would be registered at VTTablet startup, and then change during ChangeTabletType and reparents. I opted not to do this because it seemed like a bigger change, and would also emit a lot more metrics. But happy to change it if people prefer.

Related Issue(s)

Fixes #14300

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Deployment Notes

@vitess-bot
Copy link
Contributor

vitess-bot bot commented Oct 18, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Oct 18, 2023
@github-actions github-actions bot added this to the v19.0.0 milestone Oct 18, 2023
@maxenglander maxenglander marked this pull request as ready for review October 18, 2023 13:53
@maxenglander
Copy link
Collaborator Author

I don't see a page in docs where vttablet metrics are documented.

Copy link
Contributor

This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:

  • Push additional commits to the associated branch.
  • Remove the stale label.
  • Add a comment indicating why it is not stale.

If no action is taken within 7 days, this PR will be closed.

@github-actions github-actions bot added the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label Nov 18, 2023
Copy link
Contributor

This PR was closed because it has been stale for 7 days with no activity.

@github-actions github-actions bot closed this Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: emit tablet type metric from vttablet to Prometheus
1 participant