Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade prometheus #794

Merged
merged 8 commits into from
May 21, 2024
Merged

Upgrade prometheus #794

merged 8 commits into from
May 21, 2024

Conversation

thampiotr
Copy link
Contributor

@thampiotr thampiotr commented May 8, 2024

PR Description

This upgrades the Prometheus dependency. Some required, notable changes:

  • scrape_protocols was added to prometheus.scrape
  • enable_protobuf_negotiation will remain working the same as previous, but it is now deprecated and scrape_protocols should be used instead.
  • Loki had to be updated since the version we used depended on incompatible Prometheus.
  • Had to fork github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver as it depended on an incompatible Prometheus version.
  • Had to fork go.opentelemetry.io/collector/featuregate as we had to work around the problem where Prometheus and OTEL try to register duplicate OTEL feature flags and result in panics.
  • Had to change the way discoverers are created to make sure they correctly register/unregister SD metrics and refresh metrics and pass them over to Prometheus library.
  • Some package renames required changes, for example "github.com/prometheus/prometheus/model/textparse" -> "github.com/prometheus/common/model"
  • Some of the import changes were needed to add "/v3/" to Prometheus import paths. There are some more changes and shuffles due to merge conflicts.
  • Added support for AppendCTZeroSample method in appenders where needed. It is not exposed in any config right now to keep this PR smaller in scope.

Which issue(s) this PR fixes

Fixes #227

Notes to reviewer

Tested this with adding / removing discoverers at runtime and checking that their metrics are unregistered and no panics happen. For example one test scenario:

discovery.docker "containers" {
  host = "unix:///var/run/docker.sock"
}

discovery.file "local" {
  files = ["./file_discovery.yml"]
}

prometheus.scrape "default" {
  targets    = concat(discovery.docker.containers.targets, discovery.file.local.targets)
  forward_to = [prometheus.remote_write.cloud.receiver]
}

We can see these metrics:

➜  discoverers git:(main) ✗ curl -s localhost:12345/metrics | grep disco
prometheus_sd_file_read_errors_total{component_id="discovery.file.local",component_path="/"} 0
prometheus_sd_file_scan_duration_seconds{component_id="discovery.file.local",component_path="/",quantile="0.5"} 5.9375e-05
prometheus_sd_file_scan_duration_seconds{component_id="discovery.file.local",component_path="/",quantile="0.9"} 0.000206667
prometheus_sd_file_scan_duration_seconds{component_id="discovery.file.local",component_path="/",quantile="0.99"} 0.000206667
prometheus_sd_file_scan_duration_seconds_sum{component_id="discovery.file.local",component_path="/"} 0.000266042
prometheus_sd_file_scan_duration_seconds_count{component_id="discovery.file.local",component_path="/"} 2
prometheus_sd_file_watcher_errors_total{component_id="discovery.file.local",component_path="/"} 0
prometheus_sd_refresh_duration_seconds{component_id="discovery.docker.containers",component_path="/",mechanism="docker",quantile="0.5"} 0.085011917
prometheus_sd_refresh_duration_seconds{component_id="discovery.docker.containers",component_path="/",mechanism="docker",quantile="0.9"} 0.244720042
prometheus_sd_refresh_duration_seconds{component_id="discovery.docker.containers",component_path="/",mechanism="docker",quantile="0.99"} 0.244720042
prometheus_sd_refresh_duration_seconds_sum{component_id="discovery.docker.containers",component_path="/",mechanism="docker"} 0.7002042099999999
prometheus_sd_refresh_duration_seconds_count{component_id="discovery.docker.containers",component_path="/",mechanism="docker"} 7
prometheus_sd_refresh_failures_total{component_id="discovery.docker.containers",component_path="/",mechanism="docker"} 0

After renaming the file discovery component and reloading the config we can see:

➜  discoverers git:(main) ✗ curl -s localhost:12345/metrics | grep disco
prometheus_sd_file_read_errors_total{component_id="discovery.file.local_renamed",component_path="/"} 0
prometheus_sd_file_scan_duration_seconds{component_id="discovery.file.local_renamed",component_path="/",quantile="0.5"} 0.00023325
prometheus_sd_file_scan_duration_seconds{component_id="discovery.file.local_renamed",component_path="/",quantile="0.9"} 0.00023325
prometheus_sd_file_scan_duration_seconds{component_id="discovery.file.local_renamed",component_path="/",quantile="0.99"} 0.00023325
prometheus_sd_file_scan_duration_seconds_sum{component_id="discovery.file.local_renamed",component_path="/"} 0.00023325
prometheus_sd_file_scan_duration_seconds_count{component_id="discovery.file.local_renamed",component_path="/"} 1
prometheus_sd_file_watcher_errors_total{component_id="discovery.file.local_renamed",component_path="/"} 0
prometheus_sd_refresh_duration_seconds{component_id="discovery.docker.containers",component_path="/",mechanism="docker",quantile="0.5"} 0.085011917
prometheus_sd_refresh_duration_seconds{component_id="discovery.docker.containers",component_path="/",mechanism="docker",quantile="0.9"} 0.244720042
prometheus_sd_refresh_duration_seconds{component_id="discovery.docker.containers",component_path="/",mechanism="docker",quantile="0.99"} 0.244720042
prometheus_sd_refresh_duration_seconds_sum{component_id="discovery.docker.containers",component_path="/",mechanism="docker"} 0.7002042099999999
prometheus_sd_refresh_duration_seconds_count{component_id="discovery.docker.containers",component_path="/",mechanism="docker"} 7
prometheus_sd_refresh_failures_total{component_id="discovery.docker.containers",component_path="/",mechanism="docker"} 0

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

@thampiotr thampiotr force-pushed the thampiotr/upgrade-prometheus branch 8 times, most recently from b8609c0 to 792ef4b Compare May 16, 2024 14:38
@thampiotr thampiotr changed the title upgrade prometheus Upgrade prometheus May 16, 2024
@thampiotr thampiotr marked this pull request as ready for review May 16, 2024 14:50
@thampiotr thampiotr requested a review from ptodev May 16, 2024 15:00
Copy link
Contributor

@ptodev ptodev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I added a few minor comments. As long as Matt's comments are also addressed, I'd be happy to approve it.

@@ -222,6 +223,23 @@ func (a *appender) AppendHistogram(ref storage.SeriesRef, l labels.Labels, t int
return ref, multiErr
}

func (a *appender) AppendCTZeroSample(ref storage.SeriesRef, l labels.Labels, t, ct int64) (storage.SeriesRef, error) {
if a.start.IsZero() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this from some existing code? Looking at the prometheus code I don't see similar logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a new appender method AppendCTZeroSample and I'm implementing it here in fanout to match the way other appender methods are implemented. This is not copied from anywhere.

Copy link
Collaborator

@mattdurham mattdurham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, maybe a question on how ctzappender is used but I am not up to date on what it really does.

@thampiotr thampiotr force-pushed the thampiotr/upgrade-prometheus branch from a875a23 to 25d17b7 Compare May 21, 2024 12:59
@thampiotr
Copy link
Contributor Author

Looks good, maybe a question on how ctzappender is used but I am not up to date on what it really does.

I believe you cannot even use it in Alloy right now, you'd need to enabled it via https://github.com/prometheus/prometheus/blob/07355c9199e0913a76e1a992816e63b0683d16e7/cmd/prometheus/main.go#L224 - which we are not yet exposing. It's for the timestamp of the creation of the series, which may or may not become a popular feature: https://github.com/prometheus/proposals/blob/main/proposals/2023-06-13_created-timestamp.md

@thampiotr thampiotr merged commit be34410 into main May 21, 2024
18 checks passed
@thampiotr thampiotr deleted the thampiotr/upgrade-prometheus branch May 21, 2024 16:20
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade Alloy to Prometheus 2.50
3 participants