Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mlabns: stackdriver exporter crashing in all projects #896

Closed
stephen-soltesz opened this issue Apr 11, 2022 · 6 comments
Closed

mlabns: stackdriver exporter crashing in all projects #896

stephen-soltesz opened this issue Apr 11, 2022 · 6 comments
Assignees

Comments

@stephen-soltesz
Copy link
Contributor

mlabns-stackdriver-69d7dcb79b-8bcff       0/1     CrashLoopBackOff   3860       15d

Something related to "duplicate label names":

$ kubectl logs mlabns-stackdriver-69d7dcb79b-vszrp  
level=info ts=2022-04-11T21:39:39.800Z caller=stackdriver_exporter.go:164 msg="Starting stackdriver_exporter" version="(version=0.11.0, branch=HEAD, revision=aafa2f0e851f2e07772a388dbf4590484513a19f)"
level=info ts=2022-04-11T21:39:39.801Z caller=stackdriver_exporter.go:165 msg="Build context" build_context="(go=go1.15.1, user=root@67413d69d3db, date=20200902-14:39:30)"
level=info ts=2022-04-11T21:39:39.801Z caller=stackdriver_exporter.go:166 msg="Using Google Cloud Project ID" projectID=mlab-ns
level=info ts=2022-04-11T21:39:39.801Z caller=stackdriver_exporter.go:188 msg="Listening on" address=:9255
panic: duplicate label names

goroutine 268 [running]:
github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...)
	/app/vendor/github.com/prometheus/client_golang/prometheus/value.go:107
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).newConstMetric(0xc00058da08, 0xc00062c580, 0x36, 0x387988e8, 0xed9e69620, 0x0, 0xc0006de800, 0x9, 0x10, 0x2, ...)
	/app/collectors/monitoring_metrics.go:139 +0x20b
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).completeConstMetrics(0xc00058da08)
	/app/collectors/monitoring_metrics.go:180 +0x1f2
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).Complete(0xc00058da08)
	/app/collectors/monitoring_metrics.go:161 +0x2b
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportTimeSeriesMetrics(0xc00021a540, 0xc000626090, 0xc0001c6800, 0xc00007eea0, 0xc000626090, 0x0)
	/app/collectors/monitoring_collector.go:404 +0x1465
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1.1(0xc00055c3d0, 0xc00021a540, 0x38798961, 0xed9e695e4, 0x0, 0x38798961, 0xed9e69620, 0x0, 0xc0006ab620, 0xc0001c6800, ...)
	/app/collectors/monitoring_collector.go:257 +0x6b6
created by github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1
	/app/collectors/monitoring_collector.go:231 +0x3f5

@nkinkade
Copy link
Contributor

This is due to this, I believe: prometheus-community/stackdriver_exporter#85

The resolution is apparently to figure out which user-created metrics are manually adding labels that are duplicates of labels that are already being added by default by GCP. In some projects there are good number of metrics, so this process will likely be tedious.

@nkinkade
Copy link
Contributor

nkinkade commented Apr 20, 2022

Interestingly, it looks like someone posted a PR to fix this issue in the stackdriver-exporter repo just a few hours ago:

prometheus-community/stackdriver_exporter#85 (comment)

@nkinkade
Copy link
Contributor

The PR that apparently resolves this issue is now merged. As it happens, the strackdriver-exporter is one of the last container images I have on my list to upgrade as part of the general push to make sure our container images are relatively up to date. I will see if I can get this change into our version, assuming they are going to make a release containing this fix.

@nkinkade nkinkade self-assigned this Apr 27, 2022
@nkinkade
Copy link
Contributor

@stephen-soltesz: From what I can see, a fix for this issue was merged into the main branch of stackdriver_exporter. However, a release containing this fix has yet to be released. I imagine we have a couple of options here:

  • keep waiting for a new release, and checking back every so often
  • build and host and updated image ourselves.

Perhaps we could build the image ourselves for now, and then just add a note to the stackdriver_exporter k8s manifest, noting the situation, and that the image path should be reverted to the official location once a new release has been made. Do you have any opinion?

@stephen-soltesz
Copy link
Contributor Author

stephen-soltesz commented Aug 22, 2022

Building the image ourselves and adding the note as you suggest sounds good to me. (I'm imagining this taking a few hours).

nkinkade added a commit that referenced this issue Aug 22, 2022
#896

This commit updates the stackdriver_exporter image path to a temporary
location in the measurementlab Docker hub org. Once a new version of
stackdriver_exporter is released, we should revert the image path to the
official one.
nkinkade added a commit that referenced this issue Aug 22, 2022
#896

This commit updates the mlabns-stackdriver_exporter image path to a
temporary location in the measurementlab Docker hub org. Once a new
version of stackdriver_exporter is released, we should revert the image
path to the official one.
nkinkade added a commit that referenced this issue Aug 22, 2022
* Uses temp custom image for stackdriver_exporter

#896

This commit updates the stackdriver_exporter image path to a temporary
location in the measurementlab Docker hub org. Once a new version of
stackdriver_exporter is released, we should revert the image path to the
official one.

* Uses temp image location for mlabns-stackdriver_exporter

#896

This commit updates the mlabns-stackdriver_exporter image path to a
temporary location in the measurementlab Docker hub org. Once a new
version of stackdriver_exporter is released, we should revert the image
path to the official one.
@nkinkade
Copy link
Contributor

Done! And merged: #944

I'm going to close this issue for now. While this fix hasn't yet been deployed to production, it seems to be WAI in sandbox and staging. If it remains a problem we can reopen the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants