Skip to content

2.12.0-rc.1

Pre-release
Pre-release
Compare
Choose a tag to compare
@duricanikolic duricanikolic released this 20 Mar 12:21
· 1467 commits to main since this release
mimir-2.12.0-rc.1
6c30057

This release contains 525 PRs from 60 authors, including new contributors Benoit Schipper, Derek Cadzow, Edwin, Itay Kalfon, Ivan Farré Vicente, Jan O. Rundshagen, Jorge Turrado Ferrero, Lukas Monkevicius, Mickaël Canévet, Rafael Sathler, Rajakavitha Kodhandapani, Tim Kotowski, Vladimir Varankin, Zach, Zach Day, Zirko, blut, github-actions[bot], ncharaf, zhehao-grafana. Thank you!

Grafana Mimir version 2.12.0-rc.1 release notes

Grafana Labs is excited to announce version 2.12 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bug fixes in this release.
For the complete list of changes, refer to the CHANGELOG.

Features and enhancements

  • Added support to only count series that are considered active through the Cardinality API endpoint /api/v1/cardinality/label_names by passing the count_method parameter.
    If set to active it counts only series that are considered active according to the -ingester.active-series-metrics-idle-timeout flag setting rather than counting all in-memory series.

  • The "Store-gateway: bucket tenant blocks" admin page contains a new column "No Compact".
    If block no compaction marker is set, it specifies the reason and the date the marker is added.

  • The estimated number of compaction jobs based on the current bucket-index is now computed by the compactor.
    The result is tracked by the new cortex_bucket_index_compaction_jobs metric.
    If this computation fails, the cortex_bucket_index_compaction_jobs_errors_total metric is updated instead.
    The estimated number of compaction jobs is also shown in Top tenants, Tenants, and Compactor dashboards.

  • Added mimir-distroless container image built upon a distroless image (gcr.io/distroless/static-debian12).
    This improvement minimizes attack surfaces and potential CVEs by trimming down the dependencies within the image.
    After comprehensive testing, the Mimir maintainers plan to shift from the current image to the distroless version.

Additionally, the following previously experimental features are now considered stable:

  • The number of pre-allocated workers used to forward push requests to the ingesters, configurable via the -distributor.reusable-ingester-push-workers CLI flag on distributors.
    It now defaults to 2000.
    Note that this is a performance optimization, and not a limiting feature.
    If not enough workers available, new goroutines will be spawned.

  • The number of gRPC server workers used to serve the requests, configurable via the -server.grpc.num-workers CLI flag.
    It now defaults to 100.
    Note that this is the number of pre-allocated long-lived workers, and not a limiting feature.
    If not enough workers are available, new goroutines will be spawned.

  • The maximum number of concurrent index header loads across all tenants, configurable via the -blocks-storage.bucket-store.index-header.lazy-loading-concurrency CLI flag on store-gateways.
    It defaults to 4.

  • The maximum time to wait for the query-frontend to become ready before rejecting requests, configurable via the -query-frontend.not-running-timeout CLI flag on query-frontends.
    It now defaults to 2s.

  • The CLI flag that allows queriers to reduce pressure on ingesters by initially querying only the minimum set of ingesters required to reach quorum, -querier.minimize-ingester-requests.
    It is now enabled by default.

  • Spread-minimizing token-related CLI flags: -ingester.ring.token-generation-strategy, -ingester.ring.spread-minimizing-zones and -ingester.ring.spread-minimizing-join-ring-in-order.
    You can read more about this feature in our blog post.

Important changes

In Grafana Mimir 2.12 the following behavior has changed:

  • Store-gateway now persists a sparse version of the index-header to disk on construction and loads sparse index-headers from disk instead of the whole index-header.
    This improves the speed at which index headers are lazy-loaded from disk by up to 90%. The added disk usage is in the order of 1-2%.

  • Alertmanager deprecated the v1 API. All v1 API endpoints now respond with a JSON deprecation notice and a status code of 410.
    All endpoints have a v2 equivalent.
    The list of endpoints is:

    • <alertmanager-web.external-url>/api/v1/alerts
    • <alertmanager-web.external-url>/api/v1/receivers
    • <alertmanager-web.external-url>/api/v1/silence/{id}
    • <alertmanager-web.external-url>/api/v1/silences
    • <alertmanager-web.external-url>/api/v1/status
  • Exemplar's label traceID has been changed to trace_id to be consistent with the OpenTelemetry standard.

  • Errors returned by ingesters now contain only gRPC status codes.
    Previously they contained both gRPC and HTTP status codes.
    To guarantee backwards compatibility when migrating from a version prior to 2.11, it's necessary to first migrate to version 2.11, and then to version 2.12.
    Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx won't be recognized, and the corresponding request will be repeated.

  • Responses with gRPC status codes are now reported as status_code labels in the cortex_request_duration_seconds and cortex_ingester_client_request_duration_seconds metrics.

  • Responses with HTTP 4xx status codes are now treated as errors and used in status_code label of request duration metric.

The default value of the following CLI flags have been changed:

  • -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes from 10MB to 100MB.
  • -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes from 10MB to 100MB.
  • -blocks-storage.bucket-store.tenant-sync-concurrency from 10 to 1.
  • -query-frontend.max-cache-freshness from 1m to 10m.
  • -distributor.write-requests-buffer-pooling-enabled from false to true.
  • -locks-storage.bucket-store.block-sync-concurrency from 20 to 4.
  • -memberlist.stream-timeout from 10s to 2s.
  • -server.report-grpc-codes-in-instrumentation-label-enabled from false to true.

The following deprecated configuration options are removed in Grafana Mimir 2.12:

  • The YAML setting frontend.cache_unaligned_requests.
  • Experimental CLI flag -querier.prefer-streaming-chunks-from-ingesters.

The following configuration options are deprecated and will be removed in Grafana Mimir 2.14:

  • The CLI flag -ingester.limit-inflight-requests-using-grpc-method-limiter.
    It now defaults to true.

  • The CLI flag -ingester.return-only-grpc-errors.
    It now defaults to true.
    To guarantee backwards compatibility when migrating from a version prior to 2.11, it's necessary to first migrate to version 2.11, and then to version 2.12.
    Otherwise, it might happen that during the migration, some ingester errors with HTTP status code 4xx won't be recognized, and the corresponding request will be repeated.

  • The CLI flag -ingester.client.report-grpc-codes-in-instrumentation-label-enabled.
    It now defaults to true.

  • The CLI flag -distributor.limit-inflight-requests-using-grpc-method-limiter.
    It now defaults to true.

  • The CLI flag -distributor.enable-otlp-metadata-storage.
    It now defaults to true.

  • The CLI flag -querier.max-query-into-future.

The following metrics are removed or deprecated:

  • cortex_bucket_store_blocks_loaded_by_duration has been removed.
  • cortex_distributor_sample_delay_seconds has been deprecated and will be removed in Mimir 2.14.

Experimental features

Grafana Mimir 2.12 includes new features that are considered experimental and disabled by default.
Use them with caution and report any issues you encounter:

  • The maximum number of tenant IDs that may be for a federated query can be configured via the -tenant-federation.max-tenants CLI flag on query-frontends.
    By default, it's 0, meaning that the limit is disabled.

  • Sharding of active series queries can be enabled via the -query-frontend.shard-active-series-queries CLI flag on query-frontends.

  • Timely head compaction can be enabled via the -blocks-storage.tsdb.timely-head-compaction-enabled on ingesters.
    If enabled, the head compaction happens when the min block range can no longer be appended, without requiring 1.5x the chunk range worth of data in the head.

  • Streaming of responses from querier to query-frontend can be enabled via the -querier.response-streaming-enabled CLI flag on queriers.
    This is currently supported only for responses from the /api/v1/cardinality/active_series endpoint.

  • The maximum response size for active series queries, in bytes, can be set via the -querier.active-series-results-max-size-bytes CLI flag on queriers.

  • Metric relabeling on a per-tenant basis can be forcefully disabled via the -distributor.metric-relabeling-enabled CLI flag on rulers.
    Metrics relabeling is enabled by default.

  • Query Queue Load Balancing by Query Component. Tenant query queues in the query-scheduler can now be split into subqueues by which query component is expected to be utilized to complete the query: ingesters, store-gateways, both, or uncategorized.
    Dequeuing queries for a given tenant will rotate through the query component subqueues via simple round-robin.
    In the event that the one of the query components (ingesters or store-gateways) experience a slowdown, queries only utilizing the the other query component can continue to be serviced.
    This feature is recommended to be enabled.
    The following CLI flags must be set to true in order to be in effect:

    • -query-frontend.additional-query-queue-dimensions-enabled on the query-frontend.
    • -query-scheduler.additional-query-queue-dimensions-enabled on the query-scheduler.
  • Owned series tracking in ingesters can be enabled via the -ingester.track-ingester-owned-series CLI flag.
    When enabled, ingesters will track the number of in-memory series that still map to the ingester based on the ring state.
    These counts are more reactive to ring and shard changes than in-memory series, and can be used when enforcing tenant series limits by enabling the -ingester.use-ingester-owned-series-for-limits CLI flag.
    This feature requires zone-aware replication to be enabled, and the replication factor to be equal to the number of zones.

Bug fixes

  • Distributor: fixed an issue where -distributor.metric-relabeling-enabled could cause distributors to panic.
  • Distributor: fix an issue where -distributor.metric-relabeling-enabled could cause distributors to write unsorted labels and corrupt blocks.
  • Ingester: errors encountered while iterating through chunks or samples in response to a query request aren't ignored anymore.
  • Compactor: out-of-order blocks aren't allowed to prevent timely compaction anymore.
  • Querier: requests to store-gateway when a query gets canceled aren't retried anymore.
  • Querier: status code 499 is now returned instead of 500 when a request to remote read endpoint gets canceled.
  • Querier: fixed an issue where -querier.max-fetched-series-per-query wasn't applied to /series endpoint in case series loaded from ingesters.
  • Querier: fixed an issue with the remote-read requests HTTP status code translations.
    Previously, remote-read had conflicting behaviours: when returning samples all internal errors were translated to HTTP 400, while when returning chunks all internal errors were translated to HTTP 500.
    With this fix, all validation errors will be translated into HTTP 400 errors, while all other errors will be translated into HTTP 500 errors.
  • Query-frontend: the cortex_query_frontend_queries_total metric incorrectly reported op="query" for any request which wasn't a range query.
    Now the op label value can be one of the following:
    • query: instant query
    • query_range: range query
    • cardinality: cardinality query
    • label_names_and_values: label names / values query
    • active_series: active series query
    • other: any other request
  • Ruler: fixed an issue where "failed to remotely evaluate query expression, will retry" messages were logged without context such as the trace ID and didn't appear in trace events.
  • Ruler: requests to remote querier when server's response exceeds its configured max payload size aren't retried anymore.
  • Ruler: fixed a regression that caused client errors to be tracked in cortex_ruler_write_requests_failed_total metric.
  • Ruler: fixed an issue with recording rule result being corruption due to an usage of a bad native histogram pointer.

Helm chart improvements

The Grafana Mimir and Grafana Enterprise Metrics Helm charts are released independently.
Refer to the Grafana Mimir Helm chart documentation.

Changelog

2.12.0-rc.1

Grafana Mimir

  • [BUGFIX] Query-frontend: Fix memory leak on every request. #7654

All changes in this release: mimir-2.12.0-rc.0...mimir-2.12.0-rc.1