Releases: grafana/mimir
Mimir 2.2.0-rc.0
2.2.0-rc.0
This release contains 214 contributions from 32 authors. Thank you!
Grafana Labs is excited to announce version 2.2 of Grafana Mimir, the most scalable, most performant open source time series database in the world.
Highlights include the top features, enhancements, and bugfixes in this release. If you are upgrading from Grafana Mimir 2.1, there is migration-related information as well.
For the complete list of changes, see the Changelog.
Features and enhancements
-
Support for ingesting out-of-order samples: Grafana Mimir includes new, experimental support for ingesting out-of-order samples.
This support is configurable, with users able to set how far out-of-order Mimir will accept samples on a per-tenant basis.
Note that this feature still needs a heavy testing, and is not production-ready yet. -
Error messages: The error messages that Mimir reports are more human readable, and the messages include error codes that are easily searchable.
-
Configurable prefix for object storage: Mimir can now store block data, rules, and alerts in one bucket, each under its own user-defined prefix, rather than requiring one bucket for each.
You can configure the storage prefix by using-<storage>.storage-prefix
option for corresponding storage:ruler-storage
,alertmanager-storage
orblocks-storage
. -
Helm Chart update: TBD
-
Store-gateway can now optionally prepopulate the file system cache when memory-mapping index-header files.
This can help store-gateway to avoid looking stuck while loading index-headers.
Feature can be enabled with new experimental flag-blocks-storage.bucket-store.index-header.map-populate-enabled
. -
Faster ingester startup: Ingesters now replay Write-Ahead-Log by about 50% faster, and they also re-join the ring sooner under some conditions.
Upgrade considerations
We have updated default values and some parameters in Grafana Mimir 2.2 to give you a better out-of-the-box experience:
-
Message size limits for gRPC messages exchanged between internal Mimir components increased to 100 MiB from the previous 4 MiB.
This helps to avoid internal server errors when pushing or querying large data. -
The
-blocks-storage.bucket-store.ignore-blocks-within
parameter changed from0
to10h
.
The default value of-querier.query-store-after
changed from0
to12h
.
Both changes improve query performance for most-recent data by querying only the ingesters, rather than object storage. -
The option
-querier.shuffle-sharding-ingesters-lookback-period
has been deprecated.
If you previously changed this option from its default of0s
, set-querier.shuffle-sharding-ingesters-enabled
totrue
and specify the lookback period by setting the-querier.query-ingesters-within
option. -
The
-memberlist.abort-if-join-fails
parameter now defaults to false.
When Mimir is using memberlist as a backend store for hash ring, and it fails to join the memberlist cluster, Mimir no longer aborts startup by default.
Bug fixes
- PR 1883: Fixed a bug that caused the query-frontend and querier to crash when they received a user query with a special regular expression label matcher.
- PR 1933: Fixed a bug in the ingester ring page, which showed incorrect status of entries in the ring.
- PR 2090: Ruler in remote rule evaluation mode now applies the timeout correctly. Previously the ruler could get stuck forever, which halted rule evaluation.
- PR 2036: Fixed panic at startup when Mimir is running in monolithic mode and query sharding is enabled.
CHANGELOG
Grafana Mimir
- [CHANGE] Increased default configuration for
-server.grpc-max-recv-msg-size-bytes
and-server.grpc-max-send-msg-size-bytes
from 4MB to 100MB. #1884 - [CHANGE] Default values have changed for the following settings. This improves query performance for recent data (within 12h) by only reading from ingesters: #1909 #1921
-blocks-storage.bucket-store.ignore-blocks-within
now defaults to10h
(previously0
)-querier.query-store-after
now defaults to12h
(previously0
)
- [CHANGE] Alertmanager: removed support for migrating local files from Cortex 1.8 or earlier. Related to original Cortex PR cortexproject/cortex#3910. #2253
- [CHANGE] The following settings are now classified as advanced because the defaults should work for most users and tuning them requires in-depth knowledge of how the read path works: #1929
-querier.query-ingesters-within
-querier.query-store-after
- [CHANGE] Config flag category overrides can be set dynamically at runtime. #1934
- [CHANGE] Ingester: deprecated
-ingester.ring.join-after
. Mimir now behaves as this setting is always set to 0s. This configuration option will be removed in Mimir 2.4.0. #1965 - [CHANGE] Blocks uploaded by ingester no longer contain
__org_id__
label. Compactor now ignores this label and will compact blocks with and without this label together.mimirconvert
tool will remove the label from blocks as "unknown" label. #1972 - [CHANGE] Querier: deprecated
-querier.shuffle-sharding-ingesters-lookback-period
, instead adding-querier.shuffle-sharding-ingesters-enabled
to enable or disable shuffle sharding on the read path. The value of-querier.query-ingesters-within
is now used internally for shuffle sharding lookback. #2110 - [CHANGE] Memberlist:
-memberlist.abort-if-join-fails
now defaults to false. Previously it defaulted to true. #2168 - [CHANGE] Ruler:
/api/v1/rules*
and/prometheus/rules*
configuration endpoints are removed. Use/prometheus/config/v1/rules*
. #2182 - [CHANGE] Ingester:
-ingester.exemplars-update-period
has been renamed to-ingester.tsdb-config-update-period
. You can use it to update multiple, per-tenant TSDB configurations. #2187 - [FEATURE] Ingester: (Experimental) Add the ability to ingest out-of-order samples up to an allowed limit. If you enable this feature, it requires additional memory and disk space. This feature also enables a write-behind log, which might lead to longer ingester-start replays. When this feature is disabled, there is no overhead on memory, disk space, or startup times. #2187
-ingester.out-of-order-time-window
, as duration string, allows you to set how back in time a sample can be. The default is0s
, wheres
is seconds.cortex_ingester_tsdb_out_of_order_samples_appended_total
metric tracks the total number of out-of-order samples ingested by the ingester.cortex_discarded_samples_total
has a new labelreason="sample-too-old"
, when the-ingester.out-of-order-time-window
flag is greater than zero. The label tracks the number of samples that were discarded for being too old; they were out of order, but beyond the time window allowed.
- [ENHANCEMENT] Distributor: Added limit to prevent tenants from sending excessive number of requests: #1843
- The following CLI flags (and their respective YAML config options) have been added:
-distributor.request-rate-limit
-distributor.request-burst-limit
- The following metric is exposed to tell how many requests have been rejected:
cortex_discarded_requests_total
- The following CLI flags (and their respective YAML config options) have been added:
- [ENHANCEMENT] Store-gateway: Add the experimental ability to run requests in a dedicated OS thread pool. This feature can be configured using
-store-gateway.thread-pool-size
and is disabled by default. Replaces the ability to run index header operations in a dedicated thread pool. #1660 #1812 - [ENHANCEMENT] Improved error messages to make them easier to understand; each now have a unique, global identifier that you can use to look up in the runbooks for more information. #1907 #1919 #1888 #1939 #1984 #2009 #2056 #2066 #2104 #2150 #2234
- [ENHANCEMENT] Memberlist KV: incoming messages are now processed on per-key goroutine. This may reduce loss of "maintanance" packets in busy memberlist installations, but use more CPU. New
memberlist_client_received_broadcasts_dropped_total
counter tracks number of dropped per-key messages. #1912 - [ENHANCEMENT] Blocks Storage, Alertmanager, Ruler: add support a prefix to the bucket store (
*_storage.storage_prefix
). This enables using the same bucket for the three components. #1686 #1951 - [ENHANCEMENT] Upgrade Docker base images to
alpine:3.16.0
. #2028 - [ENHANCEMENT] Store-gateway: Add experimental configuration option for the store-gateway to attempt to pre-populate the file system cache when memory-mapping index-header files. Enabled with
-blocks-storage.bucket-store.index-header.map-populate-enabled=true
. Note this flag only has an effect when running on Linux. #2019 #2054 - [ENHANCEMENT] Chunk Mapper: reduce memory usage of async chunk mapper. #2043
- [ENHANCEMENT] Ingester: reduce sleep time when reading WAL. #2098
- [ENHANCEMENT] Compactor: Run sanity check on blocks storage configuration at startup. #2144
- [ENHANCEMENT] Compactor: Add HTTP API for uploading TSDB blocks. Enabled with
-compactor.block-upload-enabled
. #1694 #2126 - [ENHANCEMENT] Ingester: Enable querying overlapping blocks by default. #2187
- [ENHANCEMENT] Distributor: Auto-forget unhealthy distributors after ten failed ring heartbeats. #2154
- [ENHANCEMENT] Distributor: Add new metric
cortex_distributor_forward_errors_total
for error codes resulting from forwarding requests. #2077 - [ENHANCEMENT]
/ready
endpoint now returns and logs detailed services information. #2055 - [ENHANCEMENT] Memcached client: Reduce number of connections required to fetch cached keys from memcached. #1920
- [ENHANCEMENT] Improved error message returned when
-querier.query-store-after
validation fails. #...
2.1.0
Grafana Labs is excited to announce version 2.1 of Grafana Mimir, the most scalable, most performant open source time series database in the world.
Below we highlight the top features, enhancements and bugfixes in this release, as well as relevant callouts for those upgrading from Grafana Mimir 2.0. The complete list of changes is recorded in the Changelog.
Features and enhancements
-
Mimir on ARM: We now publish Docker images for both
amd64
andarm64
, making it easier for those on arm-based machines to develop and run Mimir. Multiplaform images are available from the Mimir docker registry. Note that our existing integration test suite only uses theamd64
images, which means we cannot make any functional or performance guarantees about thearm64
images. -
Remote
ruler mode for improved rule evaluation performance: We've added aremote
mode for the Grafana Mimir ruler, in which the ruler delegates rule evaluation to the query-frontend rather than evaluating rules directly within the ruler process itself. This allows recording and alerting rules to benefit from the query parallelization techniques implemented in the query-frontend (like query sharding).Remote
mode is considered experimental and is off by default. To enable, see remote ruler. -
Per-tenant custom trackers for monitoring cardinality: In Grafana Mimir 2.0, we introduced a custom tracker feature that allows you to track the count of active series over time that match a specific label matcher. In Grafana Mimir 2.1, we've made it possible to configure custom trackers via the runtime configuration file. This means you can now define different trackers for each tenant in your cluster and modify those trackers without an ingester restart.
-
Reduce cardinality of Grafana Mimir's
/metrics
endpoint: While Grafana Mimir does a good job of exposing a relatively small number of series about its own state, this number can tick up when running Grafana Mimir clusters with high tenant counts or high active series counts. To reduce this number (and the accompanying cost of scraping and storing these time series), we made several optimizations which decreased series count on the/metrics
endpoint by more than 10%.
Upgrade considerations
We've updated the default values for 2 parameters in Grafana Mimir to give users better out-of-the-box performance:
-
We've changed the default for
-blocks-storage.tsdb.isolation-enabled
fromtrue
tofalse
. We've marked this flag as deprecated and will remove it completely in 2 releases. TSDB isolation is a feature inherited from Prometheus that didn't provide any benefit given Grafana Mimir's distributed architecture and in our 1 billion series load test we found it actually hurt performance. Disabling it reduced our ingester 99th percentile latency by 90%. -
The store-gateway attributes cache is now enabled by default (achieved by updating the default for
-blocks-storage.bucket-store.chunks-cache.attributes-in-memory-max-items
from0
to50000
). This in-memory cache makes it faster to look up object attributes for chunk data. We've been running this optional cache internally for a while and upon a recent configuration audit, realized it made sense to do the same for all users. The increase in store-gateway memory utilization from enabling this cache is negligible and easily justified given the performance gains.
Bug fixes
2.1.0 bug fixes
- PR 1704: Fixed a bug that previously caused Grafana Mimir to crash on startup when trying to run in monolithic mode with the results cache enabled due to duplicate metric names.
- PR 1835: Fixed a bug that caused Grafana Mimir to crash when an invalid Alertmanager configuration was set even though the Alertmanager component was disabled. After this fix, the Alertmanager configuration is only validated if the Alertmanager component is loaded.
- PR 1836: The ability to run Alertmanager with
local
storage broke in Grafana Mimir 2.0 when we removed the ability to run the Alertmanager without sharding. With this bugfix, we've made it possible to again run Alertmanager withlocal
storage. However, for production use, we still recommend using external store since this is needed to persist Alertmanager state (e.g. silences) between replicas. - PR 1715: Restored Grafana Mimir's ability to use CNAME DNS records to reach memcached servers. The bug was inherited from an upstream change to Thanos; we contributed a fix to Thanos and subsequently updated our Thanos version.
CHANGELOG
Grafana Mimir
- [CHANGE] Compactor: No longer upload debug meta files to object storage. #1257
- [CHANGE] Default values have changed for the following settings: #1547
-alertmanager.alertmanager-client.grpc-max-recv-msg-size
now defaults to 100 MiB (previously was not configurable and set to 16 MiB)-alertmanager.alertmanager-client.grpc-max-send-msg-size
now defaults to 100 MiB (previously was not configurable and set to 4 MiB)-alertmanager.max-recv-msg-size
now defaults to 100 MiB (previously was 16 MiB)
- [CHANGE] Ingester: Add
user
label to metricscortex_ingester_ingested_samples_total
andcortex_ingester_ingested_samples_failures_total
. #1533 - [CHANGE] Ingester: Changed
-blocks-storage.tsdb.isolation-enabled
default fromtrue
tofalse
. The config option has also been deprecated and will be removed in 2 minor version. #1655 - [CHANGE] Query-frontend: results cache keys are now versioned, this will cause cache to be re-filled when rolling out this version. #1631
- [CHANGE] Store-gateway: enabled attributes in-memory cache by default. New default configuration is
-blocks-storage.bucket-store.chunks-cache.attributes-in-memory-max-items=50000
. #1727 - [CHANGE] Compactor: Removed the metric
cortex_compactor_garbage_collected_blocks_total
since it duplicatescortex_compactor_blocks_marked_for_deletion_total
. #1728 - [CHANGE] All: Logs that used the
org_id
label now useuser
label. #1634 #1758 - [CHANGE] Alertmanager: the following metrics are not exported for a given
user
andintegration
when the metric value is zero: #1783cortex_alertmanager_notifications_total
cortex_alertmanager_notifications_failed_total
cortex_alertmanager_notification_requests_total
cortex_alertmanager_notification_requests_failed_total
cortex_alertmanager_notification_rate_limited_total
- [CHANGE] Removed the following metrics exposed by the Mimir hash rings: #1791
cortex_member_ring_tokens_owned
cortex_member_ring_tokens_to_own
cortex_ring_tokens_owned
cortex_ring_member_ownership_percent
- [CHANGE] Querier / Ruler: removed the following metrics tracking number of query requests send to each ingester. You can use
cortex_request_duration_seconds_count{route=~"/cortex.Ingester/(QueryStream|QueryExemplars)"}
instead. #1797cortex_distributor_ingester_queries_total
cortex_distributor_ingester_query_failures_total
- [CHANGE] Distributor: removed the following metrics tracking the number of requests from a distributor to ingesters: #1799
cortex_distributor_ingester_appends_total
cortex_distributor_ingester_append_failures_total
- [CHANGE] Distributor / Ruler: deprecated
-distributor.extend-writes
. Now Mimir always behaves as if this setting was set tofalse
, which we expect to be safe for every Mimir cluster setup. #1856 - [FEATURE] Querier: Added support for streaming remote read. Should be noted that benefits of chunking the response are partial here, since in a typical
query-frontend
setup responses will be buffered until they've been completed. #1735 - [FEATURE] Ruler: Allow setting
evaluation_delay
for each rule group via rules group configuration file. #1474 - [FEATURE] Ruler: Added support for expression remote evaluation. #1536 #1818
- The following CLI flags (and their respective YAML config options) have been added:
-ruler.query-frontend.address
-ruler.query-frontend.grpc-client-config.grpc-max-recv-msg-size
-ruler.query-frontend.grpc-client-config.grpc-max-send-msg-size
-ruler.query-frontend.grpc-client-config.grpc-compression
-ruler.query-frontend.grpc-client-config.grpc-client-rate-limit
-ruler.query-frontend.grpc-client-config.grpc-client-rate-limit-burst
-ruler.query-frontend.grpc-client-config.backoff-on-ratelimits
-ruler.query-frontend.grpc-client-config.backoff-min-period
-ruler.query-frontend.grpc-client-config.backoff-max-period
-ruler.query-frontend.grpc-client-config.backoff-retries
-ruler.query-frontend.grpc-client-config.tls-enabled
-ruler.query-frontend.grpc-client-config.tls-ca-path
-ruler.query-frontend.grpc-client-config.tls-cert-path
-ruler.query-frontend.grpc-client-config.tls-key-path
-ruler.query-frontend.grpc-client-config.tls-server-name
-ruler.query-frontend.grpc-client-config.tls-insecure-skip-verify
- The following CLI flags (and their respective YAML config options) have been added:
- [FEATURE] Distributor: Added the ability to forward specifics metrics ...
2.1.0-rc.1
CHANGELOG since mimir-2.1.0-rc.0
- [CHANGE] Distributor / Ruler: deprecated
-distributor.extend-writes
. Now Mimir always behaves as if this setting was set tofalse
, which we expect to be safe for every Mimir cluster setup. #1856
2.1.0-rc.0
Grafana Mimir version 2.1 release notes
Grafana Labs is excited to announce version 2.1 of Grafana Mimir, the most scalable, most performant open source time series database in the world.
Below we highlight the top features, enhancements and bugfixes in this release, as well as relevant callouts for those upgrading from Grafana Mimir 2.0. The complete list of changes is recorded in the Changelog.
Features and enhancements
-
Mimir on ARM: We now publish Docker images for both
amd64
andarm64
, making it easier for those on arm-based machines to develop and run Mimir. Multiplaform images are available from the Mimir docker registry. Note that our existing integration test suite only uses theamd64
images, which means we cannot make any functional or performance guarantees about thearm64
images. -
Remote
ruler mode for improved rule evaluation performance: We've added aremote
mode for the Grafana Mimir ruler, in which the ruler delegates rule evaluation to the query-frontend rather than evaluating rules directly within the ruler process itself. This allows recording and alerting rules to benefit from the query parallelization techniques implemented in the query-frontend (like query sharding).Remote
mode is considered experimental and is off by default. To enable, see remote ruler. -
Per-tenant custom trackers for monitoring cardinality: In Grafana Mimir 2.0, we introduced a custom tracker feature that allows you to track the count of active series over time that match a specific label matcher. In Grafana Mimir 2.1, we've made it possible to configure custom trackers via the runtime configuration file. This means you can now define different trackers for each tenant in your cluster and modify those trackers without an ingester restart.
-
Reduce cardinality of Grafana Mimir's
/metrics
endpoint: While Grafana Mimir does a good job of exposing a relatively small number of series about its own state, this number can tick up when running Grafana Mimir clusters with high tenant counts or high active series counts. To reduce this number (and the accompanying cost of scraping and storing these time series), we made several optimizations which decreased series count on the/metrics
endpoint by more than 10%.
Upgrade considerations
We've updated the default values for 2 parameters in Grafana Mimir to give users better out-of-the-box performance:
-
We've changed the default for
-blocks-storage.tsdb.isolation-enabled
fromtrue
tofalse
. We've marked this flag as deprecated and will remove it completely in 2 releases. TSDB isolation is a feature inherited from Prometheus that didn't provide any benefit given Grafana Mimir's distributed architecture and in our 1 billion series load test we found it actually hurt performance. Disabling it reduced our ingester 99th percentile latency by 90%. -
The store-gateway attributes cache is now enabled by default (achieved by updating the default for
-blocks-storage.bucket-store.chunks-cache.attributes-in-memory-max-items
from0
to50000
). This in-memory cache makes it faster to look up object attributes for chunk data. We've been running this optional cache internally for a while and upon a recent configuration audit, realized it made sense to do the same for all users. The increase in store-gateway memory utilization from enabling this cache is negligible and easily justified given the performance gains.
Bug fixes
2.1.0 bug fixes
- PR 1704: Fixed a bug that previously caused Grafana Mimir to crash on startup when trying to run in monolithic mode with the results cache enabled due to duplicate metric names.
- PR 1835: Fixed a bug that caused Grafana Mimir to crash when an invalid Alertmanager configuration was set even though the Alertmanager component was disabled. After this fix, the Alertmanager configuration is only validated if the Alertmanager component is loaded.
- PR 1836: The ability to run Alertmanager with
local
storage broke in Grafana Mimir 2.0 when we removed the ability to run the Alertmanager without sharding. With this bugfix, we've made it possible to again run Alertmanager withlocal
storage. However, for production use, we still recommend using external store since this is needed to persist Alertmanager state (e.g. silences) between replicas. - PR 1715: Restored Grafana Mimir's ability to use CNAME DNS records to reach memcached servers. The bug was inherited from an upstream change to Thanos; we contributed a fix to Thanos and subsequently updated our Thanos version.
2.0.0
Grafana Labs is excited to announce the first release of Grafana Mimir, the most scalable, most performant open source time series database in the world. In customer tests, we’ve shown that a single cluster can support more than 1 billion active time series.
Besides massive scale, Grafana Mimir offers a host of other benefits, including easy deployment, native multi-tenancy, high availability, durable long-term storage, and exceptional query performance on even the highest cardinality queries.
We’re launching Grafana Mimir with a 2.0 version number to signal our respect for Cortex, the project from which Grafana Mimir was forked. The choice of 2.0 also represents our conviction that Grafana Mimir is real-world-tested, production-ready software. It has served as the backbone of our Grafana Cloud Metrics and Grafana Enterprise Metrics products since their inception.
Learn more:
- Grafana Mimir 2.0.0 release notes
- Announcing Grafana Mimir, the most scalable open source TSDB in the world
- Q&A with Grafana Labs CEO Raj Dutt about Grafana Mimir
- Intro to Grafana Mimir webinar on April 26
The complete list of changes is recorded in the Changelog.
2.0.0-rc.4
mimir-2.0.0-rc.4 v2.0.0-rc.4
2.0.0-rc.3
mimir-2.0.0-rc.3 v2.0.0-rc.3
2.0.0-rc.2
mimir-2.0.0-rc.2 v2.0.0-rc.2
2.0.0-rc.1
mimir-2.0.0-rc.1 v2.0.0-rc.1
2.0.0-rc.0
mimir-2.0.0-rc.0 v2.0.0-rc.0