Releases: thanos-io/thanos
v0.12.0-rc.1
Fixed
- #2288 Ruler: fixes issue #2281, a bug causing incorrect parsing of query address with path prefix.
- #2238 Ruler: fixed issue #2204, where a bug in alert queue signaling filled up the queue and alerts were dropped.
- #2231 Bucket Web: sort chunks by thanos.downsample.resolution for better grouping.
- #2254 Bucket: fix issue where metrics were registered multiple times in bucket replicate.
- #2271 Bucket Web: fixed issue #2260, where the bucket passes null when storage is empty.
- #2339 Query: fix a bug where
--store.unhealthy-timeout
was never respected. - #2208 Query and Rule: fix handling of
web.route-prefix
to correctly handle/
and prefixes that do not begin with a/
. - #2311 Receive: ensure receive component serves TLS when TLS configuration is provided.
- #2319 Query: fixed inconsistent naming of metrics.
- #2390 Store: fixed bug that was causing all posting offsets to be used instead of only 1/32 as intended; added hidden flag to control this behavior.
- #2393 Store: fixed bug causing certain not-existing label values queried to fail with "invalid-size" error from binary header.
- #2382 Store: fixex bug causing partial writes of index-header.
- #2383 Store: handle expected errors correctly, e.g. do not increment failure counters.
Added
- #2252 Query: add new
--store-strict
flag. More information available here. - #2265 Compact: add
--wait-interval
to specify compaction wait interval between consecutive compact runs when--wait
is enabled. - #2250 Compact: enable vertical compaction for offline deduplication (experimental). Uses
--deduplication.replica-label
flag to specify the replica label on which to deduplicate (hidden). Please note that this uses a NAIVE algorithm for merging (no smart replica deduplication, just chaining samples together). This works well for deduplication of blocks with precisely the same samples like those produced by Receiver replication. We plan to add a smarter algorithm in the following weeks. - #1714 Compact: the compact component now exposes the bucket web UI when it is run as a long-lived process.
- #2304 Store: added
max_item_size
configuration option to memcached-based index cache. This should be set to the max item size configured in memcached (-I
flag) in order to not waste network round-trips to cache items larger than the limit configured in memcached. - #2297 Store: add
--experimental.enable-index-cache-postings-compression
flag to enable re-encoding and compressing postings before storing them into the cache. Compressed postings take about 10% of the original size. - #2357 Compact and Store: the compact and store components now serve the bucket UI on
:<http-port>/loaded
, which shows exactly the blocks that are currently seen by compactor and the store gateway. The compactor also serves a different bucket UI on:<http-port>/global
, which shows the status of object storage without any filters. - #2166 Bucket Web: improve the tooltip for the bucket UI; it was reconstructed and now exposes much more information about blocks.
- #2172 Store: add support for sharding the store component based on the label hash.
- #2113 Bucket: added
thanos bucket replicate
command to replicate blocks from one bucket to another. - #1922 Docs: create a new document to explain sharding in Thanos.
- #2230 Store: optimize conversion of labels.
Changed
- #2136 breaking Store, Compact, Bucket: schedule block deletion by adding deletion-mark.json. This adds a consistent way for multiple readers and writers to access object storage.
Since there are no consistency guarantees provided by some Object Storage providers, this PR adds a consistent lock-free way of dealing with Object Storage irrespective of the choice of object storage. In order to achieve this co-ordination, blocks are not deleted directly. Instead, blocks are marked for deletion by uploading thedeletion-mark.json
file for the block that was chosen to be deleted. This file contains Unix time of when the block was marked for deletion. If you want to keep existing behavior, you should add--delete-delay=0s
as a flag. - #2090 breaking Downsample command: the
downsample
command has moved and is now a sub-command of thethanos bucket
sub-command; it cannot be called viathanos downsample
any more. - #2294 Store: optimizations for fetching postings. Queries using
=~".*"
matchers or negation matchers (!=...
or!~...
) benefit the most. - #2301 Ruler: exit with an error when initialization fails.
- #2310 Query: report timespan 0 to 0 when discovering no stores.
- #2330 Store: index-header is no longer experimental. It is enabled by default for store Gateway. You can disable it with new hidden flag:
--store.disable-index-header
. The--experimental.enable-index-header
flag was removed. - #1848 Ruler: allow returning error messages when a reload is triggered via HTTP.
- #2270 All: Thanos components will now print stack traces when they error out.
v0.12.0-rc.0
Fixed
- #2288 Ruler: fixes issue #2281, a bug causing incorrect parsing of query address with path prefix.
- #2238 Ruler: fixed issue #2204, where a bug in alert queue signaling filled up the queue and alerts were dropped.
- #2231 Bucket Web: sort chunks by thanos.downsample.resolution for better grouping.
- #2254 Bucket: fix issue where metrics were registered multiple times in bucket replicate.
- #2271 Bucket Web: fixed issue #2260, where the bucket passes null when storage is empty.
- #2339 Query: fix a bug where
--store.unhealthy-timeout
was never respected. - #2208 Query and Rule: fix handling of
web.route-prefix
to correctly handle/
and prefixes that do not begin with a/
. - #2311 Receive: ensure receive component serves TLS when TLS configuration is provided.
- #2319 Query: fixed inconsistent naming of metrics.
Added
- #2252 Query: add new
--store-strict
flag. More information available here. - #2265 Compact: add
--wait-interval
to specify compaction wait interval between consecutive compact runs when--wait
is enabled. - #2250 Compact: enable vertical compaction for offline deduplication (experimental). Uses
--deduplication.replica-label
flag to specify the replica label on which to deduplicate (hidden). Please note that this uses a NAIVE algorithm for merging (no smart replica deduplication, just chaining samples together). This works well for deduplication of blocks with precisely the same samples like those produced by Receiver replication. We plan to add a smarter algorithm in the following weeks. - #1714 Compact: the compact component now exposes the bucket web UI when it is run as a long-lived process.
- #2304 Store: added
max_item_size
configuration option to memcached-based index cache. This should be set to the max item size configured in memcached (-I
flag) in order to not waste network round-trips to cache items larger than the limit configured in memcached. - #2297 Store: add
--experimental.enable-index-cache-postings-compression
flag to enable re-encoding and compressing postings before storing them into the cache. Compressed postings take about 10% of the original size. - #2357 Compact and Store: the compact and store components now serve the bucket UI on
:<http-port>/loaded
, which shows exactly the blocks that are currently seen by compactor and the store gateway. The compactor also serves a different bucket UI on:<http-port>/global
, which shows the status of object storage without any filters. - #2166 Bucket Web: improve the tooltip for the bucket UI; it was reconstructed and now exposes much more information about blocks.
- #2172 Store: add support for sharding the store component based on the label hash.
- #2113 Bucket: added
thanos bucket replicate
command to replicate blocks from one bucket to another. - #1922 Docs: create a new document to explain sharding in Thanos.
- #2230 Store: optimize conversion of labels.
Changed
- #2136 breaking Store, Compact, Bucket: schedule block deletion by adding deletion-mark.json. This adds a consistent way for multiple readers and writers to access object storage.
Since there are no consistency guarantees provided by some Object Storage providers, this PR adds a consistent lock-free way of dealing with Object Storage irrespective of the choice of object storage. In order to achieve this co-ordination, blocks are not deleted directly. Instead, blocks are marked for deletion by uploading thedeletion-mark.json
file for the block that was chosen to be deleted. This file contains Unix time of when the block was marked for deletion. If you want to keep existing behavior, you should add--delete-delay=0s
as a flag. - #2090 breaking Downsample command: the
downsample
command has moved and is now a sub-command of thethanos bucket
sub-command; it cannot be called viathanos downsample
any more. - #2294 Store: optimizations for fetching postings. Queries using
=~".*"
matchers or negation matchers (!=...
or!~...
) benefit the most. - #2301 Ruler: exit with an error when initialization fails.
- #2310 Query: report timespan 0 to 0 when discovering no stores.
- #2330 Store: index-header is no longer experimental. It is enabled by default for store Gateway. You can disable it with new hidden flag:
--store.disable-index-header
. The--experimental.enable-index-header
flag was removed. - #1848 Ruler: allow returning error messages when a reload is triggered via HTTP.
- #2270 All: Thanos components will now print stack traces when they error out.
v0.11.0
Fixed
- #2033 Minio-go: Fixed Issue #1494 support Web Identity providers for IAM credentials for AWS EKS.
- #1985 Store Gateway: Fixed case where series entry is larger than 64KB in index.
- #2051 Ruler: Fixed issue where ruler does not expose shipper metrics.
- #2101 Ruler: Fixed bug where thanos_alert_sender_errors_total was not registered.
- #1789 Store Gateway: Improve timeouts.
- #2139 Properly handle SIGHUP for reloading.
- #2040 UI: Fix URL of alerts in Ruler
- #2033 Ruler: Fix tracing in Thanos Ruler
Added
- #2003 Query: Support downsampling for /series.
- #1952 Store Gateway: Implemented binary index header. This significantly reduces resource consumption (memory, CPU, net bandwidth) for startup and data loading processes as well as baseline memory. This means that adding more blocks into object storage, without querying them will use almost no resources. This, however, still means that querying large amounts of data will result in high spikes of memory and CPU use as before, due to simply fetching large amounts of metrics data. Since we fixed baseline, we are now focusing on query performance optimizations in separate initiatives. To enable experimental
index-header
mode run store with hiddenexperimental.enable-index-header
flag. - #2009 Store Gateway: Minimum age of all blocks before they are being read. Set it to a safe value (e.g 30m) if your object storage is eventually consistent. GCS and S3 are (roughly) strongly consistent.
- #1963 Mixin: Add Thanos Ruler alerts.
- #1984 Query: Add cache-control header to not cache on error.
- #1870 UI: Persist settings in query.
- #1969 Sidecar: allow setting http connection pool size via flags.
- #1967 Receive: Allow local TSDB compaction.
- #1939 Ruler: Add TLS and authentication support for query endpoints with the
--query.config
and--query.config-file
CLI flags. See documentation for further information. - #1982 Ruler: Add support for Alertmanager v2 API endpoints.
- #2030 Query: Add
thanos_proxy_store_empty_stream_responses_total
metric for number of empty responses from stores. - #2049 Tracing: Support sampling on Elastic APM with new sample_rate setting.
- #2008 Querier, Receiver, Sidecar, Store: Add gRPC health check endpoints.
- #2145 Tracing: track query sent to prometheus via remote read api.
Changed
- #1970 breaking Receive: Use gRPC for forwarding requests between peers. Note that existing values for the
--receive.local-endpoint
flag and the endpoints in the hashring configuration file must now specify the receive gRPC port and must be updated to be a simplehost:port
combination, e.g.127.0.0.1:10901
, rather than a full HTTP URL, e.g.http://127.0.0.1:10902/api/v1/receive
. - #1933 Add a flag
--tsdb.wal-compression
to configure whether to enable tsdb wal compression in ruler and receiver. - #2021 Rename metric
thanos_query_duplicated_store_address
tothanos_query_duplicated_store_addresses_total
andthanos_rule_duplicated_query_address
tothanos_rule_duplicated_query_addresses_total
.
v0.11.0-rc.1
Fixed
- #2189 minio-go: Fixed Issue #2181, unable to use IAM metadata credentials
- #2033 Minio-go: Fixed Issue #1494 support Web Identity providers for IAM credentials for AWS EKS.
- #1985 Store Gateway: Fixed case where series entry is larger than 64KB in index.
- #2051 Ruler: Fixed issue where ruler does not expose shipper metrics.
- #2101 Ruler: Fixed bug where thanos_alert_sender_errors_total was not registered.
- #1789 Store Gateway: Improve timeouts.
- #2139 Properly handle SIGHUP for reloading.
- #2040 UI: Fix URL of alerts in Ruler
- #2033 Ruler: Fix tracing in Thanos Ruler
Added
- #2003 Query: Support downsampling for /series.
- #1952 Store Gateway: Implemented binary index header. This significantly reduces resource consumption (memory, CPU, net bandwidth) for startup and data loading processes as well as baseline memory. This means that adding more blocks into object storage, without querying them will use almost no resources. This, however, still means that querying large amounts of data will result in high spikes of memory and CPU use as before, due to simply fetching large amounts of metrics data. Since we fixed baseline, we are now focusing on query performance optimizations in separate initiatives. To enable experimental
index-header
mode run store with hiddenexperimental.enable-index-header
flag. - #2009 Store Gateway: Minimum age of all blocks before they are being read. Set it to a safe value (e.g 30m) if your object storage is eventually consistent. GCS and S3 are (roughly) strongly consistent.
- #1963 Mixin: Add Thanos Ruler alerts.
- #1984 Query: Add cache-control header to not cache on error.
- #1870 UI: Persist settings in query.
- #1969 Sidecar: allow setting http connection pool size via flags.
- #1967 Receive: Allow local TSDB compaction.
- #1939 Ruler: Add TLS and authentication support for query endpoints with the
--query.config
and--query.config-file
CLI flags. See documentation for further information. - #1982 Ruler: Add support for Alertmanager v2 API endpoints.
- #2030 Query: Add
thanos_proxy_store_empty_stream_responses_total
metric for number of empty responses from stores. - #2049 Tracing: Support sampling on Elastic APM with new sample_rate setting.
- #2008 Querier, Receiver, Sidecar, Store: Add gRPC health check endpoints.
- #2145 Tracing: track query sent to prometheus via remote read api.
Changed
- #1970 breaking Receive: Use gRPC for forwarding requests between peers. Note that existing values for the
--receive.local-endpoint
flag and the endpoints in the hashring configuration file must now specify the receive gRPC port and must be updated to be a simplehost:port
combination, e.g.127.0.0.1:10901
, rather than a full HTTP URL, e.g.http://127.0.0.1:10902/api/v1/receive
. - #1933 Add a flag
--tsdb.wal-compression
to configure whether to enable tsdb wal compression in ruler and receiver. - #2021 Rename metric
thanos_query_duplicated_store_address
tothanos_query_duplicated_store_addresses_total
andthanos_rule_duplicated_query_address
tothanos_rule_duplicated_query_addresses_total
.
v0.11.0-rc.0
Fixed
- #2033 Minio-go: Fixed Issue #1494 support Web Identity providers for IAM credentials for AWS EKS.
- #1985 Store Gateway: Fixed case where series entry is larger than 64KB in index.
- #2051 Ruler: Fixed issue where ruler does not expose shipper metrics.
- #2101 Ruler: Fixed bug where thanos_alert_sender_errors_total was not registered.
- #1789 Store Gateway: Improve timeouts.
- #2139 Properly handle SIGHUP for reloading.
- #2040 UI: Fix URL of alerts in Ruler
- #2033 Ruler: Fix tracing in Thanos Ruler
Added
- #2003 Query: Support downsampling for /series.
- #1952 Store Gateway: Implemented binary index header. This significantly reduces resource consumption (memory, CPU, net bandwidth) for startup and data loading processes as well as baseline memory. This means that adding more blocks into object storage, without querying them will use almost no resources. This, however, still means that querying large amounts of data will result in high spikes of memory and CPU use as before, due to simply fetching large amounts of metrics data. Since we fixed baseline, we are now focusing on query performance optimizations in separate initiatives. To enable experimental
index-header
mode run store with hiddenexperimental.enable-index-header
flag. - #2009 Store Gateway: Minimum age of all blocks before they are being read. Set it to a safe value (e.g 30m) if your object storage is eventually consistent. GCS and S3 are (roughly) strongly consistent.
- #1963 Mixin: Add Thanos Ruler alerts.
- #1984 Query: Add cache-control header to not cache on error.
- #1870 UI: Persist settings in query.
- #1969 Sidecar: allow setting http connection pool size via flags.
- #1967 Receive: Allow local TSDB compaction.
- #1939 Ruler: Add TLS and authentication support for query endpoints with the
--query.config
and--query.config-file
CLI flags. See documentation for further information. - #1982 Ruler: Add support for Alertmanager v2 API endpoints.
- #2030 Query: Add
thanos_proxy_store_empty_stream_responses_total
metric for number of empty responses from stores. - #2049 Tracing: Support sampling on Elastic APM with new sample_rate setting.
- #2008 Querier, Receiver, Sidecar, Store: Add gRPC health check endpoints.
- #2145 Tracing: track query sent to prometheus via remote read api.
Changed
- #1970 breaking Receive: Use gRPC for forwarding requests between peers. Note that existing values for the
--receive.local-endpoint
flag and the endpoints in the hashring configuration file must now specify the receive gRPC port and must be updated to be a simplehost:port
combination, e.g.127.0.0.1:10901
, rather than a full HTTP URL, e.g.http://127.0.0.1:10902/api/v1/receive
. - #1933 Add a flag
--tsdb.wal-compression
to configure whether to enable tsdb wal compression in ruler and receiver. - #2021 Rename metric
thanos_query_duplicated_store_address
tothanos_query_duplicated_store_addresses_total
andthanos_rule_duplicated_query_address
tothanos_rule_duplicated_query_addresses_total
.
v0.10.1
v0.10.0
Thanks to all contributors! ❤️
Highlights: Store now supports memcached
; StoreAPI has a new skip-chunks
option which is used to greatly speed-up the /api/v1/series
end-point; Store/Compactor has improved synchronization of meta JSON files; Ruler supports TLS and authentication; fixed a potential data loss when uploading older blocks or when the upload is taking a long time while the Compactor is running; Compaction process should take significantly less RAM but a longer time.
❗ memcached
support is marked experimental for now ❗
As always, here is the detailed changelog:
Fixed
-
#1919 Compactor: Fixed potential data loss when uploading older blocks, or upload taking long time while compactor is
running. -
#1937 Compactor: Improved synchronization of meta JSON files.
Compactor now properly handles partial block uploads for all operation like retention apply, downsampling and compaction. Additionally:- Removed
thanos_compact_sync_meta_*
metrics. Usethanos_blocks_meta_*
metrics instead. - Added
thanos_consistency_delay_seconds
andthanos_compactor_aborted_partial_uploads_deletion_attempts_total
metrics.
- Removed
-
#1936 Store: Improved synchronization of meta JSON files. Store now properly handles corrupted disk cache. Added meta.json sync metrics.
-
#1856 Receive: close DBReadOnly after flushing to fix a memory leak.
-
#1882 Receive: upload to object storage as 'receive' rather than 'sidecar'.
-
#1907 Store: Fixed the duration unit for the metric
thanos_bucket_store_series_gate_duration_seconds
. -
#1931 Compact: Fixed the compactor successfully exiting when actually an error occurred while compacting a blocks group.
-
#1872 Ruler:
/api/v1/rules
now shows a properly formatted value -
#1945
master
container images are now built with Go 1.13 -
#1956 Ruler: now properly ignores duplicated query addresses
-
#1975 Store Gateway: fixed panic caused by memcached servers selector when there's 1 memcached node
Added
- #1852 Add support for
AWS_CONTAINER_CREDENTIALS_FULL_URI
by upgrading to minio-go v6.0.44 - #1854 Update Rule UI to support alerts count displaying and filtering.
- #1838 Ruler: Add TLS and authentication support for Alertmanager with the
--alertmanagers.config
and--alertmanagers.config-file
CLI flags. See documentation for further information. - #1838 Ruler: Add a new
--alertmanagers.sd-dns-interval
CLI option to specify the interval between DNS resolutions of Alertmanager hosts. - #1881 Store Gateway: memcached support for index cache. See documentation for further information.
- #1904 Add a skip-chunks option in Store Series API to improve the response time of
/api/v1/series
endpoint. - #1910 Query:
/api/v1/labels
now understandsPOST
- useful for sending bigger requests
Changed
-
#1947 Upgraded Prometheus dependencies to v2.15.2. This includes:
- Compactor: Significant reduction of memory footprint for compaction and downsampling process.
- Querier: Accepting spaces between time range and square bracket. e.g
[ 5m]
- Querier: Improved PromQL parser performance.
-
#1833
--shipper.upload-compacted
flag has been promoted to non hidden, non experimental state. More info available here. -
#1867 Ruler: now sets a
Thanos/$version
User-Agent
in requests -
#1887 Service discovery now deduplicates targets between different target groups
v0.10.0-rc.1
v0.10.0-rc.1
v0.10.0-rc.0
v0.10.0-rc.0
v0.9.0
Thanks to all contributors!
Worth-noting changes: Support for AlibabaCloud object storage; LightStep tracing; Ruler fixes, Store UI page fixed, Store gateway has now metrics for startup cycle plus optimization.
Added
- #1678 Add Lightstep as a tracing provider.
- #1687 Add a new
--grpc-grace-period
CLI option to components which serve gRPC to set how long to wait until gRPC Server shuts down. - #1660 Sidecar: Add a new
--prometheus.ready_timeout
CLI option to the sidecar to set how long to wait until Prometheus starts up. - #1573
AliYun OSS
object storage, see documents for further information. - #1680 Add a new
--http-grace-period
CLI option to components which serve HTTP to set how long to wait until HTTP Server shuts down. - #1712 Bucket: Rename flag on bucket web component from
--listen
to--http-address
to match other components. - #1733 Compactor: New metric
thanos_compactor_iterations_total
on Thanos Compactor which shows the number of successful iterations. - #1758 Bucket:
thanos bucket web
now supports--web.external-prefix
for proxying on a subpath. - #1770 Bucket: Add
--web.prefix-header
flags to allow for bucket UI to be accessible behind a reverse proxy. - #1668 Receiver: Added TLS options for both server and client remote write.
Fixed
- #1656 Store Gateway: Store now starts metric and status probe HTTP server earlier in its start-up sequence.
/-/healthy
endpoint now starts to respond with success earlier./metrics
endpoint starts serving metrics earlier as well. Make sure to point your readiness probes to the/-/ready
endpoint rather than/metrics
. - #1669 Store Gateway: Fixed store sharding. Now it does not load excluded meta.jsons and load/fetch index-cache.json files.
- #1670 Sidecar: Fixed un-ordered blocks upload. Sidecar now uploads the oldest blocks first.
- #1568 Store Gateway: Store now retains the first raw value of a chunk during downsampling to avoid losing some counter resets that occur on an aggregation boundary.
- #1751 Querier: Fixed labels for StoreUI
- #1773 Ruler: Fixed the /api/v1/rules endpoint that returned 500 status code with
failed to assert type of rule ...
message. - #1770 Querier: Fixed
--web.external-prefix
404s for static resources. - #1785 Ruler: The /api/v1/rules endpoints now returns the original rule filenames.
- #1791 Ruler: Ruler now supports identical rule filenames in different directories.
- #1562 Querier: Downsampling option now carries through URL.
- #1675 Querier: Reduced resource usage while using certain queries like
offset
. - #1725 & #1718 Store Gateway: Per request memory improvements.
Changed
- #1666 Compact:
thanos_compact_group_compactions_total
now counts block compactions, so operations that resulted in a compacted block. The old behaviour
is now exposed by new metric:thanos_compact_group_compaction_runs_started_total
andthanos_compact_group_compaction_runs_completed_total
which counts compaction runs overall. - #1748 Updated all dependencies.
- #1694
prober_ready
andprober_healthy
metrics are removed, for sake ofstatus
. Nowstatus
exposes same metric with a label,check
.check
can have "healty" or "ready" depending on status of the probe. - #1790 Ruler: Fixes subqueries support for ruler.
- #1769 & #1545 Adjusted most of the metrics histogram buckets.