Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Clayton Cornell <[email protected]>
  • Loading branch information
ptodev and clayton-cornell authored Oct 16, 2024
1 parent afc9a8e commit eecade2
Showing 1 changed file with 6 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -408,22 +408,22 @@ prometheus.remote_write "default" {

### Out of order errors

You may sometimes see an "out of order" error in {{< param "PRODUCT_NAME" >}}'s logs.
You may sometimes see an "out of order" error in the {{< param "PRODUCT_NAME" >}} log files.
This means that {{< param "PRODUCT_NAME" >}} sent a metric sample which has an older timestamp than a sample which the database already ingested.
If your database is Mimir, the exact name of the [Mimir error][mimir-ooo-err] is `err-mimir-sample-out-of-order`.

The most common cause for this error is that there is more than one {{< param "PRODUCT_NAME" >}} instance scraping the same target.
To troubleshoot, take the following steps in order:
1. If you use clustering, check if the number of {{< param "PRODUCT_NAME" >}} instances changed at the time the error was logged.
This is the only situation in which it is normal to experience an our of order error.
The error would only happen for a short period of time, until the cluster stabilizes and all {{< param "PRODUCT_NAME" >}} instances have a new list of targets.
The error would only happen for a short period, until the cluster stabilizes and all {{< param "PRODUCT_NAME" >}} instances have a new list of targets.
Since the time duration for the cluster to stabilize is expected to be much shorter than the scrape interval, this isn't a real problem.
If the out of order error you are seeing is not related to scaling of clustered collectors, it must be investigated.
1. Check if there are active {{< param "PRODUCT_NAME" >}} instances which should not be running.
There may be an older collector instance which wasn't shut down before a new one was started.
2. Inspect the configuration to see if there could be multiple {{< param "PRODUCT_NAME" >}} instances which scrape the same target.
3. Inspect the WAL to see which {{< param "PRODUCT_NAME" >}} instance sent those metric samples.
The WAL is located in a directory which is set by the `--storage.path` argument of the [run command][run-cmd].
There may be an older {{< param "PRODUCT_NAME" >}} instance that wasn't shut down before a new one was started.
1. Inspect the configuration to see if there could be multiple {{< param "PRODUCT_NAME" >}} instances which scrape the same target.
1. Inspect the WAL to see which {{< param "PRODUCT_NAME" >}} instance sent those metric samples.
The WAL is located in a directory set by the [run command][run-cmd] `--storage.path` argument.
You can use [Promtool][promtool] to inspect it and find out which metric series were sent by this {{< param "PRODUCT_NAME" >}} instance since the last WAL truncation event.
For example:
```
Expand Down

0 comments on commit eecade2

Please sign in to comment.