[Feature Request] Redefine the computation of segment replication metrics in Node Stats #16801

vinaykpud · 2024-12-07T21:35:15Z

Is your feature request related to a problem? Please describe

Currently, during segment replication, primary shards collect stats from each of their replicas. However, search replicas will not report stats to the primary shard.
Node stats return the following metrics, which are based on stats collected by the primary from its replicas:

segments.segment_replication.max_bytes_behind
segments.segment_replication.total_bytes_behind
segments.segment_replication.max_replication_lag

We want to

Unify the way stats is calculated and reported for replicas and search replica
Redefine how lag is computed

Describe the solution you'd like

The idea is to stop comparing replicas with the primary, as is done currently, and instead report stats for each replica based on its ongoing replication events, similar to cat recovery. This approach decouples the primary from stats calculation, and the primary shard will no longer need to maintain segment replication checkpoint information for all its replicas in the ReplicationTracker.

We can generate metrics to answer the following questions:

How many bytes are left to be downloaded to the replica?
How long is the ongoing replication taking?

This will decouple the behavior so that we can use point in time stats of each shard for the node stats calculation for segment replication.

Please note that we are currently modifying how search replicas compute and return stats in cat segrep. Details are discussed here: #15534.

Related component

Search:Performance

The text was updated successfully, but these errors were encountered:

sachinpkale · 2024-12-10T05:07:12Z

search replicas will not report stats to the primary shard.

I assume "search replicas" term is from Reader-Writer separation proposal #15237 and it would be using pull based replication. Is that correct?

How many bytes are left to be downloaded to the replica?

If replica is overloaded and download is slow, search replica would be downloading data from metadata that was polled, let's say 10 mins back. Unless it polls the latest metadata again, will we be able to understand the bytes left to be downloaded?

gbbafna · 2024-12-10T06:48:13Z

This approach decouples the primary from stats calculation, and the primary shard will no longer need to maintain segment replication checkpoint information for all its replicas in the ReplicationTracker.

But they will still need this information for doing write rejections due to searchers lagging .

I like this method as it will help in pin pointing exact node/shard copy which is having issue, as opposed to a convoluted approach today.

mch2 · 2024-12-12T22:15:02Z

I think we can do something like this:

Each replica regardless of type tracks the latest received checkpoint that includes a timestamp of when the primary (writer) refreshed on it. Search replica would have to poll for the latest cp from the store while writer replicas get this already when checkpoints are published. @sachinpkale is correct, the current pull based replication skips the metadata fetch if the search replica is already running replication, so we'd have to add a separate process to fetch the cp or allow the checkpoint step to continue.

Node stats should report the state of the replicas on the node, not the state of the replicas of the primaries on the node (as is today). Each shard then compares its currently served checkpoint to the "latest received". If the sis version is different ie the shard is behind the store, lag becomes the diff between current time & the latest received cp's timestamp which is roughly the replication lag - upload time. Bytes behind is already computed by the diff of two checkpoints as metadata is included. Checkpoints behind could be the diff in sis version, although this metric is pretty useless imo.

Backpressure:
We update the existing async task to enforce pressure by instead fetching the checkpoints of each replica and compute these stats on the fly rather than precomputed metrics on the primary. Search replica state would not be factored in. We will need a separate mechanism to track search replica freshness & fail them accordingly.

Ultimately we are giving up some precision on lag computation for simplicity, which I think is well worth it.

vinaykpud added enhancement Enhancement or improvement to existing feature or request untriaged labels Dec 7, 2024

github-actions bot added the Search:Performance label Dec 7, 2024

mch2 mentioned this issue Dec 9, 2024

[META] Reader/Writer Separation #15306

Open

14 tasks

peterzhuamazon added this to Search Project Board Dec 19, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Redefine the computation of segment replication metrics in Node Stats #16801

[Feature Request] Redefine the computation of segment replication metrics in Node Stats #16801

vinaykpud commented Dec 7, 2024 •

edited

Loading

sachinpkale commented Dec 10, 2024

gbbafna commented Dec 10, 2024

mch2 commented Dec 12, 2024 •

edited

Loading

[Feature Request] Redefine the computation of segment replication metrics in Node Stats #16801

[Feature Request] Redefine the computation of segment replication metrics in Node Stats #16801

Comments

vinaykpud commented Dec 7, 2024 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

sachinpkale commented Dec 10, 2024

gbbafna commented Dec 10, 2024

mch2 commented Dec 12, 2024 • edited Loading

vinaykpud commented Dec 7, 2024 •

edited

Loading

mch2 commented Dec 12, 2024 •

edited

Loading