Mitigate contention with dense, high-tempo operations #742

BenjaminPelletier · 2022-03-16T22:40:39Z

When prober's scd/test_operation_simple_heavy_traffic_concurrent.py test runs on a real-world cross-data-center distributed CRDB cluster with 100 concurrent operations, often 1 operation mutation will fail (as many as 3 failures observed) with one of the contention-type errors, usually ABORT_REASON_PUSHER_ABORTED.

Even when the number of concurrent operations is reduced to 40, it appears that a failure is sometimes observed even with 50 retries. This is not yet verified as the failure observation was made before #740 was merged, and #740 is what confirms the 50 retries.

While even 10-20 concurrency support should serve all foreseeable medium-term deployments, we should understand this failure better and identify a mitigation to enable future scaling. In the short term, we should reduce the number of concurrent operations to better align test acceptance criteria to current product needs (reducing concurrent operation count). In the long term, we should mitigate the issue and restore the higher concurrency limit.

The text was updated successfully, but these errors were encountered:

BenjaminPelletier added P2 Normal priority feature Issue would improve software labels Mar 16, 2022

BenjaminPelletier mentioned this issue Mar 16, 2022

[prober/scd] Align concurrency test acceptance criteria with short-term product needs #743

Merged

BenjaminPelletier added the dss Relating to one of the DSS implementations label Sep 20, 2022

BenjaminPelletier mentioned this issue Sep 20, 2022

Fix database indices #813

Open

This was referenced Feb 23, 2024

[dss] Reduce contention on OI creation/mutation endpoint #1004

Merged

SCD Traffic concurrent prober check fails on cloud deployment #1002

Closed

Shastick mentioned this issue Sep 10, 2024

[internal] prioritized issues for Q3-2024 #1110

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitigate contention with dense, high-tempo operations #742

Mitigate contention with dense, high-tempo operations #742

BenjaminPelletier commented Mar 16, 2022

Mitigate contention with dense, high-tempo operations #742

Mitigate contention with dense, high-tempo operations #742

Comments

BenjaminPelletier commented Mar 16, 2022