You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When prober's scd/test_operation_simple_heavy_traffic_concurrent.py test runs on a real-world cross-data-center distributed CRDB cluster with 100 concurrent operations, often 1 operation mutation will fail (as many as 3 failures observed) with one of the contention-type errors, usually ABORT_REASON_PUSHER_ABORTED.
Even when the number of concurrent operations is reduced to 40, it appears that a failure is sometimes observed even with 50 retries. This is not yet verified as the failure observation was made before #740 was merged, and #740 is what confirms the 50 retries.
While even 10-20 concurrency support should serve all foreseeable medium-term deployments, we should understand this failure better and identify a mitigation to enable future scaling. In the short term, we should reduce the number of concurrent operations to better align test acceptance criteria to current product needs (reducing concurrent operation count). In the long term, we should mitigate the issue and restore the higher concurrency limit.
The text was updated successfully, but these errors were encountered:
When prober's
scd/test_operation_simple_heavy_traffic_concurrent.py
test runs on a real-world cross-data-center distributed CRDB cluster with 100 concurrent operations, often 1 operation mutation will fail (as many as 3 failures observed) with one of the contention-type errors, usually ABORT_REASON_PUSHER_ABORTED.Even when the number of concurrent operations is reduced to 40, it appears that a failure is sometimes observed even with 50 retries. This is not yet verified as the failure observation was made before #740 was merged, and #740 is what confirms the 50 retries.
While even 10-20 concurrency support should serve all foreseeable medium-term deployments, we should understand this failure better and identify a mitigation to enable future scaling. In the short term, we should reduce the number of concurrent operations to better align test acceptance criteria to current product needs (reducing concurrent operation count). In the long term, we should mitigate the issue and restore the higher concurrency limit.
The text was updated successfully, but these errors were encountered: