-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv/kvserver: TestFlowControlRaftSnapshotV2 failed #132642
Comments
kv/kvserver.TestFlowControlRaftSnapshotV2 failed on master @ a1b013e763abd1454603985996a45663b6e6bcad:
Parameters:
|
kv/kvserver.TestFlowControlRaftSnapshotV2 failed with artifacts on master @ e49b56cc0e983ca2ec9a80ea898dbba60a1c6992:
|
…a.mu This method will eventually acquire replicaSendStream.mu, which needs to be ordered before Replica.mu. Fixes cockroachdb#132637 cockroachdb#132638 cockroachdb#132639 cockroachdb#132640 cockroachdb#132641 cockroachdb#132642 cockroachdb#132643 cockroachdb#132644 cockroachdb#132645 cockroachdb#132646 cockroachdb#132647 cockroachdb#132648 cockroachdb#132649 Epic: CRDB-37515 Release note: None
…a.mu This method will eventually acquire replicaSendStream.mu, which needs to be ordered before Replica.mu. Fixes cockroachdb#132637 cockroachdb#132638 cockroachdb#132639 cockroachdb#132640 cockroachdb#132641 cockroachdb#132642 cockroachdb#132643 cockroachdb#132644 cockroachdb#132645 cockroachdb#132646 cockroachdb#132647 cockroachdb#132648 cockroachdb#132649 Epic: CRDB-37515 Release note: None
132676: kvserver: don't call Processor.AdmitRaftMuLocked while holding Replic… r=kvoli,pav-kv a=sumeerbhola …a.mu This method will eventually acquire replicaSendStream.mu, which needs to be ordered before Replica.mu. Fixes #132637 #132638 #132639 #132640 #132641 #132642 #132643 #132644 #132645 #132646 #132647 #132648 #132649 Epic: CRDB-37515 Release note: None Co-authored-by: sumeerbhola <[email protected]>
kv/kvserver.TestFlowControlRaftSnapshotV2 failed on master @ 025adb55b1d9d0072bee175c6d0581fc5d392b11:
Parameters:
|
Most recent failure cc @sumeerbhola:
|
…ing called with Replica.mu held It isn't convenient to not hold Replica.mu in the caller so instead we avoid needing replicaSendStream.mu (which is and must be ordered before Replica.mu). This is done by lifting replicaSendStream.mu.Tracker out of the mu struct. Additional changes: - All methods in replicaState and replicaSendStream are named to include what locks are held. This makes them verbose, but it is important for correctness. - Assertions are added for replicaSendStream.mu being held. - Todos are added to make Replica.raftMu and Replica.mu assertions free in replica_rac2 and rac2 code, and once that is done to add more assertions in rac2. - Todo is added to lift some more fields in replicaSendStream from inside mu (the main reason we need mu is for replicaSendStream.Notify). This todo is ordered after the previous one (more assertions). Fixes cockroachdb#132646, cockroachdb#132642 Epic: CRDB-37515 Release note: None
…ing called with Replica.mu held It isn't convenient to not hold Replica.mu in the caller so instead we avoid needing replicaSendStream.mu (which is and must be ordered before Replica.mu). This is done by lifting replicaSendStream.mu.Tracker out of the mu struct. Additional changes: - All methods in replicaState and replicaSendStream are named to include what locks are held. This makes them verbose, but it is important for correctness. - Assertions are added for replicaSendStream.mu being held. - Todos are added to make Replica.raftMu and Replica.mu assertions free in replica_rac2 and rac2 code, and once that is done to add more assertions in rac2. - Todo is added to lift some more fields in replicaSendStream from inside mu (the main reason we need mu is for replicaSendStream.Notify). This todo is ordered after the previous one (more assertions). Fixes cockroachdb#132646, cockroachdb#132642 Epic: CRDB-37515 Release note: None
132132: replica_rac2: check raft state instead of lead==self r=sumeerbhola a=pav-kv Raft `RawNode` can step down from leadership, and yet retain a record that it was the leader of the current term. This makes the `Lead() == self.ID` check not robust to step downs. To understand whether the `RawNode` is acting as the leader, we must consult its raft state explicitly. Epic: none Release note: none 132712: changefeedccl: fix failure to updating PTS in retryable errors r=rharding6373 a=asg0451 Previously, in the face of retryable errors updating PTS records, the records would not be updated due to mismanagement of state. Fixes: #132602 Release note (bug fix): Fixed an issue where changefeeds would fail to update protected timestamp records in the face of retryable errors. 132763: rac2,replica_rac2: fix deadlock due to HoldsSendTokensRaftMuLocked be… r=kvoli,pav-kv a=sumeerbhola …ing called with Replica.mu held It isn't convenient to not hold Replica.mu in the caller so instead we avoid needing replicaSendStream.mu (which is and must be ordered before Replica.mu). This is done by lifting replicaSendStream.mu.Tracker out of the mu struct. Additional changes: - All methods in replicaState and replicaSendStream are named to include what locks are held. This makes them verbose, but it is important for correctness. - Assertions are added for replicaSendStream.mu being held. - Todos are added to make Replica.raftMu and Replica.mu assertions free in replica_rac2 and rac2 code, and once that is done to add more assertions in rac2. - Todo is added to lift some more fields in replicaSendStream from inside mu (the main reason we need mu is for replicaSendStream.Notify). This todo is ordered after the previous one (more assertions). Fixes #132646, #132642 Epic: CRDB-37515 Release note: None 132765: drtprod: remove rollback for drt-scale r=nameisbhaskar a=vidit-bhat This PR removes rollback in case of any failures in the `drt_scale.yaml`. We want to see the error logs and check in the fixes. Also, added a fail safe to remove `certs-$CLUSTER` to avoid putting any old certs on the machine. Epic: none Release note: None Co-authored-by: Pavel Kalinnikov <[email protected]> Co-authored-by: Miles Frankel <[email protected]> Co-authored-by: sumeerbhola <[email protected]> Co-authored-by: Vidit Bhat <[email protected]>
kv/kvserver.TestFlowControlRaftSnapshotV2 failed on master @ 6d3f108b4de3ae24ebb1543ca3882144678f8fa2:
Parameters:
|
kv/kvserver.TestFlowControlRaftSnapshotV2 failed on master @ 3eb2bb04abaaaaac92f3f5f2f6952a30ada78de5:
Parameters:
|
The cause is a send queue forming shortly after n3 receives a snapshot, then immediately disappearing.
|
`TestFlowControl.*V2` tests assert on exact counters. This can be problematic if benign deltas occur while setting up the test, such a send queue forming when adding a new learner, but being quickly resolved. Clear the token metrics prior to commencing these tests, in order to prevent flakes that result from such deltas in setup. Fixes: cockroachdb#132642 Release note: None
`TestFlowControl.*V2` tests assert on exact counters. This can be problematic if benign deltas occur while setting up the test, such a send queue forming when adding a new learner, but being quickly resolved. Clear the token metrics prior to commencing these tests, in order to prevent flakes that result from such deltas in setup. Fixes: cockroachdb#132642 Release note: None
kv/kvserver.TestFlowControlRaftSnapshotV2 failed with artifacts on master @ b37ca45ae2eee82de92b778f4553f9fe8a19603e:
|
`TestFlowControl.*V2` tests assert on exact counters. This can be problematic if benign deltas occur while setting up the test, such a send queue forming when adding a new learner, but being quickly resolved. Clear the token metrics prior to commencing these tests, in order to prevent flakes that result from such deltas in setup. Fixes: cockroachdb#132642 Release note: None
`TestFlowControl.*V2` tests assert on exact counters. This can be problematic if benign deltas occur while setting up the test, such a send queue forming when adding a new learner, but being quickly resolved. Clear the token metrics prior to commencing these tests, in order to prevent flakes that result from such deltas in setup. Fixes: cockroachdb#132642 Release note: None
kv/kvserver.TestFlowControlRaftSnapshotV2 failed with artifacts on release-24.3 @ 4cbedefd790c75cb0f21f77ed8d917c8528a7d15:
|
kv/kvserver.TestFlowControlRaftSnapshotV2 failed on master @ 472ea07a5232c98536293d13bb46cca59f9f2cd0:
Parameters:
|
kv/kvserver.TestFlowControlRaftSnapshotV2 failed on release-24.3 @ 4cbedefd790c75cb0f21f77ed8d917c8528a7d15:
Parameters:
|
kv/kvserver.TestFlowControlRaftSnapshotV2 failed on master @ a1b013e763abd1454603985996a45663b6e6bcad:
Parameters:
attempt=1
deadlock=true
run=1
shard=16
Help
See also: How To Investigate a Go Test Failure (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-43198
The text was updated successfully, but these errors were encountered: