Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvserver: TestFlowControlRaftSnapshotV2 failed #132642

Open
cockroach-teamcity opened this issue Oct 15, 2024 · 11 comments · May be fixed by #132916
Open

kv/kvserver: TestFlowControlRaftSnapshotV2 failed #132642

cockroach-teamcity opened this issue Oct 15, 2024 · 11 comments · May be fixed by #132916
Assignees
Labels
A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). GA-blocker O-robot Originated from a bot. P-1 Issues/test failures with a fix SLA of 1 month T-kv KV Team

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Oct 15, 2024

kv/kvserver.TestFlowControlRaftSnapshotV2 failed on master @ a1b013e763abd1454603985996a45663b6e6bcad:

github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:162 kvcoord.(*txnSpanRefresher).SendLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_committer.go:188 kvcoord.(*txnCommitter).SendLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go:319 kvcoord.(*txnPipeliner).SendLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_seq_num_allocator.go:111 kvcoord.(*txnSeqNumAllocator).SendLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_heartbeater.go:265 kvcoord.(*txnHeartbeater).SendLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go:533 kvcoord.(*TxnCoordSender).Send ???
github.com/cockroachdb/cockroach/pkg/kv/db.go:1133 kv.(*DB).sendUsingSender ???
github.com/cockroachdb/cockroach/pkg/kv/txn.go:1287 kv.(*Txn).Send ???
github.com/cockroachdb/cockroach/pkg/kv/db.go:964 kv.sendAndFill ???
github.com/cockroachdb/cockroach/pkg/kv/txn.go:802 kv.(*Txn).Run ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_command.go:2588 kvserver.execChangeReplicasTxn.func2 ???
github.com/cockroachdb/cockroach/pkg/kv/txn.go:1051 kv.(*Txn).exec ???
github.com/cockroachdb/cockroach/pkg/kv/db.go:1097 kv.runTxn ???
github.com/cockroachdb/cockroach/pkg/kv/db.go:1060 kv.(*DB).TxnWithAdmissionControl ???
github.com/cockroachdb/cockroach/pkg/kv/db.go:1035 kv.(*DB).Txn ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_command.go:2452 kvserver.execChangeReplicasTxn ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_command.go:2081 kvserver.(*Replica).execReplicationChangesForVoters ???

goroutine 5125447 lock 0xc006f98468
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_create_replica.go:105 kvserver.(*Store).tryGetReplica ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_create_replica.go:104 kvserver.(*Store).tryGetReplica ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_create_replica.go:167 kvserver.(*Store).tryGetOrCreateReplica ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_create_replica.go:73 kvserver.(*Store).getOrCreateReplica ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:356 kvserver.(*Store).withReplicaForRequest ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:628 kvserver.(*Store).processRequestQueue ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:397 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???

goroutine 5125447 lock 0xc006f98a68
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:2296 kvserver.(*Replica).withRaftGroup ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica.go:181 kvserver.(*ReplicaMutex).Lock ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:2295 kvserver.(*Replica).withRaftGroup ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:610 kvserver.(*Replica).stepRaftGroupRaftMuLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:406 kvserver.(*Store).processRaftRequestWithReplica ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:626 kvserver.(*Store).processRequestQueue.func1 ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:361 kvserver.(*Store).withReplicaForRequest ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:628 kvserver.(*Store).processRequestQueue ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:397 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???

goroutine 56 lock 0xc00f51a198
github.com/cockroachdb/cockroach/pkg/util/admission/grant_coordinator.go:749 admission.(*GrantCoordinator).CPULoad ??? <<<<<
github.com/cockroachdb/cockroach/pkg/util/admission/grant_coordinator.go:748 admission.(*GrantCoordinator).CPULoad ???
github.com/cockroachdb/cockroach/pkg/util/goschedstats/runnable.go:231 goschedstats.(*schedStatsTicker).getStatsOnTick ???
github.com/cockroachdb/cockroach/pkg/util/goschedstats/runnable.go:161 goschedstats.init.0.func1 ???



Parameters:

  • attempt=1
  • deadlock=true
  • run=1
  • shard=16
Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-43198

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team labels Oct 15, 2024
@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestFlowControlRaftSnapshotV2 failed on master @ a1b013e763abd1454603985996a45663b6e6bcad:

        -  kvflowcontrol.tokens.eval.elastic.returned                        | 0 B     
        -  kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.eval.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B     
        -  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B     
        +  kvflowcontrol.tokens.eval.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.eval.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.elastic.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B      
        +  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.returned                        | 4.0 KiB  
        +  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B      
         
         
         -- Observe the total tracked tokens per-stream on n1. 2MiB is tracked for n1-n5;
         -- see last comment for an explanation why we're still deducting for n2, n3.
         SELECT range_id, store_id, crdb_internal.humanize_bytes(total_tracked_tokens::INT8)
        --- FAIL: TestFlowControlRaftSnapshotV2/v2_enabled_when_leader_level=2/kvadmission.flow_control.mode=apply_to_all (23.83s)

Parameters:

  • attempt=1
  • race=true
  • run=1
  • shard=16
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@nvanbenschoten nvanbenschoten added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 15, 2024
@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestFlowControlRaftSnapshotV2 failed with artifacts on master @ e49b56cc0e983ca2ec9a80ea898dbba60a1c6992:

        -  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.eval.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B     
        -  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B     
        +  kvflowcontrol.tokens.eval.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.eval.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.elastic.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B      
        +  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.returned                        | 4.0 KiB  
        +  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B      
         
         
         -- Observe the total tracked tokens per-stream on n1. 2MiB is tracked for n1-n5;
         -- see last comment for an explanation why we're still deducting for n2, n3.
         SELECT range_id, store_id, crdb_internal.humanize_bytes(total_tracked_tokens::INT8)
        --- FAIL: TestFlowControlRaftSnapshotV2/v2_enabled_when_leader_level=2/kvadmission.flow_control.mode=apply_to_all (4.28s)
=== RUN   TestFlowControlRaftSnapshotV2/v2_enabled_when_leader_level=2
    --- FAIL: TestFlowControlRaftSnapshotV2/v2_enabled_when_leader_level=2 (14.28s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Oct 15, 2024
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Oct 15, 2024
craig bot pushed a commit that referenced this issue Oct 15, 2024
132676: kvserver: don't call Processor.AdmitRaftMuLocked while holding Replic… r=kvoli,pav-kv a=sumeerbhola

…a.mu

This method will eventually acquire replicaSendStream.mu, which needs to be ordered before Replica.mu.

Fixes #132637 #132638 #132639 #132640 #132641 #132642 #132643 #132644 #132645 #132646 #132647 #132648 #132649

Epic: CRDB-37515

Release note: None

Co-authored-by: sumeerbhola <[email protected]>
@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestFlowControlRaftSnapshotV2 failed on master @ 025adb55b1d9d0072bee175c6d0581fc5d392b11:

          kvflowcontrol.tokens.eval.elastic.returned                        | 10 MiB   
          kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 4.0 MiB  
          kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B      
          kvflowcontrol.tokens.eval.regular.available                       | 80 MiB   
          kvflowcontrol.tokens.eval.regular.deducted                        | 0 B      
          kvflowcontrol.tokens.eval.regular.returned                        | 0 B      
          kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B      
          kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B      
          kvflowcontrol.tokens.send.elastic.available                       | 40 MiB   
          kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB   
          kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B      
          kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B      
          kvflowcontrol.tokens.send.elastic.returned                        | 10 MiB   
          kvflowcontrol.tokens.send.elastic.returned.disconnect             | 4.0 MiB  
          kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B      
          kvflowcontrol.tokens.send.regular.available                       | 80 MiB   
          kvflowcontrol.tokens.send.regular.deducted                        | 0 B      
          kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B      
          kvflowcontrol.tokens.send.regular.returned                        | 0 B      
          kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B      
          kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B      
        
        
        -- Observe the total tracked tokens per-stream on n1; there should be nothing.
        SELECT range_id, store_id, crdb_internal.humanize_bytes(total_tracked_tokens::INT8)
           FROM crdb_internal.kv_flow_control_handles_v2
        
          range_id | store_id | total_tracked_tokens  
        -----------+----------+-----------------------
          70       | 1        | 0 B                   
          70       | 2        | 0 B                   
          70       | 3        | 0 B                   
          70       | 4        | 0 B                   
          70       | 5        | 0 B                   
        
        
        -- Another view of tokens, using /inspectz-backed vtables.
        SELECT store_id,
        	   crdb_internal.humanize_bytes(available_eval_regular_tokens),
        	   crdb_internal.humanize_bytes(available_eval_elastic_tokens)
          FROM crdb_internal.kv_flow_controller_v2
         ORDER BY store_id ASC;
        
          range_id | eval_regular_available | eval_elastic_available  
        -----------+------------------------+-------------------------
          1        | 16 MiB                 | 8.0 MiB                 
          2        | 16 MiB                 | 8.0 MiB                 
          3        | 16 MiB                 | 8.0 MiB                 
          4        | 16 MiB                 | 8.0 MiB                 
          5        | 16 MiB                 | 8.0 MiB                 

Parameters:

  • attempt=1
  • deadlock=true
  • run=1
  • shard=17
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@kvoli kvoli added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Oct 16, 2024
@kvoli
Copy link
Collaborator

kvoli commented Oct 16, 2024

Most recent failure cc @sumeerbhola:

=== RUN   TestFlowControlRaftSnapshotV2/v2_enabled_when_leader_level=2/kvadmission.flow_control.mode=apply_to_all
POTENTIAL DEADLOCK: Inconsistent locking. saw this ordering in one goroutine:
happened before
github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go:2117 rac2.(*replicaState).handleReadyEntries ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go:2116 rac2.(*replicaState).handleReadyEntries ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go:1080 rac2.(*rangeController).HandleRaftEventRaftMuLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go:775 replica_rac2.(*processorImpl).HandleRaftReadyRaftMuLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1005 kvserver.(*Replica).handleRaftReadyRaftMuLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:837 kvserver.(*Replica).handleRaftReady ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:681 kvserver.(*Store).processReady ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:420 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???
happened after
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:2085 kvserver.(*Replica).sendRaftMessage ??? <<<<<
github.com/sasha-s/go-deadlock/external/com_github_sasha_s_go_deadlock/deadlock.go:116 go-deadlock.(*RWMutex).Lock ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica.go:181 kvserver.(*ReplicaMutex).Lock ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1568 kvserver.(*Replica).SendMsgApp ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go:2602 rac2.(*replicaSendStream).isEmptySendQueueLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go:2121 rac2.(*replicaState).handleReadyEntries ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go:1080 rac2.(*rangeController).HandleRaftEventRaftMuLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go:775 replica_rac2.(*processorImpl).HandleRaftReadyRaftMuLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1005 kvserver.(*Replica).handleRaftReadyRaftMuLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:837 kvserver.(*Replica).handleRaftReady ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:681 kvserver.(*Store).processReady ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:420 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???
in another goroutine: happened before
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1441 kvserver.(*Replica).tick ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica.go:181 kvserver.(*ReplicaMutex).Lock ???
github.com/sasha-s/go-deadlock/external/com_github_sasha_s_go_deadlock/deadlock.go:116 go-deadlock.(*RWMutex).Lock ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:710 kvserver.(*Store).processTick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:410 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???
happened after
github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go:1897 rac2.(*replicaSendStream).holdsTokens ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go:1896 rac2.(*replicaSendStream).holdsTokens ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvflowcontrol/rac2/range_controller.go:1337 rac2.(*rangeController).HoldsSendTokensRaftMuLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvflowcontrol/replica_rac2/processor.go:1097 replica_rac2.(*processorImpl).HoldsSendTokensRaftMuLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:606 kvserver.(*Replica).hasSendTokensRaftMuLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft_quiesce.go:338 kvserver.shouldReplicaQuiesceRaftMuLockedReplicaMuLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft_quiesce.go:195 kvserver.(*Replica).maybeQuiesceRaftMuLockedReplicaMuLocked ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1469 kvserver.(*Replica).tick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:710 kvserver.(*Store).processTick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:410 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???
Other goroutines holding locks:
goroutine 4437633 lock 0xc00121f6e8
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1430 kvserver.(*Replica).tick ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1429 kvserver.(*Replica).tick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:710 kvserver.(*Store).processTick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:410 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???
goroutine 4437562 lock 0xc002747be8
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1430 kvserver.(*Replica).tick ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1429 kvserver.(*Replica).tick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:710 kvserver.(*Store).processTick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:410 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???
goroutine 4437620 lock 0xc02a0f3ce8
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1441 kvserver.(*Replica).tick ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica.go:181 kvserver.(*ReplicaMutex).Lock ???
github.com/sasha-s/go-deadlock/external/com_github_sasha_s_go_deadlock/deadlock.go:116 go-deadlock.(*RWMutex).Lock ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:710 kvserver.(*Store).processTick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:410 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???
goroutine 32 lock 0xc01c4190b8
github.com/cockroachdb/cockroach/pkg/util/admission/grant_coordinator.go:749 admission.(*GrantCoordinator).CPULoad ??? <<<<<
github.com/cockroachdb/cockroach/pkg/util/admission/grant_coordinator.go:748 admission.(*GrantCoordinator).CPULoad ???
github.com/cockroachdb/cockroach/pkg/util/goschedstats/runnable.go:231 goschedstats.(*schedStatsTicker).getStatsOnTick ???
github.com/cockroachdb/cockroach/pkg/util/goschedstats/runnable.go:161 goschedstats.init.0.func1 ???
goroutine 4437447 lock 0xc0284e56e8
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1430 kvserver.(*Replica).tick ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1429 kvserver.(*Replica).tick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:710 kvserver.(*Store).processTick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:410 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???
goroutine 4437447 lock 0xc0284e5ce8
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1441 kvserver.(*Replica).tick ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica.go:181 kvserver.(*ReplicaMutex).Lock ???
github.com/sasha-s/go-deadlock/external/com_github_sasha_s_go_deadlock/deadlock.go:116 go-deadlock.(*RWMutex).Lock ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:710 kvserver.(*Store).processTick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:410 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???
goroutine 4437562 lock 0xc0027481e8
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1441 kvserver.(*Replica).tick ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica.go:181 kvserver.(*ReplicaMutex).Lock ???
github.com/sasha-s/go-deadlock/external/com_github_sasha_s_go_deadlock/deadlock.go:116 go-deadlock.(*RWMutex).Lock ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:710 kvserver.(*Store).processTick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:410 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???
goroutine 4437626 lock 0xc00c8fa468
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:836 kvserver.(*Replica).handleRaftReady ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:835 kvserver.(*Replica).handleRaftReady ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:681 kvserver.(*Store).processReady ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:420 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???
goroutine 4437903 lock 0xc005e971e8
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:836 kvserver.(*Replica).handleRaftReady ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:835 kvserver.(*Replica).handleRaftReady ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:681 kvserver.(*Store).processReady ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:420 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???
goroutine 4437620 lock 0xc02a0f36e8
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1430 kvserver.(*Replica).tick ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1429 kvserver.(*Replica).tick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:710 kvserver.(*Store).processTick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:410 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???
goroutine 4437447 lock 0xc03489a2e0
github.com/cockroachdb/cockroach/pkg/util/hlc/pkg/util/hlc/hlc.go:424 hlc.(*Clock).NowAsClockTimestamp ??? <<<<<
github.com/cockroachdb/cockroach/pkg/util/hlc/pkg/util/hlc/hlc.go:423 hlc.(*Clock).NowAsClockTimestamp ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:1465 kvserver.(*Replica).tick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:710 kvserver.(*Store).processTick ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:410 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???
goroutine 4437626 lock 0xc00c8faa68
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:919 kvserver.(*Replica).handleRaftReadyRaftMuLocked ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica.go:181 kvserver.(*ReplicaMutex).Lock ???
github.com/sasha-s/go-deadlock/external/com_github_sasha_s_go_deadlock/deadlock.go:116 go-deadlock.(*RWMutex).Lock ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:837 kvserver.(*Replica).handleRaftReady ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:681 kvserver.(*Store).processReady ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:420 kvserver.(*raftSchedulerShard).worker ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:319 kvserver.(*raftScheduler).Start.func2 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:498 stop.(*Stopper).RunAsyncTaskEx.func2 ???

sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Oct 16, 2024
…ing called with Replica.mu held

It isn't convenient to not hold Replica.mu in the caller so instead we
avoid needing replicaSendStream.mu (which is and must be ordered before
Replica.mu). This is done by lifting replicaSendStream.mu.Tracker out
of the mu struct.

Additional changes:
- All methods in replicaState and replicaSendStream are named to include
  what locks are held. This makes them verbose, but it is important for
  correctness.
- Assertions are added for replicaSendStream.mu being held.
- Todos are added to make Replica.raftMu and Replica.mu assertions free
  in replica_rac2 and rac2 code, and once that is done to add more
  assertions in rac2.
- Todo is added to lift some more fields in replicaSendStream from
  inside mu (the main reason we need mu is for replicaSendStream.Notify).
  This todo is ordered after the previous one (more assertions).

Fixes cockroachdb#132646, cockroachdb#132642

Epic: CRDB-37515

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Oct 16, 2024
…ing called with Replica.mu held

It isn't convenient to not hold Replica.mu in the caller so instead we
avoid needing replicaSendStream.mu (which is and must be ordered before
Replica.mu). This is done by lifting replicaSendStream.mu.Tracker out
of the mu struct.

Additional changes:
- All methods in replicaState and replicaSendStream are named to include
  what locks are held. This makes them verbose, but it is important for
  correctness.
- Assertions are added for replicaSendStream.mu being held.
- Todos are added to make Replica.raftMu and Replica.mu assertions free
  in replica_rac2 and rac2 code, and once that is done to add more
  assertions in rac2.
- Todo is added to lift some more fields in replicaSendStream from
  inside mu (the main reason we need mu is for replicaSendStream.Notify).
  This todo is ordered after the previous one (more assertions).

Fixes cockroachdb#132646, cockroachdb#132642

Epic: CRDB-37515

Release note: None
craig bot pushed a commit that referenced this issue Oct 16, 2024
132132: replica_rac2: check raft state instead of lead==self r=sumeerbhola a=pav-kv

Raft `RawNode` can step down from leadership, and yet retain a record that it was the leader of the current term. This makes the `Lead() == self.ID` check not robust to step downs. To understand whether the `RawNode` is acting as the leader, we must consult its raft state explicitly.

Epic: none
Release note: none

132712: changefeedccl: fix failure to updating PTS in retryable errors r=rharding6373 a=asg0451

Previously, in the face of retryable errors
updating PTS records, the records would not be
updated due to mismanagement of state.

Fixes: #132602

Release note (bug fix): Fixed an issue where
changefeeds would fail to update protected
timestamp records in the face of retryable errors.


132763: rac2,replica_rac2: fix deadlock due to HoldsSendTokensRaftMuLocked be… r=kvoli,pav-kv a=sumeerbhola

…ing called with Replica.mu held

It isn't convenient to not hold Replica.mu in the caller so instead we avoid needing replicaSendStream.mu (which is and must be ordered before Replica.mu). This is done by lifting replicaSendStream.mu.Tracker out of the mu struct.

Additional changes:
- All methods in replicaState and replicaSendStream are named to include what locks are held. This makes them verbose, but it is important for correctness.
- Assertions are added for replicaSendStream.mu being held.
- Todos are added to make Replica.raftMu and Replica.mu assertions free in replica_rac2 and rac2 code, and once that is done to add more assertions in rac2.
- Todo is added to lift some more fields in replicaSendStream from inside mu (the main reason we need mu is for replicaSendStream.Notify). This todo is ordered after the previous one (more assertions).

Fixes #132646, #132642

Epic: CRDB-37515

Release note: None

132765: drtprod: remove rollback for drt-scale r=nameisbhaskar a=vidit-bhat

This PR removes rollback in case of any failures
in the `drt_scale.yaml`. We want to see the error
logs and check in the fixes. Also, added a fail safe to remove 
`certs-$CLUSTER` to avoid putting any old certs on the machine.

Epic: none
Release note: None

Co-authored-by: Pavel Kalinnikov <[email protected]>
Co-authored-by: Miles Frankel <[email protected]>
Co-authored-by: sumeerbhola <[email protected]>
Co-authored-by: Vidit Bhat <[email protected]>
@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestFlowControlRaftSnapshotV2 failed on master @ 6d3f108b4de3ae24ebb1543ca3882144678f8fa2:

        -  kvflowcontrol.tokens.eval.elastic.returned                        | 0 B     
        -  kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.eval.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B     
        -  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B     
        +  kvflowcontrol.tokens.eval.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.eval.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.elastic.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B      
        +  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.returned                        | 4.0 KiB  
        +  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B      
         
         
         -- Observe the total tracked tokens per-stream on n1. 2MiB is tracked for n1-n5;
         -- see last comment for an explanation why we're still deducting for n2, n3.
         SELECT range_id, store_id, crdb_internal.humanize_bytes(total_tracked_tokens::INT8)
        --- FAIL: TestFlowControlRaftSnapshotV2/v2_enabled_when_leader_level=2/kvadmission.flow_control.mode=apply_to_all (24.05s)

Parameters:

  • attempt=1
  • race=true
  • run=1
  • shard=17
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestFlowControlRaftSnapshotV2 failed on master @ 3eb2bb04abaaaaac92f3f5f2f6952a30ada78de5:

        -  kvflowcontrol.tokens.eval.elastic.returned                        | 0 B     
        -  kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.eval.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B     
        -  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B     
        +  kvflowcontrol.tokens.eval.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.eval.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.elastic.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B      
        +  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.returned                        | 8.0 KiB  
        +  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B      
         
         
         -- Observe the total tracked tokens per-stream on n1. 2MiB is tracked for n1-n5;
         -- see last comment for an explanation why we're still deducting for n2, n3.
         SELECT range_id, store_id, crdb_internal.humanize_bytes(total_tracked_tokens::INT8)
        --- FAIL: TestFlowControlRaftSnapshotV2/v2_enabled_when_leader_level=2/kvadmission.flow_control.mode=apply_to_all (57.91s)

Parameters:

  • attempt=1
  • race=true
  • run=1
  • shard=17
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@kvoli kvoli added P-1 Issues/test failures with a fix SLA of 1 month branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 A-testing Testing tools and infrastructure labels Oct 18, 2024
@kvoli
Copy link
Collaborator

kvoli commented Oct 18, 2024

The cause is a send queue forming shortly after n3 receives a snapshot, then immediately disappearing.

I241018 14:59:17.713487 48217 kv/kvserver/replica_raftstorage.go:521 ⋮ [T1,Vsystem,n3,s3,r70/3:‹/{Table/Max-Max}›] 3793  applied snapshot f06c9078 from (n1,s1):1 at applied index 15 ‹as write ›(total=3ms data=850 B ingestion=6@1ms)
I241018 14:59:17.721517 41483 kv/kvserver/kvflowcontrol/rac2/range_controller.go:2028 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›,raft] 3794  creating send stream t‹1›/s‹3› for replica (n3,s3):3LEARNER
I241018 14:59:17.722846 47803 kv/kvserver/kvflowcontrol/rac2/token_counter.go:650 ⋮ [-] 3795  adjusted send flow tokens (wc=elastic stream=t‹1›/s‹3› delta=‹-4.0 KiB› flag=‹normal›): regular=‹+16 MiB› elastic=‹+8.0 MiB›
I241018 14:59:17.723214 41485 kv/kvserver/kvflowcontrol/rac2/range_controller.go:1992 ⋮ [T1,Vsystem,n1,s1,r70/1:‹/{Table/Max-Max}›,raft] 3796  r70:(n1,s1):1 stream t‹1›/s‹1› admit term:6, admitted:[LowPri:17,NormalPri:17,AboveNormalPri:17,HighPri:17]
I241018 14:59:17.723696 41485 kv/kvserver/kvflowcontrol/rac2/token_counter.go:650 ⋮ [T1,Vsystem,n1] 3797  adjusted send flow tokens (wc=elastic stream=t‹1›/s‹3› delta=‹+4.0 KiB› flag=‹normal›): regular=‹+16 MiB› elastic=‹+8.0 MiB›

kvoli added a commit to kvoli/cockroach that referenced this issue Oct 18, 2024
`TestFlowControl.*V2` tests assert on exact counters. This can be
problematic if benign deltas occur while setting up the test, such a
send queue forming when adding a new learner, but being quickly
resolved.

Clear the token metrics prior to commencing these tests, in order to
prevent flakes that result from such deltas in setup.

Fixes: cockroachdb#132642
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 18, 2024
`TestFlowControl.*V2` tests assert on exact counters. This can be
problematic if benign deltas occur while setting up the test, such a
send queue forming when adding a new learner, but being quickly
resolved.

Clear the token metrics prior to commencing these tests, in order to
prevent flakes that result from such deltas in setup.

Fixes: cockroachdb#132642
Release note: None
@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestFlowControlRaftSnapshotV2 failed with artifacts on master @ b37ca45ae2eee82de92b778f4553f9fe8a19603e:

        -  kvflowcontrol.tokens.eval.elastic.returned                        | 0 B     
        -  kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.eval.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B     
        -  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B     
        +  kvflowcontrol.tokens.eval.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.eval.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.elastic.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B      
        +  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.returned                        | 4.0 KiB  
        +  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B      
         
         
         -- Observe the total tracked tokens per-stream on n1. 2MiB is tracked for n1-n5;
         -- see last comment for an explanation why we're still deducting for n2, n3.
         SELECT range_id, store_id, crdb_internal.humanize_bytes(total_tracked_tokens::INT8)
        --- FAIL: TestFlowControlRaftSnapshotV2/v2_enabled_when_leader_level=2/kvadmission.flow_control.mode=apply_to_all (8.60s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

kvoli added a commit to kvoli/cockroach that referenced this issue Oct 18, 2024
`TestFlowControl.*V2` tests assert on exact counters. This can be
problematic if benign deltas occur while setting up the test, such a
send queue forming when adding a new learner, but being quickly
resolved.

Clear the token metrics prior to commencing these tests, in order to
prevent flakes that result from such deltas in setup.

Fixes: cockroachdb#132642
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 18, 2024
`TestFlowControl.*V2` tests assert on exact counters. This can be
problematic if benign deltas occur while setting up the test, such a
send queue forming when adding a new learner, but being quickly
resolved.

Clear the token metrics prior to commencing these tests, in order to
prevent flakes that result from such deltas in setup.

Fixes: cockroachdb#132642
Release note: None
@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestFlowControlRaftSnapshotV2 failed with artifacts on release-24.3 @ 4cbedefd790c75cb0f21f77ed8d917c8528a7d15:

        -  kvflowcontrol.tokens.eval.elastic.returned                        | 0 B     
        -  kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.eval.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B     
        -  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B     
        +  kvflowcontrol.tokens.eval.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.eval.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.elastic.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B      
        +  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.returned                        | 4.0 KiB  
        +  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B      
         
         
         -- Observe the total tracked tokens per-stream on n1. 2MiB is tracked for n1-n5;
         -- see last comment for an explanation why we're still deducting for n2, n3.
         SELECT range_id, store_id, crdb_internal.humanize_bytes(total_tracked_tokens::INT8)
        --- FAIL: TestFlowControlRaftSnapshotV2/v2_enabled_when_leader_level=2/kvadmission.flow_control.mode=apply_to_all (4.96s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestFlowControlRaftSnapshotV2 failed on master @ 472ea07a5232c98536293d13bb46cca59f9f2cd0:

        -  kvflowcontrol.tokens.eval.elastic.returned                        | 0 B     
        -  kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.eval.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B     
        -  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B     
        +  kvflowcontrol.tokens.eval.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.eval.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.elastic.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B      
        +  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.returned                        | 4.0 KiB  
        +  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B      
         
         
         -- Observe the total tracked tokens per-stream on n1. 2MiB is tracked for n1-n5;
         -- see last comment for an explanation why we're still deducting for n2, n3.
         SELECT range_id, store_id, crdb_internal.humanize_bytes(total_tracked_tokens::INT8)
        --- FAIL: TestFlowControlRaftSnapshotV2/v2_enabled_when_leader_level=2/kvadmission.flow_control.mode=apply_to_all (24.48s)

Parameters:

  • attempt=1
  • race=true
  • run=2
  • shard=18
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestFlowControlRaftSnapshotV2 failed on release-24.3 @ 4cbedefd790c75cb0f21f77ed8d917c8528a7d15:

        -  kvflowcontrol.tokens.eval.elastic.returned                        | 0 B     
        -  kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.eval.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B     
        -  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B     
        -  kvflowcontrol.tokens.send.regular.available                       | 70 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB  
        -  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned                        | 0 B     
        -  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B     
        -  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B     
        +  kvflowcontrol.tokens.eval.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.eval.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.elastic.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.eval.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.eval.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.eval.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.eval.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.eval.regular.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.available                       | 30 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.elastic.deducted.force_flush_send_queue | 0 B      
        +  kvflowcontrol.tokens.send.elastic.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.elastic.returned                        | 4.0 KiB  
        +  kvflowcontrol.tokens.send.elastic.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.elastic.unaccounted                     | 0 B      
        +  kvflowcontrol.tokens.send.regular.available                       | 70 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted                        | 10 MiB   
        +  kvflowcontrol.tokens.send.regular.deducted.prevent_send_queue     | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned                        | 0 B      
        +  kvflowcontrol.tokens.send.regular.returned.disconnect             | 0 B      
        +  kvflowcontrol.tokens.send.regular.unaccounted                     | 0 B      
         
         
         -- Observe the total tracked tokens per-stream on n1. 2MiB is tracked for n1-n5;
         -- see last comment for an explanation why we're still deducting for n2, n3.
         SELECT range_id, store_id, crdb_internal.humanize_bytes(total_tracked_tokens::INT8)
        --- FAIL: TestFlowControlRaftSnapshotV2/v2_enabled_when_leader_level=2/kvadmission.flow_control.mode=apply_to_all (26.69s)

Parameters:

  • attempt=1
  • race=true
  • run=1
  • shard=17
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). GA-blocker O-robot Originated from a bot. P-1 Issues/test failures with a fix SLA of 1 month T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants