debug_assert causes leader to panic #898

pepinns · 2023-07-17T14:56:47Z

Describe the bug
The debug_assert in the progress entry module can cause the leader to panic if a node goes offline, and then comes back online after it's data has been wiped such that it's log is erased.

To Reproduce
Steps to reproduce the behavior:

Setup a cluster with static membership, using the rocks store from this repo
Every node calls init() with same membership on startup if they have no current leader defined.
Take down 1 node
Wipe out it's data directory
Bring node back online, it has no current leader stored so calls init()

Note: that if we remove the init() on startup at step 5 the leader still panics, but the follower never recovers.

Expected behavior
I expect that the node coming back online would be brought up to date by the leader.

Actual behavior
The follower returns a Conflict message, which causes the leader to panic due to a debug_assert call. Once the leader panics it is un-recoverable.

Env (please complete the following information):

Openraft version: tested on main branch
Does the bug still there in the latest version?(main for the latest release branch, or release-0.7 for 0.7 branch)? yes
Rust-toolchain: [e.g. cargo 1.65.0-nightly (ce40690a5 2022-08-09); Find toolchain with cargo version in a crate dir] nightly-2023-05-04
OS: [e.g. mac os 12.0.1, centos 7.0] ubuntu 20.04
CPU: [e.g. mac m1, x86_64] x86_64

Additional files:
Logs from my service right before the panic

RaftCore{id=201 cluster=dev}:replication{id=201 target=3}:main{session=44-Some(201):committed/Some(0-0) target=3 cluster=dev}:send_log_entries: openraft::replication: openraft/src/replication/mod.rs:340: append_entries resp req=(Some(44-477), Some(44-478)] resp=Conflict
RaftCore{id=201 cluster=dev}:replication{id=201 target=3}:main{session=44-Some(201):committed/Some(0-0) target=3 cluster=dev}:send_log_entries: openraft::replication: openraft/src/replication/mod.rs:420: update_conflicting target=3 request_id=3 conflict=44-477
RaftCore{id=201 cluster=dev}:replication{id=201 target=3}:main{session=44-Some(201):committed/Some(0-0) target=3 cluster=dev}: openraft::replication: openraft/openraft/src/replication/mod.rs:187: replication action done res=Ok(None)
RaftCore{id=201 cluster=dev}:replication{id=201 target=3}:main{session=44-Some(201):committed/Some(0-0) target=3 cluster=dev}: openraft::replication: openraft/openraft/src/replication/mod.rs:527: drain_events
RaftCore{id=201 cluster=dev}:main:runtime_loop{id=201}:handle_notify{state=Leader id=201}: openraft::core::raft_core: openraft/src/core/raft_core.rs:1153: recv from rx_notify: Replication command done: UpdateReplicationProgress: target: 3, id: 3, result: Ok(UTime { data: Conflict(LogId { leader_id: CommittedLeaderId { term: 44, p: PhantomData<u64> }, index: 477 }), utime: Some(Instant { tv_sec: 1453442, tv_nsec: 389792525 }) }), session_id: 44-Some(201):committed/Some(0-0)
RaftCore{id=201 cluster=dev}:main:runtime_loop{id=201}:handle_notify{state=Leader id=201}:handle_replication_progress: openraft::core::raft_core: openraft/src/core/raft_core.rs:1414: handle_replication_progress target=3 result=Ok(UTime { data: Conflict(LogId { leader_id: CommittedLeaderId { term: 44, p: PhantomData<u64> }, index: 477 }), utime: Some(Instant { tv_sec: 1453442, tv_nsec: 389792525 }) })
RaftCore{id=201 cluster=dev}:main:runtime_loop{id=201}:handle_notify{state=Leader id=201}:handle_replication_progress:update_progress: openraft::engine::handler::replication_handler: openraft/src/engine/handler/replication_handler/mod.rs:290: openraft::engine::handler::replication_handler::ReplicationHandler<_>::update_progress target=3 request_id=3 result=Ok(UTime { data: Conflict(LogId { leader_id: CommittedLeaderId { term: 44, p: PhantomData<u64> }, index: 477 }), utime: Some(Instant { tv_sec: 1453442, tv_nsec: 389792525 }) }) progress={201: {[Some(44-478), 479), inflight:None}, 5: {[Some(44-478), 479), inflight:None}, 1: {[Some(44-478), 479), inflight:None}, 3: {[Some(44-477), 478), inflight:Logs(id=3):(Some(44-477), Some(44-478)]}, 101: {[Some(44-478), 479), inflight:None}}
RaftCore{id=201 cluster=dev}:main:runtime_loop{id=201}:handle_notify{state=Leader id=201}:handle_replication_progress:update_progress:update_success_progress:update_leader_vote_clock: openraft::engine::handler::replication_handler: openraft/src/engine/handler/replication_handler/mod.rs:168: openraft::engine::handler::replication_handler::ReplicationHandler<_>::update_leader_vote_clock target=3 t=Instant { tv_sec: 1453442, tv_nsec: 389792525 }
RaftCore{id=201 cluster=dev}:main:runtime_loop{id=201}:handle_notify{state=Leader id=201}:handle_replication_progress:update_progress:update_success_progress:update_leader_vote_clock: openraft::engine::handler::replication_handler: openraft/src/engine/handler/replication_handler/mod.rs:176: granted leader vote clock after updating granted=Some(Instant { tv_sec: 1453441, tv_nsec: 733456621 }) clock_progress=VecProgress { quorum_set: Joint { data: [[1, 3, 5, 101, 201]], _p: PhantomData<(u64, alloc::vec::Vec<u64>)> }, granted: Some(Instant { tv_sec: 1453441, tv_nsec: 733456621 }), voter_count: 5, vector: [(3, Some(Instant { tv_sec: 1453442, tv_nsec: 389792525 })), (101, Some(Instant { tv_sec: 1453441, tv_nsec: 733727536 })), (5, Some(Instant { tv_sec: 1453441, tv_nsec: 733456621 })), (1, Some(Instant { tv_sec: 1453441, tv_nsec: 733357209 })), (201, None)], stat: Stat { update_count: 12, move_count: 12, is_quorum_count: 27 } }
RaftCore{id=201 cluster=dev}:main:runtime_loop{id=201}:handle_notify{state=Leader id=201}:handle_replication_progress:update_progress:update_success_progress:update_conflicting: openraft::progress::entry: openraft/src/progress/entry/mod.rs:115: update_conflict self=ProgressEntry { matching: Some(LogId { leader_id: CommittedLeaderId { term: 44, p: PhantomData<u64> }, index: 477 }), curr_inflight_id: 3, inflight: Logs { id: 3, log_id_range: LogIdRange { prev_log_id: Some(LogId { leader_id: CommittedLeaderId { term: 44, p: PhantomData<u64> }, index: 477 }), last_log_id: Some(LogId { leader_id: CommittedLeaderId { term: 44, p: PhantomData<u64> }, index: 478 }) } }, searching_end: 478 } request_id=3 conflict=477
RaftCore{id=201 cluster=dev}:main:runtime_loop{id=201}:handle_notify{state=Leader id=201}:handle_replication_progress:update_progress:try_purge_log: openraft::engine::handler::replication_handler: openraft/src/engine/handler/replication_handler/mod.rs:419: try_purge_log last_purged_log_id=None purge_upto=None
RaftCore{id=201 cluster=dev}:main:runtime_loop{id=201}:handle_notify{state=Leader id=201}:handle_replication_progress:update_progress:try_purge_log: openraft::engine::handler::replication_handler: openraft/src/engine/handler/replication_handler/mod.rs:426: no need to purge, return
thread 'api' panicked at 'assertion failed: matching_next <= end', openraft/src/progress/entry/mod.rs:199:9

Additional context
The following patch seems to resolve the issue, as it allows the leader to stay online, and the follower eventually gets updated.

index 0dba3c25..29a0c393 100644
--- a/openraft/src/progress/entry/mod.rs
+++ b/openraft/src/progress/entry/mod.rs
@@ -196,6 +196,9 @@ impl<NID: NodeId> ProgressEntry<NID> {
     }

     fn calc_mid(matching_next: u64, end: u64) -> u64 {
+        if matching_next > end {
+            return end;
+        }
         debug_assert!(matching_next <= end);
         let d = end - matching_next;
         let offset = d / 16 * 8;

The actual offending code could be in the append_entries on the follower, but I do think the leader should be resilient to faults and not panic in this case at the very least.
There was no error path in this code as it returns a Result<T, E> but the E is the same as the T in this case. The above was the simplest way I could think of to avoid this panic, but Ideally if the code has an error condition there would be an actual error return path so the Leader could re-act and perhaps fix whatever internal state is causing this situation.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-07-17T14:57:00Z

👋 Thanks for opening this issue!

Get help or engage by:

/help : to print help messages.
/assignme : to assign this issue to you.

drmingdrmer · 2023-07-18T13:35:48Z

Thank you for your detailed report. By the current design it should not be
considered as a bug.

Because the removed and then restarted node has unexpectedly changed its data,
the leader can not guarantee handling such condition correctly.

In a distributed system like Openraft, a node must guarantee the persisted
data wont change. Otherwise it is a severe bug and the entire system should be
shutdown at once to prevent further data damage.

However, we understand that in certain scenarios, like testing or
troubleshooting, you may need to wipe out all data and restart the node.

To address your specific use case, I propose adding a feature flag that relaxes
the replication progress checks. This would allow the leader node to continue data
replication to the follower, even in scenarios where all data has been wiped out
and the node restarted.

Please let me know if you have further questions.

pepinns · 2023-07-18T16:08:56Z

I guess I'm thinking of this slightly differently, in my case all the nodes are known in advance and configured as a cluster. These clusters are then deployed as an isolated unit of 5 machines.

I was thinking of production scenarios, where a node has to be replaced. In production the nodes could be VMs which can die at any time. Being able to bring a new one online to replace a failed one would make this easier to operate, and does seem to work if we don't panic in the leader.
With the panic, ( if handled and shutdown is initiated), then the rest of the cluster is fine, but the leader ends up shutting down abruptly and restarting.

Is there something I'm not considering here that makes this operational plan invalid or dangerous?

pepinns · 2023-07-18T16:42:30Z

The feature flag seems like a good plan though. I must admit I didn't find a good place in the codepath to check for this case.
Do you think the solution is on the Follower side codepath? Or in the leader's handling of the Conflict Message?

drmingdrmer · 2023-07-19T06:39:47Z

@pepinns
Yes it is dangerous: Replacing a node with another empty one may cause data loss.
Assumes the leader is N1, followers are N2,N3,N4,N5;

A log(x) that is replicated by N1 to N2,N3 is considered committed.
At this point, if N3 is replaced with an empty node, and at once the leader N1 is crashed. Then N5 may elected as a new leader with granted vote by N3,N4;
Then the new leader N5 will not have log x.

The standard way is call change_membership() to remove the crashed node, then start a new empty node, finally call change_membership() again to add the new empty node back to the cluster.

Another non-standard tricky way is to set up a cluster of 6 nodes. The quorum is 4 nodes thus you can replace one node with empty state without losing data.

I'll open a PR to show you where to add the patch to address the panicking issue.

pepinns · 2023-07-19T12:57:02Z

Thanks for clarifying, I was overlooking the fact that a log is committed before it's replicated everywhere.

…res loosen-follower-log-revert` Add a new feature flag `loosen-follower-log-revert`, to permit the follower's log to roll back to an earlier state without causing the leader to panic. Although log state reversion is typically seen as a bug, enabling it can be useful for testing or in some special scenarios. For instance, in an even number nodes cluster, erasing a node's data and then rebooting it(log reverts to empty) will not result in data loss. - Related issue: databendlabs#898

drmingdrmer · 2023-07-20T03:33:22Z

@pepinns
The log reversion issue should best be addressed when the conflicting event is reported to the progress:

Feature: permit follower log to revert to earlier state with --features loosen-follower-log-revert #903

drmingdrmer self-assigned this Jul 18, 2023

pepinns closed this as completed Jul 19, 2023

drmingdrmer mentioned this issue Jul 20, 2023

Feature: permit follower log to revert to earlier state with --features loosen-follower-log-revert #903

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

debug_assert causes leader to panic #898

debug_assert causes leader to panic #898

pepinns commented Jul 17, 2023

github-actions bot commented Jul 17, 2023

drmingdrmer commented Jul 18, 2023

pepinns commented Jul 18, 2023

pepinns commented Jul 18, 2023

drmingdrmer commented Jul 19, 2023

pepinns commented Jul 19, 2023

drmingdrmer commented Jul 20, 2023

debug_assert causes leader to panic #898

debug_assert causes leader to panic #898

Comments

pepinns commented Jul 17, 2023

github-actions bot commented Jul 17, 2023

drmingdrmer commented Jul 18, 2023

pepinns commented Jul 18, 2023

pepinns commented Jul 18, 2023

drmingdrmer commented Jul 19, 2023

pepinns commented Jul 19, 2023

drmingdrmer commented Jul 20, 2023