-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data inconsistency in etcd version 3.5.0 (3.5.x rollback> 3.4 upgrade-> 3.5) story #13514
Comments
Interesting. What's the exact etcd version ? [edit: see 3.5.0 in the title] In general the behavior you are describing might happen in: Last changes around the logic I'm aware of were in: #12762 (@wpedrak FYI) Things to verify
|
I am also thinking probably the the 3 members are not belong to the same cluster at all. Could you execute the following commands to double confirm?
Please also double confirm the etcd and etcdctl versions using command |
The fault in the environment has been rectified. My way of avoiding it: the client still uses clientv3 of the etcd 3.4 version. After I downgraded the etcd server version to 3.4.15, the problem does not occur again. |
|
I can't restart etcd because when I do, I get the panic: failed to recover v3 backend from snapshot, the error is "failed to find database snapshot file (snap: snapshot file doesn't exist)" |
wow, I have meet the same question, have U fixed it? |
I'm unable to reproduce this with Etcd v3.5.1 and etcdctl v3.4 and v3.5. I run 3 member cluster where each member restarts every 10 seconds with a periodic get revision (every 1 second) and etcdctl check perf to generate load. I have never seen an inconsistent read. Can you provide more detailed reproduction steps? |
This is a blocker for v3.5.2 so I will focus on this. Help would be appreciated! |
|
With a test specific for this problem on the way and no reproduction I'm inclined to remove this as a blocker for v3.5.2 release. |
This issue can be closed. I found the reason why consistency_index was not updated, but I did not find the reason why revision was inconsistent. |
Thank you @moonovo. That's a good story for rollback->rollforward safety, that @serathius I think in 3.5 the server should during bootstrap reconcile Term between WAL logs and bbolt considering bbolt to be authoritative source of truth. |
Thanks @ptabor for calling this out in the "3.5 inconsistency issue" email thread !!!! This issue looks worrying that makes 3.5.x rollback a risky operation. I've got a reproduction (manual testing) reproduction.log after following the provided description from @moonovo . etcd_endpoints=http://127.0.0.1:2379,http://127.0.0.1:22379,http://127.0.0.1:32379
echo $etcd_endpoints | tr ',' '\n' | xargs -I '{}' etcdctl --endpoints '{}' get foo -w json
{"header":{"cluster_id":9108655428580501207,"member_id":16513504468151473617,"revision":5,"raft_term":3},"kvs":[{"key":"Zm9v","create_revision":2,"mod_revision":5,"version":4,"value":"ZXJyb3I="}],"count":1}
{"header":{"cluster_id":9108655428580501207,"member_id":17404356270211841594,"revision":5,"raft_term":3},"kvs":[{"key":"Zm9v","create_revision":2,"mod_revision":5,"version":4,"value":"ZXJyb3I="}],"count":1}
{"header":{"cluster_id":9108655428580501207,"member_id":463034352321715000,"revision":9,"raft_term":3},"kvs":[{"key":"Zm9v","create_revision":2,"mod_revision":9,"version":8,"value":"ZXJyb3I="}],"count":1} To answer your question @moonovo
The lagging-behind consistent_index node will re-apply some raft entries when it restarts. (the consistent_index will be reloaded from disk and unnecessarily allow applying old raft entries because the following code) etcd/server/etcdserver/server.go Lines 2156 to 2163 in 99018a7
Also the worrying initial-corruption-check feature was broken like that /cc @serathius
I will work on the fix (and test) soon. |
Thanks @chaochn47, are you able to repeat this result? I've been trying to reproduce it with your reproduction log with no success. At the end I get the same revisions and versions across all members. Here is my failed_repro.log. |
Hi @michaljasionowski thanks for the input!
Yes, It is repeatable. I will covert it into code which makes it easy be reproduced by someone else.
I think I missed pasting something in the repro execution logs. After this step,
you need to kill infra10 and restart it which will reload its lagging-behind consistent_index from disk, reload raft storage from disk to memory, replay its log and apply to backend if necessary. In infra10, it will apply twice the number of raft entries from when it received mutations requests. Also please
This will help visualize the consistent_index with human-readable format instead of encoded plain bytes. Let me know if it still does not work for you, thanks. |
Please refer to discussion in pull/13844 and issues/13766 |
Thanks @chaochn47, that was very helpful. I managed to reproduce it as well
and also initial corruption check passed
|
This the script I used for reproduction: upgradedowngrade.log. It sets 3 variables at the beginning that have to be updated. |
I have been looking into this issue and found a strange behavior, looks like etcd v3.5.2 upon restart reapplies entries. Each restart results in etcd reapplying the last entry. I have simplified the script provided by @michaljasionowski If we look at data dir of:
Current guess is that this is caused by outdated term value caused by downgrade. cc @ptabor |
Looks like corruption happens before the restart, restarting just causes etcd to reapply the entries.
|
cc @ahrtr |
The cause of corruption is etcd/server/storage/schema/cindex.go Lines 66 to 69 in 451ea54
Outdated term field (caused by downgrade) results in etcd applying the record without updating CI. The case is artificial because downgrades are not officially supported, but this brought up two issues that are not related to downgrades. Still thinking what exactly is the problem that should be fixed:
Possibly both cases would be worth fixing. cc @spzala @ahrtr @ptabor for opinion. |
I will have a deep dive into this one in the following couple of days. I have been busy with 13854 in the past week. |
Looks like both issues were introduced here #12855 (comment) when we merged migrator consistent index update logic with WAL entry apply one. |
Intuitively applyEntries should get never executed if we are in the wrong term. |
When I enter the following command:
I get the response:
different
mod_revision
andversion
on 192.168.2.4 nodeReproduce Procedure:
If the second point is true, there will be data inconsistency? why can the node with broken data added to the etcd cluster?
The text was updated successfully, but these errors were encountered: