Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop writing v2store and validate confChange against v3store instead of v2store #17019

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 7 additions & 29 deletions server/etcdserver/api/membership/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -263,9 +263,6 @@ func (c *RaftCluster) Recover(onSet func(*zap.Logger, *semver.Version)) {
if c.be != nil {
c.version = c.be.ClusterVersionFromBackend()
c.members, c.removed = c.be.MustReadMembersFromBackend()
} else {
c.version = clusterVersionFromStore(c.lg, c.v2store)
c.members, c.removed = membersFromStore(c.lg, c.v2store)
}
c.buildMembershipMetric()

Expand Down Expand Up @@ -303,9 +300,13 @@ func (c *RaftCluster) Recover(onSet func(*zap.Logger, *semver.Version)) {

// ValidateConfigurationChange takes a proposed ConfChange and
// ensures that it is still valid.
func (c *RaftCluster) ValidateConfigurationChange(cc raftpb.ConfChange) error {
// TODO: this must be switched to backend as well.
membersMap, removedMap := membersFromStore(c.lg, c.v2store)
func (c *RaftCluster) ValidateConfigurationChange(cc raftpb.ConfChange, shouldApplyV3 ShouldApplyV3) error {
// It makes no sense to validate a confChange if we don't apply it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not true, we are still applying confChange as long as we bootstrap from snapshot and not consistent index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is a longstanding design problem to me.

Current status summary

  • in older etcd versions ( < 3.4?) we load all data from v2store, and replay WAL entries on top of the snapshot. All work well.
  • Starting from 3.5, we load normal data from v3store and membership data still from v2store, and replay WAL entries on top of v2 snapshot. It works well for conf change. A little ugly for normal key/value data, but it still works.
  • Starting from 3.6 (main branch for now), we load all data from v3store on startup, but we still replay WAL entries on top of v2 snapshot.
    • For normal key/value data, etcd skips already applied entries; it works although ugly.
    • For conf change, the principle of "skiping already applied entries" doesn't work anymore.
      • For etcd, it already loaded membership data from v3store, so etcd can just skip already applied entries.
      • But raft doesn't know the latest membership data, we still depend on the raft loop to re-apply the conf change (with the old entry index < consistent-index), so as to notify raft all the membership data. So etcdd has to call raft's ApplyConfChange even etcd just skips it

Solutions

Short-term solutions

  1. Validate the conf change against v2store if it is already in v3store (shouldApplyV3 == false)
  2. apply conf change (read from v3store) to raft directly when restarting etcd members to bypass the raft loop? (need investigation)

Long-term solutions

  1. Bootstrap etcd from consistent_index.
    • It works for key/value data.
    • For conf change, it still needs to apply the conf change to raft directly when etcd restarting to bypass the raft loop.
  2. Generate v3 snapshot periodically (similar to what etcd did on v2 snapshot), and bootstrap etcd from v3 snapshot, and then replay WAL entries on top of it.
  • need to evaluate the performance of generate v3snapshot on huge db file (e.g. 8GB)
  • The disk space usage will be multiple times bigger, depending on how many snapshot files to keep.
  • Then why do we need the latest bbolt db file in such case?
  1. Generate v3 snapshot periodically (similar to what etcd did on v2 snapshot), but bootstrap etcd from consistent_index. The v3 snapshot files are just for corruption recovery, especially in single-node edge case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate the conf change against v2store if it is already in v3store (shouldApplyV3 == false)

I know, #17017

Copy link
Member

@serathius serathius Nov 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply conf change (read from v3store) to raft directly when restarting etcd members to bypass the raft loop? (need investigation)

I have draft implementation for that. #17022

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to evaluate the performance of generate v3snapshot on huge db file (e.g. 8GB)

v3 db snapshot is not a thing. That's an incompatible concept. Recommend reading the raft paper on in-memory snapshotable storage (etcd v2) vs persistent one (etcd v3).

if !shouldApplyV3 {
return nil
}
membersMap, removedMap := c.be.MustReadMembersFromBackend()

id := types.ID(cc.NodeID)
if removedMap[id] {
return ErrIDRemoved
Expand Down Expand Up @@ -390,9 +391,6 @@ func (c *RaftCluster) ValidateConfigurationChange(cc raftpb.ConfChange) error {
func (c *RaftCluster) AddMember(m *Member, shouldApplyV3 ShouldApplyV3) {
c.Lock()
defer c.Unlock()
if c.v2store != nil {
mustSaveMemberToStore(c.lg, c.v2store, m)
}
if c.be != nil && shouldApplyV3 {
c.be.MustSaveMemberToBackend(m)

Expand Down Expand Up @@ -423,9 +421,6 @@ func (c *RaftCluster) AddMember(m *Member, shouldApplyV3 ShouldApplyV3) {
func (c *RaftCluster) RemoveMember(id types.ID, shouldApplyV3 ShouldApplyV3) {
c.Lock()
defer c.Unlock()
if c.v2store != nil {
mustDeleteMemberFromStore(c.lg, c.v2store, id)
}
if c.be != nil && shouldApplyV3 {
c.be.MustDeleteMemberFromBackend(id)

Expand Down Expand Up @@ -467,9 +462,6 @@ func (c *RaftCluster) UpdateAttributes(id types.ID, attr Attributes, shouldApply

if m, ok := c.members[id]; ok {
m.Attributes = attr
if c.v2store != nil {
mustUpdateMemberAttrInStore(c.lg, c.v2store, m)
}
if c.be != nil && shouldApplyV3 {
c.be.MustSaveMemberToBackend(m)
}
Expand Down Expand Up @@ -499,12 +491,6 @@ func (c *RaftCluster) PromoteMember(id types.ID, shouldApplyV3 ShouldApplyV3) {
c.Lock()
defer c.Unlock()

if c.v2store != nil {
m := *(c.members[id])
m.RaftAttributes.IsLearner = false
mustUpdateMemberInStore(c.lg, c.v2store, &m)
}

if c.be != nil && shouldApplyV3 {
c.members[id].RaftAttributes.IsLearner = false
c.updateMembershipMetric(id, true)
Expand All @@ -528,11 +514,6 @@ func (c *RaftCluster) UpdateRaftAttributes(id types.ID, raftAttr RaftAttributes,
c.Lock()
defer c.Unlock()

if c.v2store != nil {
m := *(c.members[id])
m.RaftAttributes = raftAttr
mustUpdateMemberInStore(c.lg, c.v2store, &m)
}
if c.be != nil && shouldApplyV3 {
c.members[id].RaftAttributes = raftAttr
c.be.MustSaveMemberToBackend(c.members[id])
Expand Down Expand Up @@ -589,9 +570,6 @@ func (c *RaftCluster) SetVersion(ver *semver.Version, onSet func(*zap.Logger, *s
c.version = ver
sv := semver.Must(semver.NewVersion(version.Version))
serverversion.MustDetectDowngrade(c.lg, sv, c.version)
if c.v2store != nil {
mustSaveClusterVersionToStore(c.lg, c.v2store, ver)
}
if c.be != nil && shouldApplyV3 {
c.be.MustSaveClusterVersionToBackend(ver)
}
Expand Down
69 changes: 4 additions & 65 deletions server/etcdserver/api/membership/cluster_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,12 @@ package membership
import (
"encoding/json"
"fmt"
"path"
"reflect"
"testing"

"github.com/stretchr/testify/assert"
"go.uber.org/zap/zaptest"

"go.etcd.io/etcd/client/pkg/v3/testutil"
"go.etcd.io/etcd/client/pkg/v3/types"
"go.etcd.io/etcd/server/v3/etcdserver/api/v2store"
"go.etcd.io/etcd/server/v3/mock/mockstore"
Expand Down Expand Up @@ -278,6 +276,8 @@ func TestClusterValidateAndAssignIDs(t *testing.T) {

func TestClusterValidateConfigurationChange(t *testing.T) {
cl := NewCluster(zaptest.NewLogger(t), WithMaxLearners(1))
be := newMembershipBackend()
cl.SetBackend(be)
cl.SetStore(v2store.New())
for i := 1; i <= 4; i++ {
var isLearner bool
Expand Down Expand Up @@ -455,7 +455,7 @@ func TestClusterValidateConfigurationChange(t *testing.T) {
},
}
for i, tt := range tests {
err := cl.ValidateConfigurationChange(tt.cc)
err := cl.ValidateConfigurationChange(tt.cc, true)
if err != tt.werr {
t.Errorf("#%d: validateConfigurationChange error = %v, want %v", i, err, tt.werr)
}
Expand Down Expand Up @@ -517,52 +517,6 @@ func TestNodeToMemberBad(t *testing.T) {
}
}

func TestClusterAddMember(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please migrate such tests v3 instead of just removing them

st := mockstore.NewRecorder()
c := newTestCluster(t, nil)
c.SetStore(st)
c.AddMember(newTestMember(1, nil, "node1", nil), true)

wactions := []testutil.Action{
{
Name: "Create",
Params: []any{
path.Join(StoreMembersPrefix, "1", "raftAttributes"),
false,
`{"peerURLs":null}`,
false,
v2store.TTLOptionSet{ExpireTime: v2store.Permanent},
},
},
}
if g := st.Action(); !reflect.DeepEqual(g, wactions) {
t.Errorf("actions = %v, want %v", g, wactions)
}
}

func TestClusterAddMemberAsLearner(t *testing.T) {
st := mockstore.NewRecorder()
c := newTestCluster(t, nil)
c.SetStore(st)
c.AddMember(newTestMemberAsLearner(1, []string{}, "node1", []string{"http://node1"}), true)

wactions := []testutil.Action{
{
Name: "Create",
Params: []any{
path.Join(StoreMembersPrefix, "1", "raftAttributes"),
false,
`{"peerURLs":[],"isLearner":true}`,
false,
v2store.TTLOptionSet{ExpireTime: v2store.Permanent},
},
},
}
if g := st.Action(); !reflect.DeepEqual(g, wactions) {
t.Errorf("actions = %v, want %v", g, wactions)
}
}

func TestClusterMembers(t *testing.T) {
cls := newTestCluster(t, []*Member{
{ID: 1},
Expand All @@ -583,21 +537,6 @@ func TestClusterMembers(t *testing.T) {
}
}

func TestClusterRemoveMember(t *testing.T) {
st := mockstore.NewRecorder()
c := newTestCluster(t, nil)
c.SetStore(st)
c.RemoveMember(1, true)

wactions := []testutil.Action{
{Name: "Delete", Params: []any{MemberStoreKey(1), true, true}},
{Name: "Create", Params: []any{RemovedMemberStoreKey(1), false, "", false, v2store.TTLOptionSet{ExpireTime: v2store.Permanent}}},
}
if !reflect.DeepEqual(st.Action(), wactions) {
t.Errorf("actions = %v, want %v", st.Action(), wactions)
}
}

func TestClusterUpdateAttributes(t *testing.T) {
name := "etcd"
clientURLs := []string{"http://127.0.0.1:4001"}
Expand Down Expand Up @@ -903,7 +842,7 @@ func TestIsReadyToPromoteMember(t *testing.T) {
// 1/1 members ready, should succeed (quorum = 1, new quorum = 2)
[]*Member{
newTestMember(1, nil, "1", nil),
newTestMemberAsLearner(2, nil, "2", nil),
newTestMemberAsLearner(2, []string{}, "2", []string{}),
},
2,
true,
Expand Down
42 changes: 0 additions & 42 deletions server/etcdserver/api/membership/storev2.go
Original file line number Diff line number Diff line change
Expand Up @@ -78,18 +78,6 @@ func mustSaveMemberToStore(lg *zap.Logger, s v2store.Store, m *Member) {
}
}

func mustDeleteMemberFromStore(lg *zap.Logger, s v2store.Store, id types.ID) {
if _, err := s.Delete(MemberStoreKey(id), true, true); err != nil {
lg.Panic(
"failed to delete member from store",
zap.String("path", MemberStoreKey(id)),
zap.Error(err),
)
}

mustAddToRemovedMembersInStore(lg, s, id)
}

func mustAddToRemovedMembersInStore(lg *zap.Logger, s v2store.Store, id types.ID) {

if _, err := s.Create(RemovedMemberStoreKey(id), false, "", false, v2store.TTLOptionSet{ExpireTime: v2store.Permanent}); err != nil {
Expand All @@ -101,21 +89,6 @@ func mustAddToRemovedMembersInStore(lg *zap.Logger, s v2store.Store, id types.ID
}
}

func mustUpdateMemberInStore(lg *zap.Logger, s v2store.Store, m *Member) {
b, err := json.Marshal(m.RaftAttributes)
if err != nil {
lg.Panic("failed to marshal raftAttributes", zap.Error(err))
}
p := path.Join(MemberStoreKey(m.ID), raftAttributesSuffix)
if _, err := s.Update(p, string(b), v2store.TTLOptionSet{ExpireTime: v2store.Permanent}); err != nil {
lg.Panic(
"failed to update raftAttributes",
zap.String("path", p),
zap.Error(err),
)
}
}

func mustUpdateMemberAttrInStore(lg *zap.Logger, s v2store.Store, m *Member) {
b, err := json.Marshal(m.Attributes)
if err != nil {
Expand Down Expand Up @@ -184,18 +157,3 @@ func MemberStoreKey(id types.ID) string {
func MemberAttributesStorePath(id types.ID) string {
return path.Join(MemberStoreKey(id), attributesSuffix)
}

func clusterVersionFromStore(lg *zap.Logger, st v2store.Store) *semver.Version {
e, err := st.Get(path.Join(storePrefix, "version"), false, false)
if err != nil {
if isKeyNotFound(err) {
return nil
}
lg.Panic(
"failed to get cluster version from store",
zap.String("path", path.Join(storePrefix, "version")),
zap.Error(err),
)
}
return semver.Must(semver.NewVersion(*e.Node.Value))
}
2 changes: 1 addition & 1 deletion server/etcdserver/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -1935,7 +1935,7 @@ func removeNeedlessRangeReqs(txn *pb.TxnRequest) {
// applyConfChange applies a ConfChange to the server. It is only
// invoked with a ConfChange that has already passed through Raft
func (s *EtcdServer) applyConfChange(cc raftpb.ConfChange, confState *raftpb.ConfState, shouldApplyV3 membership.ShouldApplyV3) (bool, error) {
if err := s.cluster.ValidateConfigurationChange(cc); err != nil {
if err := s.cluster.ValidateConfigurationChange(cc, shouldApplyV3); err != nil {
cc.NodeID = raft.None
s.r.ApplyConfChange(cc)

Expand Down
Loading