Tracking Issue: Make VSchema management serializable #17398

mattlord · 2024-12-17T15:26:32Z

Feature Description

Today, we have no consistency guarantees around VSchema writes. If there is any concurrent read-modify-write cycles — we only offer an API for writing the entire object today — then you can lose intermediate writes. Given that the vschema plays a critical role in serving queries and data, this is not an ideal situation.

You can see the general problem reported in a specific context here: #15794

We have support for linearizing writes using the topo key version:

vitess/go/vt/topo/etcd2topo/file.go

Lines 48 to 74 in 998433c

    
           // Update is part of the topo.Conn interface. 
        
           func (s *Server) Update(ctx context.Context, filePath string, contents []byte, version topo.Version) (topo.Version, error) { 
        
           	nodePath := path.Join(s.root, filePath) 
        
           	if version != nil { 
        
           		// We have to do a transaction. This means: if the 
        
           		// current file revision is what we expect, save it. 
        
           		txnresp, err := s.cli.Txn(ctx). 
        
           			If(clientv3.Compare(clientv3.ModRevision(nodePath), "=", int64(version.(EtcdVersion)))). 
        
           			Then(clientv3.OpPut(nodePath, string(contents))). 
        
           			Commit() 
        
           		if err != nil { 
        
           			return nil, convertError(err, nodePath) 
        
           		} 
        
           		if !txnresp.Succeeded { 
        
           			return nil, topo.NewError(topo.BadVersion, nodePath) 
        
           		} 
        
           		return EtcdVersion(txnresp.Header.Revision), nil 
        
           	} 
        
           	// No version specified. We can use a simple unconditional Put. 
        
           	resp, err := s.cli.Put(ctx, nodePath, string(contents)) 
        
           	if err != nil { 
        
           		return nil, convertError(err, nodePath) 
        
           	} 
        
           	return EtcdVersion(resp.Header.Revision), nil 
        
           }

That is used for Keyspaces, Shards, and Tablets today using KeyspaceInfo, ShardInfo, and TabletInfo respectively, which wrap the Keyspace, Shard, and Tablet records and store the version of the key that was read from the topo server. If we look at KeyspaceShard as an example:

KeyspaceInfo:

vitess/go/vt/topo/keyspace.go

Lines 63 to 70 in 998433c

    
           // KeyspaceInfo is a meta struct that contains metadata to give the 
        
           // data more context and convenience. This is the main way we interact 
        
           // with a keyspace. 
        
           type KeyspaceInfo struct { 
        
           	keyspace string 
        
           	version  Version 
        
           	*topodatapb.Keyspace 
        
           }

The version set on read:

vitess/go/vt/topo/keyspace.go

Lines 117 to 143 in 998433c

    
           // GetKeyspace reads the given keyspace and returns it 
        
           func (ts *Server) GetKeyspace(ctx context.Context, keyspace string) (*KeyspaceInfo, error) { 
        
           	if ctx.Err() != nil { 
        
           		return nil, ctx.Err() 
        
           	} 
        
           	if err := ValidateKeyspaceName(keyspace); err != nil { 
        
           		return nil, vterrors.Wrapf(err, "GetKeyspace: %s", err) 
        
           	} 
        
           	keyspacePath := path.Join(KeyspacesPath, keyspace, KeyspaceFile) 
        
           	data, version, err := ts.globalCell.Get(ctx, keyspacePath) 
        
           	if err != nil { 
        
           		return nil, err 
        
           	} 
        
           	k := &topodatapb.Keyspace{} 
        
           	if err = k.UnmarshalVT(data); err != nil { 
        
           		return nil, vterrors.Wrap(err, "bad keyspace data") 
        
           	} 
        
           	return &KeyspaceInfo{ 
        
           		keyspace: keyspace, 
        
           		version:  version, 
        
           		Keyspace: k, 
        
           	}, nil 
        
           }

The version used on write to ensure we linearize the writes and are only allowed to update the latest/current version (do not lose intermediate changes):

vitess/go/vt/topo/keyspace.go

Lines 178 to 206 in 998433c

    
           // UpdateKeyspace updates the keyspace data. It checks the keyspace is locked. 
        
           func (ts *Server) UpdateKeyspace(ctx context.Context, ki *KeyspaceInfo) error { 
        
           	if ctx.Err() != nil { 
        
           		return ctx.Err() 
        
           	} 
        
           	// make sure it is locked first 
        
           	if err := CheckKeyspaceLocked(ctx, ki.keyspace); err != nil { 
        
           		return err 
        
           	} 
        
           	data, err := ki.Keyspace.MarshalVT() 
        
           	if err != nil { 
        
           		return err 
        
           	} 
        
           	keyspacePath := path.Join(KeyspacesPath, ki.keyspace, KeyspaceFile) 
        
           	version, err := ts.globalCell.Update(ctx, keyspacePath, data, ki.version) 
        
           	if err != nil { 
        
           		return err 
        
           	} 
        
           	ki.version = version 
        
           	event.Dispatch(&events.KeyspaceChange{ 
        
           		KeyspaceName: ki.keyspace, 
        
           		Keyspace:     ki.Keyspace, 
        
           		Status:       "updated", 
        
           	}) 
        
           	return nil 
        
           }

We do not, however, use this same mechanism for VSchemas today. So you can have workflow related commands coming from systems and humans, humans using vtctldclient vschema related commands (GetVSchema and ApplyVSchema), systems using vschema related RPCs, vtgates doing the same via its VSchema SQL interface, with all of these read-modify-write cycles potentially happening concurrently and without any mechanism to ensure consistency via linearizing the writes (all writes happen in a sequential order w/o losing intermediate ones and going back in logical time) so that you can only update the current/latest vchema. This can potentially lead to undefined behavior and in turn can lead to major failures and downtime.

We should fix this in all topo server implementations (etcd, consul and zookeeper all support versions).

This work will involve several pieces:

Add versioning to the low level topo interface for VSchemas using the existing Info struct model #17399
Add an interface for modifying discrete parts of the VSchema via concrete actions: AddTable, AddVindex, RemoveTable, etc.
Provide a replacement or alternative to transition away from the one shot ApplyVSchema command and if we keep it long term find a way to make it linearizable and ensure consistency as well
Ensure that SrvVSchema management is also linearizable (it contains copies of the keyspaces' VSchema protos)

Use Case(s)

Removing a sharp edge in Vitess cluster management.

The text was updated successfully, but these errors were encountered:

mattlord added Component: Cluster management Component: VReplication Component: Query Serving labels Dec 17, 2024

mattlord added this to VReplication Dec 17, 2024

github-project-automation bot moved this to Backlog in VReplication Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking Issue: Make VSchema management serializable #17398

Tracking Issue: Make VSchema management serializable #17398

mattlord commented Dec 17, 2024 •

edited

Loading

Tracking Issue: Make VSchema management serializable #17398

Tracking Issue: Make VSchema management serializable #17398

Comments

mattlord commented Dec 17, 2024 • edited Loading

Feature Description

Use Case(s)

mattlord commented Dec 17, 2024 •

edited

Loading