Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Scale mutation queue with multiple shards #1048

Merged
merged 11 commits into from
Oct 11, 2018

Conversation

gdbelvin
Copy link
Contributor

Tables with an incrementing value in the lower bits of the primary key are notoriously hard to scale because all the writes will go to a single server. This PR allows the queue to scale by adding write support for multiple queues.

Depends on: read by high/low watermarks #1045
Part of: retry-able mutations #1044

@gdbelvin gdbelvin added the blocked PR cannot be merged until another PR is merged. label Sep 25, 2018
}
for rows.Next() {
var shardID int64
rows.Scan(&shardID)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G104: Errors unhandled.

}

// ReadQueue reads all mutations that are still in the queue up to batchSize.
func (m *Mutations) ReadQueue(ctx context.Context, domainID string, shardID, low, high int64) ([]*mutator.QueueMessage, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line is 128 characters

@codecov
Copy link

codecov bot commented Sep 25, 2018

Codecov Report

Merging #1048 into master will increase coverage by 0.01%.
The diff coverage is 68.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1048      +/-   ##
==========================================
+ Coverage   65.24%   65.26%   +0.01%     
==========================================
  Files          39       39              
  Lines        2728     2764      +36     
==========================================
+ Hits         1780     1804      +24     
- Misses        630      633       +3     
- Partials      318      327       +9
Impacted Files Coverage Δ
impl/sql/mutationstorage/mutations.go 60.78% <ø> (ø) ⬆️
impl/integration/env.go 73.21% <50%> (ø) ⬆️
impl/sql/mutationstorage/queue.go 58.4% <61.01%> (+3.52%) ⬆️
core/adminserver/admin_server.go 67.02% <80%> (-0.38%) ⬇️
core/sequencer/server.go 63.73% <86.36%> (+0.18%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a70c371...69e197c. Read the comment docs.

@gdbelvin gdbelvin force-pushed the f/retry/multi-queue branch from 3c088f0 to bf5e4e2 Compare October 5, 2018 12:59
if err != nil {
t.Fatalf("Failed to create Mutations: %v", err)
}
domainID := "foo"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string foo has 3 occurrences, make it a constant

}
for rows.Next() {
var shardID int64
rows.Scan(&shardID)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G104: Errors unhandled.

@gdbelvin gdbelvin force-pushed the f/retry/multi-queue branch from bf5e4e2 to 75969ea Compare October 5, 2018 13:12
}

// randShard returns a random, enabled shard for domainID.
func (m *Mutations) randShard(ctx context.Context, domainID string) (int64, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

47-73 lines are duplicate of impl/sql/mutationstorage/queue.go:92-118

}

// randomShard returns a random shard from the list of active shards for domainID.
func (m *Mutations) randomShard(ctx context.Context, domainID string) (int64, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

92-118 lines are duplicate of impl/sql/mutationstorage/queue.go:47-73

@gdbelvin gdbelvin force-pushed the f/retry/multi-queue branch from 75969ea to 6ebf36c Compare October 5, 2018 13:16
@gdbelvin gdbelvin added for review PR is ready for review and removed blocked PR cannot be merged until another PR is merged. labels Oct 5, 2018
@gdbelvin gdbelvin requested a review from pav-kv October 5, 2018 13:19
@gdbelvin
Copy link
Contributor Author

gdbelvin commented Oct 9, 2018

Updated and nice and green. PTAL.
Happy to walk through the code with you

@@ -260,6 +270,13 @@ func (s *Server) CreateDomain(ctx context.Context, in *pb.CreateDomainRequest) (
}); err != nil {
return nil, fmt.Errorf("adminserver: domains.Write(): %v", err)
}

// Create shards for queue.
shardIDs := []int64{1}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is supposed to create shards if they are more than 1? This looks like a place for some TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created issue #1048

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably meant #1063
(putting it to create a link between github issues)

@@ -85,6 +85,12 @@ func (e *miniEnv) Close() {
e.stopMockServer()
}

type fakeQueueAdmin struct{}

func (*fakeQueueAdmin) AddShards(ctx context.Context, domainID string, shardIDs ...int64) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the receiver simply be taken by value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that would simplify things a bit.

@@ -36,8 +36,9 @@ message MapMetadata {
// of logged items MUST be monotonically increasing.
int64 highest_watermark = 2;
}
// source defines the range of inputs used for this map revision.
SourceSlice source = 1;
reserved 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you at a point where you can't afford breaking compatibility by simply removing this field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortunately we're early enough that we're happy to break compatibility.
This PR is also introducing backwards-incompatible database schema changes as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was my point. Why using reserved 1 instead of just dropping / reassigning the tag to the new field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good habits I suppose :-)

core/sequencer/sequencer_api.proto Outdated Show resolved Hide resolved
core/sequencer/sequencer_api.proto Outdated Show resolved Hide resolved
impl/sql/mutationstorage/queue_test.go Outdated Show resolved Hide resolved
impl/sql/mutationstorage/queue_test.go Outdated Show resolved Hide resolved
impl/sql/mutationstorage/queue_test.go Outdated Show resolved Hide resolved
impl/sql/mutationstorage/queue_test.go Outdated Show resolved Hide resolved
impl/sql/mutationstorage/queue_test.go Show resolved Hide resolved
@gdbelvin
Copy link
Contributor Author

PTAL

Copy link
Contributor

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % nits.

core/sequencer/server.go Outdated Show resolved Hide resolved
ReadQueue(ctx context.Context, domainID string, low, high int64) ([]*mutator.QueueMessage, error)
// HighWatermark returns the highest timestamp in the mutations table.
HighWatermarks(ctx context.Context, domainID string) (map[int64]int64, error)
// Read returns up to batchSize messages for domainID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes. How about renaming the ReadQueue method to just Read, and nicely naming the interface (e.g. Logs as you say)? I think that a more verbose interface name like ShardedMutationLog or similar could be fine as well.

core/sequencer/server.go Show resolved Hide resolved
core/sequencer/server.go Show resolved Hide resolved
impl/sql/mutationstorage/mutations.go Show resolved Hide resolved
impl/sql/mutationstorage/queue.go Show resolved Hide resolved
impl/sql/mutationstorage/queue.go Show resolved Hide resolved
impl/sql/mutationstorage/queue_test.go Show resolved Hide resolved
impl/sql/mutationstorage/queue_test.go Outdated Show resolved Hide resolved
impl/sql/mutationstorage/queue.go Outdated Show resolved Hide resolved
@gdbelvin
Copy link
Contributor Author

Thanks very much for the review!

@gdbelvin gdbelvin merged commit e869a3b into google:master Oct 11, 2018
@gdbelvin gdbelvin deleted the f/retry/multi-queue branch October 11, 2018 17:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
for review PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants