Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: allow inflights size to dynamically increase #135225

Closed

Conversation

kvoli
Copy link
Collaborator

@kvoli kvoli commented Nov 14, 2024

Inflights.Add() could encounter an OOB panic if more than the configured Inflights.size entries were added via Add(), despite stating support for this behavior:

// [..] However, Add() is still valid and allowed, for cases when this pacing
// is implemented at the higher app level. The tracker correctly tracks all the
// in-flight entries.

Allow the buffer to grow dynamically larger than Inflights.size, as it should remain bounded by the RangeController in cases where Full() is not already checked (such as Raft's on flow control machinery).

Fixes: #135223
Release note: None

Copy link

blathers-crl bot commented Nov 14, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli self-assigned this Nov 14, 2024
@kvoli kvoli added the backport-24.3.x Flags PRs that need to be backported to 24.3 label Nov 14, 2024
Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/raft/tracker/inflights.go line 81 at r1 (raw file):

		next -= size
	}
	if next >= len(in.buffer) {

Is this branch ever reached at all?

When we get to the last entry (say size=100):

next := 0 + 100
size := 100
if 100 >= 100 {
    next -= 100
}
// next == 0

pkg/raft/tracker/inflights.go line 102 at r1 (raw file):

	}
	newBuffer := make([]inflight, newSize)
	copy(newBuffer, in.buffer)

This will never shrink now? Potentially concerning.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/raft/tracker/inflights.go line 77 at r1 (raw file):

func (in *Inflights) Add(index, bytes uint64) {
	next := in.start + in.count
	size := in.size

I am very confused by the logic that manages in.buffer. It seems a very peculiar implementation of a circular buffer.
The modulus logic here uses in.size and not len(in.buffer). Which suggests that say in.size was 100 and the stable number of entries were only 2 (due to Add alternating with FreeLE), we would still grow the buffer to 100. Which seems to go against the claim:

We grow on demand
// instead of preallocating to inflights.size to handle systems which have
// thousands of Raft groups per process.

It seems to me that the original logic should have been to grow up to in.size when there were actually that many in-flight entries.

Also, this suggests this PR is not quite right since the modulus logic is incorrect (the buffer could be bigger than size). It should do:

if in.count == len(in.buffer) {
   // grow
}

Then compute next using the current length of the buffer.

As a quick fix I think it is ok to grow and never shrink. But I think we should in the future move rac2.CircularBuffer into a util package and rewrite this code to use it.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @pav-kv)


pkg/raft/tracker/inflights.go line 102 at r1 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

This will never shrink now? Potentially concerning.

When were we ever shrinking before?

@kvoli kvoli force-pushed the 241114.rac-inflights-size-dyn branch from 51dfa68 to 6ebc78e Compare November 14, 2024 23:49
`Inflights.Add()` could encounter an OOB panic if more than the
configured `Inflights.size` entries were added via `Add()`, despite
stating support for this behavior:

```go
// [..] However, Add() is still valid and allowed, for cases when this pacing
// is implemented at the higher app level. The tracker correctly tracks all the
// in-flight entries.
```

Allow the buffer to grow dynamically larger than `Inflights.size`, as it
should remain bounded by the `RangeController` in cases where `Full()`
is not already checked (such as Raft's on flow control machinery).

Fixes: cockroachdb#135223
Release note: None
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @pav-kv and @sumeerbhola)


pkg/raft/tracker/inflights.go line 77 at r1 (raw file):

Also, this suggests this PR is not quite right since the modulus logic is incorrect (the buffer could be bigger than size). It should do:

I'm not following how in.count >= in.size with this change (but I'm also questioning many things about this change myself), the only modifications to in.size and len(in.buffer) are made in grow outside of init, and the method there explicitly sets in.size to be equal to len(in.buffer) should it ever be smaller (or the newSize really).


pkg/raft/tracker/inflights.go line 81 at r1 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

Is this branch ever reached at all?

When we get to the last entry (say size=100):

next := 0 + 100
size := 100
if 100 >= 100 {
    next -= 100
}
// next == 0

Doesn't this assume len(in.buffer) == in.size? That doesn't appear to hold, rather len(in.buffer) <= in.size.


pkg/raft/tracker/inflights.go line 102 at r1 (raw file):

Previously, sumeerbhola wrote…

When were we ever shrinking before?

Don't think so from looking around.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @pav-kv)


pkg/raft/tracker/inflights.go line 77 at r1 (raw file):

Which suggests that say in.size was 100 and the stable number of entries were only 2 (due to Add alternating with FreeLE), we would still grow the buffer to 100.

I think the above would not happen only if it shrank to 0.

	if in.count == 0 {
		// inflights is empty, reset the start index so that we don't grow the
		// buffer unnecessarily.
		in.start = 0
	}

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @pav-kv)


pkg/raft/tracker/inflights.go line 77 at r1 (raw file):

I'm not following how in.count >= in.size with this change

Maybe I am misunderstanding the question: in.size is the limit we no longer respect in lazy replication mode, so in.count can exceed it. Isn't that attempt what caused the panic.

@kvoli
Copy link
Collaborator Author

kvoli commented Nov 15, 2024

I'm in favor of closing this and adopting #135237 @pav-kv.

@kvoli
Copy link
Collaborator Author

kvoli commented Nov 15, 2024

Closing in favor of #135237.

@kvoli kvoli closed this Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-24.3.x Flags PRs that need to be backported to 24.3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

raft: PANIC inflights buffer is never resized beyond the initial size
4 participants