Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate ack checking #1561

Merged
merged 2 commits into from
Nov 18, 2024
Merged

Consolidate ack checking #1561

merged 2 commits into from
Nov 18, 2024

Conversation

mkeeter
Copy link
Contributor

@mkeeter mkeeter commented Nov 12, 2024

(staged on top of #1552)

There are a bunch of places where we check to see if jobs should be acked and/or retired:

  • Downstairs::enqueue (if skipped on all three Downstairs)
  • Downstairs::skip_all_jobs (for skipped jobs only)
  • DownstairsClient::process_io_completion (lots of conditionals to decide if we're ready to ack; returns a flag to Downstairs::process_io_completion_inner)

This PR consolidates ack and retire checks into a single place: Downstairs::ack_check. This new function is responsible for deciding when to ack and retire jobs; it's called in all of the places where you'd expect.

DownstairsClient::process_io_completion is reorganized, with one minor logical change: non-fatal read errors now increment the downstairs_errors counter. It's no longer responsible for deciding when to ack a job; that's now in Downstairs::ack_check.

The up-to-ds-*-done DTrace probes are removed, because (after #1552) they fire immediately before the equivalent gw-*-done probes; there's no delay waiting for ackable jobs to be acked.

// snapshot exists already but we should not increment
// `downstairs_errors` and transition that Downstairs to
// Failed - that downstairs is still able to serve IO.
(_, CrucibleError::SnapshotExistsAlready(..)) => (),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not matching on Flush here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches the previous behavior (L1296), but I agree that we could be more specific; fixed!

(Note that this only handles error counters, the actual error checking in the Downstairs remains agnostic to IOop type)

@@ -474,12 +474,77 @@ impl Downstairs {
}
}

/// Checks whether an ack and/or a retire check is necessary for the job
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the follow up to 1552 I was going to chat to you about, but it was next in the queue :) 👍

@mkeeter mkeeter force-pushed the mkeeter/immediate-ack branch from 4b0a34e to a33bee3 Compare November 15, 2024 16:19
Base automatically changed from mkeeter/immediate-ack to main November 15, 2024 16:20
@mkeeter mkeeter force-pushed the mkeeter/consolidate-ack branch from c065ec2 to f3d07aa Compare November 15, 2024 16:20
@faithanalog
Copy link
Contributor

it seems CI may not agree with us on these changes

@mkeeter mkeeter merged commit 0fc0ed8 into main Nov 18, 2024
16 checks passed
@mkeeter mkeeter deleted the mkeeter/consolidate-ack branch November 18, 2024 19:18
@mkeeter
Copy link
Contributor Author

mkeeter commented Nov 18, 2024

CI weirdness appears to be #1390 (or similar)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants