Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve lifecycle management #416

Merged
merged 5 commits into from
Jul 7, 2024
Merged

Improve lifecycle management #416

merged 5 commits into from
Jul 7, 2024

Conversation

Stebalien
Copy link
Member

@Stebalien Stebalien commented Jul 6, 2024

  1. Merge the client and the runner. The distinction was unclear and the client/runner/module tended to reach into each other. This change merges the client/runner and then separates the new "runner" from the module as much as possible.
  2. Completely stop/discard the runner when rebootstrapping. The new logic carefully waits for all components to stop before moving on.
  3. Simplify locking and make sure we take the locks where appropriate.
  4. Merge bootstrap and re-configure logic. The dynamic manifest client no longer cares about when a manifest should be applied, it simply gives it to the module (F3) and let's F3 us its normal bootstrap logic.

Finally, I've improved the tests to:

  1. Always on exit (checking for errors).
  2. Never fail from goroutines.
  3. Correctly wait for manifest changes (previously, it would wait for at least one node to change manifests).

NOTE: This removes the ability to reconfig without rebootstrap, but preserves the ability to pause without rebootstrap.

Copy link

codecov bot commented Jul 6, 2024

Codecov Report

Attention: Patch coverage is 66.33065% with 167 lines in your changes missing coverage. Please review.

Project coverage is 74.45%. Comparing base (de5c871) to head (0a4f523).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #416   +/-   ##
=======================================
  Coverage   74.45%   74.45%           
=======================================
  Files          39       41    +2     
  Lines        3754     3719   -35     
=======================================
- Hits         2795     2769   -26     
+ Misses        683      666   -17     
- Partials      276      284    +8     
Files Coverage Δ
manifest/static.go 100.00% <100.00%> (ø)
ec/powerdelta.go 77.77% <77.77%> (ø)
manifest/manifest.go 57.89% <71.42%> (ø)
store.go 55.55% <55.55%> (ø)
cmd/f3/run.go 0.00% <0.00%> (ø)
manifest/manifest_sender.go 75.00% <68.75%> (+6.81%) ⬆️
manifest/dynamic_manifest.go 71.42% <78.57%> (-5.43%) ⬇️
cmd/f3/manifest.go 0.00% <0.00%> (ø)
f3.go 70.66% <70.14%> (+1.53%) ⬆️
host.go 63.04% <67.58%> (+0.80%) ⬆️

... and 2 files with indirect coverage changes

@Stebalien Stebalien force-pushed the steb/fix-tests branch 2 times, most recently from 9a5d293 to 65eb1ac Compare July 6, 2024 15:43
cmd/f3/manifest.go Show resolved Hide resolved
cmd/f3/manifest.go Show resolved Hide resolved
cmd/f3/manifest.go Show resolved Hide resolved
cmd/f3/run.go Show resolved Hide resolved
f3.go Show resolved Hide resolved
host.go Outdated Show resolved Hide resolved
host.go Show resolved Hide resolved
manifest/dynamic_manifest.go Show resolved Hide resolved
manifest/dynamic_manifest.go Show resolved Hide resolved
manifest/dynamic_manifest.go Show resolved Hide resolved
host.go Show resolved Hide resolved
host.go Show resolved Hide resolved
@Stebalien Stebalien force-pushed the steb/fix-tests branch 2 times, most recently from 4c77453 to cf135ad Compare July 6, 2024 20:58
f3.go Show resolved Hide resolved
@Stebalien Stebalien force-pushed the steb/fix-tests branch 4 times, most recently from 814a237 to b76a60c Compare July 6, 2024 21:50
f3.go Show resolved Hide resolved
f3.go Show resolved Hide resolved
host.go Outdated Show resolved Hide resolved
1. Merge the client and the runner. The distinction was unclear and the
client/runner/module tended to reach into each other. This change merges
the client/runner and then separates the new "runner" from the module as
much as possible.
2. Completely stop/discard the runner when rebootstrapping. The new
logic carefully waits for all components to stop before moving on.
3. Simplify locking and make sure we take the locks where appropriate.
4. Merge bootstrap and re-configure logic. The dynamic manifest client
no longer cares about _when_ a manifest should be applied, it simply
gives it to the module (F3) and let's F3 us its normal bootstrap logic.

Finally, I've improved the tests to:

1. Always on exit (checking for errors).
2. Never fail from goroutines.
3. Correctly wait for manifest changes (previously, it would wait for at
least one node to change manifests).

NOTEs:

1. This removes the ability to reconfig without rebootstrap, but
preserves the ability to _pause_ without rebootstrap.
2. This causes bootstrap to start at the time the bootstrap epoch
_should_ have happened instead of starting at the next non-null epoch.
In practice, this should behave better as all nodes will start at the
same time (and will look back 900 epochs anyways).
manifest/dynamic_manifest.go Show resolved Hide resolved
manifest/manifest.go Show resolved Hide resolved
manifest/manifest.go Show resolved Hide resolved
manifest/manifest.go Show resolved Hide resolved
manifest/manifest.go Show resolved Hide resolved
test/f3_test.go Show resolved Hide resolved
test/f3_test.go Show resolved Hide resolved
test/f3_test.go Show resolved Hide resolved
test/f3_test.go Show resolved Hide resolved
test/f3_test.go Show resolved Hide resolved
f3.go Show resolved Hide resolved
f3.go Show resolved Hide resolved
f3.go Show resolved Hide resolved
f3.go Show resolved Hide resolved
f3.go Show resolved Hide resolved
f3.go Show resolved Hide resolved
f3.go Show resolved Hide resolved
host.go Show resolved Hide resolved
f3.go Show resolved Hide resolved
f3.go Show resolved Hide resolved
f3.go Outdated Show resolved Hide resolved
f3.go Outdated Show resolved Hide resolved
host.go Outdated Show resolved Hide resolved
Copy link
Contributor

@Kubuxu Kubuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directionally LGTM

And have the test chain "catch up" if it's too far behind.
@Stebalien Stebalien enabled auto-merge July 7, 2024 14:14
@Stebalien Stebalien added this pull request to the merge queue Jul 7, 2024
Merged via the queue into main with commit 8df83eb Jul 7, 2024
12 of 13 checks passed
@Stebalien Stebalien deleted the steb/fix-tests branch July 7, 2024 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants