New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Locality] Location-aware NodeSetChecker #2500

Open

AhmedSoliman wants to merge 1 commit into main from pr2500

+1,474 −298

Contributor

AhmedSoliman commented Jan 16, 2025 •

edited

Loading

This introduces a new sophisticated and performance-optimized nodeset checker for bifrost flexible quorums. It pre-computes aggregates to reduce the cost of repeatative quorum checks.

The new nodeset checker has extensive test coverage

AhmedSoliman mentioned this pull request

[Locality] Set node location on cluster join #2498

Merged

github-actions bot commented Jan 16, 2025 •

edited

Loading

Test Results

7 files ±0 7 suites ±0 4m 13s ⏱️ -1s
47 tests ±0 46 ✅ ±0 1 💤 ±0 0 ❌ ±0
182 runs ±0 179 ✅ ±0 3 💤 ±0 0 ❌ ±0

Results for commit 4c47f6c. ± Comparison against base commit 6de7e0f.

♻️ This comment has been updated with latest results.

AhmedSoliman force-pushed the pr2500 branch from 005a0d5 to d1cda65 Compare

January 16, 2025 12:59

AhmedSoliman marked this pull request as ready for review

January 16, 2025 12:59

muhamadazmy reviewed

View reviewed changes

crates/bifrost/src/providers/replicated_loglet/replication/checker.rs Outdated Show resolved Hide resolved

muhamadazmy reviewed

View reviewed changes

crates/bifrost/src/providers/replicated_loglet/replication/checker.rs Outdated Show resolved Hide resolved

AhmedSoliman requested a review from tillrohrmann

January 16, 2025 14:49

AhmedSoliman force-pushed the pr2500 branch 2 times, most recently from 43b2609 to 2c47453 Compare

January 16, 2025 15:06

muhamadazmy reviewed

View reviewed changes

Contributor

muhamadazmy left a comment

LGTM! I left 2 really minor comments (mainly questions)

I would love to see explanation of how f-majority algorithm works. I still can't fully comprehend how it works.

crates/bifrost/src/providers/replicated_loglet/replication/checker.rs Show resolved Hide resolved


          [Locality] Location-aware NodeSetChecker

4c47f6c

This introduces a new sophisticated and performance-optimized nodeset checker for bifrost flexible quorums. It pre-computes aggregates to reduce the cost of repeatative quorum checks.

The new nodeset checker has extensive test coverage

AhmedSoliman force-pushed the pr2500 branch from 2c47453 to 4c47f6c Compare

January 16, 2025 18:21

tillrohrmann approved these changes

View reviewed changes

Contributor

tillrohrmann left a comment •

edited

Loading

Impressive work @AhmedSoliman! It is quite a bit of brain jogging to follow you along here. As far as I can tell, the code looks good to me. I left my usual nitpicks about duplicate words in the comments ;-) +1 for merging.

Nice trick with the raw pointers!

crates/bifrost/src/providers/replicated_loglet/replication/checker.rs

    
                  }

                  pub fn is_empty(&self) -> bool {

                      self.node_attribute.is_empty()

                  /// The number of nodes in that nodeset that has matches the storage state predicate

Contributor

tillrohrmann Jan 16, 2025

Suggested change

      
                /// The number of nodes in that nodeset that has matches the storage state predicate
          
                /// The number of nodes in that nodeset that matches the storage state predicate

crates/bifrost/src/providers/replicated_loglet/replication/checker.rs

    
                      for (_, v) in self.node_attribute.iter_mut() {

                          *v = Default::default();

                      }

                  /// How many nodes have attributes set. Note that this doesn't count nodes with has not

Contributor

tillrohrmann Jan 16, 2025

Suggested change

      
                /// How many nodes have attributes set. Note that this doesn't count nodes with has not
          
                /// How many nodes have attributes set. Note that this doesn't count nodes which have not

crates/bifrost/src/providers/replicated_loglet/replication/checker.rs

    
                              .expect("total domains must fit into u32");

                          debug_assert!(scope_state.replication_factor > 0);

                          let f_majority_needed =

                              if candidate_domains >= u32::from(scope_state.replication_factor - 1) {

Contributor

tillrohrmann Jan 16, 2025

I don't see why - 1 is needed here. Wouldn't this mean that we need to find f_majority if the replication factor is 1 and candidate_domains == 0? However, if candidate_domains == 0, then there is no way to form a valid write quorum given a replication of 1 and, thus, f_majority is already reached?

crates/bifrost/src/providers/replicated_loglet/replication/checker.rs

    
                              // strings at this scope (the node scope has the most number of domains)

                              node_id.to_string()

                          } else if location.is_scope_defined(*scope) {

                              location.domain_string(*scope, Some(node_id))

Contributor

tillrohrmann Jan 16, 2025

Why are we providing the node_id as an argument if we exclude the Node scope from this branch? Wouldn't it be ignored in all other NodeLocationScope cases?

crates/bifrost/src/providers/replicated_loglet/replication/checker.rs

    
                          if authoritative >= fmajority_requires {

                              return FMajorityResult::SuccessWithRisk;

                  fn rm_attr_internal(&mut self, node_id: PlainNodeId, attr: &Attr, storage_state: StorageState) {

                      // Apply this attribute on every scope

Contributor

tillrohrmann Jan 16, 2025

Suggested change

      
                    // Apply this attribute on every scope
          
                    // Remove this attribute on every scope

crates/bifrost/src/providers/replicated_loglet/replication/checker.rs

    
                  num_complete_domains: HashMap<Attr, u32>,

                  /// For each Attribute, set of the domains at that scope that have at least one node with the

                  /// attribute. Those are the failure domains considered for

                  /// replication/write-quorum/write-quorum.

Contributor

tillrohrmann Jan 16, 2025

Suggested change

      
                /// replication/write-quorum/write-quorum.
          
                /// replication/write-quorum.

crates/bifrost/src/providers/replicated_loglet/replication/checker.rs

Comment on lines +527 to +532

    
                              if fd.as_ref().is_domain_complete(&attr) {

                                  *scope_state

                                      .num_complete_domains

                                      .entry(attr.clone())

                                      .or_default() += 1;

                              }

Contributor

tillrohrmann Jan 16, 2025

Given the implementation of is_domain_complete which requires num authoritative nodes > 0, won't we double count a failure domain if it contains more than one authoritative node?

crates/bifrost/src/providers/replicated_loglet/replication/checker.rs

    
                  {

                      // For each attribute, merge the set of FailureDomainState that have at least one node

                      // with the attribute here. We stop merging immediately after the set's size reaches

                      // the replication target. This makes `check_write_quorum` O(replication * num_domain_matches * num_scopes) in the

Contributor

tillrohrmann Jan 16, 2025

This O notation does not refer to this check_write_quorum but the one belonging to the NodeSetChecker, does it? If so, maybe move it there. The num_scopes is a bit confusing in this context.

Probably it's already to late for my brain. I don't see how replication * num_domain_matches * num_scopes is the worst case. Isn't it a bit better because of the early break condition: O(num_scopes * (#attributes + replication)) (worst case for every scope we need to go over all attributes and in the last attribute our predicate matches and we find replication domains)?

crates/bifrost/src/providers/replicated_loglet/replication/checker.rs

    
                  replication_factor: u8,

                  /// For each attribute, the number of domains at that scope that have all their nodes with the

                  /// attribute.

                  num_complete_domains: HashMap<Attr, u32>,

Contributor

tillrohrmann Jan 16, 2025

For what is this field used? I can only see that we update it but we seem to never really read it.

crates/bifrost/src/providers/replicated_loglet/replication/checker.rs

Comment on lines +1151 to +1153

    
                      checker.set_attribute(10, false);

                      // trying 3 instead

                      checker.set_attribute(3, true);

Contributor

tillrohrmann Jan 16, 2025

Could we have set [1, 2, 3] to true right away instead of going through 10?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet