Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement secure boost scheme phase 1 - vertical pipeline with hist sync #10037

Merged
merged 27 commits into from
Mar 1, 2024

Conversation

ZiyueXu77
Copy link

@ZiyueXu77 ZiyueXu77 commented Feb 9, 2024

For implementing Vertical Federated Learning with Secure Features, as discussed in
#9987
The first phase is to implement an alternative vertical pipeline that sync the histograms from clients to label owner.
This PR implemented this feature as a standalone data mode.

Functional changes finished, currently adding unit testings

Note: phase 2 will be adding HE encryption features, which will be added in an independent PR

@rongou
Copy link
Contributor

rongou commented Feb 9, 2024

@trivialfis

include/xgboost/data.h Outdated Show resolved Hide resolved
Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on federated learning! Exciting new features.

Initial review as I'm still reading the secure boost paper. It would be great if you could make a summary of the various differences between all data modes.

src/common/quantile.cc Outdated Show resolved Hide resolved
src/common/quantile.cc Outdated Show resolved Hide resolved
src/common/quantile.cc Show resolved Hide resolved
src/tree/hist/evaluate_splits.h Show resolved Hide resolved
@ZiyueXu77 ZiyueXu77 changed the base branch from master to vertical-federated-learning February 28, 2024 19:42
src/common/quantile.cc Outdated Show resolved Hide resolved
src/common/quantile.cc Outdated Show resolved Hide resolved
src/tree/hist/evaluate_splits.h Outdated Show resolved Hide resolved
@@ -401,6 +413,9 @@ class HistEvaluator {
if (is_col_split_) {
// With column-wise data split, we gather the best splits from all the workers and update the
// expand entries accordingly.
// Note that under secure vertical setting, only the label owner is able to evaluate the split
// based on the global histogram. The other parties will receive the final best splits
// allgather is capable of performing this (0-gain entries for non-label owners),
auto all_entries = AllgatherColumnSplit(entries);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part of the code even useful for passive parties? Considering that they don't evaluate splits. If not, then it would be much cleaner to skip the call to evaluation altogether. Keep spreading conditions like if (secure) can make the code difficult to change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently the (secure && passive parties) is skipped with "if ((!is_secure_) || (collective::GetRank() == 0)) {", recommendations on skipping it in other places?

@@ -190,6 +193,17 @@ class HistogramBuilder {
reinterpret_cast<double *>(this->hist_[first_nidx].data()), n);
}

if (is_distributed_ && is_col_split_ && is_secure_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to allgather the histogram across workers? I thought we only need to send it to the active worker?

Copy link
Author

@ZiyueXu77 ZiyueXu77 Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we only need to collect histograms to the active party, but my understanding is we currently do not have a "gather" function to do that? it will be great if we have it, similar to broadcast(..., rank), just reverse

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing, I can look into a gather function in the future.

src/tree/hist/histogram.h Outdated Show resolved Hide resolved
@trivialfis
Copy link
Member

I have enabled all the CI pipelines, please don't push until they finish, otherwise a new commit will interrupt the previous run. The PR looks good to me overall and will approve once all tests pass.

Please note that after having all the desired features in the feature branch and having a full picture of the code changes, we might do a few rounds of refactors before merging into the master. This way we can unblock these individual PRs while keeping the code maintainable in the future.

@ZiyueXu77
Copy link
Author

I have enabled all the CI pipelines, please don't push until they finish, otherwise a new commit will interrupt the previous run. The PR looks good to me overall and will approve once all tests pass.

Please note that after having all the desired features in the feature branch and having a full picture of the code changes, we might do a few rounds of refactors before merging into the master. This way we can unblock these individual PRs while keeping the code maintainable in the future.

sounds good! Thanks a lot. :)

@trivialfis trivialfis merged commit fe73294 into dmlc:vertical-federated-learning Mar 1, 2024
28 checks passed
@ZiyueXu77 ZiyueXu77 deleted the SecureBoost branch March 1, 2024 19:38
trivialfis pushed a commit to trivialfis/xgboost that referenced this pull request Jul 1, 2024
The first phase is to implement an alternative vertical pipeline that syncs the histograms from clients to the label owner.
trivialfis added a commit that referenced this pull request Jul 2, 2024
The first phase is to implement an alternative vertical pipeline that syncs the histograms from clients to the label owner.

Co-authored-by: Ziyue Xu <[email protected]>
trivialfis added a commit to trivialfis/xgboost that referenced this pull request Jul 15, 2024
)

The first phase is to implement an alternative vertical pipeline that syncs the histograms from clients to the label owner.

Co-authored-by: Ziyue Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants