Implement secure boost scheme phase 1 - vertical pipeline with hist sync #10037

ZiyueXu77 · 2024-02-09T02:39:00Z

For implementing Vertical Federated Learning with Secure Features, as discussed in
#9987
The first phase is to implement an alternative vertical pipeline that sync the histograms from clients to label owner.
This PR implemented this feature as a standalone data mode.

Functional changes finished, currently adding unit testings

Note: phase 2 will be adding HE encryption features, which will be added in an independent PR

…ute under secure scenario

…valent to broadcast

…lobal best split, but need to further apply split correctly

…case

rongou · 2024-02-09T15:53:57Z

@trivialfis

include/xgboost/data.h

…ute under secure scenario

…valent to broadcast

…lobal best split, but need to further apply split correctly

…case

trivialfis

Thank you for working on federated learning! Exciting new features.

Initial review as I'm still reading the secure boost paper. It would be great if you could make a summary of the various differences between all data modes.

src/common/quantile.cc

src/tree/hist/evaluate_splits.h

Add alternate vertical splits

src/common/quantile.cc

src/tree/hist/evaluate_splits.h

trivialfis · 2024-02-28T20:22:08Z

src/tree/hist/evaluate_splits.h

@@ -401,6 +413,9 @@ class HistEvaluator {
    if (is_col_split_) {
      // With column-wise data split, we gather the best splits from all the workers and update the
      // expand entries accordingly.
+      // Note that under secure vertical setting, only the label owner is able to evaluate the split
+      // based on the global histogram. The other parties will receive the final best splits
+      // allgather is capable of performing this (0-gain entries for non-label owners),
      auto all_entries = AllgatherColumnSplit(entries);


Is this part of the code even useful for passive parties? Considering that they don't evaluate splits. If not, then it would be much cleaner to skip the call to evaluation altogether. Keep spreading conditions like if (secure) can make the code difficult to change.

currently the (secure && passive parties) is skipped with "if ((!is_secure_) || (collective::GetRank() == 0)) {", recommendations on skipping it in other places?

trivialfis · 2024-02-28T20:24:36Z

src/tree/hist/histogram.h

@@ -190,6 +193,17 @@ class HistogramBuilder {
          reinterpret_cast<double *>(this->hist_[first_nidx].data()), n);
    }

+    if (is_distributed_ && is_col_split_ && is_secure_) {


Why do we need to allgather the histogram across workers? I thought we only need to send it to the active worker?

yes we only need to collect histograms to the active party, but my understanding is we currently do not have a "gather" function to do that? it will be great if we have it, similar to broadcast(..., rank), just reverse

Thank you for sharing, I can look into a gather function in the future.

src/tree/hist/histogram.h

trivialfis · 2024-02-29T18:42:25Z

I have enabled all the CI pipelines, please don't push until they finish, otherwise a new commit will interrupt the previous run. The PR looks good to me overall and will approve once all tests pass.

Please note that after having all the desired features in the feature branch and having a full picture of the code changes, we might do a few rounds of refactors before merging into the master. This way we can unblock these individual PRs while keeping the code maintainable in the future.

ZiyueXu77 · 2024-02-29T18:46:20Z

I have enabled all the CI pipelines, please don't push until they finish, otherwise a new commit will interrupt the previous run. The PR looks good to me overall and will approve once all tests pass.

Please note that after having all the desired features in the feature branch and having a full picture of the code changes, we might do a few rounds of refactors before merging into the master. This way we can unblock these individual PRs while keeping the code maintainable in the future.

sounds good! Thanks a lot. :)

The first phase is to implement an alternative vertical pipeline that syncs the histograms from clients to the label owner.

The first phase is to implement an alternative vertical pipeline that syncs the histograms from clients to the label owner. Co-authored-by: Ziyue Xu <[email protected]>

) The first phase is to implement an alternative vertical pipeline that syncs the histograms from clients to the label owner. Co-authored-by: Ziyue Xu <[email protected]>

ZiyueXu77 added 8 commits January 31, 2024 10:48

Add additional data split mode to cover the secure vertical pipeline

8570ba5

Add IsSecure info and update corresponding functions

2d00db6

Modify evaluate_splits to block non-label owners to perform hist comp…

ab17f5a

…ute under secure scenario

Continue using Allgather for best split sync for secure vertical, equ…

fb1787c

…valent to broadcast

Modify histogram sync scheme for secure vertical case, can identify g…

7a2a2b8

…lobal best split, but need to further apply split correctly

Sync cut informaiton across clients, full pipeline works for testing …

3ca3142

…case

Code cleanup, phase 1 of alternative vertical pipeline finished

22dd522

Code clean

52e8951

ZiyueXu77 marked this pull request as draft February 9, 2024 02:39

ZiyueXu77 marked this pull request as ready for review February 9, 2024 02:40

ZiyueXu77 mentioned this pull request Feb 9, 2024

Vertical Federated Learning with Secure Features (secure inference and encrypted training) RFC #9987

Closed

rongou reviewed Feb 9, 2024

View reviewed changes

include/xgboost/data.h Outdated Show resolved Hide resolved

ZiyueXu77 and others added 11 commits February 12, 2024 11:32

change kColS to kColSecure to avoid confusion with kCols

e9eef15

Add additional data split mode to cover the secure vertical pipeline

70e6ca6

Add IsSecure info and update corresponding functions

a54ea6a

Modify evaluate_splits to block non-label owners to perform hist comp…

6fe61dd

…ute under secure scenario

Continue using Allgather for best split sync for secure vertical, equ…

1c2b7ed

…valent to broadcast

Modify histogram sync scheme for secure vertical case, can identify g…

b36ff2b

…lobal best split, but need to further apply split correctly

Sync cut informaiton across clients, full pipeline works for testing …

0707731

…case

Code cleanup, phase 1 of alternative vertical pipeline finished

dce7609

Code clean

6cebc31

change kColS to kColSecure to avoid confusion with kCols

1562f52

Add one unit test

f31c824

trivialfis reviewed Feb 18, 2024

View reviewed changes

src/common/quantile.cc Outdated Show resolved Hide resolved

src/common/quantile.cc Outdated Show resolved Hide resolved

src/common/quantile.cc Show resolved Hide resolved

src/tree/hist/evaluate_splits.h Show resolved Hide resolved

ZiyueXu77 added 4 commits February 20, 2024 13:06

Merge branch 'SecureBoost' into add_alternate_vertical_splits

6fcbe02

Merge pull request #1 from YuanTingHsieh/add_alternate_vertical_splits

967e307

Add alternate vertical splits

Merge branch 'dmlc:master' into SecureBoost

04cd1cb

Merge branch 'dmlc:master' into SecureBoost

087a8dd

trivialfis mentioned this pull request Feb 28, 2024

Implement secure boost scheme - secure evaluation and validation (during training) without local feature leakage #10079

Merged

ZiyueXu77 changed the base branch from master to vertical-federated-learning February 28, 2024 19:42

ZiyueXu77 and others added 2 commits February 28, 2024 15:00

Merge branch 'vertical-federated-learning' into SecureBoost

616f68e

remove redundant print

7e407a8

trivialfis reviewed Feb 28, 2024

View reviewed changes

updates according to comments

add5dcd

fix linting issues

7d4b99d

trivialfis approved these changes Mar 1, 2024

View reviewed changes

trivialfis merged commit fe73294 into dmlc:vertical-federated-learning Mar 1, 2024
28 checks passed

ZiyueXu77 deleted the SecureBoost branch March 1, 2024 19:38

trivialfis pushed a commit to trivialfis/xgboost that referenced this pull request Jul 1, 2024

[secure boost] Vertical pipeline with hist sync (dmlc#10037)

ada7f43

The first phase is to implement an alternative vertical pipeline that syncs the histograms from clients to the label owner.

trivialfis mentioned this pull request Jul 1, 2024

[secure boost] Vertical pipeline with hist sync (#10037) #10528

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement secure boost scheme phase 1 - vertical pipeline with hist sync #10037

Implement secure boost scheme phase 1 - vertical pipeline with hist sync #10037

ZiyueXu77 commented Feb 9, 2024 •

edited

Loading

rongou commented Feb 9, 2024

trivialfis left a comment

trivialfis Feb 28, 2024

ZiyueXu77 Feb 28, 2024

trivialfis Feb 28, 2024

ZiyueXu77 Feb 28, 2024 •

edited

Loading

trivialfis Feb 29, 2024

trivialfis commented Feb 29, 2024

ZiyueXu77 commented Feb 29, 2024

Implement secure boost scheme phase 1 - vertical pipeline with hist sync #10037

Implement secure boost scheme phase 1 - vertical pipeline with hist sync #10037

Conversation

ZiyueXu77 commented Feb 9, 2024 • edited Loading

rongou commented Feb 9, 2024

trivialfis left a comment

Choose a reason for hiding this comment

trivialfis Feb 28, 2024

Choose a reason for hiding this comment

ZiyueXu77 Feb 28, 2024

Choose a reason for hiding this comment

trivialfis Feb 28, 2024

Choose a reason for hiding this comment

ZiyueXu77 Feb 28, 2024 • edited Loading

Choose a reason for hiding this comment

trivialfis Feb 29, 2024

Choose a reason for hiding this comment

trivialfis commented Feb 29, 2024

ZiyueXu77 commented Feb 29, 2024

ZiyueXu77 commented Feb 9, 2024 •

edited

Loading

ZiyueXu77 Feb 28, 2024 •

edited

Loading