-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Federated learning plugin. #10410
[WIP] Federated learning plugin. #10410
Conversation
Many existing federated tests have to be disabled with the plugin:
Do we really want to merge it at this stage? |
Sounds like a no? :) |
Like the idea of limiting the datasplit to horizontal and vertical, while controlling the encryption with plugin (passby v.s. encrypt). Some notes on secure vertical v.s. our existing well-established vertical (@rongou 's implementation):
|
This is indeed concerning. The current implementation of this PR should be significantly slower than the existing one. Histogram build is the bottleneck in XGBoost, and the PR switches it into a single thread build from the existing parallel build. We can optimize it, of course. But it would be great if we could refactor them to use more shared code instead of having two different histogram build code paths. |
@rongou Please help review when you are available. We will be targeting a feather branch instead of the master branch for this PR. |
Brought back the tests by @rongou . Here's the configuration:
All of them have the column-split and row-split variant. I think this is getting out of hand now. We need a better way to manage them. |
The underlying logic is the same as #10231 , with a couple of changes in the interface. This PR is still extremely early, opening it for collabration and discussion. The code is not ready for merge.
At the moment, vertical learning is blocked by a IPC issue in nvflare where the message size can change for the allgather V call. I haven't tested horizontal yet.
Difference
A C interface
The original PR proposes a C++ interface. This appears to be convenient but has some drawbacks:
malloc
andfree
calls (along withnew
anddelete
) should come from the same version oflibc
and C++ runtime, along with the C++ compiler. XGBoost is distributed in binary format in PyPI, which makes this requirement difficult to fulfill. This rule applies to CUDA memory as well.Split up the histogram implementation
Remove secure data split
collective::IsFederated()
will be the only source of information about whether training is performed in an encrypted space. If we don't remove the secure split in data, we will have 8 different states:row-split/col-split x secure/insecure x federated/normal
, which is difficult to maintain.Rebased to the new collective implementation
This PR uses the latest GRPC backend.
Interface
The plugin interface I have at the moment has two major steps for both horizontal and vertical:
Takes in raw data (vertical) or histogram (horizontal), and returns an encrypted/encoded histogram.
Takes an encrypted histogram, and returns a decrypted one.
The plugin itself owns all the buffers, and is freed after the plugin is closed. In addition, minimal data copying is required from the interface's perspective. For instance, we can directly reuse the row partition as input for the plugin. Lastly, since the histogram procedure is split into a policy class, feel free to modify the plugin interface in the future as you see fit.