Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update vertical xgb examples for gpu support #1977

Merged
merged 8 commits into from
Sep 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions examples/advanced/vertical_xgboost/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ Before starting please make sure you set up a [virtual environment](../../../REA
python3 -m pip install -r requirements.txt
```

> **_NOTE:_** If vertical federated learning support is not available in the XGBoost PyPI release yet, reinstall XGBoost from a [wheel](https://xgboost.readthedocs.io/en/stable/install.html#nightly-build) with a recent commit.
> **_NOTE:_** If vertical federated learning support or GPU support is not available in the XGBoost PyPI release yet, either reinstall XGBoost from a [wheel](https://xgboost.readthedocs.io/en/stable/install.html#nightly-build) with a recent commit from the master branch, or build from [source](https://github.com/dmlc/xgboost/blob/master/plugin/federated/README.md). When building XGBoost from source, ensure that gRPC, CUDA, and NCCL are installed with sufficient versions and use the cmake options `-DPLUGIN_FEDERATED -DUSE_CUDA -DUSE_NCCL` (`-DNCCL_LIBRARY -DUSE_NCCL_LIB_PATH` might also be needed depending on the location of NCCL). Lastly, we recommend using a [cuda image](https://hub.docker.com/r/nvidia/cuda/tags) if you prefer working with docker.

## Preparing HIGGS Data
In this example we showcase a binary classification task based on the [HIGGS dataset](https://archive.ics.uci.edu/dataset/280/higgs), which contains 11 million instances, each with 28 features and 1 class label.

### Download and Store Dataset
We first download the dataset from the HIGGS link above, which is a single zipped `.csv` file.
First download the dataset from the HIGGS link above, which is a single zipped `.csv` file.
By default, we assume the dataset is downloaded, uncompressed, and stored in `~/dataset/HIGGS.csv`.

### Vertical Data Splits
Expand Down Expand Up @@ -47,7 +47,13 @@ Next, we can use `FedXGBHistogramExecutor` and set XGBoost training parameters i

Lastly, we must subclass `XGBDataLoader` and implement the `load_data()` method. For vertical federated learning, it is important when creating the `xgb.Dmatrix` to set `data_split_mode=1` for column mode, and to specify the presence of a label column `?format=csv&label_column=0` for the csv file. To support PSI, the dataloader can also read in the dataset based on the calculated intersection, and split the data into training and validation.

> **_NOTE:_** For secure mode, make sure to provide the required certificates for the federated communicator. Also as of now, GPUs are not yet supported by vertical federated XGBoost.
> **_NOTE:_** For secure mode, make sure to provide the required certificates for the federated communicator.

### GPU Support
By default, CPU based training is used.

In order to enable GPU accelerated training, first ensure that your machine has CUDA installed and has at least one GPU.
In `config_fed_client.json` set `"use_gpus": true` and `"tree_method": "hist"` in `xgb_params`. Then, in `FedXGBHistogramExecutor` we use the `device` parameter to map each rank to a GPU device ordinal in `xgb_params`. If using multiple GPUs, we can map each rank to a different GPU device, however you can also map each rank to the same GPU device if using a single GPU.

## Run the Example
Run the vertical xgboost job:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,12 @@
"gamma": 1.0,
"max_depth": 8,
"min_child_weight": 100,
"tree_method": "approx",
"tree_method": "hist",
"grow_policy": "depthwise",
"eval_metric": "auc"
},
"data_loader_id": "dataloader"
"data_loader_id": "dataloader",
"use_gpus": false
}
}
}
Expand Down
21 changes: 18 additions & 3 deletions nvflare/app_opt/xgboost/histogram_based/executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,15 @@ class FedXGBHistogramExecutor(Executor):
This class implements a basic xgb_train logic, feel free to overwrite the function for custom behavior.
"""

def __init__(self, num_rounds, early_stopping_rounds, xgb_params: dict, data_loader_id: str, verbose_eval=False):
def __init__(
self,
num_rounds,
early_stopping_rounds,
xgb_params: dict,
data_loader_id: str,
verbose_eval=False,
use_gpus=False,
):
"""Federated XGBoost Executor for histogram-base collaboration.

This class sets up the training environment for Federated XGBoost.
Expand All @@ -88,14 +96,17 @@ def __init__(self, num_rounds, early_stopping_rounds, xgb_params: dict, data_loa
https://xgboost.readthedocs.io/en/stable/python/python_api.html#module-xgboost.training
data_loader_id: the ID points to XGBDataLoader.
verbose_eval: verbose_eval in xgboost.train
use_gpus: flag to enable gpu training
"""
super().__init__()
self.app_dir = None

self.num_rounds = num_rounds
self.early_stopping_rounds = early_stopping_rounds
self.verbose_eval = verbose_eval
self.xgb_params = xgb_params
self.data_loader_id = data_loader_id
self.verbose_eval = verbose_eval
self.use_gpus = use_gpus

self.rank = None
self.world_size = None
Expand All @@ -104,7 +115,6 @@ def __init__(self, num_rounds, early_stopping_rounds, xgb_params: dict, data_loa
self._client_key_path = None
self._client_cert_path = None
self._server_address = "localhost"
self.data_loader_id = data_loader_id
self.train_data = None
self.val_data = None

Expand Down Expand Up @@ -236,6 +246,11 @@ def train(self, shareable: Shareable, fl_ctx: FLContext, abort_signal: Signal) -
self.rank = rank_map[client_name]
self.world_size = world_size

if self.use_gpus:
# mapping each rank to a GPU (can set to cuda:{0} if simulating with only one gpu)
self.log_info(fl_ctx, f"Training with GPU {self.rank}")
self.xgb_params["device"] = f"cuda:{self.rank}"

self.log_info(fl_ctx, f"Using xgb params: {self.xgb_params}")
params = XGBoostParams(
xgb_params=self.xgb_params,
Expand Down