Skip to content
This repository has been archived by the owner on Feb 1, 2022. It is now read-only.

A potential refinement on document #123

Open
0as1s opened this issue Aug 9, 2021 · 0 comments
Open

A potential refinement on document #123

0as1s opened this issue Aug 9, 2021 · 0 comments

Comments

@0as1s
Copy link

0as1s commented Aug 9, 2021

When I started to deploy xgboost-operator on my kubeflow cluster, I referred to https://github.com/kubeflow/xgboost-operator/blob/master/config/samples/xgboost-dist/utils.py#L47 to implement my own version to read my own data. It's very common I follow this function to read parts of the whole data according to the rank manually.

However, I found that dmatrix already has an internal logic to only read parts of data when it detects distributed mode. Then my manual data reading causes each rank to only read 1/N*N instead of 1/N data.

I think it could be better if adding a comment in that function to guide the users to rewrite it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant