Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stuck at "foreachPartition at XGBoost.scala:565" #10795

Open
fkjhaflkjgg opened this issue Sep 2, 2024 · 4 comments
Open

stuck at "foreachPartition at XGBoost.scala:565" #10795

fkjhaflkjgg opened this issue Sep 2, 2024 · 4 comments

Comments

@fkjhaflkjgg
Copy link

I met this issue many times. it sometimes almost hang for more than 10 hour while normal application just takes only 1 hour to succeed.
small dataset (100w+ samples) and big dataset(1000w+ samples) both occur this issue.

XGBoostSpark: Running XGBoost 1.0.0 with parameters:
alpha -> 0.0
min_child_weight -> 300.0
sample_type -> uniform
base_score -> 0.5
weight_col ->
rabit_timeout -> -1
colsample_bylevel -> 1.0
grow_policy -> depthwise
skip_drop -> 0.0
lambda_bias -> 0.0
silent -> 0
scale_pos_weight -> 1.0
seed -> 0
cache_training_set -> false
features_col -> features
num_early_stopping_rounds -> 0
label_col -> label
num_workers -> 200
subsample -> 1.0
lambda -> 1.0
max_depth -> 6
probability_col -> probability
raw_prediction_col -> rawPrediction
tree_limit -> 0
custom_eval -> null
dmlc_worker_connect_retry -> 5
rate_drop -> 0.0
max_bin -> 16
train_test_ratio -> 1.0
use_external_memory -> false
objective -> binary:logistic
eval_metric -> auc
num_round -> 200
timeout_request_workers -> 1800000
missing -> 0.0
rabit_ring_reduce_threshold -> 32768
checkpoint_path ->
tracker_conf -> TrackerConf(0,python)
tree_method -> hist
max_delta_step -> 0.0
eta -> 0.15
verbosity -> 1
colsample_bytree -> 1.0
normalize_type -> tree
allow_non_zero_for_missing -> false
custom_obj -> null
gamma -> 0.0
sketch_eps -> 0.03
nthread -> 1
prediction_col -> prediction
checkpoint_interval -> -1

@fkjhaflkjgg
Copy link
Author

The same issue as the issue 5013(#5013).

@fkjhaflkjgg
Copy link
Author

when I change the tree_method to "approx", the application can succeed, but which may lost some precision.

@fkjhaflkjgg
Copy link
Author

compare to logs of success application, the logs of stuck application lack of this sentences "24/09/02 12:04:09 INFO MemoryStore: Block rdd_43_0 stored as values in memory (estimated size 984.0 B, free 7.8 GB)
24/09/02 12:04:11 INFO Executor: 1 block locks were not released by TID = 3005:
[rdd_43_0]" .

@wbo4958
Copy link
Contributor

wbo4958 commented Sep 3, 2024

@fkjhaflkjgg, could you try the latest XGBoost from https://mvnrepository.com/artifact/ml.dmlc ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants