-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault when trying to get feature importance of multilabel binary classifier #10686
Comments
Thank you for sharing! Will try to reproduce it. |
Hi @shreyaspuducheri23 , could you please share a reproducible example? I tried to following toy example and did not observe a segfault: from sklearn.datasets import make_multilabel_classification
import xgboost as xgb
X, y = make_multilabel_classification()
clf = xgb.XGBClassifier()
clf.fit(X, y)
clf.feature_importances_
clf.get_booster().get_score(importance_type='weight') |
Hi @trivialfis the issue arrises when using the vector leaf option:
|
Ah, the parameter is still working in progress. Will implement feature importance after sorting out some current work. |
I see, thank you! Do you have an estimated time frame- i.e. weeks, months, etc.? Just wondering whether it would be in my best interest to wait for the feature or just switch to one-output-per-tree for my current project. |
Opened a PR to add support for If the PR is approved, you can use the nightly build for testing. |
@trivialfis, I'm here because of the same issue @shreyaspuducheri23 has. I can see that your last change ( #10700) is approved and merged but I still can't access feature importance properly (When I tried, it just returned 0.0 as the feature importance for all my features) when I set multi_strategy to multi_output_tree. On a separate note, when I set multi_strategy to one_output_per_tree, I get a single 1D array of feature importance (even though I have 3 labels). What's going on under the hood, I was expecting to get feature importance for each label since three different independent models are built. |
I would like to work on this |
They were combined to represent the whole model instead of individual models.
Thank you for volunteering! Maybe #10700 can be a good start for looking into where it's calculated? |
Thanks @trivialfis for your response. When you say they were combined, what combination method is used? Is it average of all feature importance across all the models for each feature? |
Either total or average, depending on the type of the gain you specified. |
I am experiencing a segmentation fault with XGBoost 2.1.0 when trying to access feature importances in a multi-label binary classification model. The model trains and predicts as expected; however, when I attempt to retrieve feature importances using either
xgb_model.feature_importances_
orxgb_model.get_score(importance_type='weight')
, the process fails. In a Jupyter kernel, this results in a kernel crash, and when executed from the terminal, it outputs "Segmentation fault". The issue occurs specifically under these conditions, without any problems during other operations like fitting or predicting.The text was updated successfully, but these errors were encountered: