Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBoost incremental training, issue with ONNX Conversion #18841

Open
kiransarv opened this issue Dec 15, 2023 · 11 comments
Open

XGBoost incremental training, issue with ONNX Conversion #18841

kiransarv opened this issue Dec 15, 2023 · 11 comments
Labels
stale issues that have not been addressed in a while; categorized by a bot training issues related to ONNX Runtime training; typically submitted using template

Comments

@kiransarv
Copy link

Describe the issue

Trained an XGBoost with incremental learning.

    batch_size = 1024
    print(vectors.shape, labels.shape, len(np.unique(labels)))
    self.model: XGBClassifier = XGBClassifier(**self.init_param)
    for start in range(0, vectors.shape[0], batch_size):
        itr_vector = vectors[start : start + batch_size]
        itr_label = labels[start : start + batch_size]
        if start == 0:
           self.model.fit(itr_vector, itr_label, **fit_params)
        else:
           fit_params["xgb_model"] = self.model
           self.model.fit(itr_vector, itr_label, **fit_params)

facing an issue with ONNX model
RUNTIME_EXCEPTION : Non-zero status code returned while running TreeEnsembleClassifier node. Name:'TreeEnsembleClassifier' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/ml/tree_ensemble_aggregator.h:201 void onnxruntime::ml::detail::TreeAggregatorSum<InputType, ThresholdType, OutputType>::ProcessTreeNodePrediction(onnxruntime::InlinedVector<onnxruntime::ml::detail::ScoreValue >&, const onnxruntime::ml::detail::TreeNodeElement&, gsl::span<const onnxruntime::ml::detail::SparseValue >) const [with InputType = float; ThresholdType = float; OutputType = float; onnxruntime::InlinedVector<onnxruntime::ml::detail::ScoreValue > = absl::lts_20220623::InlinedVector<onnxruntime::ml::detail::ScoreValue, 6, std::allocator<onnxruntime::ml::detail::ScoreValue > >] it->i < (int64_t)predictions.size() was false.

if not incremental model, only fitting one time self.model.fit(vectors, labels, **fit_params)
No issue with ONNX model, predictions are working fine.

To reproduce

Steps are detailed above.

Urgency

No response

Platform

Mac

OS Version

MacOS Ventura

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@yf711 yf711 added the training issues related to ONNX Runtime training; typically submitted using template label Dec 22, 2023
@kiransarv kiransarv changed the title XGBoost issue with ONNX XGBoost incremental training, issue with ONNX Conversion Jan 2, 2024
@baijumeswani
Copy link
Contributor

@xadupre would you please help with this issue?

@xadupre
Copy link
Member

xadupre commented Jan 4, 2024

This error means that a leaf returns a class index outside the expected number of classes. The attribute classlabels_int64s probably shorter than max(class_ids) but I wonder why it would happen. I'll need to know the version you used to train and convert the model (version of xgboost and onnxmltools).

@kiransarv
Copy link
Author

XGBoost version 2.0.2
ONNX Version 1.16.1

@xadupre
Copy link
Member

xadupre commented Jan 4, 2024

What about onnxmltools?

@kiransarv
Copy link
Author

onnxmltools 1.11.2

@xadupre
Copy link
Member

xadupre commented Jan 4, 2024

Is it possible to try with 1.12.0? We released it last month. It fixes some bugs with xgboost >= 2.0.

@kiransarv
Copy link
Author

Sure Thanks...

@kiransarv
Copy link
Author

Same error even after upgrading

@xadupre
Copy link
Member

xadupre commented Jan 8, 2024

Thanks for trying. I'll try to replicate your issue unless you already have a full script to share.

Copy link
Contributor

github-actions bot commented Feb 8, 2024

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Feb 8, 2024
@addisonklinke
Copy link

the attribute classlabels_int64s probably shorter than max(class_ids)

Thanks for the tip @xadupre. I've been trying to convert a PySpark XGBoost model, and because it doesn't have .classes_ from the sklearn implementation I had to fill that attribute myself. Initially I had the column names hardcoded and then realized I was fitting on one and setting the attribute with another which would indeed lead to len(classlabels_int64s) != max(class_ids)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale issues that have not been addressed in a while; categorized by a bot training issues related to ONNX Runtime training; typically submitted using template
Projects
None yet
Development

No branches or pull requests

5 participants