Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Very slow load of ONNX model in Windows #22219

Open
dhatraknilam opened this issue Sep 25, 2024 · 3 comments
Open

[Performance] Very slow load of ONNX model in Windows #22219

dhatraknilam opened this issue Sep 25, 2024 · 3 comments
Labels
performance issues related to performance regressions platform:windows issues related to the Windows platform stale issues that have not been addressed in a while; categorized by a bot

Comments

@dhatraknilam
Copy link

dhatraknilam commented Sep 25, 2024

Describe the issue

I am trying to load XGBoost onnx models using onnxruntime on Windows machine.
The model size is 52 mb and the RAM it is consuming on loading is 1378.9 MB. The time to load the model is 15 mins!!
This behavior is observed only on Windows, in Linux the models are loaded in few seconds. but the memory consumption is high in Linux as well.

I tried solution suggested in [https://github.com//issues/3802#issuecomment-624464802] but getting this error
AttributeError: 'onnxruntime.capi.onnxruntime_pybind11_state.SessionOptions' object attribute 'graph_optimization_level' is read-only

This is the simple code I used to load the model,
# sess = rt.InferenceSession(modelSav_path, providers=["CPUExecutionProvider"])

To reproduce

Train and a XGBoost classification model following params:
`

Classifier

update_registered_converter(
XGBClassifier,
"XGBoostXGBClassifier",
calculate_linear_classifier_output_shapes,
convert_xgboost,
options={"nocl": [True, False], "zipmap": [True, False, "columns"]},
)

param = {'n_estimators': 3435, 'max_delta_step': 6, 'learning_rate': 0.030567232354470994, 'base_score': 0.700889637773676, 'scale_pos_weight': 0.29833333651319716, 'booster': 'gbtree', 'reg_lambda': 0.0005531812782988272, 'reg_alpha': 4.8213852607021606e-05, 'subsample': 0.9816268623744107, 'colsample_bytree': 0.3187040821569215, 'max_depth': 17, 'min_child_weight': 2, 'eta': 6.2582977222245746e-06, 'gamma': 2.2248460288603035e-07, 'grow_policy': 'depthwise'}

x_train.columns = range(x_train.shape[1])
x_test.columns = range(x_train.shape[1])

pipe = Pipeline([("xgb", MultiOutputClassifier(XGBClassifier(**param)))])
pipe.fit(x_train.to_numpy(), y_train)
model_onnx = convert_sklearn(
pipe,
"pipeline_xgboost",
[("input", FloatTensorType([None, x_train.shape[1]]))],
verbose=1,
target_opset={"": 12, "ai.onnx.ml": 2},
)

with open("modelname.onnx", "wb") as f:
f.write(model_onnx.SerializeToString())
`

Train and a XGBoost regressor model following params:
`

Regressor

update_registered_converter(
XGBRegressor,
"XGBoostXGBRegressor",
calculate_linear_regressor_output_shapes,
convert_xgboost,

)

param = {'n_estimators': 3435, 'max_delta_step': 6, 'learning_rate': 0.030567232354470994, 'base_score': 0.700889637773676, 'scale_pos_weight': 0.29833333651319716, 'booster': 'gbtree', 'reg_lambda': 0.0005531812782988272, 'reg_alpha': 4.8213852607021606e-05, 'subsample': 0.9816268623744107, 'colsample_bytree': 0.3187040821569215, 'max_depth': 17, 'min_child_weight': 2, 'eta': 6.2582977222245746e-06, 'gamma': 2.2248460288603035e-07, 'grow_policy': 'depthwise'}

x_train.columns = range(x_train.shape[1])
x_test.columns = range(x_train.shape[1])

pipe = Pipeline([("xgb", MultiOutputRegressor(XGBRegressor(**param)))])
pipe.fit(x_train.to_numpy(), y_train)

model_onnx = convert_sklearn(
pipe,
"pipeline_xgboost",
[("input", FloatTensorType([None, x_train.shape[1]]))],
verbose=1,
target_opset={"": 12, "ai.onnx.ml": 2},
options={type(pipe):{'zipmap':False}}
)

with open("modelname.onnx", "wb") as f:
f.write(model_onnx.SerializeToString())`

Load the model with following code,
sess = rt.InferenceSession(modelSav_path, providers=["CPUExecutionProvider"])
And observe the load time and RAM usage.

Urgency

This is release critical issue, since we can't deliver these models with such low performance. Although the models are performing well, we are stuck with the loading time issue. We also thought to use other libraries to package the ML models but we don't have necessary compliance also we trust Microsoft.

Platform

Windows

OS Version

11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

@dhatraknilam dhatraknilam added the performance issues related to performance regressions label Sep 25, 2024
@github-actions github-actions bot added the platform:windows issues related to the Windows platform label Sep 25, 2024
@dhatraknilam dhatraknilam changed the title [Performance] Very slow load of ONNX model in memory in Windows [Performance] Very slow load of ONNX model in Windows Sep 25, 2024
@xadupre
Copy link
Member

xadupre commented Sep 27, 2024

This PR should solve this: #22043.

@dhatraknilam
Copy link
Author

This PR should solve this: #22043.

Thanks #xadupre for the prompt response will try it and update here.

Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance issues related to performance regressions platform:windows issues related to the Windows platform stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

2 participants