Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Web] Light GBM .ort model multiple times larger than .onnx model #17691

Open
rstokes92 opened this issue Sep 25, 2023 · 3 comments
Open

[Web] Light GBM .ort model multiple times larger than .onnx model #17691

rstokes92 opened this issue Sep 25, 2023 · 3 comments
Labels
stale issues that have not been addressed in a while; categorized by a bot

Comments

@rstokes92
Copy link

Describe the issue

When I create an Onnx Runtime model with python -m onnxruntime.tools.convert_onnx_models_to_ort to convert a Light GBM .onnx model the resulting .ort is 2-3 times larger than the original .onnx file. I think I understand that the ORT format primarily allows us to have a smaller build, but I was surprised that the model size increased.

Is this expected in general? Or could it be specific to LGBM/tree based model?

To reproduce

from onnxmltools import convert_lightgbm
from onnxmltools.convert.lightgbm.operator_converters.LightGbm import (
    convert_lightgbm,
)

from skl2onnx import convert_sklearn, update_registered_converter
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.shape_calculator import (
    calculate_linear_classifier_output_shapes,
)

from lightgbm import LGBMClassifier
from sklearn.datasets import load_iris


X, y = load_iris(return_X_y=True)

model = LGBMClassifier().fit(X, y)

update_registered_converter(
        LGBMClassifier,
        "LightGbmLGBMClassifier",
        calculate_linear_classifier_output_shapes,
        convert_lightgbm,
        options={"nocl": [True, False], "zipmap": [True, False]},
    )

dim = len(model.feature_name_)
initial_type = [("float_input", FloatTensorType([None, dim]))]
onnx_model = convert_sklearn(
    model,
    initial_types=initial_type,
    options={id(model): {"zipmap": False, "nocl": True}},
)

with open("lgbm_iris_test.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

Gives lgbm_iris_test.onnx with size of 142.3kb

python -m onnxruntime.tools.convert_onnx_models_to_ort "lgbm_iris_test.onnx" --output_dir "lgbm_iris_ort"

Results in lgnb_iris_ort/lgbm_iris_test.ort with size 337kb

onnx==1.14.1
onnxconverter-common==1.14.0
onnxmltools==1.11.2
onnxruntime==1.16.0
skl2onnx==1.15.0

Urgency

No response

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16

Execution Provider

Other / Unknown

@rstokes92 rstokes92 added the platform:web issues related to ONNX Runtime web; typically submitted using template label Sep 25, 2023
@hariharans29
Copy link
Member

hariharans29 commented Sep 25, 2023

CC: @skottmckay

Removing platform:web tag as this is not really a Web issue. Not sure what the appropriate tag is (probably a new tag tools as this is a question about the ONNX->ORT model conversion tool ?)

@hariharans29 hariharans29 removed the platform:web issues related to ONNX Runtime web; typically submitted using template label Sep 25, 2023
@skottmckay
Copy link
Contributor

It's expected when the model has traditional ML operators as we don't currently pack integers. e.g. the ids in a TreeEnsembleClassifier node would use 64 bits in the ORT format model flatbuffer but would be packed into just the required bits when saved in an onnx protobuf.

We've considered adding this but there's never been a production use case that required it. Doing so would also mean you couldn't reference the data directly in the ORT format model flatbuffer so it would potentially cost more memory at runtime.

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Oct 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

3 participants