Error in ONNX conversion #563

fgrosa · 2022-02-01T18:07:37Z

I tried to run the following script based on the example in the README.md changing backend from pytorch to onnx:

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from hummingbird.ml import convert, load

# Create some random data for binary classification
num_classes = 2
X = np.random.rand(1000, 4)
y = np.random.randint(num_classes, size=1000)

# Create and train a model (scikit-learn RandomForestClassifier in this case)
skl_model = RandomForestClassifier(n_estimators=10, max_depth=10)
skl_model.fit(X, y)

# Use Hummingbird to convert the model to ONNX
model = convert(skl_model, 'onnx', X[0:1])

# Run predictions on CPU
model.predict(X)

# Save the model
model.save('hb_model')

# Load the model back
model = load('hb_model')

Everything works as expected, but at the end of the execution I get the following error:

Fatal Python error: take_gil: PyMUTEX_LOCK(gil->mutex) failed
Python runtime state: finalizing (tstate=0x7fceb1409790)

Abort trap: 6

Is there something wrong in my installation or is something in the onnx conversion?

The text was updated successfully, but these errors were encountered:

ksaur · 2022-02-01T18:14:41Z

Hello and welcome! We have not seen that error before, and I'm pretty sure I personally have hit all errors onnx has to offer. :-D So I do not think it is specific to Hummingbird, but let's see. |
What platform are you using? Does it happen to be related to this rclpy/#805 ? Can you tell us a little more about your system and versions?

fgrosa · 2022-02-01T18:39:38Z

Hi @ksaur thanks for the quick reply! Indeed it looks like the same error, not sure if it is related.
My platform is the following:

Operating System: MacOS, 11.6.1 (non M1)
Installation type: pip
Version: hummingbird_ml[extra]>=0.4.2

ksaur · 2022-02-01T19:17:20Z

Ok, and do I understand correctly that it only happens when you change from torch to onnx for the convert? Is it possible that you just use torch instead?

Also, does it still happen when you kill python, and THEN call load on the model? You are able to still run predictions?

fgrosa · 2022-02-02T08:07:01Z

It is correct, it happens only with onnx. Unfortunately currently I need to use onnx because I need to apply the models on c++ and in our framework we use ONNXRuntime for this.

If I load the converted model and I call predict on a test sample, I get the correct output. The error only appears when I call convert.

ksaur · 2022-02-02T17:34:05Z

Hmm ok. Can share with us the full stack trace we can see where the error is in convert and try to sort it out? Also share version information on OS/GPU/Python/onnx libs? I'm assuming you installed onnxruntime-gpu as opposed to regular onnxruntime?

I am curious...does it work on your platform if you instead install onnxruntime and run on CPU? That would be a good baseline check to rule out any other issues.

We mostly test on Linux/Windows GPU, but do our best to also support Mac! We don't have access to a MacOS-enabled GPU in our pipeline.

I'm not sure if this will be helpful, but one idea could be to check onnx[runtime,tools,...] versions maybe? We had some bug (unrelated to this) which caused us to pin to this version of sklearn-onnx in our workflow. (This is likely not the issue, but I'm just trying to think of all possible ideas.) See also the versions in our most recent version of the workflow for MacOS python3.7-3.9 for which onnx versions we expect to work on CPU at least.

Please let us know how it goes, and we will try to help you further troubleshoot!

fgrosa · 2022-02-02T18:34:36Z

We are indeed working with CPU and therefore onnxruntime already. In principle we needed older versions of onnx, onnxruntime, and onnxmltools (onnx==1.8.0, onnxruntime==1.7.0, and onnxmltools==1.7.0), but I tried also with the latest versions I get no error, however it remains stuck after the last line and does not exit the script. I also tried by installing the sklearn-onnx that you have in your workflow, but unfortunately it does not help. I got the same behaviour also on Linux (CentOS7). Since I don't get a full stacktrace, do you have a command to suggest me to use? Thanks!

ksaur · 2022-02-02T22:04:22Z

If it is CPU-only, then I am surprised it doesn't work because we have tested onnx+cpu+mac quite a bit! Are you running the exact code above, or any modifications? What version of Python?

In the past, we've used this to force traceback:

import traceback
import sys

try:
    do_stuff()
except Exception:
    print(traceback.format_exc())
    # or
    print(sys.exc_info()[2])

For CentOS7, I would expect it to work; we've tested on other fedora-based distros but not that one specifically. Does it also hang there, or print GIL error? Is this on a VM on the same machine as your Mac, or a separate machine?

ksaur · 2022-02-03T01:26:09Z

I tried just now on CentOS7 with the above code with the older onnx versions and wasn't able to reproduce the error; it worked as expected, so I'm not sure.

ksaur · 2022-03-10T19:29:26Z

Hi @fgrosa I'm going to close this since I couldn't reproduce it. If you are still stuck, please reopen or file another issue! Thanks!

fgrosa mentioned this issue Feb 1, 2022

Add class for model conversion and test scripts hipe4ml/hipe4ml_converter#1

Merged

ksaur closed this as completed Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in ONNX conversion #563

Error in ONNX conversion #563

fgrosa commented Feb 1, 2022

ksaur commented Feb 1, 2022

fgrosa commented Feb 1, 2022

ksaur commented Feb 1, 2022

fgrosa commented Feb 2, 2022

ksaur commented Feb 2, 2022

fgrosa commented Feb 2, 2022

ksaur commented Feb 2, 2022 •

edited

Loading

ksaur commented Feb 3, 2022

ksaur commented Mar 10, 2022

Error in ONNX conversion #563

Error in ONNX conversion #563

Comments

fgrosa commented Feb 1, 2022

ksaur commented Feb 1, 2022

fgrosa commented Feb 1, 2022

ksaur commented Feb 1, 2022

fgrosa commented Feb 2, 2022

ksaur commented Feb 2, 2022

fgrosa commented Feb 2, 2022

ksaur commented Feb 2, 2022 • edited Loading

ksaur commented Feb 3, 2022

ksaur commented Mar 10, 2022

ksaur commented Feb 2, 2022 •

edited

Loading