Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export to ONNX and use ONNX Runtime, working. Guide. #746

Open
Kromtar opened this issue Jun 5, 2022 · 41 comments
Open

Export to ONNX and use ONNX Runtime, working. Guide. #746

Kromtar opened this issue Jun 5, 2022 · 41 comments

Comments

@Kromtar
Copy link

Kromtar commented Jun 5, 2022

This is an explanation of how to export the recognition model and the detection model to ONNX format. Then a brief explanation of how to use ONNX Runtime to use these models.

ONNX is an intercompatibility standard for AI models. It allows us to use the same model in different types of programming languages, operating systems, acceleration platforms and runtimes. Personally I need to make a C++ build of EasyOCR functionality. After failing, due to several reasons, to make a C++ build using Pytorch and the EasyOCR models, I found that the best solution is to transform the models to ONNX and then program in C++ using ONNX Runtime. Then, compiling is very easy compared to PyTorch.

Due to time constraints I am not presenting a PR. It will be necessary for you to modify a copy of EasyOCR locally.

Requirements

We must install the modules: onnx and onnxruntime. In my case I also had to manually install the protobuf module in version 3.20.

I am using:

  • EasyOCR 1.5.0
  • Python 3.9.9
  • torch 1.10.1
  • torchvision 0.11.2
  • onnx 1.11.0
  • onnxruntime 1.11.1

Exporting ONNX models

The best place to modify the EasyOCR code to export the models is right after EasyOCR uses the loaded model to perform the prediction.

Exporting detection model

In easyocr/detection.py after y, feature = net(x) (line 46) add:

    batch_size_1 = 500
    batch_size_2 = 500
    in_shape=[1, 3, batch_size_1, batch_size_2]
    dummy_input = torch.rand(in_shape)
    dummy_input = dummy_input.to(device)

    torch.onnx.export(
        net.module,
        dummy_input,
        "detectionModel.onnx",
        export_params=True,
        opset_version=11,
        input_names = ['input'],
        output_names = ['output'],
        dynamic_axes={'input' : {2 : 'batch_size_1', 3: 'batch_size_2'}},
    )

We generate a dumb input, totally random, so that onnx can perform the export. It doesn't matter the input, the important thing is that it has the correct structure. The detection model uses an input that is a 4-dimensional tensor, where the first dimension always has a value of 1, the second a value of 3 and the third and fourth values depend on the resolution of the analyzed image. I have assumed this conclusion after analyzing the data flow, I may be in error and this needs to be corrected.

Note that we export with the parameters (export_params=True) and specify that the two final dimensions of the input tensor are of dynamic size (dynamic_axes=...).

Then we can add this code to immediately import the exported model and validate that it is not corrupted:

onnx_model = onnx.load("detectionModel.onnx")
try:
    onnx.checker.check_model(onnx_model)
except onnx.checker.ValidationError as e:
    print('The model is invalid: %s' % e)
else:
    print('The model is valid!')

Remember to import onnx in the file header.

To run the export just use EasyOCR and perform an analysis on any image indicating the language to be detected. This will download the corresponding model, run the detection and simultaneously export the model. If we change the language we will have to export a new model. Once the model is exported, we can comment or delete the code.

Exporting the recognition model

This model is a bit more difficult to export and we will have to do some black magic.

In easyocr/recognition.py after preds = model(image, text_for_pred) (line 111) add:

    batch_size_1_1 = 500
    in_shape_1=[1, 1, 64, batch_size_1_1]
    dummy_input_1 = torch.rand(in_shape_1)
    dummy_input_1 = dummy_input_1.to(device)

    batch_size_2_1 = 50
    in_shape_2=[1, batch_size_2_1]
    dummy_input_2 = torch.rand(in_shape_2)
    dummy_input_2 = dummy_input_2.to(device)

    dummy_input = (dummy_input_1, dummy_input_2)

    torch.onnx.export(
        model.module,
        dummy_input,
        "recognitionModel.onnx",
        export_params=True,
        opset_version=11,
        input_names = ['input1','input2'],
        output_names = ['output'],
        dynamic_axes={'input1' : {3 : 'batch_size_1_1'}},
    )

As with the detection model, we create a dumb input to be able to export the model. In this case, the model input is 2 elements.

The first element is a 4-dimensional tensor, where the first dimension always has a value of 1, the second a value of 1, the third a value of 64 and the fourth a dynamic value.

The second element is a 2-dimensional tensor, where the first dimension always has a value of 1 and the second a dynamic value.

Again, I may be wrong about the structure of these inputs, it was what I observed empirically.

First strange thing: ONNX for some reason, in performing its analysis of the model structure, concludes that the second input element does not perform any function. So even if we tell ONNX to export a model with 2 input elements, it will always export a model with 1 input element. It appears that this is due to an internal ONNX process where it "cuts" parts of the network defining graph that do not alter the network output. According to the documentation we can stop this "cutting" process and export the network without optimization using the do_constant_folding=False parameter as an option. But due to a bug it is not taking effect. In spite of the above, we can observe that this lack of the second element does not generate losses in the accuracy of the model. For this reason, in the dynamic elements (dynamic_axes=) we only define one element where its third dimension is variable in size. If anyone manages to export the model with the two input elements, it would be appreciated if you could notify us.

Second strange thing: In order to export the recognition model, we must edit easyocr/model/vgg_model.py. It turns out that the AdaptiveAvgPool2d operator is not fully supported by ONNX. When it uses the "None" option, in the configuration tuple (which indicates that the size must be equal to the input), the export fails. To fix this we need to change line 11:

From
self.AdaptiveAvgPool = nn.AdaptiveAvgPool2d((None, 1))
to
self.AdaptiveAvgPool = nn.AdaptiveAvgPool2d((256, 1))

Why 256? I don't know. Is there a better option? I have not found one. Does it generate errors in the model? I have not been able to find any accuracy problems. If someone can explain why with 256 it works and what the consequences are, it would be appreciated.

Well then, just like the detection model we can add these lines to validate the exported model:

onnx_model = onnx.load("detectionModel.onnx")
try:
    onnx.checker.check_model(onnx_model)
except onnx.checker.ValidationError as e:
    print('The model is invalid: %s' % e)
else:
    print('The model is valid!')

Remember to import onnx in the file header.

To export the recognition model we must run EasyOCR using any image and the desired language. In the process you will see that some alerts will be generated, but you can ignore them. The model will be exported several times, since the added code has been placed inside a for loop. But this should not cause any problems. Remember to comment or remove the added code afterwards. If you change language, you must export a new ONNX model.

Using ONNX models in EasyOCR

To test and validate that the models work, we will modify the code again. This time we will comment the lines where EasyOCR uses the Pytorch prediction and we will add the code to use ONNX Runtime to perform the prediction.

Using the ONNX detection model

First we must add this helper function to the file easyocr/detection.py:

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

Then we must comment on linear 46 where it says y, feature = net(x). After this line we must add:

ort_session = onnxruntime.InferenceSession("detectionModel.onnx")
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
ort_outs = ort_session.run(None, ort_inputs)
y = ort_outs[0]

Remember to import onnxruntime in the file header.

In this way we load the ONNX model of detection and pass as input the value "x". Since ONNX does not use Pytorch, we must convert "x" from a Tensor to a standard numpy array. Para eso usamos la función de ayuda The output of ONNX is left in the "y" variable.

One last modification must be made on lines 51 and 52. Change from:

score_text = out[:, :, 0].cpu().data.numpy()
score_link = out[:, :, 1].cpu().data.numpy()

to

score_text = out[:, :, 0]
score_link = out[:, :, 1]

This is because the model output is already a numpy array and does not need to be converted from a Tensor.

To test, we can run EasyOCR with some image and see the result.

Using the ONNX recognition model

We must add the help function to the file easyocr/recognition.py:

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

Then we must comment on linear 111 to stop using PyTorch prediction: preds = model(image, text_for_pred). And right after that add:

ort_session = onnxruntime.InferenceSession("recognitionModel.onnx")
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(image)}
ort_outs = ort_session.run(None, ort_inputs)
preds = torch.from_numpy(ort_outs[0])

Remember to import onnxruntime in the file header.

We can see how we are only passing one input entity. Although this model, in theory, is supposed to receive two. As with the detection model, the input must be transformed from a Tensor to a numpy array. We convert the output from an array to a Tensor, so that the data flow continues normally.

To test, we can run EasyOCR with some image and see the result.

Others

We can use this function to compare the output of the PyTorch model and the ONNX model to quantify the difference:

np.testing.assert_allclose(to_numpy(<PYTORCH_PREDICTION>), <ONNX_PREDICTION>, rtol=1e-03, atol=1e-05)

In my tests, the difference between the detection models is minimal and passes the test correctly.

In case of the difference in the recognition models, the difference is slightly larger and the test fails. In spite of this it fails by very little and I have not observed failures in the actual recognition of the characters. I don't know if this is due to the problem with ONNX not detecting the two input entities, the problem with AdaptiveAvgPool2d or just a natural error in the model export and decimal approximations.

Final note

I hope this will be of help to continue with the development of this excellent tool. I hope that exporters in EasyOCR and Pytorch can review this and find the answers to the questions raised.

@AutumnSun1996
Copy link

Looks great.

The pytorch library will take nearly 3GB, way too large to publish.

I managed to get a simple working demo with onnxruntime, and generated folder from pyinstaller is 376MB (167MB zipped).

Is there anyone else interested in implementing a runtime version of easyocr, with no torch dependency?

@Kromtar
Copy link
Author

Kromtar commented Jun 7, 2022

@AutumnSun1996 I'm working on it in my spare time.

I am having doubts if the transition process I mentioned in the guide from PyTorch to ONNX generates lower accuracy and/or performance in EasyOCR. Could you confirm me if in your ONNX implementation you observe the same accuracy and performance as in PyTorch? Have you tested the performance of the ONNX runtime using CUDA?

@AutumnSun1996
Copy link

Found similar behavior for exported model: diff in the recognition model output, but final text are the same.
I did not test the performance, since my goal is to minimize package size. Maybe I can do some simple checks when I got some spare time.

@Itaybre
Copy link

Itaybre commented Jun 24, 2022

@Kromtar can I ask you which version of PyTorch you have?
I cannot export the Recognition Model, I get this error:

Traceback (most recent call last):
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/read_ocr.py", line 39, in <module>
    scan3 = detector.read_easyocr(image)
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/MachineLearning/LicenseReader.py", line 48, in read_easyocr
    result = self.easyocr.readtext(optimized_image, allowlist='ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 400, in readtext
    result = self.recognize(img_cv_grey, horizontal_list, free_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 330, in recognize
    result0 = get_text(self.character, imgH, int(max_width), self.recognizer, self.converter, image_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 246, in get_text
    result1 = recognizer_predict(recognizer, converter, test_loader,batch_max_length,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 128, in recognizer_predict
    torch.onnx.export(
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 493, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 437, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 385, in _trace_and_get_graph_from_model
    orig_state_dict_keys = _unique_state_dict(model).keys()
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/jit/_trace.py", line 71, in _unique_state_dict
    filtered_dict[k] = v.detach()
AttributeError: __torch__.torch.classes.rnn.CellParamsBase (of Python compilation unit at: 0x0) does not have a field with name 'detach'

@Kromtar
Copy link
Author

Kromtar commented Jun 24, 2022

@Itaybre

python 3.9.9
torch 1.10.1
torchvision 0.11.2
onnx 1.11.0
onnxruntime 1.11.1

These are the versions of the main packages I have installed

@dovanhuong
Copy link

@Kromtar can I ask you which version of PyTorch you have? I cannot export the Recognition Model, I get this error:

Traceback (most recent call last):
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/read_ocr.py", line 39, in <module>
    scan3 = detector.read_easyocr(image)
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/MachineLearning/LicenseReader.py", line 48, in read_easyocr
    result = self.easyocr.readtext(optimized_image, allowlist='ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 400, in readtext
    result = self.recognize(img_cv_grey, horizontal_list, free_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 330, in recognize
    result0 = get_text(self.character, imgH, int(max_width), self.recognizer, self.converter, image_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 246, in get_text
    result1 = recognizer_predict(recognizer, converter, test_loader,batch_max_length,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 128, in recognizer_predict
    torch.onnx.export(
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 493, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 437, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 385, in _trace_and_get_graph_from_model
    orig_state_dict_keys = _unique_state_dict(model).keys()
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/jit/_trace.py", line 71, in _unique_state_dict
    filtered_dict[k] = v.detach()
AttributeError: __torch__.torch.classes.rnn.CellParamsBase (of Python compilation unit at: 0x0) does not have a field with name 'detach'

hello, I'm having the same problems, did you solve this?

@Itaybre
Copy link

Itaybre commented Jul 5, 2022

@Kromtar can I ask you which version of PyTorch you have? I cannot export the Recognition Model, I get this error:

Traceback (most recent call last):
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/read_ocr.py", line 39, in <module>
    scan3 = detector.read_easyocr(image)
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/MachineLearning/LicenseReader.py", line 48, in read_easyocr
    result = self.easyocr.readtext(optimized_image, allowlist='ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 400, in readtext
    result = self.recognize(img_cv_grey, horizontal_list, free_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 330, in recognize
    result0 = get_text(self.character, imgH, int(max_width), self.recognizer, self.converter, image_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 246, in get_text
    result1 = recognizer_predict(recognizer, converter, test_loader,batch_max_length,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 128, in recognizer_predict
    torch.onnx.export(
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 493, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 437, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 385, in _trace_and_get_graph_from_model
    orig_state_dict_keys = _unique_state_dict(model).keys()
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/jit/_trace.py", line 71, in _unique_state_dict
    filtered_dict[k] = v.detach()
AttributeError: __torch__.torch.classes.rnn.CellParamsBase (of Python compilation unit at: 0x0) does not have a field with name 'detach'

hello, I'm having the same problems, did you solve this?

Unfortunately no, I ended up exporting the default model to an older version of PyTorch to make it work on my environment

@MaxAntson
Copy link

Getting the same error here, with the same versions installed as specified by @Kromtar (thanks for your hard work on this btw, awesome that you managed to do this and hopefully more of us will follow!). If anybody has a solution it'd be great to hear - I'll keep exploring also and will post if I find something.

Also just another note that I got an error prior to this and had to change model.module to model when running torch.onnx.export() - this worked to export the detector.

@Kromtar can I ask you which version of PyTorch you have? I cannot export the Recognition Model, I get this error:

Traceback (most recent call last):
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/read_ocr.py", line 39, in <module>
    scan3 = detector.read_easyocr(image)
  File "/Users/itaybrenner/tesis/OnDevice/OnDevice/MachineLearning/LicenseReader.py", line 48, in read_easyocr
    result = self.easyocr.readtext(optimized_image, allowlist='ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 400, in readtext
    result = self.recognize(img_cv_grey, horizontal_list, free_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/easyocr.py", line 330, in recognize
    result0 = get_text(self.character, imgH, int(max_width), self.recognizer, self.converter, image_list,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 246, in get_text
    result1 = recognizer_predict(recognizer, converter, test_loader,batch_max_length,\
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/easyocr/recognition.py", line 128, in recognizer_predict
    torch.onnx.export(
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 493, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 437, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 385, in _trace_and_get_graph_from_model
    orig_state_dict_keys = _unique_state_dict(model).keys()
  File "/Users/itaybrenner/tesis/venv/lib/python3.9/site-packages/torch/jit/_trace.py", line 71, in _unique_state_dict
    filtered_dict[k] = v.detach()
AttributeError: __torch__.torch.classes.rnn.CellParamsBase (of Python compilation unit at: 0x0) does not have a field with name 'detach'

hello, I'm having the same problems, did you solve this?

@skottmckay
Copy link

    dynamic_axes={'input' : {2 : 'batch_size_1', 3: 'batch_size_2'}},

Technically it would be clearer to call these 'height' and 'width' respectively. The first dimension is the batch size (1), the second the number of channels (3), the third the height and fourth the width as the input is in NCHW format.

FWIW if your input during inferencing will have a fixed height and width, there are potential performance benefits from leaving those dimensions as fixed sizes instead of using dynamic_axes for them. e.g. constant folding may be able to pre-calculate some values during model loading instead of during every inference.

@Kromtar
Copy link
Author

Kromtar commented Jul 11, 2022

Hi everyone, thanks for the feedback. I have been incredibly busy the last 3 weeks. I finally have some free time.

I will check again the bug to find the solution, since in my environment it is working perfectly.

Best regards.

@bit-scientist
Copy link

Hi, @Kromtar. What is the next step for testing with real images in mobile phones? I am sorry if it's too obvious, but I am not familiar with onnx at all and need to know what to do to test images using android/ios phones? Is there any environment where you connect phone with easyOCR? Thank you.

Also, since I am into it anyway, I'd love to learn more about every single step to make any ml/dl model work in mobile phones. I'd appreciate if you can share materials (posts, projects, etc.) related to this task. Thanks once again.

@Kromtar
Copy link
Author

Kromtar commented Jul 16, 2022

Okay guys
I found the source of the error mentioned by @Itaybre.

The problem is that "torch.onnx.export", in case of the recognition model, only works when EasyOCR is running in GPU mode (i.e. using cuda cores). Apparently this is due to how ONNX proceeds to parse and export a very specific network layer, only used by the recognition model. I have not been able to find a solution that does not involve making substantial changes to Torch.

My recommendation is to follow the guide, but make sure EasyOCR is running in GPU mode. For this we will be required to have an NVIDIA graphics card with CUDA and the corresponding drivers installed.

Soon I will publish a container with everything previously configured.

...I tried my best to make the export work without having to have a CUDA enabled card... but I didn't succeed, I'm sorry ;(

@Kromtar
Copy link
Author

Kromtar commented Jul 17, 2022

I have created a new issue where I have made available the ONNX version of the EasyOCR models for all languages.
Feel free to download and use them.

@Kromtar
Copy link
Author

Kromtar commented Jul 17, 2022

I have published a branch in this fork where you can find the whole process using containers. You can see the readme to understand how to use it.

@long-senpai
Copy link

@Kromtar Thank you for your convert code. I have a dummy question, in your code for conversion recognition model to onnx format, I saw we have 2 input, but when I use Netron app to preview the onnx file, but I cannot find input_2 in model converted, I also observe your converted models
image

@Kromtar
Copy link
Author

Kromtar commented Aug 10, 2022

@long-senpai I don't really know why this happens in the conversion process. I think that ONNX, when optimizing the model, discovers that the weights provided by input 2 are unnecessary; so it deletes them.

The last few weeks I have been working on comparing performances between the models before and after converting.

What I can confirm is that independent of input 2, the output of the converted model is the same as the output of the original model. So don't worry, there is no loss of performance.

Again, the origin of why ONNX does that, I don't know.

@Phelan164
Copy link

Phelan164 commented Aug 28, 2022

@Kromtar does detection model in ONNX format support running with batch?

@Kromtar I tried export with dynamic batch image and it worked.

@nissansz
Copy link

nissansz commented Jan 15, 2023

Do you have exported onnx models for generation2 to download directly? Can these onnx files be used directly without EasyOcr?
Any dict file needed?

@A2va
Copy link
Contributor

A2va commented Jan 17, 2023

My recommendation is to follow the guide, but make sure EasyOCR is running in GPU mode. For this we will be required to have an NVIDIA graphics card with CUDA and the corresponding drivers installed.

I found how to export ONNX model with CPU only for the recognition model. In fact, it needs to remove quantization, so need to comment these lines.

if quantize:
  try:
      torch.quantization.quantize_dynamic(model, dtype=torch.qint8, inplace=True)
  except:
      pass

@samiechan
Copy link

Having troubles exporting the model

Traceback (most recent call last):
  File "/home/user/anaconda3/envs/easyocr/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py", line 968, in symbolic_fn
    output_size = _parse_arg(output_size, "is")
  File "/home/user/anaconda3/envs/easyocr/lib/python3.9/site-packages/torch/onnx/symbolic_helper.py", line 83, in _parse_arg
    raise RuntimeError("Failed to export an ONNX attribute '" + v.node().kind() +
RuntimeError: Failed to export an ONNX attribute 'onnx::Gather', since it's not constant, please try to make things (e.g., kernel size) static if possible

@samiechan
Copy link

samiechan commented Feb 7, 2023

Nvm, I needed to update nn.AdaptiveAvgPool2d in my custom_model.py

@samiechan
Copy link

samiechan commented Feb 22, 2023

Here is another approach to export the CRAFT (detection) model to ONNX format on Windows using WSL2:

  1. Install conda on WSL2 Ubuntu 20.01
  2. Install the necessary libraries and create a new environment:
conda create -n easyocr
conda activate easyocr
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
pip install onnxruntime-gpu
pip install easyocr
  1. Download the CRAFT model file from the EasyOCR release page:
wget https://github.com/JaidedAI/EasyOCR/releases/download/pre-v1.1.6/craft_mlt_25k.zip
unzip craft_mlt_25k.zip
  1. Load the CRAFT model and export it to ONNX format:
from easyocr import detection
import torch

# load model using CPU - default
model = detection.get_detector(trained_model='craft_mlt_25k.pth', device='cpu', quantize=False)
# load model using GPU - for custom models if trained on gpu
# model = detection.get_detector(trained_model='craft_mlt_25k.pth', device='cuda:0', quantize=False)
dummy_input = torch.randn(1, 3, 384, 512)
torch.onnx.export(model, dummy_input, "craft.onnx")
  1. Perform text detection using the ONNX model using ONNX Runtime (also see # Export to ONNX clovaai/CRAFT-pytorch#4):
import onnxruntime as rt
import cv2
import numpy as np
from easyocr.craft_utils import getDetBoxes, adjustResultCoordinates
from easyocr.imgproc import resize_aspect_ratio, normalizeMeanVariance
from easyocr.utils import reformat_input

# Read input image
img, _ = reformat_input('https://jeroen.github.io/images/testocr.png')

# Resize and normalize input image
img_resized, target_ratio, size_heatmap = resize_aspect_ratio(img, 512, interpolation=cv2.INTER_LINEAR, mag_ratio=1.)
ratio_h = ratio_w = 1 / target_ratio
x = normalizeMeanVariance(img_resized)
x = torch.from_numpy(x).permute(2, 0, 1).unsqueeze(0)

# Create ONNX Runtime session and load model
providers = ['CPUExecutionProvider']
session = rt.InferenceSession("craft.onnx", providers=providers)
input_name = session.get_inputs()[0].name

# Prepare input tensor for inference
inp = {input_name: x.numpy()}

# Run inference and get output
y, _ = session.run(None, inp)

# Extract score and link maps
score_text = y[0, :, :, 0]
score_link = y[0, :, :, 1]

# Post-processing to obtain bounding boxes and polygons
boxes, polys, mapper = getDetBoxes(score_text, score_link, 0.5, 0.4, 0.4)
boxes = adjustResultCoordinates(boxes, ratio_w, ratio_h)
polys = adjustResultCoordinates(polys, ratio_w, ratio_h)

You can use craft-text-detector export_detected_regions function to export bounding boxes as cropped images (there are issues with polygons - you need to provide poly=True in getDetBoxes to get values).

@samiechan
Copy link

samiechan commented Feb 23, 2023

@Kromtar

Why 256? I don't know. Is there a better option? I have not found one. Does it generate errors in the model? I have not been able to find any accuracy problems. If someone can explain why with 256 it works and what the consequences are, it would be appreciated.

It is related to network parameters, I believe.

network_params = {
    'input_channel': 1,
    'output_channel': 256,
    'hidden_size': 256
    }

You can find it here.

@samiechan
Copy link

Here is approach for recognition model:

  1. Download the desired recognition language model. Example:
    wget https://github.com/JaidedAI/EasyOCR/releases/download/v1.6.1/cyrillic_g2.zip
  2. Provide language model configuration:
from easyocr import recognition
#import yaml
import os

recog_network = 'generation2'

# for custom model
#with open(recog_network + '.yaml', encoding='utf8') as file:
  #recog_config = yaml.load(file, Loader=yaml.FullLoader)

#network_params = recog_config['network_params']

network_params = {
    'input_channel': 1,
    'output_channel': 256,
    'hidden_size': 256
    }

# for custom model
#character = recog_config['character_list']


# see https://github.com/JaidedAI/EasyOCR/blob/ca9f9b0ac081f2874a603a5614ddaf9de40ac339/easyocr/config.py for other language config examples
character = '0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ €₽ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдеёжзийклмнопрстуфхцчшщъыьэюяЂђЃѓЄєІіЇїЈјЉљЊњЋћЌќЎўЏџҐґҒғҚқҮүҲҳҶҷӀӏӢӣӨөӮӯ'
symbol = '0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ €₽'
model_path = "cyrillic_g2.pth"
separator_list = {}
cyrillic_lang_list = ['ru','rs_cyrillic','be','bg','uk','mn','abq','ady','kbd',\
                      'ava','dar','inh','che','lbe','lez','tab','tjk', 'en']
package_dir = os.path.dirname(recognition.__file__)

dict_list = {}
for lang in cyrillic_lang_list:
    dict_list[lang] = os.path.join(package_dir, 'dict', lang + ".txt")

model, converter = recognition.get_recognizer(recog_network=recog_network, network_params=network_params, character=character, separator_list=separator_list, dict_list=dict_list, model_path=model_path, device='cpu', quantize=False)
  1. Export recognition model to ONNX:
import torch
import torchvision.transforms as transforms

# Define the dimensions of the input image
batch_size = 1
num_channels = 1
image_height = imgH = 64
image_width = 128
device = 'cpu'

# Create dummy input tensors for the image and text inputs
dummy_input = torch.randn(batch_size, num_channels, image_height, image_width)

# Define the maximum length of the text input
max_text_length = 10

dummy_text_input = torch.LongTensor(max_text_length, batch_size).random_(0, 10)

# Convert the input image to grayscale
grayscale_transform = transforms.Grayscale(num_output_channels=1)
grayscale_input = grayscale_transform(dummy_input)
grayscale_input = grayscale_transform(dummy_input.unsqueeze(0)).squeeze(0)

input_names = ["image_input", "text_input"]
output_names = ["output"]
dynamic_axes = {"image_input": {0: "batch_size"}, "text_input": {1: "batch_size"}}
opset_version = 12

torch.onnx.export(model, (grayscale_input, dummy_text_input), "recog.onnx", 
                  input_names=input_names, output_names=output_names, 
                  dynamic_axes=dynamic_axes, opset_version=opset_version)
  1. Modify recognizer_predict, to run model inference using ONNX runtime:
    Replace preds = model(image, text_for_pred) with
providers = ['CPUExecutionProvider']
session = rt.InferenceSession("recog.onnx", providers=providers)
inputs = session.get_inputs()
inp = {inputs[0].name: image.numpy()}
preds = session.run(None, inp)
preds = torch.from_numpy(preds[0])

Here is an example how to run it with one cropped image from the detection model:

import torch
import onnxruntime as rt
import numpy as np
from easyocr.utils import reformat_input, get_image_list

# read image
img, img_cv_grey = reformat_input('/content/outputs/image_crops/crop_0.png')

y_max, x_max = img_cv_grey.shape

horizontal_list = [[0, x_max, 0, y_max]]

lang_char = []
char_file = os.path.join(package_dir, 'character', lang + "_char.txt")
with open(char_file, "r", encoding = "utf-8-sig") as input_file:
  char_list =  input_file.read().splitlines()
lang_char += char_list
lang_char = set(lang_char).union(set(symbol))

ignore_char = ''.join(set(character)-set(lang_char))

result = []

for bbox in horizontal_list:
    h_list = [bbox]
    f_list = []
    image_list, max_width = get_image_list(h_list, f_list, img_cv_grey, model_height=64) # 64 is default value
    result0 = get_text(character, imgH, int(max_width), converter, image_list,\
                              ignore_char, 'greedy', beamWidth = 5, batch_size = batch_size, contrast_ths = 0.1, adjust_contrast = 0.5, filter_ths = 0.003,\
                              workers = 0, device = device)
    result += result0

Also for custom models you can use GPU:
providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if rt.get_device()=='GPU' else ['CPUExecutionProvider']

Also yes, the ONNX model has only 1 input.

@light42
Copy link

light42 commented Feb 27, 2023

@samiechan Onnx image_width is not dynamic, if our input image width exceeds it it'll get an error. Is there a way to make it dynamic?

@samiechan
Copy link

@samiechan Onnx image_width is not dynamic, if our input image width exceeds it it'll get an error. Is there a way to make it dynamic?

For detection model:

from easyocr import detection
import torch

model = detection.get_detector(trained_model='craft_mlt_25k.pth', device='cpu', quantize=False)

input_shape = (1, 3, 480, 640)
inputs = torch.ones(*input_shape)
input_names=['input']
output_names=['output']

dynamic_axes= {'input':{0:'batch_size', 2:'height', 3:'width'}, 'output':{0:'batch_size', 2:'height', 3:'width'}} #adding names for better debugging
torch.onnx.export(model, inputs, "craft.onnx", dynamic_axes=dynamic_axes, input_names=input_names, output_names=output_names)

For recognition model:

import torch
import torchvision.transforms as transforms

# Define the dimensions of the input image
batch_size = 1
num_channels = 1
image_height = imgH = 64
image_width = 128

image_input_shape = (batch_size, 1, image_height, image_width)
image_input = torch.ones(*image_input_shape)

max_text_length = 10
text_input_shape = (batch_size, max_text_length)
text_input = torch.ones(*text_input_shape)

input_names=['image_input', 'text_input']
output_names=['output']

dynamic_axes = {"image_input": {0: "batch_size", 3: "width"}, "text_input": {0: "batch_size"}}
opset_version = 12

torch.onnx.export(model, (image_input, text_input), "recog.onnx", 
                  input_names=input_names, output_names=output_names, 
                  dynamic_axes=dynamic_axes, opset_version=opset_version)

@skottmckay
Copy link

skottmckay commented Feb 27, 2023

It's important to note that there are two aspects here. One is whether the model inputs have fixed or dynamic sizes, and the other is whether the model itself supports dynamic sizes. e.g. if the model was trained with input of 64 x 128 and does not internally resize input, you will most likely get an error about sizes mismatching as the nodes and weights in the model will be expecting sizes relative to 64 x 128.

Basically the model inputs being dynamic allows any value to be specified but that does not mean the model itself supports any value.

If you are pre-processing the image prior to running the original pytorch model (e.g. resize, crop, normalize) you need to do the same things prior to running the exported ONNX model.

We have some new helpers that are about to be released that may be helpful. They allow adding these common pre-processing steps into the ONNX model so that onnxruntime can do those. The latest onnxruntime (1.14) also includes the updated ONNX Resize operator that supports anti-aliasing, providing equivalency with the typical Pillow based image sizing used by pytorch..

Overview documentation: https://github.com/microsoft/onnxruntime-extensions/blob/main/onnxruntime_extensions/tools/Example%20usage%20of%20the%20PrePostProcessor.md

There are some example implementations, including one showing what the pre-processing pipeline would look like for a model with the common pytorch image pre-processing here: https://github.com/microsoft/onnxruntime-extensions/blob/7578af836146b015bbd7a8539f3288cc539660ad/onnxruntime_extensions/tools/add_pre_post_processing_to_model.py#L23

It's also possible to do image conversion from png or jpg as part of the pre-processing, although that requires the onnxruntime-extensions library to be available at runtime as it uses a custom operator (i.e. not an operator defined in the ONNX spec). We have prebuilt android and ios packages for onnxruntime-extensions in this first release of the new tools, so you'd have to build it yourself for other platforms.

@kadmor
Copy link

kadmor commented Mar 1, 2023

Here is approach for recognition model:

  1. Download the desired recognition language model. Example:
    wget https://github.com/JaidedAI/EasyOCR/releases/download/v1.6.1/cyrillic_g2.zip
  2. Provide language model configuration:
from easyocr import recognition
#import yaml
import os

recog_network = 'generation2'

# for custom model
#with open(recog_network + '.yaml', encoding='utf8') as file:
  #recog_config = yaml.load(file, Loader=yaml.FullLoader)

#network_params = recog_config['network_params']

network_params = {
    'input_channel': 1,
    'output_channel': 256,
    'hidden_size': 256
    }

# for custom model
#character = recog_config['character_list']


# see https://github.com/JaidedAI/EasyOCR/blob/ca9f9b0ac081f2874a603a5614ddaf9de40ac339/easyocr/config.py for other language config examples
character = '0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ €₽ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдеёжзийклмнопрстуфхцчшщъыьэюяЂђЃѓЄєІіЇїЈјЉљЊњЋћЌќЎўЏџҐґҒғҚқҮүҲҳҶҷӀӏӢӣӨөӮӯ'
symbol = '0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ €₽'
model_path = "cyrillic_g2.pth"
separator_list = {}
cyrillic_lang_list = ['ru','rs_cyrillic','be','bg','uk','mn','abq','ady','kbd',\
                      'ava','dar','inh','che','lbe','lez','tab','tjk', 'en']
package_dir = os.path.dirname(recognition.__file__)

dict_list = {}
for lang in cyrillic_lang_list:
    dict_list[lang] = os.path.join(package_dir, 'dict', lang + ".txt")

model, converter = recognition.get_recognizer(recog_network=recog_network, network_params=network_params, character=character, separator_list=separator_list, dict_list=dict_list, model_path=model_path, device='cpu', quantize=False)
  1. Export recognition model to ONNX:
import torch
import torchvision.transforms as transforms

# Define the dimensions of the input image
batch_size = 1
num_channels = 1
image_height = imgH = 64
image_width = 128
device = 'cpu'

# Create dummy input tensors for the image and text inputs
dummy_input = torch.randn(batch_size, num_channels, image_height, image_width)

# Define the maximum length of the text input
max_text_length = 10

dummy_text_input = torch.LongTensor(max_text_length, batch_size).random_(0, 10)

# Convert the input image to grayscale
grayscale_transform = transforms.Grayscale(num_output_channels=1)
grayscale_input = grayscale_transform(dummy_input)
grayscale_input = grayscale_transform(dummy_input.unsqueeze(0)).squeeze(0)

input_names = ["image_input", "text_input"]
output_names = ["output"]
dynamic_axes = {"image_input": {0: "batch_size"}, "text_input": {1: "batch_size"}}
opset_version = 12

torch.onnx.export(model, (grayscale_input, dummy_text_input), "recog.onnx", 
                  input_names=input_names, output_names=output_names, 
                  dynamic_axes=dynamic_axes, opset_version=opset_version)
  1. Modify recognizer_predict, to run model inference using ONNX runtime:
    Replace preds = model(image, text_for_pred) with
providers = ['CPUExecutionProvider']
session = rt.InferenceSession("recog.onnx", providers=providers)
inputs = session.get_inputs()
inp = {inputs[0].name: image.numpy()}
preds = session.run(None, inp)
preds = torch.from_numpy(preds[0])

Here is an example how to run it with one cropped image from the detection model:

import torch
import onnxruntime as rt
import numpy as np
from easyocr.utils import reformat_input, get_image_list

# read image
img, img_cv_grey = reformat_input('/content/outputs/image_crops/crop_0.png')

y_max, x_max = img_cv_grey.shape

horizontal_list = [[0, x_max, 0, y_max]]

lang_char = []
char_file = os.path.join(package_dir, 'character', lang + "_char.txt")
with open(char_file, "r", encoding = "utf-8-sig") as input_file:
  char_list =  input_file.read().splitlines()
lang_char += char_list
lang_char = set(lang_char).union(set(symbol))

ignore_char = ''.join(set(character)-set(lang_char))

result = []

for bbox in horizontal_list:
    h_list = [bbox]
    f_list = []
    image_list, max_width = get_image_list(h_list, f_list, img_cv_grey, model_height=64) # 64 is default value
    result0 = get_text(character, imgH, int(max_width), converter, image_list,\
                              ignore_char, 'greedy', beamWidth = 5, batch_size = batch_size, contrast_ths = 0.1, adjust_contrast = 0.5, filter_ths = 0.003,\
                              workers = 0, device = device)
    result += result0

Also for custom models you can use GPU: providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if rt.get_device()=='GPU' else ['CPUExecutionProvider']

Also yes, the ONNX model has only 1 input.

@samiechan Hello. Can you please tell me the version of your libraries. Your code gave me an error. pytorch 1.9.1
RuntimeError Traceback (most recent call last)
...
RuntimeError: Unsupported: ONNX export of operator adaptive pooling, since output_size is not constant.. Please feel free to request support or submit a pull request on PyTorch GitHub.

@samiechan
Copy link

import torch
import torchvision.transforms as transforms

Define the dimensions of the input image

batch_size = 1
num_channels = 1
image_height = imgH = 64
image_width = 128

image_input_shape = (batch_size, 1, image_height, image_width)
image_input = torch.ones(*image_input_shape)

max_text_length = 10
text_input_shape = (batch_size, max_text_length)
text_input = torch.ones(*text_input_shape)

input_names=['image_input', 'text_input']
output_names=['output']

dynamic_axes = {"image_input": {0: "batch_size", 3: "width"}, "text_input": {0: "batch_size"}}
opset_version = 12

torch.onnx.export(model, (image_input, text_input), "recog.onnx",
input_names=input_names, output_names=output_names,
dynamic_axes=dynamic_axes, opset_version=opset_version)

Here is my Google Colab: https://colab.research.google.com/drive/1pcoueUxhWFX5Ac6AA4paYDLgZMf819GT?usp=sharing

@kadmor
Copy link

kadmor commented Mar 3, 2023

@samiechan Thank you very much!

@zoldaten
Copy link

zoldaten commented Mar 4, 2023

I have published a branch in this fork where you can find the whole process using containers. You can see the readme to understand how to use it.

hi! @Kromtar
tried to build docker container but failed with missing file:

Step 4/11 : RUN curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | POETRY_HOME=/opt/poetry python &&   cd /usr/local/bin &&   ln -s /opt/poetry/bin/poetry &&   poetry config virtualenvs.create false
 ---> Running in 9eb14c51d577
  File "<stdin>", line 1
    404: Not Found
    ^
SyntaxError: illegal target for annotation
ERROR: Service 'easyocr_hybrid_onnx' failed to build: The command '/bin/sh -c curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | POETRY_HOME=/opt/poetry python &&   cd /usr/local/bin &&   ln -s /opt/poetry/bin/poetry &&   poetry config virtualenvs.create false' returned a non-zero code: 1

also i tried to start up with mods u made without using docker but failed with almost the same error:

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: input1 for the following indices
 index: 1 Got: 3 Expected: 1
 index: 2 Got: 96 Expected: 64
 Please fix either the inputs or the model.

@zoldaten
Copy link

zoldaten commented Mar 4, 2023

import torch
import torchvision.transforms as transforms

Define the dimensions of the input image

batch_size = 1
num_channels = 1
image_height = imgH = 64
image_width = 128
image_input_shape = (batch_size, 1, image_height, image_width)
image_input = torch.ones(*image_input_shape)
max_text_length = 10
text_input_shape = (batch_size, max_text_length)
text_input = torch.ones(*text_input_shape)
input_names=['image_input', 'text_input']
output_names=['output']
dynamic_axes = {"image_input": {0: "batch_size", 3: "width"}, "text_input": {0: "batch_size"}}
opset_version = 12
torch.onnx.export(model, (image_input, text_input), "recog.onnx",
input_names=input_names, output_names=output_names,
dynamic_axes=dynamic_axes, opset_version=opset_version)

Here is my Google Colab: https://colab.research.google.com/drive/1pcoueUxhWFX5Ac6AA4paYDLgZMf819GT?usp=sharing

@samiechan hi!
do you have any inference time and correctness checks in compare with standard EasyOCR models ?

@samiechan
Copy link

It's important to note that there are two aspects here. One is whether the model inputs have fixed or dynamic sizes, and the other is whether the model itself supports dynamic sizes. e.g. if the model was trained with input of 64 x 128 and does not internally resize input, you will most likely get an error about sizes mismatching as the nodes and weights in the model will be expecting sizes relative to 64 x 128.

Basically the model inputs being dynamic allows any value to be specified but that does not mean the model itself supports any value.

If you are pre-processing the image prior to running the original pytorch model (e.g. resize, crop, normalize) you need to do the same things prior to running the exported ONNX model.

We have some new helpers that are about to be released that may be helpful. They allow adding these common pre-processing steps into the ONNX model so that onnxruntime can do those. The latest onnxruntime (1.14) also includes the updated ONNX Resize operator that supports anti-aliasing, providing equivalency with the typical Pillow based image sizing used by pytorch..

Overview documentation: https://github.com/microsoft/onnxruntime-extensions/blob/main/onnxruntime_extensions/tools/Example%20usage%20of%20the%20PrePostProcessor.md

There are some example implementations, including one showing what the pre-processing pipeline would look like for a model with the common pytorch image pre-processing here: https://github.com/microsoft/onnxruntime-extensions/blob/7578af836146b015bbd7a8539f3288cc539660ad/onnxruntime_extensions/tools/add_pre_post_processing_to_model.py#L23

It's also possible to do image conversion from png or jpg as part of the pre-processing, although that requires the onnxruntime-extensions library to be available at runtime as it uses a custom operator (i.e. not an operator defined in the ONNX spec). We have prebuilt android and ios packages for onnxruntime-extensions in this first release of the new tools, so you'd have to build it yourself for other platforms.

I agree. It should be noted somewhere that the output from the detection model gives us the bounding boxes with height of 64 px, however recognition model takes images with height of 32 px. To support dynamic shapes we need to modify VGG model, which is used by default for gen2 models or to resize images before feeding them into model. I came across one Google Colab, which makes it clear how Deep Text Recognition Benchmark models work. Transformation can be added to the VGG model to support dynamic sizes for input (but the model should be retrained).

@samiechan
Copy link

import torch
import torchvision.transforms as transforms

Define the dimensions of the input image

batch_size = 1
num_channels = 1
image_height = imgH = 64
image_width = 128
image_input_shape = (batch_size, 1, image_height, image_width)
image_input = torch.ones(*image_input_shape)
max_text_length = 10
text_input_shape = (batch_size, max_text_length)
text_input = torch.ones(*text_input_shape)
input_names=['image_input', 'text_input']
output_names=['output']
dynamic_axes = {"image_input": {0: "batch_size", 3: "width"}, "text_input": {0: "batch_size"}}
opset_version = 12
torch.onnx.export(model, (image_input, text_input), "recog.onnx",
input_names=input_names, output_names=output_names,
dynamic_axes=dynamic_axes, opset_version=opset_version)

Here is my Google Colab: https://colab.research.google.com/drive/1pcoueUxhWFX5Ac6AA4paYDLgZMf819GT?usp=sharing

@samiechan hi! do you have any inference time and correctness checks in compare with standard EasyOCR models ?

Will try to do it once I finish experimenting with recognition model.

@samiechan
Copy link

Also, the reason why the second parameter (text) is missing in ONNX model might be that it is not used anywhere in the VGG model.

@zoldaten
Copy link

zoldaten commented Mar 6, 2023

@samiechan i move on with your colab and see you export to onnx Craft at the beginning.
As your image is 640x480 i changed to my size - 1920x1080:

from easyocr import detection
import torch
#https://drive.google.com/open?id=1Jk4eGD7crsqCCg9C9VjCLkMN3ze8kutZ

model = detection.get_detector(trained_model='craft_mlt_25k.pth', device='cpu', quantize=False)

#input_shape = (1, 3, 480, 640)
input_shape = (1, 3, 1080, 1920)
inputs = torch.ones(*input_shape)
input_names=['input']
output_names=['output']

dynamic_axes= {'input':{0:'batch_size', 2:'height', 3:'width'}, 'output':{0:'batch_size', 2:'height', 3:'width'}} #adding names for better debugging
torch.onnx.export(model, inputs, "craft.onnx", dynamic_axes=dynamic_axes, input_names=input_names, output_names=output_names)

conversion was ok but i dont think that model works good.
try this image - https://drive.google.com/file/d/1pAwYCMxqk7I4H7uXr13XpwimoA2FZRF6/view?usp=sharing

and your code i bit modified:

import onnxruntime as rt
import cv2
import numpy as np
from easyocr.craft_utils import getDetBoxes, adjustResultCoordinates
from easyocr.imgproc import resize_aspect_ratio, normalizeMeanVariance
from easyocr.utils import reformat_input
import torch
from functools import wraps
import time

def craft_onnx(): 
    # Read input image
    #img, _ = reformat_input('https://jeroen.github.io/images/testocr.png')
    #img, _ = reformat_input('testocr.png') #640x480
    img, _ = reformat_input('test_image.jpg') #1920x1080

    # Resize and normalize input image
    img_resized, target_ratio, size_heatmap = resize_aspect_ratio(img, 512, interpolation=cv2.INTER_LINEAR, mag_ratio=1.)
    ratio_h = ratio_w = 1 / target_ratio
    x = normalizeMeanVariance(img_resized)
    x = torch.from_numpy(x).permute(2, 0, 1).unsqueeze(0)

    # Create ONNX Runtime session and load model
    providers = ['CPUExecutionProvider']
    session = rt.InferenceSession("craft_1920x1080.onnx", providers=providers)
    input_name = session.get_inputs()[0].name

    # Prepare input tensor for inference
    inp = {input_name: x.numpy()}

    # Run inference and get output
    y, _ = session.run(None, inp)

    # Extract score and link maps
    score_text = y[0, :, :, 0]
    score_link = y[0, :, :, 1]

    # Post-processing to obtain bounding boxes and polygons
    boxes, polys, mapper = getDetBoxes(score_text, score_link, 0.5, 0.4, 0.4)
    boxes = adjustResultCoordinates(boxes, ratio_w, ratio_h)
    polys = adjustResultCoordinates(polys, ratio_w, ratio_h)


    from craft_text_detector import export_detected_regions
    #from craft_text_detector import export_extra_results
    output_dir = 'outputs1/'

    exported_file_paths = export_detected_regions(
            image=img,
            regions=boxes,
            output_dir=output_dir,
            rectify=False
        )

craft_onnx()

@zoldaten
Copy link

zoldaten commented Mar 9, 2023

@samiechan hi.
tried to convert cyrillic_g2.pth to onnx (from your Colab) :

import torch
import onnxruntime as rt
import numpy as np
from easyocr.utils import reformat_input, get_image_list


from easyocr import recognition
#import yaml
import os

recog_network = 'generation2'

# for custom model
#with open(recog_network + '.yaml', encoding='utf8') as file:
  #recog_config = yaml.load(file, Loader=yaml.FullLoader)

#network_params = recog_config['network_params']

network_params = {
    'input_channel': 1,
    'output_channel': 256,
    'hidden_size': 256
    }

# for custom model
#character = recog_config['character_list']


# see https://github.com/JaidedAI/EasyOCR/blob/ca9f9b0ac081f2874a603a5614ddaf9de40ac339/easyocr/config.py for other language config examples
# https://github.com/JaidedAI/EasyOCR/releases/download/v1.6.1/cyrillic_g2.zip
character = '0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ €₽ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдеёжзийклмнопрстуфхцчшщъыьэюяЂђЃѓЄєІіЇїЈјЉљЊњЋћЌќЎўЏџҐґҒғҚқҮүҲҳҶҷӀӏӢӣӨөӮӯ'
symbol = '0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ €₽'
model_path = "cyrillic_g2.pth"
#symbols= "0123456789!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~ "
#character= '0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZЁЂЄІЇЈЉЊЋЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёђєіїјљњћўџҐґҮүө'
#model_path = "cyrillic.pth"
separator_list = {}
cyrillic_lang_list = ['ru','rs_cyrillic','be','bg','uk','mn','abq','ady','kbd',\
                      'ava','dar','inh','che','lbe','lez','tab','tjk', 'en']

#cyrillic_lang_list = ['ru','rs_cyrillic','be','bg','uk','mn','abq','ady','kbd',\
#                      'ava','dar','inh','che','lbe','lez','tab','tjk']
package_dir = os.path.dirname(recognition.__file__)

dict_list = {}
for lang in cyrillic_lang_list:
    dict_list[lang] = os.path.join(package_dir, 'dict', lang + ".txt")

model, converter = recognition.get_recognizer(recog_network=recog_network, network_params=network_params, character=character, separator_list=separator_list, dict_list=dict_list, model_path=model_path, device='cpu', quantize=False)


#import torch

# Define the dimensions of the input image
batch_size = 1
num_channels = 1
image_height = imgH = 64
image_width = 128

image_input_shape = (batch_size, 1, image_height, image_width)
image_input = torch.ones(*image_input_shape)

max_text_length = 10
text_input_shape = (batch_size, max_text_length)
text_input = torch.ones(*text_input_shape)

input_names=['image_input', 'text_input']
output_names=['output']

dynamic_axes = {"image_input": {0: "batch_size", 3: "width"}, "text_input": {0: "batch_size"}}
opset_version = 12

torch.onnx.export(model, (image_input, text_input), "recog.onnx", 
                  input_names=input_names, output_names=output_names, 
                  dynamic_axes=dynamic_axes, opset_version=opset_version)

but failed:

Traceback (most recent call last):
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py", line 1758, in symbolic_fn
    output_size = symbolic_helper._parse_arg(output_size, "is")
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 104, in _parse_arg
    raise errors.SymbolicValueError(
torch.onnx.errors.SymbolicValueError: Failed to export a node '%82 : Long(device=cpu) = onnx::Gather[axis=0](%79, %81), scope: easyocr.model.vgg_model.Model::/torch.nn.modules.pooling.AdaptiveAvgPool2d::AdaptiveAvgPool # /home/al/.local/lib/python3.8/site-packages/torch/nn/functional.py:1213:0
' (in list node %83 : int[] = prim::ListConstruct(%82, %49), scope: easyocr.model.vgg_model.Model::/torch.nn.modules.pooling.AdaptiveAvgPool2d::AdaptiveAvgPool
) because it is not constant. Please try to make things (e.g. kernel sizes) static if possible.  [Caused by the value '83 defined in (%83 : int[] = prim::ListConstruct(%82, %49), scope: easyocr.model.vgg_model.Model::/torch.nn.modules.pooling.AdaptiveAvgPool2d::AdaptiveAvgPool
)' (type 'List[int]') in the TorchScript graph. The containing node has kind 'prim::ListConstruct'.] 

    Inputs:
        #0: 82 defined in (%82 : Long(device=cpu) = onnx::Gather[axis=0](%79, %81), scope: easyocr.model.vgg_model.Model::/torch.nn.modules.pooling.AdaptiveAvgPool2d::AdaptiveAvgPool # /home/al/.local/lib/python3.8/site-packages/torch/nn/functional.py:1213:0
    )  (type 'Tensor')
        #1: 49 defined in (%49 : Long(device=cpu) = onnx::Constant[value={1}](), scope: easyocr.model.vgg_model.Model::/easyocr.model.modules.VGG_FeatureExtractor::FeatureExtraction/torch.nn.modules.container.Sequential::ConvNet/torch.nn.modules.conv.Conv2d::ConvNet.0
    )  (type 'Tensor')
    Outputs:
        #0: 83 defined in (%83 : int[] = prim::ListConstruct(%82, %49), scope: easyocr.model.vgg_model.Model::/torch.nn.modules.pooling.AdaptiveAvgPool2d::AdaptiveAvgPool
    )  (type 'List[int]')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/idlelib/run.py", line 559, in runcode
    exec(code, self.locals)
  File "/home/al/Desktop/easy_ocr_optimized/easy_ocr_onnx.py", line 67, in <module>
    torch.onnx.export(model, (image_input, text_input), "recog.onnx", 
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 504, in export
    _export(
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 1529, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 1115, in _model_to_graph
    graph = _optimize_graph(
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 663, in _optimize_graph
    graph = _C._jit_pass_onnx(graph, operator_export_type)
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 1899, in _run_symbolic_function
    return symbolic_fn(graph_context, *inputs, **attrs)
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 380, in wrapper
    return fn(g, *args, **kwargs)
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py", line 1762, in symbolic_fn
    return symbolic_helper._onnx_unsupported(
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 588, in _onnx_unsupported
    raise errors.SymbolicValueError(
torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of operator adaptive pooling, since output_size is not constant.. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues  [Caused by the value 'input.52 defined in (%input.52 : Float(*, *, 256, 3, strides=[23808, 1, 93, 31], requires_grad=1, device=cpu) = onnx::Transpose[perm=[0, 3, 1, 2]](%75), scope: easyocr.model.vgg_model.Model:: # /home/al/.local/lib/python3.8/site-packages/easyocr/model/vgg_model.py:26:0
)' (type 'Tensor') in the TorchScript graph. The containing node has kind 'onnx::Transpose'.] 
    (node defined in /home/al/.local/lib/python3.8/site-packages/easyocr/model/vgg_model.py(26): forward
/home/al/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(1182): _slow_forward
/home/al/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(1194): _call_impl
/home/al/.local/lib/python3.8/site-packages/torch/jit/_trace.py(118): wrapper
/home/al/.local/lib/python3.8/site-packages/torch/jit/_trace.py(127): forward
/home/al/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(1194): _call_impl
/home/al/.local/lib/python3.8/site-packages/torch/jit/_trace.py(1184): _get_trace_graph
/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py(891): _trace_and_get_graph_from_model
/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py(987): _create_jit_graph
/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py(1111): _model_to_graph
/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py(1529): _export
/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py(504): export
/home/al/Desktop/easy_ocr_optimized/easy_ocr_onnx.py(67): <module>
/usr/lib/python3.8/idlelib/run.py(559): runcode
/usr/lib/python3.8/idlelib/run.py(156): main
<string>(1): <module>
)

    Inputs:
        #0: 75 defined in (%75 : Float(*, 256, 3, *, strides=[23808, 93, 31, 1], requires_grad=1, device=cpu) = onnx::Relu(%input.48), scope: easyocr.model.vgg_model.Model::/easyocr.model.modules.VGG_FeatureExtractor::FeatureExtraction/torch.nn.modules.container.Sequential::ConvNet/torch.nn.modules.activation.ReLU::ConvNet.19 # /home/al/.local/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
    )  (type 'Tensor')
    Outputs:
        #0: input.52 defined in (%input.52 : Float(*, *, 256, 3, strides=[23808, 1, 93, 31], requires_grad=1, device=cpu) = onnx::Transpose[perm=[0, 3, 1, 2]](%75), scope: easyocr.model.vgg_model.Model:: # /home/al/.local/lib/python3.8/site-packages/easyocr/model/vgg_model.py:26:0
    )  (type 'Tensor')

my modules:

torch==1.13.1+cpu
onnx==1.13.1
onnxruntime==1.14.1

@light42
Copy link

light42 commented Mar 10, 2023

If you want to convert it to tensorrt, I found out that torch2trt already provide the complete pipeline from conversion to inference:
https://github.com/NVIDIA-AI-IOT/torch2trt/tree/master/examples/easyocr. I tested the code and it works.

It could even provide dynamic input shape.

@pratap73
Copy link

pratap73 commented Mar 13, 2023

recognitionmodel.onnx output is different in opencv dnn than onnxruntime. could anyone pls look into this, although the input array is same

@rmast
Copy link

rmast commented Jun 3, 2023

@Kromtar writes:
Again, I may be wrong about the structure of these inputs, it was what I observed empirically.

I'm now trying to get the CRAFT-model working in EasyOCR on an Intel-desktop processor i5 8500 with Intel HD 630 graphics with 8 GB on Windows using ONNX via OpenVino.

As I'm using a sample image that triggers a lot of dislikes at every automatical attempt with whatever product I try it's a perfect sample to see what's going wrong as I'm getting used to it's difficulties:

https://user-images.githubusercontent.com/3341558/175789293-f39ddfdb-6f3e-4598-8d16-80a1f4a88b36.jpg
(see internetarchive/archive-pdf-tools#55 for the default EasyOCR-output)

I was trying to get the text of this image detected with CUDA on my GT1030. Unfortunately it ran short of memory (only 2GB). Only cutting it to a quarter would make the detection come to an end, however tiny dots on characters were lost and words glued together. Running without AVX2 compiled on my old AMD makes it unacceptably slow. My next attempt was to get it GPU-detected on this Intel with 8 GB shared memory.

I rewrote the detect function to get the first detection-step done with OpenVino:

diff --git a/easyocr/detection.py b/easyocr/detection.py
index 072178a..bff7eb4 100644
--- a/easyocr/detection.py
+++ b/easyocr/detection.py
@@ -10,6 +10,13 @@ from .craft_utils import getDetBoxes, adjustResultCoordinates
 from .imgproc import resize_aspect_ratio, normalizeMeanVariance
 from .craft import CRAFT

+import logging as log
+import sys
+
+from openvino.preprocess import PrePostProcessor, ResizeAlgorithm
+from openvino.runtime import Core, Layout, Type, PartialShape
+
+
 def copyStateDict(state_dict):
     if list(state_dict.keys())[0].startswith("module"):
         start_idx = 1
@@ -38,18 +45,41 @@ def test_net(canvas_size, mag_ratio, net, image, text_threshold, link_threshold,
     # preprocessing
     x = [np.transpose(normalizeMeanVariance(n_img), (2, 0, 1))
          for n_img in img_resized_list]
-    x = torch.from_numpy(np.array(x))
-    x = x.to(device)
-
+    #x = torch.from_numpy(np.array(x))
+    x = np.array(x) # erbij bedacht
+    print('Shape' + str(x.shape))
+    log.info('Creating OpenVINO Runtime Core')
+    core = Core()
+#    core.set_property("CPU", {"INFERENCE_PRECISION_HINT": "i8"})
+#    core.set_property("GPU.0", {"INFERENCE_PRECISION_HINT": "FP16"})
+#    log.info(f'Reading the model: {model_path}')
+    # (.xml and .bin files) or (.onnx file)
+    model = core.read_model('C:/Users/nicor/Downloads/craft.onnx')
+#    model.reshape({model.input(0).any_name: PartialShape([1, 3, 2560, 2560])})
+#    x = x.to(device)
+    input_tensor = x
+#    ppp = PrePostProcessor(model)
+#    _, h, w, _ = input_tensor.shape
+#    ppp.input().tensor() \
+#        .set_shape(input_tensor.shape) \
+#        .set_element_type(Type.u8) \
+#        .set_layout(Layout('NHWC'))  # noqa: ECE001, N400
+#    ppp.input().preprocess().resize(ResizeAlgorithm.RESIZE_LINEAR)
+#    ppp.input().model().set_layout(Layout('NCHW'))
+#    ppp.output('output1').tensor().set_element_type(Type.f32)
+#    model = ppp.build()
+    compiled_model = core.compile_model(model, "CPU")
+    results = compiled_model.infer_new_request({0: input_tensor})
+    predictions = next(iter(results.values()))
     # forward pass
-    with torch.no_grad():
-        y, feature = net(x)
-
+    # with torch.no_grad():
+    #    y, feature = net(x)
+    y = predictions
     boxes_list, polys_list = [], []
     for out in y:
         # make score and link map
-        score_text = out[:, :, 0].cpu().data.numpy()
-        score_link = out[:, :, 1].cpu().data.numpy()
+        score_text = out[:, :, 0]
+        score_link = out[:, :, 1]

         # Post-processing
         boxes, polys, mapper = getDetBoxes(
diff --git a/easyocr/imgproc.py b/easyocr/imgproc.py
index ab09d6f..d72cb7b 100644
--- a/easyocr/imgproc.py
+++ b/easyocr/imgproc.py
@@ -56,7 +56,7 @@ def resize_aspect_ratio(img, square_size, interpolation, mag_ratio=1):
         target_h32 = target_h + (32 - target_h % 32)
     if target_w % 32 != 0:
         target_w32 = target_w + (32 - target_w % 32)
-    resized = np.zeros((target_h32, target_w32, channel), dtype=np.float32)
+    resized = np.zeros((square_size, square_size, channel), dtype=np.float32)
     resized[0:target_h, 0:target_w, :] = proc
     target_h, target_w = target_h32, target_w32

You see I had to fill the input-image to a square with zero's as OpenVino doesn't accept an unmatching shape, I don't know the negative implications of that oversize for the performance and memory use.

When I used the 1,3,768,768-shape as found in the CRAFT-main the text '14 Wijziging' on top of the page got glued as when cutting the page to a quarter of it's original size, so I knew that shape wasn't the right one.

Then I did a new ONNX-export to the shape of the square defined in EasyOCR: 1,3,2560,2560. The Intel GPU ran out of RAM during detection with that model and only wanted to do 32 bit float arithmetic. The Intel CPU didn't use that much RAM and did come to an end with default settings like in the current diff above.

The detection-result didn't glue the '14' and the 'Wijziging' together:

Shape(1, 3, 2560, 2560)
([[[107, 500, 181, 308], [546, 659, 212, 303], [697, 1079, 209, 333], [2187, 2264, 323, 359], [546, 1337, 359, 424], [2188, 2241, 368, 399], [2262, 2340, 368, 399], [545, 866, 600, 636], [992, 1373, 600, 641], [1433, 1853, 597, 642], [548, 902, 638, 677], [992, 1384, 638, 677], [1433, 1752, 638, 677], [1894, 2371, 636, 677], [544, 922, 670, 718], [992, 1398, 676, 715], [1432, 1790, 673, 718], [545, 899, 715, 754], [992, 1359, 712, 748], [1430, 1771, 712, 755], [994, 1340, 748, 790], [1433, 1677, 750, 789], [1896, 2124, 750, 789], [545, 808, 789, 829], [989, 1370, 789, 828], [1433, 1850, 789, 828], [545, 715, 827, 866], [1430, 1831, 827, 866], [569, 798, 861, 902], [992, 1113, 863, 902], [1433, 1826, 863, 903], [569, 951, 901, 937], [992, 1362, 901, 937], [1433, 1798, 901, 940], [569, 839, 937, 976], [992, 1384, 936, 978], [1433, 1847, 940, 978], [569, 930, 973, 1017], [992, 1368, 974, 1015], [1433, 1796, 973, 1015], [568, 654, 1012, 1051], [992, 1340, 1014, 1050], [1433, 1804, 1014, 1052], [1896, 2139, 1009, 1053], [569, 913, 1048, 1088], [1434, 1556, 1053, 1084], [569, 726, 1088, 1124], [992, 1354, 1090, 1126], [1432, 1832, 1084, 1133], [569, 954, 1126, 1162], [992, 1387, 1126, 1162], [1430, 1688, 1126, 1162], [569, 907, 1159, 1204], [993, 1320, 1165, 1197], [1897, 2074, 1165, 1197], [569, 921, 1199, 1242], [990, 1338, 1201, 1243], [545, 957, 1236, 1278], [990, 1404, 1236, 1278], [552, 573, 1336, 1368], [621, 1075, 1328, 1385], [112, 343, 1416, 1471], [546, 585, 1429, 1460], [698, 789, 1425, 1461], [787, 1707, 1416, 1472], [114, 475, 1461, 1506], [112, 477, 1498, 1540], [114, 422, 1537, 1579], [623, 709, 1541, 1572], [111, 473, 1569, 1619], [112, 480, 1612, 1655], [113, 245, 1653, 1685], [112, 449, 1688, 1727], [112, 436, 1726, 1765], [618, 922, 1718, 1772], [619, 984, 1836, 1880], [549, 577, 1933, 1970], [620, 887, 1924, 1984], [112, 343, 2016, 2071], [546, 588, 2029, 2060], [621, 982, 2021, 2071], [117, 403, 2062, 2102], [114, 447, 2102, 2141], [657, 1236, 2099, 2144], [112, 477, 2138, 2179], [131, 436, 2176, 2215], [697, 1010, 2172, 2221], [128, 334, 2214, 2253], [128, 310, 2253, 2292], [744, 1201, 2246, 2295], [546, 594, 2478, 2510], [618, 2145, 2471, 2520], [657, 1129, 2549, 2590], [1496, 1708, 2551, 2590], [656, 1317, 2620, 2668], [1496, 1708, 2628, 2664], [656, 1435, 2697, 2745], [1496, 1708, 2702, 2741], [656, 1164, 2769, 2818], [1496, 1710, 2779, 2815], [656, 1060, 2845, 2894], [1496, 1708, 2853, 2892], [655, 1429, 2919, 2969], [1496, 1688, 2925, 2967], [657, 1059, 2998, 3041], [1496, 1708, 3004, 3040], [656, 1010, 3069, 3121], [1496, 1708, 3078, 3117], [115, 901, 3408, 3440], [2278, 2326, 3408, 3436], [2342, 2396, 3414, 3435]]], [[[[624.0298574998546, 1419.1194299994186], [700.8096965887974, 1429.7808970915848], [694.9701425001454, 1466.8805700005814], [618.1903034112026, 1456.2191029084152]]]])

Any other optimization hint failed to get it to use fp, f or bf16 or even int8. I don't think compressing the model would make it any better, as the CPU/GPU don't support smaller number-sizes. Luckily the Core i5-8500 with its six cores and AVX-2 is able to do the job in a reasonable time, but it would even do that with the unmodified EasyOCR.

So picking the shape at random is suboptimal. The target-program might pick a shape to fit and to resize images to, that might be the best shape to export the ONNX to.

@MinGiSa
Copy link

MinGiSa commented Aug 3, 2023

@samiechan hi. tried to convert cyrillic_g2.pth to onnx (from your Colab) :

import torch
import onnxruntime as rt
import numpy as np
from easyocr.utils import reformat_input, get_image_list


from easyocr import recognition
#import yaml
import os

recog_network = 'generation2'

# for custom model
#with open(recog_network + '.yaml', encoding='utf8') as file:
  #recog_config = yaml.load(file, Loader=yaml.FullLoader)

#network_params = recog_config['network_params']

network_params = {
    'input_channel': 1,
    'output_channel': 256,
    'hidden_size': 256
    }

# for custom model
#character = recog_config['character_list']


# see https://github.com/JaidedAI/EasyOCR/blob/ca9f9b0ac081f2874a603a5614ddaf9de40ac339/easyocr/config.py for other language config examples
# https://github.com/JaidedAI/EasyOCR/releases/download/v1.6.1/cyrillic_g2.zip
character = '0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ €₽ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдеёжзийклмнопрстуфхцчшщъыьэюяЂђЃѓЄєІіЇїЈјЉљЊњЋћЌќЎўЏџҐґҒғҚқҮүҲҳҶҷӀӏӢӣӨөӮӯ'
symbol = '0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ €₽'
model_path = "cyrillic_g2.pth"
#symbols= "0123456789!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~ "
#character= '0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZЁЂЄІЇЈЉЊЋЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёђєіїјљњћўџҐґҮүө'
#model_path = "cyrillic.pth"
separator_list = {}
cyrillic_lang_list = ['ru','rs_cyrillic','be','bg','uk','mn','abq','ady','kbd',\
                      'ava','dar','inh','che','lbe','lez','tab','tjk', 'en']

#cyrillic_lang_list = ['ru','rs_cyrillic','be','bg','uk','mn','abq','ady','kbd',\
#                      'ava','dar','inh','che','lbe','lez','tab','tjk']
package_dir = os.path.dirname(recognition.__file__)

dict_list = {}
for lang in cyrillic_lang_list:
    dict_list[lang] = os.path.join(package_dir, 'dict', lang + ".txt")

model, converter = recognition.get_recognizer(recog_network=recog_network, network_params=network_params, character=character, separator_list=separator_list, dict_list=dict_list, model_path=model_path, device='cpu', quantize=False)


#import torch

# Define the dimensions of the input image
batch_size = 1
num_channels = 1
image_height = imgH = 64
image_width = 128

image_input_shape = (batch_size, 1, image_height, image_width)
image_input = torch.ones(*image_input_shape)

max_text_length = 10
text_input_shape = (batch_size, max_text_length)
text_input = torch.ones(*text_input_shape)

input_names=['image_input', 'text_input']
output_names=['output']

dynamic_axes = {"image_input": {0: "batch_size", 3: "width"}, "text_input": {0: "batch_size"}}
opset_version = 12

torch.onnx.export(model, (image_input, text_input), "recog.onnx", 
                  input_names=input_names, output_names=output_names, 
                  dynamic_axes=dynamic_axes, opset_version=opset_version)

but failed:

Traceback (most recent call last):
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py", line 1758, in symbolic_fn
    output_size = symbolic_helper._parse_arg(output_size, "is")
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 104, in _parse_arg
    raise errors.SymbolicValueError(
torch.onnx.errors.SymbolicValueError: Failed to export a node '%82 : Long(device=cpu) = onnx::Gather[axis=0](%79, %81), scope: easyocr.model.vgg_model.Model::/torch.nn.modules.pooling.AdaptiveAvgPool2d::AdaptiveAvgPool # /home/al/.local/lib/python3.8/site-packages/torch/nn/functional.py:1213:0
' (in list node %83 : int[] = prim::ListConstruct(%82, %49), scope: easyocr.model.vgg_model.Model::/torch.nn.modules.pooling.AdaptiveAvgPool2d::AdaptiveAvgPool
) because it is not constant. Please try to make things (e.g. kernel sizes) static if possible.  [Caused by the value '83 defined in (%83 : int[] = prim::ListConstruct(%82, %49), scope: easyocr.model.vgg_model.Model::/torch.nn.modules.pooling.AdaptiveAvgPool2d::AdaptiveAvgPool
)' (type 'List[int]') in the TorchScript graph. The containing node has kind 'prim::ListConstruct'.] 

    Inputs:
        #0: 82 defined in (%82 : Long(device=cpu) = onnx::Gather[axis=0](%79, %81), scope: easyocr.model.vgg_model.Model::/torch.nn.modules.pooling.AdaptiveAvgPool2d::AdaptiveAvgPool # /home/al/.local/lib/python3.8/site-packages/torch/nn/functional.py:1213:0
    )  (type 'Tensor')
        #1: 49 defined in (%49 : Long(device=cpu) = onnx::Constant[value={1}](), scope: easyocr.model.vgg_model.Model::/easyocr.model.modules.VGG_FeatureExtractor::FeatureExtraction/torch.nn.modules.container.Sequential::ConvNet/torch.nn.modules.conv.Conv2d::ConvNet.0
    )  (type 'Tensor')
    Outputs:
        #0: 83 defined in (%83 : int[] = prim::ListConstruct(%82, %49), scope: easyocr.model.vgg_model.Model::/torch.nn.modules.pooling.AdaptiveAvgPool2d::AdaptiveAvgPool
    )  (type 'List[int]')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/idlelib/run.py", line 559, in runcode
    exec(code, self.locals)
  File "/home/al/Desktop/easy_ocr_optimized/easy_ocr_onnx.py", line 67, in <module>
    torch.onnx.export(model, (image_input, text_input), "recog.onnx", 
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 504, in export
    _export(
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 1529, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 1115, in _model_to_graph
    graph = _optimize_graph(
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 663, in _optimize_graph
    graph = _C._jit_pass_onnx(graph, operator_export_type)
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 1899, in _run_symbolic_function
    return symbolic_fn(graph_context, *inputs, **attrs)
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 380, in wrapper
    return fn(g, *args, **kwargs)
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py", line 1762, in symbolic_fn
    return symbolic_helper._onnx_unsupported(
  File "/home/al/.local/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 588, in _onnx_unsupported
    raise errors.SymbolicValueError(
torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of operator adaptive pooling, since output_size is not constant.. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues  [Caused by the value 'input.52 defined in (%input.52 : Float(*, *, 256, 3, strides=[23808, 1, 93, 31], requires_grad=1, device=cpu) = onnx::Transpose[perm=[0, 3, 1, 2]](%75), scope: easyocr.model.vgg_model.Model:: # /home/al/.local/lib/python3.8/site-packages/easyocr/model/vgg_model.py:26:0
)' (type 'Tensor') in the TorchScript graph. The containing node has kind 'onnx::Transpose'.] 
    (node defined in /home/al/.local/lib/python3.8/site-packages/easyocr/model/vgg_model.py(26): forward
/home/al/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(1182): _slow_forward
/home/al/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(1194): _call_impl
/home/al/.local/lib/python3.8/site-packages/torch/jit/_trace.py(118): wrapper
/home/al/.local/lib/python3.8/site-packages/torch/jit/_trace.py(127): forward
/home/al/.local/lib/python3.8/site-packages/torch/nn/modules/module.py(1194): _call_impl
/home/al/.local/lib/python3.8/site-packages/torch/jit/_trace.py(1184): _get_trace_graph
/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py(891): _trace_and_get_graph_from_model
/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py(987): _create_jit_graph
/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py(1111): _model_to_graph
/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py(1529): _export
/home/al/.local/lib/python3.8/site-packages/torch/onnx/utils.py(504): export
/home/al/Desktop/easy_ocr_optimized/easy_ocr_onnx.py(67): <module>
/usr/lib/python3.8/idlelib/run.py(559): runcode
/usr/lib/python3.8/idlelib/run.py(156): main
<string>(1): <module>
)

    Inputs:
        #0: 75 defined in (%75 : Float(*, 256, 3, *, strides=[23808, 93, 31, 1], requires_grad=1, device=cpu) = onnx::Relu(%input.48), scope: easyocr.model.vgg_model.Model::/easyocr.model.modules.VGG_FeatureExtractor::FeatureExtraction/torch.nn.modules.container.Sequential::ConvNet/torch.nn.modules.activation.ReLU::ConvNet.19 # /home/al/.local/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
    )  (type 'Tensor')
    Outputs:
        #0: input.52 defined in (%input.52 : Float(*, *, 256, 3, strides=[23808, 1, 93, 31], requires_grad=1, device=cpu) = onnx::Transpose[perm=[0, 3, 1, 2]](%75), scope: easyocr.model.vgg_model.Model:: # /home/al/.local/lib/python3.8/site-packages/easyocr/model/vgg_model.py:26:0
    )  (type 'Tensor')

my modules:

torch==1.13.1+cpu
onnx==1.13.1
onnxruntime==1.14.1

try to do it with easyOCR ver 1.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests