-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export to ONNX and use ONNX Runtime, working. Guide. #746
Comments
Looks great. The pytorch library will take nearly 3GB, way too large to publish. I managed to get a simple working demo with onnxruntime, and generated folder from pyinstaller is 376MB (167MB zipped). Is there anyone else interested in implementing a runtime version of easyocr, with no torch dependency? |
@AutumnSun1996 I'm working on it in my spare time. I am having doubts if the transition process I mentioned in the guide from PyTorch to ONNX generates lower accuracy and/or performance in EasyOCR. Could you confirm me if in your ONNX implementation you observe the same accuracy and performance as in PyTorch? Have you tested the performance of the ONNX runtime using CUDA? |
Found similar behavior for exported model: diff in the recognition model output, but final text are the same. |
@Kromtar can I ask you which version of PyTorch you have?
|
python 3.9.9 These are the versions of the main packages I have installed |
hello, I'm having the same problems, did you solve this? |
Unfortunately no, I ended up exporting the default model to an older version of PyTorch to make it work on my environment |
Getting the same error here, with the same versions installed as specified by @Kromtar (thanks for your hard work on this btw, awesome that you managed to do this and hopefully more of us will follow!). If anybody has a solution it'd be great to hear - I'll keep exploring also and will post if I find something. Also just another note that I got an error prior to this and had to change
|
Technically it would be clearer to call these 'height' and 'width' respectively. The first dimension is the batch size (1), the second the number of channels (3), the third the height and fourth the width as the input is in NCHW format. FWIW if your input during inferencing will have a fixed height and width, there are potential performance benefits from leaving those dimensions as fixed sizes instead of using |
Hi everyone, thanks for the feedback. I have been incredibly busy the last 3 weeks. I finally have some free time. I will check again the bug to find the solution, since in my environment it is working perfectly. Best regards. |
Hi, @Kromtar. What is the next step for testing with real images in mobile phones? I am sorry if it's too obvious, but I am not familiar with onnx at all and need to know what to do to test images using android/ios phones? Is there any environment where you connect phone with easyOCR? Thank you. Also, since I am into it anyway, I'd love to learn more about every single step to make any ml/dl model work in mobile phones. I'd appreciate if you can share materials (posts, projects, etc.) related to this task. Thanks once again. |
Okay guys The problem is that "torch.onnx.export", in case of the recognition model, only works when EasyOCR is running in GPU mode (i.e. using cuda cores). Apparently this is due to how ONNX proceeds to parse and export a very specific network layer, only used by the recognition model. I have not been able to find a solution that does not involve making substantial changes to Torch. My recommendation is to follow the guide, but make sure EasyOCR is running in GPU mode. For this we will be required to have an NVIDIA graphics card with CUDA and the corresponding drivers installed. Soon I will publish a container with everything previously configured. ...I tried my best to make the export work without having to have a CUDA enabled card... but I didn't succeed, I'm sorry ;( |
I have created a new issue where I have made available the ONNX version of the EasyOCR models for all languages. |
I have published a branch in this fork where you can find the whole process using containers. You can see the readme to understand how to use it. |
@Kromtar Thank you for your convert code. I have a dummy question, in your code for conversion recognition model to onnx format, I saw we have 2 input, but when I use Netron app to preview the onnx file, but I cannot find |
@long-senpai I don't really know why this happens in the conversion process. I think that ONNX, when optimizing the model, discovers that the weights provided by input 2 are unnecessary; so it deletes them. The last few weeks I have been working on comparing performances between the models before and after converting. What I can confirm is that independent of input 2, the output of the converted model is the same as the output of the original model. So don't worry, there is no loss of performance. Again, the origin of why ONNX does that, I don't know. |
Do you have exported onnx models for generation2 to download directly? Can these onnx files be used directly without EasyOcr? |
I found how to export ONNX model with CPU only for the recognition model. In fact, it needs to remove quantization, so need to comment these lines. if quantize:
try:
torch.quantization.quantize_dynamic(model, dtype=torch.qint8, inplace=True)
except:
pass |
Having troubles exporting the model
|
Nvm, I needed to update nn.AdaptiveAvgPool2d in my custom_model.py |
Here is another approach to export the CRAFT (detection) model to ONNX format on Windows using WSL2:
You can use craft-text-detector |
It is related to network parameters, I believe.
You can find it here. |
Here is approach for recognition model:
Here is an example how to run it with one cropped image from the detection model:
Also for custom models you can use GPU: Also yes, the ONNX model has only 1 input. |
@samiechan Onnx image_width is not dynamic, if our input image width exceeds it it'll get an error. Is there a way to make it dynamic? |
For detection model:
For recognition model:
|
It's important to note that there are two aspects here. One is whether the model inputs have fixed or dynamic sizes, and the other is whether the model itself supports dynamic sizes. e.g. if the model was trained with input of 64 x 128 and does not internally resize input, you will most likely get an error about sizes mismatching as the nodes and weights in the model will be expecting sizes relative to 64 x 128. Basically the model inputs being dynamic allows any value to be specified but that does not mean the model itself supports any value. If you are pre-processing the image prior to running the original pytorch model (e.g. resize, crop, normalize) you need to do the same things prior to running the exported ONNX model. We have some new helpers that are about to be released that may be helpful. They allow adding these common pre-processing steps into the ONNX model so that onnxruntime can do those. The latest onnxruntime (1.14) also includes the updated ONNX Resize operator that supports anti-aliasing, providing equivalency with the typical Pillow based image sizing used by pytorch.. Overview documentation: https://github.com/microsoft/onnxruntime-extensions/blob/main/onnxruntime_extensions/tools/Example%20usage%20of%20the%20PrePostProcessor.md There are some example implementations, including one showing what the pre-processing pipeline would look like for a model with the common pytorch image pre-processing here: https://github.com/microsoft/onnxruntime-extensions/blob/7578af836146b015bbd7a8539f3288cc539660ad/onnxruntime_extensions/tools/add_pre_post_processing_to_model.py#L23 It's also possible to do image conversion from png or jpg as part of the pre-processing, although that requires the onnxruntime-extensions library to be available at runtime as it uses a custom operator (i.e. not an operator defined in the ONNX spec). We have prebuilt android and ios packages for onnxruntime-extensions in this first release of the new tools, so you'd have to build it yourself for other platforms. |
@samiechan Hello. Can you please tell me the version of your libraries. Your code gave me an error. pytorch 1.9.1 |
Here is my Google Colab: https://colab.research.google.com/drive/1pcoueUxhWFX5Ac6AA4paYDLgZMf819GT?usp=sharing |
@samiechan Thank you very much! |
hi! @Kromtar
also i tried to start up with mods u made without using docker but failed with almost the same error:
|
@samiechan hi! |
I agree. It should be noted somewhere that the output from the detection model gives us the bounding boxes with height of 64 px, however recognition model takes images with height of 32 px. To support dynamic shapes we need to modify VGG model, which is used by default for gen2 models or to resize images before feeding them into model. I came across one Google Colab, which makes it clear how Deep Text Recognition Benchmark models work. Transformation can be added to the VGG model to support dynamic sizes for input (but the model should be retrained). |
Will try to do it once I finish experimenting with recognition model. |
Also, the reason why the second parameter (text) is missing in ONNX model might be that it is not used anywhere in the VGG model. |
@samiechan i move on with your colab and see you export to onnx Craft at the beginning.
conversion was ok but i dont think that model works good. and your code i bit modified:
|
@samiechan hi.
but failed:
my modules:
|
If you want to convert it to tensorrt, I found out that torch2trt already provide the complete pipeline from conversion to inference: It could even provide dynamic input shape. |
recognitionmodel.onnx output is different in opencv dnn than onnxruntime. could anyone pls look into this, although the input array is same |
I'm now trying to get the CRAFT-model working in EasyOCR on an Intel-desktop processor i5 8500 with Intel HD 630 graphics with 8 GB on Windows using ONNX via OpenVino. As I'm using a sample image that triggers a lot of dislikes at every automatical attempt with whatever product I try it's a perfect sample to see what's going wrong as I'm getting used to it's difficulties: https://user-images.githubusercontent.com/3341558/175789293-f39ddfdb-6f3e-4598-8d16-80a1f4a88b36.jpg I was trying to get the text of this image detected with CUDA on my GT1030. Unfortunately it ran short of memory (only 2GB). Only cutting it to a quarter would make the detection come to an end, however tiny dots on characters were lost and words glued together. Running without AVX2 compiled on my old AMD makes it unacceptably slow. My next attempt was to get it GPU-detected on this Intel with 8 GB shared memory. I rewrote the detect function to get the first detection-step done with OpenVino:
You see I had to fill the input-image to a square with zero's as OpenVino doesn't accept an unmatching shape, I don't know the negative implications of that oversize for the performance and memory use. When I used the 1,3,768,768-shape as found in the CRAFT-main the text '14 Wijziging' on top of the page got glued as when cutting the page to a quarter of it's original size, so I knew that shape wasn't the right one. Then I did a new ONNX-export to the shape of the square defined in EasyOCR: 1,3,2560,2560. The Intel GPU ran out of RAM during detection with that model and only wanted to do 32 bit float arithmetic. The Intel CPU didn't use that much RAM and did come to an end with default settings like in the current diff above. The detection-result didn't glue the '14' and the 'Wijziging' together:
Any other optimization hint failed to get it to use fp, f or bf16 or even int8. I don't think compressing the model would make it any better, as the CPU/GPU don't support smaller number-sizes. Luckily the Core i5-8500 with its six cores and AVX-2 is able to do the job in a reasonable time, but it would even do that with the unmodified EasyOCR. So picking the shape at random is suboptimal. The target-program might pick a shape to fit and to resize images to, that might be the best shape to export the ONNX to. |
try to do it with easyOCR ver 1.5. |
This is an explanation of how to export the recognition model and the detection model to ONNX format. Then a brief explanation of how to use ONNX Runtime to use these models.
ONNX is an intercompatibility standard for AI models. It allows us to use the same model in different types of programming languages, operating systems, acceleration platforms and runtimes. Personally I need to make a C++ build of EasyOCR functionality. After failing, due to several reasons, to make a C++ build using Pytorch and the EasyOCR models, I found that the best solution is to transform the models to ONNX and then program in C++ using ONNX Runtime. Then, compiling is very easy compared to PyTorch.
Due to time constraints I am not presenting a PR. It will be necessary for you to modify a copy of EasyOCR locally.
Requirements
We must install the modules: onnx and onnxruntime. In my case I also had to manually install the protobuf module in version 3.20.
I am using:
Exporting ONNX models
The best place to modify the EasyOCR code to export the models is right after EasyOCR uses the loaded model to perform the prediction.
Exporting detection model
In
easyocr/detection.py
aftery, feature = net(x)
(line 46) add:We generate a dumb input, totally random, so that onnx can perform the export. It doesn't matter the input, the important thing is that it has the correct structure. The detection model uses an input that is a 4-dimensional tensor, where the first dimension always has a value of 1, the second a value of 3 and the third and fourth values depend on the resolution of the analyzed image. I have assumed this conclusion after analyzing the data flow, I may be in error and this needs to be corrected.
Note that we export with the parameters (
export_params=True
) and specify that the two final dimensions of the input tensor are of dynamic size (dynamic_axes=...
).Then we can add this code to immediately import the exported model and validate that it is not corrupted:
Remember to
import onnx
in the file header.To run the export just use EasyOCR and perform an analysis on any image indicating the language to be detected. This will download the corresponding model, run the detection and simultaneously export the model. If we change the language we will have to export a new model. Once the model is exported, we can comment or delete the code.
Exporting the recognition model
This model is a bit more difficult to export and we will have to do some black magic.
In
easyocr/recognition.py
afterpreds = model(image, text_for_pred)
(line 111) add:As with the detection model, we create a dumb input to be able to export the model. In this case, the model input is 2 elements.
The first element is a 4-dimensional tensor, where the first dimension always has a value of 1, the second a value of 1, the third a value of 64 and the fourth a dynamic value.
The second element is a 2-dimensional tensor, where the first dimension always has a value of 1 and the second a dynamic value.
Again, I may be wrong about the structure of these inputs, it was what I observed empirically.
First strange thing: ONNX for some reason, in performing its analysis of the model structure, concludes that the second input element does not perform any function. So even if we tell ONNX to export a model with 2 input elements, it will always export a model with 1 input element. It appears that this is due to an internal ONNX process where it "cuts" parts of the network defining graph that do not alter the network output. According to the documentation we can stop this "cutting" process and export the network without optimization using the
do_constant_folding=False
parameter as an option. But due to a bug it is not taking effect. In spite of the above, we can observe that this lack of the second element does not generate losses in the accuracy of the model. For this reason, in the dynamic elements (dynamic_axes=
) we only define one element where its third dimension is variable in size. If anyone manages to export the model with the two input elements, it would be appreciated if you could notify us.Second strange thing: In order to export the recognition model, we must edit
easyocr/model/vgg_model.py
. It turns out that the AdaptiveAvgPool2d operator is not fully supported by ONNX. When it uses the "None" option, in the configuration tuple (which indicates that the size must be equal to the input), the export fails. To fix this we need to change line 11:From
self.AdaptiveAvgPool = nn.AdaptiveAvgPool2d((None, 1))
to
self.AdaptiveAvgPool = nn.AdaptiveAvgPool2d((256, 1))
Why 256? I don't know. Is there a better option? I have not found one. Does it generate errors in the model? I have not been able to find any accuracy problems. If someone can explain why with 256 it works and what the consequences are, it would be appreciated.
Well then, just like the detection model we can add these lines to validate the exported model:
Remember to
import onnx
in the file header.To export the recognition model we must run EasyOCR using any image and the desired language. In the process you will see that some alerts will be generated, but you can ignore them. The model will be exported several times, since the added code has been placed inside a for loop. But this should not cause any problems. Remember to comment or remove the added code afterwards. If you change language, you must export a new ONNX model.
Using ONNX models in EasyOCR
To test and validate that the models work, we will modify the code again. This time we will comment the lines where EasyOCR uses the Pytorch prediction and we will add the code to use ONNX Runtime to perform the prediction.
Using the ONNX detection model
First we must add this helper function to the file
easyocr/detection.py
:Then we must comment on linear 46 where it says
y, feature = net(x)
. After this line we must add:Remember to
import onnxruntime
in the file header.In this way we load the ONNX model of detection and pass as input the value "x". Since ONNX does not use Pytorch, we must convert "x" from a Tensor to a standard numpy array. Para eso usamos la función de ayuda The output of ONNX is left in the "y" variable.
One last modification must be made on lines 51 and 52. Change from:
to
This is because the model output is already a numpy array and does not need to be converted from a Tensor.
To test, we can run EasyOCR with some image and see the result.
Using the ONNX recognition model
We must add the help function to the file
easyocr/recognition.py
:Then we must comment on linear 111 to stop using PyTorch prediction:
preds = model(image, text_for_pred)
. And right after that add:Remember to
import onnxruntime
in the file header.We can see how we are only passing one input entity. Although this model, in theory, is supposed to receive two. As with the detection model, the input must be transformed from a Tensor to a numpy array. We convert the output from an array to a Tensor, so that the data flow continues normally.
To test, we can run EasyOCR with some image and see the result.
Others
We can use this function to compare the output of the PyTorch model and the ONNX model to quantify the difference:
np.testing.assert_allclose(to_numpy(<PYTORCH_PREDICTION>), <ONNX_PREDICTION>, rtol=1e-03, atol=1e-05)
In my tests, the difference between the detection models is minimal and passes the test correctly.
In case of the difference in the recognition models, the difference is slightly larger and the test fails. In spite of this it fails by very little and I have not observed failures in the actual recognition of the characters. I don't know if this is due to the problem with ONNX not detecting the two input entities, the problem with AdaptiveAvgPool2d or just a natural error in the model export and decimal approximations.
Final note
I hope this will be of help to continue with the development of this excellent tool. I hope that exporters in EasyOCR and Pytorch can review this and find the answers to the questions raised.
The text was updated successfully, but these errors were encountered: