Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define minimal tesseract dependancies #2333

Closed
zdenop opened this issue Mar 16, 2019 · 10 comments
Closed

Define minimal tesseract dependancies #2333

zdenop opened this issue Mar 16, 2019 · 10 comments

Comments

@zdenop
Copy link
Contributor

zdenop commented Mar 16, 2019

I tried to build (cmake&clang) tesseract with minimal dependencies, so I build leptonica without any dependencies (not even zlib). Build finished fine, tesseract -v produced:

tesseract 4.1.0-rc1-97-g681e
 leptonica-1.78.0 (Mar 13 2019, 19:12:40) [MSC v.1915 LIB Release x64]
  (null)
 Found AVX2
 Found AVX
 Found SSE

But when I run tesseract test.pnm - it produced following error messages and than it crashed:

Error in pixReadMemTiff: function not present
Error in pixReadMem: tiff: no pix returned
Error in pixaGenerateFontFromString: pix not made
Error in bmfCreate: font pixa not made
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 333
Error in pixWriteMemPng: function not present

These indicates that there are several must dependencies for running tesseract (tiff, png and therefore zlib...) and they have to be checked (bmfCreate and pixReadMem) during configuring build.

More advance approach would to analyze if relevant code could be skipped - but than we should somehow distribute information about not available API (e.g. ProcessPagesMultipageTiff).

@zdenop zdenop added this to the 4.1.0 milestone Mar 16, 2019
@amitdo
Copy link
Collaborator

amitdo commented Mar 17, 2019

Looks like a bug in baseapi.cpp.

@stweil stweil added the bug label Mar 17, 2019
@stweil
Copy link
Contributor

stweil commented Mar 17, 2019

Leptonica is normally used as a shared library. So it is possible to provide a minimal liblept.so which obviously causes a crash or to use a full liblept.so which will work fine. The library used during the build is not necessarily the same as the library used when running tesseract.

Therefore I think it is not a build problem, but a problem of the error handling at runtime (which should also be fixed, of course).

@amitdo
Copy link
Collaborator

amitdo commented Mar 17, 2019

Leponica can handle pnm format without any dependency.

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Mar 17, 2019

what about adding minimum versions of python (3.6) and bash (4.4) for training tools?

#2319 (comment)

#2249 (comment)

@zdenop
Copy link
Contributor Author

zdenop commented Mar 17, 2019

@Shreeshrii: we can put info about minimum dependencies to README,md.
Maybe I found some time to rewrite python script to 3.5 compatible version (it should not be a big issue) or at least to check python version at the begin...

@zdenop
Copy link
Contributor Author

zdenop commented Mar 17, 2019

@amitdo : this was exactly reason why I use pnm format: I guess this minimal tesseract&leptonica version is needed by tesseract wrapper (python, java maybe C#) that will use its standard way for opening files and pass only image data to tesseract.

@zdenop
Copy link
Contributor Author

zdenop commented Mar 17, 2019

@stweil : you are right - I did not calculate with scenario with replacing shared leptonica with version with less features (which I did yesterday on linux "successfully").
But to do check at build time is IMO first step (that can actually hide this bug).
Anyway to have working tesseract and leptonica library without any other dependency should be finale target.

@bertsky
Copy link
Contributor

bertsky commented Mar 20, 2019

what about adding minimum versions of python (3.6) and bash (4.4) for training tools?

Why bash >= 4.4, has that actually been ascertained?

@zdenop
Copy link
Contributor Author

zdenop commented May 1, 2019

Here are more details if anybody would like to help it:

  • Building leptonica with libpng (and therefore zlib) support is sufficient to avoid crash.
  • Just api->Init("./tessdata", nullptr); will cause error messages (but not crash):
Error in pixReadMemTiff: function not present
Error in pixReadMem: tiff: no pix returned
Error in pixaGenerateFontFromString: pix not made
Error in bmfCreate: font pixa not made
  • api->SetImage(pixs) is working ok (at least PIX* check = api->GetInputImage() confirm it)
  • cash will be caused by calling outText = api->GetUTF8Text() (without png support):
Warning: Invalid resolution 0 dpi. Using 70 instead.
Error in pixWriteMemPng: function not present
Error in pixReadMem: Unknown format: no pix returned
src_pix != nullptr:Error:Assert failed:in file imagedata.cpp, line 232

Assert is because GetPixInternal did not return pix because pixReadMem did not find format of image_data_. Reason for it that image_data_ should be stored with png format which is not available...

@zdenop
Copy link
Contributor Author

zdenop commented May 1, 2019

Crash is fixed. Init is open for improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants