You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have ocr'd a pdf with a fax. Now I expect that I can find words from the fax in the plone search. But that does not work. The words are not in the catalog. I have checked /Plone/portal_catalog/plone_lexicon.
I placed a pdb breakpoint in catalog.py. But the breakpoint was never reached. However the SearchableText adapters seem to be registered correctly.
The text was updated successfully, but these errors were encountered:
After further investigation I found out the following. If a pdf contains the text information then this text is added to the index. You can find it by the Plone search. If you add an image or pdf without text information (an image as pdf) then the text is not added to the index. If you process a pdf with text information it is ocr'd nevertheless. That isn't necessary, because the process is expensive and error-prone. Some words are missrecognized. If you find a misrecognized word in the text view of the document viewer try to find this word in plone's fulltext search. It won't be there. But you find the original word. Example: Imagine the word "proper" would be recognized as "prooer". Then proper is in the index but "prooer" not. In this cases the bug is a feature. Here you see that the ocr'd text is not written to the index.
I have ocr'd a pdf with a fax. Now I expect that I can find words from the fax in the plone search. But that does not work. The words are not in the catalog. I have checked /Plone/portal_catalog/plone_lexicon.
I placed a pdb breakpoint in catalog.py. But the breakpoint was never reached. However the SearchableText adapters seem to be registered correctly.
The text was updated successfully, but these errors were encountered: